Forem: LLMWare

How to Create a Local Chatbot Without Coding in Less Than 10 Minutes on AI PCs

Rohan Sharma — Wed, 02 Jul 2025 04:10:07 +0000

🔖 No cloud. No internet. No coding.
🔖 Just you, your laptop, and 100+ powerful AI models running locally.

Imagine building your own chatbot that can answer your questions, summarize documents, analyze images, and even understand tables, all without needing an internet connection.

Sounds futuristic?

Thanks to Model HQ, this is now a reality.

Model HQ developed by LLMWare, is an innovative application that allows you to create and run a chatbot locally on your PC or laptop without an internet connection. Best of all, this can be done with NO CODE in less than 10 minutes, even on older laptops up to 5 years old, provided they have 16GB or more of RAM.

In this guide, we’ll walk you through how to create your own local chatbot using Model HQ ; a revolutionary AI desktop app by LLMWare.ai. Whether you’re a student, developer, or a professional looking for a private and offline AI assistant, this tool puts the power of cutting-edge AI models directly on your laptop.

Let’s break it down.

If you want to know about Model HQ in detail, then read the blog below:

Rohan Sharma for LLMWare

Jun 27 '25

How to Run AI Models Privately on Your AI PC with Model HQ; No Cloud, No Code

#showdev #ai #security #nocode

Comments 17

5 min read

Step 1: Download Model HQ

Model HQ is an AI desktop application that allows you to interact with over 100+ top-performing AI models, including large ones with up to 32 billion parameters — all running locally on your PC.

Unlike cloud-based tools, there’s no internet required, and your data never leaves your machine. That means more privacy, better speed, and zero cost for each query you run.

In this blog, we will be looking into the CHAT feature of Model HQ that helps us to create a chatbot running locally on our machine.

First, get the app.

👉 Download or Buy Model HQ for Windows

Not ready to buy? No problem.

👉 Join the 90-Day Free Developer Trial

Once installed, you’ll have access to an interface that feels like your own AI control panel.

Step 2: Choosing the Right AI Model

Once installation is done, open the ModelHQ application, and then you will be prompted to add a setup method. The setup guide is provided after buying the application.

After this, you will land in the main menu. Now, click on the Chat button.

You’ll be prompted to select an AI model. If you’re unsure which model to choose, you can click on “choose for me,” and the application will select a suitable model based on your needs. Model HQ comes up with 100+ models.

Available Model Options:

Small Model:
~1– 3 billion parameters:- Fastest response time, suitable for basic chat.
Medium Model:
~7– 8 billion parameters:- Balanced performance, ideal for chat, data analysis, and standard RAG tasks.
Large Model:
~9 – up to 32 billion parameters:- Most powerful chat, RAG, and best for advanced and complex analytical workloads.

By the way, Model HQ will pick a smart default based on your system and use case.

The size of the model you choose can significantly impact both speed and output quality. Smaller models are faster but may provide less detailed responses. Follow this simple rule:

Step 3. Downloading Models

For demonstration purposes, we are selecting the Small Model.

If no models have been downloaded previously (e.g., in the No Setup, Fast Setup, or Full Setup paths), the selected model will begin downloading automatically.

This process typically takes 2–7 minutes, depending on the model you selected and your internet speed.

This is only a one-time internet requirement; once the models are downloaded, you don’t need internet anymore.

Step 4: Start Chatting

Once you’ve selected a model, you can start a chat by typing in your questions. For example, you might ask a simple question like, “What are the top sites to see in Paris?” The model will generate a response based on its training data.

Customizing Your Chat Experience

Model HQ allows you to customize your chat experience further. You can adjust settings such as the maximum output length and the randomness of the responses (known as temperature). By default, the app is set to generate up to 1,000 tokens, which is usually sufficient for smaller models. However, even if you’re using larger models, be cautious about increasing this limit, as it can consume more memory and take longer to generate responses. So, in short, you can adjust generation settings:

Max Tokens: How long should the response be?
Temperature: Should the answer be creative or precise?
Stop/Restart: Hit ❌ to stop a long generation anytime.

Step 5: Integrating Sources for Enhanced Responses

One of the standout features of Model HQ is its ability to integrate sources, such as documents and images, into your chat. To do this, simply click on the “source” button and upload a file, such as a PDF or Word document.

Example: Using a Document as a Source

For instance, if you upload an executive employment agreement, you can ask specific questions about the clauses within the document. The model will reference the uploaded document to provide accurate answers. This feature is invaluable for fact-checking and ensuring that you have the right information at your fingertips.

Chatting with Images

Model HQ also allows you to chat with images. By uploading an image, the application can analyze the content and answer questions based on what it sees. This capability opens up a world of possibilities for multimedia processing, all done locally on your machine without any additional costs.

Step 6: Saving and Downloading Results

After you’ve finished your session, you can save the chat results for future reference. This is particularly useful if you need to compile information for reports or presentations. Simply download the results, and you’ll have everything you need at your fingertips.

Step 7: Exploring Advanced Features

As you become more comfortable with Model HQ, you can explore its advanced features. For example, you can experiment with different models to see how they perform with various types of queries. You can also adjust the generation settings to fine-tune the responses based on your specific needs.

If you’re a visual learner, then watch this YouTube walkthrough:

Future Updates and Community Engagement

Stay engaged with the Model HQ community by following their updates and tutorials on platforms like YouTube. The Model HQ YouTube playlist offers valuable insights and tips to help you maximize your experience with the application.

Join the LLMWare’s Official Discord Server to interact with LLMWare’s great community of users and if you have any questions or feedback.

Why This Matters

Most AI apps require you to upload data to a cloud server. That’s slow, often expensive, and puts your privacy at risk.

With Model HQ, everything runs on your own machine with:

✅ No internet needed
✅ No Coding Required
✅ No API keys or credits
✅ No data leaves your PC
✅ Zero cost per query

It’s your personal AI lab, fully private and offline.

Conclusion: Get Started with Model HQ Today!

Creating a chatbot that runs locally without coding and an internet connection has never been easier. With Model HQ, you have access to a powerful AI tool that can enhance your productivity and streamline your workflow.

Ready to experience the future of AI? Visit the LLMWare website to learn more about Model HQ and its features. Don’t forget to sign up for the 90-day free trial for developers here and explore the application firsthand. When you’re ready to make the leap, you can purchase Model HQ directly here.

Unlock the full potential of AI on your PC or laptop with Model HQ today, and take the first step towards creating your very own local chatbot!

How to Run AI Models Privately on Your AI PC with Model HQ; No Cloud, No Code

Rohan Sharma — Fri, 27 Jun 2025 04:20:58 +0000

In an era where efficiency and data privacy are paramount, Model HQ by LLMWare emerges as a game-changer for professionals and enthusiasts alike. Built by LLMWare, Model HQ is a groundbreaking desktop application that transforms your own PC or laptop into a fully private, high-performance AI workstation.

Most AI tools rely on the cloud. Model HQ doesn’t.

No more cloud latency. No more vendor lock-in. Just 100+ cutting-edge AI models, blazing fast document search, and natural language tools; all running locally on your machine.

What is Model HQ?

Model HQ is a powerful, no-code desktop application that enables users to run enterprise-grade AI workflows locally, securely, and at scale, right from their own PC or laptop. Designed for simplicity and performance, it provides point-and-click access to 100+ state-of-the-art AI models, ranging from 1B to 32B parameters, with built-in optimization for AI PCs and Intel hardware. Whether you’re building AI applications, analyzing documents, or querying data, Model HQ automatically adapts to your device’s specs to ensure fast, efficient inferencing, even for large models that traditionally struggle on standard formats.

What truly sets Model HQ apart is its privacy-first, offline capability. Once models are downloaded, they can be used without Wi-Fi, keeping your data and sensitive information 100% on-device. This makes it the fastest and most secure way to explore and deploy powerful AI tools without depending on the cloud or external APIs. From developers and researchers to enterprise teams, Model HQ delivers a seamless, cost-effective, and private AI experience; all in one sleek, local platform.

What Model HQ can do?

1. Chat:
The Chat feature allows users a fast way to start experimenting with chat models of various sizes, from Small (1–3 billion parameters), Medium (7–8 billion parameters) to Large (9 and above, up to 32 billion parameters).

Small Model:

~1–3 billion parameters — Fastest response time, suitable for basic chat.
Medium Model:

~7–8 billion parameters — Balanced performance, ideal for chat, data analysis and standard RAG tasks.
Large Model:

~9 up to 32 billion parameters — Most powerful chat, RAG, and best for advanced and complex analytical workloads.

Watch Chat in Action

2. Agents
Agents in Model HQ are pre-configured or custom-built workflows that automate complex tasks using local AI models. They allow users to process files, extract insights, or perform multi-step operations; all with point-and-click simplicity and no coding required.

Users can build new agents from scratch, load existing ones (either from built-in templates or previously created workflows), and manage them through a simple dropdown interface. From editing or deleting agents to running batch operations on multiple documents, the Agent system provides a flexible way to scale private, on-device AI workflows. Pre-created agents include powerful tools like Contract Analyzer, Customer Support Bot, Financial Data Extractor, Image Tagger, and more — each designed to handle specific tasks efficiently.

Watch Agents in Action

3. Bots
The Bots feature allows users to create their own custom Chat and RAG bots seamlessly for either the AI PC/edge device use case (Fast Start Chatbot and Model HQ Biz Bot) or via API deployment (Model HQ API Server Biz Bot).

Watch Bots in Action

4. RAG
RAG combines retrieval-based techniques with generative AI to allow models to answer questions more accurately by retrieving relevant information from external sources or documents. With RAG in Model HQ, you can create knowledge bases that you can query in the chat section or via a custom bot by uploading documents. The RAG section is used only to create the knowledge base.

Watch Rag in Action

5. Models
The Models section allows you to explore, manage, and test models within Model HQ. You can discover new models, manage downloaded models, review inference history, and run benchmark tests; all from a single interface.

And this all can be done, while keeping your data private, your workflows offline, and your AI performance fully optimized for your device — no internet, no cloud, and no compromise. With its powerful features and user-friendly interface, Model HQ empowers you to leverage AI technology without compromising on security. Experience the future of AI today and transform the way you work!

System Requirements

Experience Model HQ Risk-Free

We understand that trying new software can be a leap of faith. That’s why we’re offering a 90-day free trial for developers. Experience the full capabilities of Model HQ without any commitment. Sign up for the trial here and discover how it can transform your workflow.

A Powerful Collaboration with Intel

LLMWare.ai has partnered with Intel to optimize Model HQ for peak performance on your devices. This collaboration ensures that you receive a reliable and efficient AI experience, making your tasks smoother and more productive. Learn more about this exciting partnership here.

Read the Intel Solution Brief here:

Local AI—No Code, More Secure with AI PCs and the Private Cloud

Bring secure, no-code GenAI to your enterprise with Intel® AI PCs and LLMWare’s Model HQ—run agents and RAG queries locally without exposing data or incurring cloud costs. In this brief, learn how to scale private AI simply and affordably.

intel.com

Take the Next Step Towards AI Empowerment

Don’t miss the chance to elevate your productivity with Model HQ. Whether you’re a business professional, a developer, or a student, this application is designed to meet your needs and exceed your expectations.

Purchase Model HQ Today!

Ready to unlock the full potential of AI on your PC or laptop? Buy Model HQ now by clicking here and take the first step towards a smarter, more efficient future.

Learn More About Model HQ

For additional information about Model HQ, including detailed features and user guides, visit our website. Don’t forget to check out our introductory video and explore our YouTube playlist for tutorials and tips.

Join the LLMWare’s official Discord Server to interact with LLMWare's great community of users and if you have any questions or feedback.

Conclusion

Model HQ isn’t just another AI app, it’s a complete, offline-first platform built for speed, privacy, and control. Whether you’re chatting with LLMs, building agents, analyzing documents, or deploying custom bots, everything runs securely on your own PC or laptop. With support for models up to 32B parameters, RAG-enabled document search, natural language SQL, and no-code workflows, Model HQ brings enterprise-grade AI directly to your desktop, no cloud required.

As the world moves toward AI-powered productivity, Model HQ ensures you’re ahead of the curve with a faster, safer, and smarter way to work.

How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models

Julia Zhou — Wed, 14 May 2025 12:05:49 +0000

Introduction

It's been a few months since the last iteration in this series, but new year, more LLMWare Fast Start to RAG examples! In the previous articles, we covered creating libraries and transforming this information into embeddings. Now that we have done the heavy lifting, so to speak, we are ready to begin writing prompts and getting responses. This will be the focus of today's article.

Extra resources

A few notes before we start! In case you missed them, I will link the previous articles in this series. This example will build upon example 1 and example 2 and will assume prior understanding of these topics.

For visual learners, here is a video that works through example 3. Feel free to watch the video before following the steps in this article. Also, here is a Python Notebook that breaks down this example's code alongside the output.

Notebook for example 3: prompts and models

The code

Now, we are ready to take a look at the example's code! This LLMWare Faststart example can be run in the same way as the previous ones, but instructions can be found in our README file if needed. Example 3 is directly copy-paste ready!

Code for example 3: prompts and models

Part 1 - What are prompts?

While working through this example, I read a MIT Sloan Teaching & Learning Technologies article titled "Effective Prompts for AI: The Essentials". The entire article is definitely worth a read, but I wanted to share a quote that summarizes what prompts are in the AI world. To read the whole article, check out this link.

Prompts are your input into the AI system to obtain specific results. In other words, prompts are conversation starters: what and how you tell something to the AI for it to respond in a way that generates useful responses for you ... It’s like having a conversation with another person, only in this case the conversation is text-based, and your interlocutor is AI.

In other words, the prompt you provide determines how the AI responds. To create the most effective prompts, use specific wording and consider providing context, including in the form of additional text paragraphs.

Part 2 - Which model should I use?

The model catalog is a list of all the models LLMWare has registered. Like a dictionary, each model in the catalog is automatically linked with configuration data and implementation classes for easy use. The goal of this catalog is exactly this: ease of use. When provided with only the model's name, if it is present in the catalog, it should be able to run without any other information.

The following lines of code provide lists of models included in the catalog. More information about the capabilities and performances of these models is included as comments in the Python code file.

#   all generative models
llm_models = ModelCatalog().list_generative_models()

#   if you only want to see the local models
llm_local_models = ModelCatalog().list_generative_local_models()

#   to see only the open source models
llm_open_source_models = ModelCatalog().list_open_source_models()

The following line of code selections a model by index. To choose a different model, simply replace the index value.

model_name = gguf_generative_models[0]

Alternatively, we can choose a specific model by name. For those interested in exploring RAG through OpenAI, all of the LLMWare examples are ready to use. In this particular example, uncomment the following lines and insert the necessary information.

#   model_name = "gpt-4"
#   os.environ["USER_MANAGED_OPENAI_API_KEY"] = "<insert-your-openai-key>"

However, these examples also encourage the use of open-source, models. These are locally deployed models that produce top-notch quality right on your laptop. The developments in regards to open-source over the past few years cannot be overstated. The future of AI is here in these small, specialized models optimized for a specific purpose.

For example, the LLMWare Bling 1B is a small, fast model fine-tuned to RAG that runs on your local machine.

To learn more about LLMWare's Bling and Dragon models, consider visiting their Hugging Face page!

Part 3 - Main example script

Now, we can head to the main example script, fast_start_prompting. We will follow four general steps:

Pull sample questions
Load the model
Prompt the model
Get results

The sample questions (each with query, answer, and context) are found at the top of the Python file. They cover a variety of fields with a little extra emphasis on business, financial, and legal applications. However, it is always encouraged to change these questions or add to them to better suit your interests and needs! All of the questions will be pulled in through the test_list.

To use the model, we create a prompt object. Prompts are what we do to a model: we use them when we have a question in context and want to pass it to the model to receive a response. This line of code loads the model:

prompter = Prompt().load_model(model_name)

The first time we load the model, it needs to "move" from the LLMWare Hugging Face repository to your local system, which can take a few minutes. However, once that is complete, all the work the model does will happen locally on your computer!

Now, we loop through our list of questions. The key method .prompt_main in the prompt class causes inference on the model. The mandatory parameter for this method is the query. Optionally, context, prompt_name, and temperature can also be passed in.

output = prompter.prompt_main(entries["query"],
                                      context=entries["context"],
                                      prompt_name="default_with_context",
                                      temperature=0.30)

The context is a passage of information we want the model to read before answering the question. This allows us to explain what we want the model to consider in its answer, and it will answer based on the passage.

The prompt catalog supports a range of prompt names. The code uses default_with_context, which tells the model to read the provided context and answer the question.

Adjusting the temperature will change the results of the query. In general, a lower temperature will yield more factual responses directly relating to the context. Higher temperatures are more appropriate when we require a more creative response from the model. For RAG based applications, we set the temperature comparatively low to yield the the most consistency and quality.

The output is a dictionary with two keys: llm_response and usage.

Part 4 - Running the model

Once you run the code, you will see the queries being iterated through and printed out. Each of these print-outs has an LLM Response and a Gold Answer. The LLM Response is the model's response while the Gold Answer is an "answer key" we created that the model does not see. This allows us to quickly compare the two answers and check for the model's accuracy.

The LLM Usage line provides additional information about how the model formulated its response. In particular, you can see the "processing_time" for each query, which showcases the model's speed. Of course, the computer you run the models on will also cause speed to vary - the amount of RAM available is especially impactful for efficiency.

1. Query: What is the total amount of the invoice?
LLM Response: 22,500.00
Gold Answer: $22,500.00
LLM Usage: {'input': 209, 'output': 9, 'total': 218, 'metric': 'tokens', 'processing_time': 2.0669240951538086}

The above output is a sample response. The LLM correctly responded to the query since its response matches the gold answer.

We have successfully received answers to our questions! Congrats on reaching the end of this example. Here is a link to the full working code!

FULL CODE

Part 5 - Further exploration

To experiment more with this example, consider changing out the model_name for other models! How does the LLMWare Bling model compare to the LLMWare Dragon model or OpenAI? Will these models generate the same response when provided the same queries and context? Once you try out these questions, let us know what you think!

I hope you enjoyed this example about prompts and models! The next example will be about RAG text query, stay tuned for the article.

Happy coding!

To see more ...

Please join our LLMWare community on discord to learn more about RAG / LLMs and share your thoughts! https://discord.gg/5mx42AGbHm

Visit LLMWare's Website

Explore LLMWare on GitHub

Image from Freepik

How I Learned Generative AI in Two Weeks (and You Can Too): Part 2 - Embeddings

Julia Zhou — Fri, 11 Oct 2024 11:41:08 +0000

Introduction

A few weeks ago, I shared my experience learning about Generative AI Libraries through LLMWare's Fast Start to RAG example 1. Today, I will continue this series by taking you through example 2. This is personally one of my favorite "lessons" in this LLMWare series, so I hope you will find it thought-provoking as well! This example will focus on embeddings and vectors. Let us start by exploring what exactly these terms mean!

How do embedding models work?

Embeddings models are trained on large amounts of language tokens to either predict the next token or fill in missing tokens. In either case, these models learn how to represent language! They take in large chunks of text as input and processes it through tokenization (breaking down into smaller pieces), conversion into numbers, and various layers of transformations. These steps build a representation of the input text to help formulate the output: vectors.

Vectors are created when the input text is translated into the language through which the model sees the world. Geometrically speaking, they are n-dimensional shapes where "n" is the number of embedding dimensions (typically, n is 768). The dimensions are represented by n floats, usually ranging between 0 and 1 or -1 and 1. Converting the text to numbers allows the model to more easily compare the similarity of two texts.

Try thinking back to high school geometry! You might remember that two points (or shapes) that are close to each other are considered more similar to one another than two far away points. This process is exactly what the model performs to compare texts and is known as a semantic search. Once a query is converted to a vector, that vector is compared to all the other vectors in the database. The ones that are the most similar are returned.

Now, we are ready to take a look at the example's code! This LLMWare Faststart example can be run in the same way as example 1, but instructions can be found in our README file if needed. Example 2 is directly copy-paste ready!

Example 2: Embeddings

Extra resources

In case you missed it, I will link my previous article in this series since this example will continue building on the foundation we built in example 1. The same process for creating libraries is utilized in example 2, so I will skip over it here.

Article - Example 1: Libraries

For visual learners, here is a video that works through example 2. Feel free to watch the video before following the steps in this article. Also, here is a Python Notebook that breaks down this example's code alongside the output.

Example 2 Notebook

Part 1 - Creating embeddings & storing vectors

As mentioned above, we will not cover the library building process in this article and will move directly into embedding models. For this demo, we will use the "mini-lm-sbert" model, which is efficient and is included in the default LLMWare package. Feel free to experiment with different models, including the OpenAI Text Embedding Ada!

Recall that in example 1, we not only created our library but also added our documents into a database. This database will make it extremely convenient to access test chunks that we can give to the embedding model.

Once the library has been created, let us focus our attention on the most important line of code:

library.install_new_embedding(embedding_model_name=embedding_model, vector_db=vector_db,batch_size=100)

This line calls the install_new_embedding function and passes in the embedding model and vector names as parameters. The final parameter batch_size determines how many text chunks will be processed at a time. Considerations like efficiency, memory, model capability, and database size all factor into choosing the most appropriate batch size.

We can confirm that our embedding creation and vector storage was a success!

update = Status().get_embedding_status(library_name, embedding_model)
print("update: Embeddings Complete - Status() check at end of embedding - ", update)

Part 2 - Queries

Now that we have the vector database, we can begin running queries on it! We will begin by creating a very simple query before passing it into the library and running a semantic query model on it.

sample_query = "incentive compensation"
query_results = Query(library).semantic_query(sample_query, result_count=20)

We will use the following portion of code to iterate through the query results to view them, and we will especially look at the distance parameter.

for i, entries in enumerate(query_results):
  text = entries["text"]
  document_source = entries["file_source"]
  page_num = entries["page_num"]
  vector_distance = entries["distance"]

  if len(text) > 125: text = text[0:125] + " ... "

  print("\nupdate: query results - {} - document - {} - page num - {} distance - {} ".format(i, document_source, page_num, vector_distance))

  print("update: text sample - ", text)

Let us run the example to see the results in action!

Part 3 - The results

Through the output, we can see that at first, we have no embeddings.

embedding record - before embedding  [{'embedding_status': 'no', 'embedding_model': 'none', 'embedding_db': 'none', 'embedded_blocks': 0, 'embedding_dims': 0, 'time_stamp': 'NA'}]

Then, there are a series of outputs showing that we are creating embeddings in batches of 100, as expected. By the end, all of the text chunks will be converted to vectors.

update: Embeddings Complete - Status() check at end of embedding -  [{'_id': 2, 'key': 'example2_library_embedding_mini-lm-sbert', 'summary': '2211 of 2211 blocks', 'start_time': '1717690179.087806', 'end_time': '1717690199.5373614', 'total': 2211, 'current': 2211, 'units': 'blocks'}]

Now, we have arrived back at the query result for-loop mentioned above. Looking at the first result, we can see that one, among many, of the outputted metadata points is distance. This distance value can be considered the distance between the vector for our query ("incentive compensation") and the vector for this sample block.

update: query results - 0 - document - Artemis Poseidon EXECUTIVE EMPLOYMENT AGREEMENT.pdf - page num - 4 distance - 0.24837934970855713

The query results are sorted from lowest to highest distance - that is, from most to least similar. For comparison, we can see that the tenth query result returned has a higher distance than the first one!

update: query results - 10 - document - Eileithyia EXECUTIVE EMPLOYMENT AGREEMENT.pdf - page num - 3 distance - 0.27305811643600464 
update: text sample -  in Employer's annual cash incentive   bonus plan (the “Plan”), based on the same terms and conditions as in existence for oth ...

Part 4 - Further exploration

For this example, we used the "faiss" vector database, but I encourage you to experiment with others as well.

Similarly, try using different embedding models to see how their characteristics might be optimized for certain types of inputs! A series of examples involving embeddings can be found on the LLMWare Github page.

Embeddings Examples

I hope you enjoyed this example about embeddings and vectors! The next example will be about prompts and models, stay tuned for the article.

Happy coding!

To see more ...

Please join our LLMWare community on discord to learn more about RAG and LLMs! https://discord.gg/5mx42AGbHm

Visit LLMWare's Website

Explore LLMWare on GitHub

Image from Freepik

How I Learned Generative AI in Two Weeks (and You Can Too): Part 1 - Libraries

Julia Zhou — Thu, 12 Sep 2024 21:54:51 +0000

Introduction

For reference, prior to this journey, I barely had more knowledge about AI than the average person. Sure, I fired off the occasional ChatGPT request for one task or another, but I was always more focused on coding than AI, having picked up Python and Java during quarantine.

Despite my initial skepticism at being able to successfully understand the examples, particularly in a short time frame, I found LLMWare's "Fast Start to RAG" series highly accessible. I will cover example one of the course in this article - hopefully it can help you as well! If you are interested in learning more about LLMWare, feel free to check out our website as well as another DEV article outlining the Fast Start to RAG examples.

To clarify, extensive knowledge of coding, specifically Python 3, is not necessarily a prerequisite for the examples that I used to get my start in AI and RAG. However, basic understanding is certainly helpful in comprehending content and parsing code.

Getting started

To run these examples, you will need to install the LLMWare package by running pip3 install llmware in the command line. Further instructions can be found in our README file. Then, you will be able to run example 1, which is directly copy-paste ready.

I will also point out that the AI community tends to use acronyms (like AI itself!) and technical language extending beyond the scope of everyday conversation. The acronym "RAG" stands for Retrieval Augmented Generation, which enhances outputs of LLMs (Large Language Models) using external knowledge. In Example 1, we will be focusing on the first step in RAG - converting a pile of files into an AI ready knowledge base.

Extra resources

For visual learners, here is a video that works through example 1. Feel free to watch the video before following the steps in this article. Also, here is a Python Notebook that breaks down this example's code alongside the output: Example 1 Notebook.

Part 1 - Execution configuration

By default, the active database being used is called "mongo", but we will select "sqlite" since it does not require a separate installation.

Additionally, we can use different debug mode options to see more or less information as it is processed. We can set debug_mode to 2 for more detailed outputs compared to 0, the default.

For this example, sample data sets are imported through from llmware.setup import Setup and are stored in sample_folders. These sets include documents of different subject matters and sizes, but you will be able to replace them with your own data as well. We can choose a name for our library (go ahead and customize!) and select a folder from the samples before running the main script.

LLMWareConfig().set_active_db("sqlite")

LLMWareConfig().set_config("debug_mode", 2)

sample_folders = ["Agreements", "Invoices", "UN-Resolutions-500", "SmallLibrary", "FinDocs", "AgreementsLarge"]
library_name = "example1_library"
selected_folder = sample_folders[0]     # e.g., "Agreements"

output = parsing_documents_into_library(library_name, selected_folder)

Part 2 - Main body

Step 1: Now, we can create our library! This line of code will set up the database tables as well as supporting file repositories to store information about the library.

library = Library().create_new_library(library_name)

Steps 2 and 3: However, our library is still completely empty, so we need to fill it up. To do so, we will load in the LLMWare sample files and save them in sample_files_path. If you are using your own data sets, you will need to point to a local folder path with your documents.

sample_files_path = Setup().load_sample_files(over_write=False)
ingestion_folder_path = os.path.join(sample_files_path, sample_folder)

Step 4: While adding files to a library, LLMWare performs parsing, text chunking, and indexing in the sqlite database. It will automatically choose the correct parser based on a file's extension type. This parser will extract information to store in database text chunks. Although this may seem like a lot of steps, it all happens incredibly quickly behind the scenes!

parsing_output = library.add_files(ingestion_folder_path)

Step 5: To check our progress, we can look at the updated_library_card, which contains key metadata, counting data, and other important information. This .get_library_card() method can be called at any time to retrieve information about your library,

updated_library_card = library.get_library_card()
doc_count = updated_library_card["documents"]
block_count = updated_library_card["blocks"]

Steps 6 and 7: We can check the library's main folder structure, but the library is ready to start running queries! We will do this by instantiating a Query object and passing it to the library. This test_query may need to be adjusted to best suit the data set. For this example, we chose the "Agreements" sample set, so we can use "base salary" as a "hello world"-esque query.

Now, a text query is going to be run to look at every chunk of text to find the ones that contain "base salary" to return. The Query class contains many methods for different Query types. Today, we will use the simplest text_query method.

query_results = Query(library).text_query(test_query, result_count=10)

We can print out our results, giving us a look at the metadata and attributes of the individual text blocks we created!

for i, result in enumerate(query_results):
        #   here are a few useful attributes
        text = result["text"]
        file_source = result["file_source"]
        page_number = result["page_num"]
        doc_id = result["doc_ID"]
        block_id = result["block_ID"]
        matches = result["matches"]

        print("query results: ", i, result)

Part 3 - The results

The outputted summary will include key information such as total pdf files processed, total blocks created, total pages added, and time elapsed. Try and see if you can find all of them!

In particular, the LLMWare package includes "C based parsers" that are able to quickly and efficiently parse files. Once completed, the parsed information will be outputted as a dictionary. You will see the results of your work in the previous steps!

To summarize, we took our documents and broke them down into thousands of blocks. Then, we extracted text information and put it into the sqlite database. Lastly, we ran a text search against that data to retrieve our results (including details as small as pixel coordinates and character level matches!).

You just completed your first example, but there is so much more for you to explore! I would suggest rerunning this example with varied data sets to tap into the true potential of this technology, and of course, continue onto example 2 about building embeddings!

Happy coding!

Part 4 - To see more ...

Please join our LLMWare community on discord to learn more about RAG and LLMs! https://discord.gg/5mx42AGbHm

Visit LLMWare's Website

Explore LLMWare on GitHub

Image by DC Studio on Freepik

Evaluating LLMs and Prompts with Electron UI 🤖 💬

Will Taner — Wed, 07 Aug 2024 13:02:07 +0000

What is this UI useful for? 🤨

LLMs are becoming an increasingly prevalent tool across various industries. However, achieving optimal results greatly depends on selecting the correct model and prompts. This process can be extremely time-consuming as it requires extensive trial and error.

This article serves as a tutorial on a tool designed to streamline the process of testing various models and prompts. By using this tool, developers can efficiently identify the most effective combinations.

Big shoutout to Kevin Brisson for creating this Electron UI! You may find the GitHub Repo for his tool here: https://github.com/kbrisso/ai-base

Note: The CLI commands provided in this article are designed for Linux/Mac systems and may not function correctly on Windows machines. If you encounter any issues, please replace the incompatible commands with their Windows equivalents.

Step-by-Step Guide on Getting the Tool up and Running!

(If you would like to watch a video demonstrating the setup, click here)

1. Let's Start by Cloning the Repo 💻

Navigate to a directory of your choosing and run git clone https://github.com/kbrisso/ai-base in your command line.

2. Root Directory Installs 📦

Navigate to the root directory, ai-base, in your command line and run npm install. If you experience an error, run npm audit fix. This will resolve any vulnerabilities in the packages.

3. llmware-wrapper Installs 📦

Navigate to the directory llmware-wrapper in your command line and run the same exact command from above: npm install and npm audit fix if there is an error.

4. Create a Virtual Environment and Install Packages 💾

While in the same directory as above, llmware-wrapper, create a new virtual environment: python3 -m venv venv.

Then enter into the virtual environment by running source venv/bin/activate in the command line.

Then run pip install -r requirements.txt to install the required packages.

Finally, deactivate the virtual environment by running deactivate

5. Copy Local Python Path 🐍

On the file explorer on your IDE, open the file titled llmware-wrapper.properties in the llmware-wrapper directory.

In this file, delete the path that the variable pythonpath is already set to and replace it with the path to your local python interpreter.

If you do not know what your local path is, run which python in the command line then copy and paste the result where you deleted the previous path.

Let's start the UI! 🚀

Navigate to the root directory, ai-base, in your command line and run npm start. After a few seconds a window of the UI should pop up.

Selecting a Model and Prompt 💡

Click on the button that says "Choose a Model". This will display all the available models provided by LLMWare. Once you find a model that you would like to try, click the button that says "Choose" next to the model name.

Click on the button that says "Choose a Prompt". This will display all the available types of prompts you can choose from provided by LLMWare. Additionally, you may find supplemental information about the prompt type, such as a description, to the right of the prompt name. Once you find a prompt type that you would like to try, click the button that says "Choose" next to the prompt name.

Putting it all together... 🧩

After selecting a model and a prompt type, it is time to start querying! Simply add your query to the box labeled "Query" and click the button that says "Run Query". After some time, the response will show up in the box titled "Response".

Note: Depending on the prompt type chosen, an extra box for context will appear. You may use this space to provide relevant details for your query if needed.

Now you can experiment with different models and queries with ease!

Conclusion 🏁

Finding the right combination of models and prompts are crucial to creating a reliable and effective LLM tool. Utilizing this UI, you can find the best combination faster than ever before!

Please check out our Github and leave a star! https://github.com/llmware-ai/llmware

Please be sure to visit our website llmware.ai for more information and updates

🤖Dueling AIs: Questioning and Answering with Language Models🚀

Prashant Iyer — Sun, 28 Jul 2024 20:36:55 +0000

You've probably asked a question to a language model before and then had it give you an answer. After all, this is what we most commonly use language models for.

But have you ever received a question from a language model? While not as common, this application of AI has diverse use cases in areas like education, where you might want a model to give you practice questions for a test, and in sales enablement, where you question your business's sales team about your products to improve their ability to make sales.

Now, what if we had a face off⚔️ between two different models: one that asked questions about a topic and another that answered them? All without human intervention?

In this article, we're going to look at exactly that. We'll provide a sample passage about OpenAI's AI safety team as context to our models. We'll then let our models duel it out! One model will ask questions based on this passage, and another model will respond!

Our AI Models🤖

Intoducing, slim-q-gen-tiny-tool. This will be our question model, capable of generating 3 different types of questions:

Multiple choice questions
Boolean (true/false) questions
General open-ended questions

Facing off against this will be bling-phi-3-gguf! This will be our answer model, giving appropriate responses to any of the above types of questions.

One important note is that both these models are GGUF quantized. This means that they are smaller and faster versions of their original counterparts. What this means for us is that we can run them on just a CPU, with no need for resources like GPUs!

Step 1: Providing input parameters✏️

This is what our function signature for this example looks like.

def ask_and_answer_game(source_passage, q_model="slim-q-gen-tiny-tool", number_of_tries=10, question_type="question",
                        temperature=0.5):

source_passage is the text input that we will provide our models,
q_model is our questioning model,
number_of_tries is the number of questions we will attempt to generate (more on this later!)
question_type can be either "multiple choice", "boolean" or "question" corresponding to each of the types of questions we saw above,
temperature is a value ranging from 0 to 1 that determines how much variance we will see in our generated questions. Here, the value of 0.5 is relatively high so that we get a good variety of questions with little repetition!

Step 2: Loading in our models🪫🔋

With the inputs taken care of, let's now load in both our models.

q_model = ModelCatalog().load_model(q_model, sample=True, temperature=temperature)

Notice that we have sample=True to increase variety in our model output (the questions generated).

Now, for the answer model.

answer_model = ModelCatalog().load_model("bling-phi-3-gguf")

We won't mess with the sample or temperature options here because we want concise, fact-based answers from this model.

Step 3: Generating our questions🤔💬

We'll try to generate questions number_of_tries times, which in this case is 10. We'll then then update our questions list with only the unique questions, to avoid repetitions.

questions = []

# Loop number_of_tries times
for x in range(0, number_of_tries):
    response = q_model.function_call(source_passage, params=[question_type])
    new_q = response["llm_response"]["question"]

    # Check to see that the question generated is unique
    if new_q and new_q not in questions:
        questions.append(new_q)

An important function here is q_model.function_call(). This is how the llmware library lets you prompt language models with just a single function call. Here, we pass in the source text and question type as arguments.

The function returns response, a dictionary with a lot of information about the call, but we're only interested in the question key, which is located inside the llm_response dictionary.

Step 4: Responding to our questions📝

Now that the questions have been generated, the duel is on! Let's use our answering model to now respond to these questions. We'll loop through our questions list, pass in the source passage as context to the model and ask each question.

# Loop through each question
for i, question in enumerate(questions):
    # Print out the question
    print(f"\nquestion: {i} - {question}")

    # Validate the question list and run inference
    if isinstance(question, list) and len(question) > 0:
        response = answer_model.inference(question[0], add_context=test_passage)

        # Print out the answer
        print(f"response: ", response["llm_response"])

It is important to note that our question model returns each question as a list, with the first element (question[0]) containing the actual string corresponding to the question.

For each question, we then need to perform some validation:

Check to see that the question is of the correct data type (list)
Check to see that the question is not empty.

Then, the answer_model.inference() function will ask our model the question, passing in the test_passage as context.

Finally, we print out the response.

Results!✅

Let's quickly look at our sample passage. This passage was taken from a CNBC news story in May 2024 about OpenAI's work with safety and security.

"OpenAI said Tuesday it has established a new committee to make recommendations to the company’s board about safety and security, weeks after dissolving a team focused on AI safety. In a blog post, OpenAI said the new committee would be led by CEO Sam Altman as well as Bret Taylor, the company’s board chair, and board member Nicole Seligman. The announcement follows the high-profile exit this month of an OpenAI executive focused on safety, Jan Leike. Leike resigned from OpenAI leveling criticisms that the company had under-invested in AI safety work and that tensions with OpenAI’s leadership had reached a breaking point."

Now, let's see what our output looks like!

We can see all the questions that were asked about the passage, as well as concise, fact-based responses given to them!

Note that there are only 9 questions here while we provided number_of_tries=10. This means that one question generated was a duplicate and was ignored.

Conclusion

And with that, we're done with this example! Recall that we used the llmware library to:

Load in a question and answer model
Generate unique questions about a source passage
Respond to each question with accuracy.

And remember that we did all of this on just a CPU! 💻

Check out our YouTube videon on this example!

If you made it this far, thank you for taking the time to go through this topic with us ❤️! For more content like this, make sure to visit our dev.to page.

The source code for many more examples like this one are on our GitHub. Find this example here.

Our repository also contains a notebook for this example that you can run yourself using Google Colab, Jupyter or any other platform that supports .ipynb notebooks.

Join our Discord to interact with a growing community of AI enthusiasts of all levels of experience!

Please be sure to visit our website llmware.ai for more information and updates.

🚀Supercharged SLIM models Multistep RAG analysis that never leaves your CPU🧑‍💻

Simon Risman — Tue, 02 Jul 2024 13:30:11 +0000

Many of us are used to models running in the cloud, sending API calls to far-away servers, filed away as training data for the next wave of GPTs. And how else would this even work? Surely an individual laptop just doesn't have the power to manage and execute the workflows that a cloud based service does.

Consider, for a moment, the mighty ant. At first glance, it may seem insignificant—a mere speck in the grand tapestry of nature. Yet, beneath its tiny exterior lies a powerhouse of strength, resilience, and ingenuity.

Enter SLIM - Structured Language Instruction Models.🏋️

These models are tiny and run comfortably on a CPU, but pack a punch when it comes to providing specialized, structured outputs. Instead of an AI summary being more bullet points or god forbid paragraphs, SLIM models output a variety of structured data like CSVs, JSONs, and SQL.

The highly specialized nature of the SLIM models is precisely what makes them so powerful - instead of a general solution to a large problem, stringing together a few SLIM models yields more robust performance with greater flexibility.

To show just how much these models can do, we are going to take a look at a tech tale worthy of invoking Gavin Belson: The partnership-turned-rivalry between Microsoft and IBM.

🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜🐜

0️⃣ Setup 🛠️

Make sure you have installed llmware and imported the libraries we are going to use. The code below should get you all set up.

Run this command in your terminal

pip install llmware

Add these imports to the top of your code.

import os
import shutil

from llmware.agents import LLMfx
from llmware.library import Library
from llmware.retrieval import Query
from llmware.configs import LLMWareConfig
from llmware.setup import Setup

1️⃣ Build a Knowledge Base of Microsoft Documents 📖

First we need to create a database to query. In your case it can be anything from customer service reports to earnings calls, but for now we will use a range of Microsoft-related documents.

def multistep_analysis():

    """ In this example, our objective is to research Microsoft history and rivalry in the 1980s with IBM. """

    #   step 1 - assemble source documents and create library

    print("update: Starting example - agent-multistep-analysis")

    #   note: the program attempts to automatically pull sample document into local path
    #   depending upon permissions in your environment, you may need to set up directly
    #   if you pull down the samples files with Setup().load_sample_files(), in the Books folder,
    #   you will find the source: "Bill-Gates-Biography.pdf"
    #   if you have pulled sample documents in the past, then to update to latest: set over_write=True

    print("update: Loading sample files")

    sample_files_path = Setup().load_sample_files(over_write=False)
    bill_gates_bio = "Bill-Gates-Biography.pdf"
    path_to_bill_gates_bio = os.path.join(sample_files_path, "Books", bill_gates_bio)

    microsoft_folder = os.path.join(LLMWareConfig().get_tmp_path(), "example_microsoft")

    print("update: attempting to create source input folder at path: ", microsoft_folder)

    if not os.path.exists(microsoft_folder):
        os.mkdir(microsoft_folder)
        os.chmod(microsoft_folder, 0o777)
        shutil.copy(path_to_bill_gates_bio,os.path.join(microsoft_folder, bill_gates_bio))

    #   create library
    print("update: creating library and parsing source document")

    LLMWareConfig().set_active_db("sqlite")
    my_lib = Library().create_new_library("microsoft_history_0210_1")
    my_lib.add_files(microsoft_folder)

2️⃣ Locate Mentions of IBM and Create an Agent to Process Them 🔍

In our first pass we focus on any mention of IBM, and since we have a multistep process we can analyse these instances on a more granular level.

query = "ibm"
    search_results = Query(my_lib).text_query(query)
    print(f"update: executing query to filter to key passages - {query} - results found - {len(search_results)}")

    #   create an agent and load several tools that we will be using
    agent = LLMfx()
    agent.load_tool_list(["sentiment", "emotions", "topic", "tags", "ner", "answer"])

    #   load the search results into the agent's work queue
    agent.load_work(search_results)

3️⃣ Pick out Negative Sentiment 🫳

This is where you get to decide the depth of your analysis for each item. For our scenario, we want only mentions of IBM that carry negative sentiment (evidence of the rivalry.)

    while True:

        agent.sentiment()

        if not agent.increment_work_iteration():
            break

    #   analyze sections where the sentiment on ibm was negative
    follow_up_list = agent.follow_up_list(key="sentiment", value="negative")

4️⃣ Deep Dive Analysis 🤿

Now that we have picked out the instances we want to explore further, we arm our agent with tools - each tool is a SLIM model built to perform at the highest level on each individual task, providing a comprehensive overview of the pertinent results.

for job_index in follow_up_list:

        # follow-up 'deep dive' on selected text that references ibm negatively
        agent.set_work_iteration(job_index)
        agent.exec_multitool_function_call(["tags", "emotions", "topics", "ner"])
        agent.answer("What is a brief summary?", key="summary")

    my_report = agent.show_report(follow_up_list)

    activity_summary = agent.activity_summary()

    for entries in my_report:
        print("my report entries: ", entries)

    return my_report

Results 🎉🎉🎉

Your multi-step local RAG model should return a filled out dictionary that looks something like this:

report 1 entries:  {'sentiment': ['negative'], 'tags': '["IBM", "COBOL", "PL/1", "BAL", "OS/2", "Presentation Manager", "K.", "OS/2 1.0", "December 1987", "1.0"]', 'emotions': ['anger'], 'topics': ['ibm'], 'people': [], 'organization': ['IBM'], 'misc': ['OS/2', 'Presentation Manager'], 'summary': ['•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote "clunky" code that was top-heavy with lines of documentation to make the software "easy to service."\t\t•IBM wrote'], 'source': {'query': 'ibm', '_id': '174', 'text': 'writers were contemptuous of IBM and it\'s coding   culture. In the increasingly irrelevant world of IBM, the classical   languages were COBOL, PL/1, and BAL (Basic Assembly Language),   NOT C!    J.    In addition, IBM wrote "clunky" code that was top-heavy with lines of   documentation to make the software "easy to service."   K.    Finally, in December 1987 OS/2 1.0 without Presentation Manager ', 'doc_ID': 1, 'block_ID': 173, 'page_num': 35, 'content_type': 'text', 'author_or_speaker': 'IBM_User', 'special_field1': '', 'file_source': 'Bill-Gates-Biography.pdf', 'added_to_collection': 'Mon Jul  1 13:14:36 2024', 'table': '', 'coords_x': 162, 'coords_y': 414, 'coords_cx': 34, 'coords_cy': 45, 'external_files': '', 'score': -4.040003091801133, 'similarity': 0.0, 'distance': 0.0, 'matches': [[29, 'ibm'], [100, 'ibm'], [215, 'ibm']], 'account_name': 'llmware', 'library_name': 'microsoft_history_0210_1'}}

The beauty of the output is the structured nature. You could easily write a program to hand off your report to, a program that wouldn't need to waste precious time parsing natural language and could just flip to the right part of the dictionary. Besides saving time, you also increase accuracy and consistency.

If you want to learn more, below is a video walkthrough for this tutorial.

The full code for this example can be found in our Github repo.

If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click here to join. See you there!🚀🚀🚀

Please be sure to visit our website llmware.ai for more information and updates.

🔉From Sound to Insights: Using AI🤖 for Audio File Transcription and Analysis!🚀

Prashant Iyer — Fri, 28 Jun 2024 20:57:16 +0000

If we were given an audio file, is there any way we could identify the time stamps where specific words were said? Is there any way we could extract all the key words mentioned about a topic?

With AI 🤖, we can do all of this and much more! The key lies in being able to parse audio into text, allowing us to then harness the natural language processing capabilities of language models to perform sophisticated analyses and inferences on our data.

Regardless of who you are, such an approach to audio transcription and analysis will augment how you interact with and extract knowledge from audio files.

Let's see how we can do this with llmware.

AI Tools 🤖

We'll be using two models for this example.

The first is Whisper by OpenAI. This is the model that will allow us to parse the audio files, i.e. convert them from audio to text.

The second is the SLIM (Structured Language Instruction Model) Extract Tool by LLMWare, which we'll be using to ask questions about our audio. This is a GGUF quantized version of a much larger model called slim-extract. All this means is that our model, the SLIM Extract Tool, is a smaller and faster version of the original model. This allows us to run it locally on a CPU, without the need for powerful computational resources like GPUs!

With that out of the way, let's get started with the example.

Step 1: Loading in audio files 🔉🔉

If you have audio files that you want to run the example with, then feel free to use those by setting input_folder appropriately, but if not, the llmware library provides you with several sets of sample audio files!



voice_sample_files = Setup().load_voice_sample_files(small_only=False)
input_folder = os.path.join(voice_sample_files, "greatest_speeches")

Here, we're loading in the greatest_speeches set of audio files.

Step 2: Parsing our audio files 📝

Now that we have our audio files, we can go about parsing them into chunks of text. Recall that we'll be needing the WhisperCPP model to do this. But fortunately, you won't have to directly interact with the model yourself since the Parser class from the llmware library will take care of this for you!



parser_output = Parser(chunk_size=400, max_chunk_size=600).parse_voice(input_folder, write_to_db=False, copy_to_library=False, remove_segment_markers=True, chunk_by_segment=True, real_time_progress=False)

Here, the chunk_size and max_chunk_size indicate how big each chunk of parsed text will be. We're passing in our folder containing the audio files to the parse_voice() function of the Parser class.

The function does accept many more optional arguments about how we'd like the audio to be parsed, but we can ignore them for this example.

Step 3: Text searching 🕵️

Let's now run a text search on our parsed audio. We can try searching for the word "president". What this means is that we want to find all the portions of the audio and corresponding text that have the word "president" in it. We can do this using the fast_search_dicts() function in the Utilies class in the llmware library.



results = Utilities().fast_search_dicts("president", parser_output)

Step 4: Making an AI call on text chunks 🤖

Now that we have a list of text blocks containing the word "president", lets use an AI model to identify which presidents are being mentioned in the selected text blocks.



extract_model = ModelCatalog().load_model("slim-extract-tool", sample=False, temperature=0.0, max_output=200)

Here, we're using the ModelCatalog class to load in our SLIM Extract Tool. Let's now iterate over each text block containing "president".



final_list = []
for i, res in enumerate(results):
    response = extract_model.function_call(res["text"], params=["president name"])

We're making a function_call() for "president name". This is how we ask our Tool to identify the president name in the text block.

Step 5: Analyzing our output 🔍

The function_call() function would have returned a dictionary containing a lot of data about the function call. We specifically want the president_name key in the dictionary.



extracted_name = ""
if "president_name" in response["llm_response"]:
    if len(response["llm_response"]["president_name"]) > 0:
        extracted_name = response["llm_response"]["president_name"][0].lower()
    else:
        print("\nupdate: skipping result - no president name found - ", response["llm_response"], res["text"])

If the value of the president_name key is a non-empty string, then we store its value in extracted_name. Otherwise, no result was found and we print this out.

Now lets see if the president name matched any of the recent American presidents in this list:



various_american_presidents = ["kennedy", "carter", "nixon", "reagan", "clinton", "obama"]

To do this, we'll check if the extracted_name contains any of these American presidents. If we have a match, then we'll add it to our final_list as a dictionary containing some information about the location of the name in the audio as well as the text block it was in.



for president in various_american_presidents:
    if president in extracted_name:
        final_list.append({"key": president, "source": res["file_source"], "time_start": res["coords_x"], "text": res["text"]})

Results! ✅

Let's now output the final_list.



for i, f in enumerate(final_list):
    print("final results: ", i, f)

This is what an one search result in the output would look after running the code.

Here, we have a Python dictionary as output containing:

key: the name of the president identified, which here is "kennedy"
source: the audio file this was found in, which here is "ConcessionStand.wav"
time_start: the time stamp in seconds where the president was mentioned, which here is 339.9 seconds
text: which contains the text chunk the name was found in.

Conclusion

And we're done! To recap, we were able to parse our audio files into text, run a text search on them for the word "president", and then use our SLIM Extract Tool to identify the specific presidents named in our text chunks! And remember that we did all this on just a CPU! 💻

Be sure to check out our YouTube video on this example!

If you made it this far, thank you for taking the time to go through this topic with us ❤️! For more content like this, make sure to visit our dev.to page.

The source code for many more examples like this one are on our GitHub. Find this example here.

Our repository also contains a notebook for this example that you can run yourself using Google Colab, Jupyter or any other platform that supports .ipynb notebooks.

Join our Discord to interact with a growing community of AI enthusiasts of all levels of experience!

Please be sure to visit our website llmware.ai for more information and updates.

🤖AI-Powered Contract Queries: Use Language Models for Effective Analysis!🔥

Prashant Iyer — Fri, 28 Jun 2024 20:48:31 +0000

Imagine you were given a large contract and asked a really specific question about it: "What is the notice for termination for convenience?" It would be an ordeal to locate the answer for this in the contract.

But what if we could use AI 🤖 to analyze the contract and answer this for us?

What we want here is to perform something known as retrieval-augmented generation (RAG). This is the process by which we give a language model some external sources (such as a contract). The external sources are intended to enhance the model's context, giving it a more comprehensive understanding of a topic. The model should then give us more accurate responses to the questions we ask it on the topic.

Now, a general purpose model like Chat-GPT might be able to answer questions about contracts with RAG, but what if we instead used a model that's been trained and fine-tuned specifically on contract data?

Our AI model 🤖

For this example, we'll be using LLMWare's dragon-yi-6b-gguf model. This model is RAG-finetuned for fact-based question-answering on complex business and legal documents.

This means that it is specialized in giving us short and concise responses to questions involving documents like contracts. This makes it perfect for our example!

This is also a GGUF quantized model, meaning that it is a smaller and faster version of the original 6 billion parameter dragon-yi-6b model. Fortunately for us, this means that we can run it on a CPU 💻 without the need for powerful computational resources like GPUs!

Now, let's look at an example of using the llmware library for contract analysis from start to end!

Step 1: Loading in files 📁

Let's start off by loading in our contracts to be analyzed. The llmware library provides sample contracts via the Setup class, but you can also use your own files in this example by replacing the agreements_path below.



local_path = Setup().load_sample_files()
agreements_path = os.path.join(local_path, "AgreementsLarge")

Here, we load in the AgreementsLarge set of files.

Next, we'll create a Library object and add our sample files to this library. An llmware library breaks documents down into text chunks and stores them in a database so that we can access them easily later.



msa_lib = Library().create_new_library("msa_lib503_635")
msa_lib.add_files(agreements_path)

Step 2: Locating MSA files 🔍

Let's say that we want to consider only MSA (master services agreements) files from our sample contracts.

We can first create a Query object containing all our files, and then run a text_search_by_page() to filter only the files that contain "master services agreement" on their front page.



q = Query(msa_lib)
query = "master services agreement"
results = q.text_search_by_page(query, page_num=1, results_only=False)
msa_docs = results["file_source"]

The results from the text search will be a dictionary containing detailed information about the text query. However, we're only interested in the file_source key representing the file names.

Great! We now have our MSA files.

Step 3: Loading our model 🪫🔋

Now, we can load in our model using the Prompt class in the llmware library.



model_name = "llmware/dragon-yi-6b-gguf"
prompter = Prompt().load_model(model_name)

Step 4: Analyzing our files using AI 🧠💡

Let's now iterate over our MSA files, and for each file, we'll:

identify the text chunks containing the word "termination",
add those chunks as a source for our AI call, and
run the AI call "What is the notice for termination for convenience?"

We can start by performing a text query for the word "termination".



for i, docs in enumerate(msa_docs):
    doc_filter = {"file_source": [docs]}
    termination_provisions = q.text_query_with_document_filter("termination", doc_filter)

We'll then add these termination_provisions as a source to our model.



sources = prompter.add_source_query_results(termination_provisions)

And with that done, we can call the LLM and ask it our question.



response = prompter.prompt_with_source("What is the notice for termination for convenience?")

Results! ✅

Let's print out our response and see what the output looks like.



for i, resp in enumerate(response):
    print("update: llm response - ", resp)

Here's what the output of our code looks like:

What we have is a Python dictionary with several keys, notably:

llm_response: giving us the answer to our question, which here is "30 days written notice"
evidence: giving us the text where the model found the answer to the question

The dictionary also contains detailed information about the metadata of the AI call, but these are not relevant to our example and have been omitted from the output above.

Human in the loop! 👤

We're not done just yet! If we wanted to generate a CSV report for a human to review the results of our analysis, we can make use of the HumanInTheLoop class. All we need to do is save the current state of our prompter and call the export_current_interaction_to_csv() function.



prompter.save_state()
csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()

Conclusion

And that brings us to the end of our example! To summarize, we used the llmware library to:

Load in sample files
Filter only the MSA files
Use the dragon-yi-6b-gguf model to ask questions about termination provisions.

And remember that we did all of this on just a CPU! 💻

Check out our YouTube video on this example!

If you made it this far, thank you for taking the time to go through this topic with us ❤️! For more content like this, make sure to visit our dev.to page.

The source code for many more examples like this one are on our GitHub. Find this example here.

Our repository also contains a notebook for this example that you can run yourself using Google Colab, Jupyter or any other platform that supports .ipynb notebooks.

Join our Discord to interact with a growing community of AI enthusiasts of all levels of experience!

Please be sure to visit our website llmware.ai for more information and updates.

The Hardest Problem in RAG... Handling 'NOT FOUND' Answers 🔍🤔

Will Taner — Mon, 24 Jun 2024 17:04:03 +0000

First of All... What is RAG? 🕵️‍♂️

Retrieval-Augmented Generation (RAG) is an approach to natural language processing that references external documents to provide more accurate and contextually relevant answers. Despite its advantages, RAG faces some challenges, one of which is handling 'NOT FOUND' answers. Addressing this issue is crucial for developing an effective and reliable model that everyone can use.

Why 'NOT FOUND' Answers Can Be Concerning ⛔️

Some models respond with "hallucinations" when they cannot find an answer, creating inaccurate responses that may mislead the user. This can undermine the trust users have in the model, making it less reliable and effective.

How Can We Remedy This? 🛠️

For starters, it is better for the model to inform the user that it could not find the answer rather than fabricating one.

Next, we will delve into one way LLMWare handles 'NOT FOUND' cases effectively. By examining these methods, we can gain a better understanding of how to address this issue and enhance the overall performance and reliability of RAG systems.

For the Visual Learners... 📺

Here is a video discussing the same topic as this article. A good idea would be to watch the video, and then work through the steps in this article.

Framework 🖼️

LLMWare
For our new readers, LLMWARE is a comprehensive, open-source framework that provides a unified platform for application patterns based on LLMs, including Retrieval Augmented Generation (RAG).

Please run pip3 install llmware in the command line to download the package.

Import Libraries and Create Context 📚

from llmware.models import ModelCatalog
from llmware.parsers import WikiParser

ModelCatalog: A class within llmware that manages selecting the desired model, loading the model, and configuring the model.

WikiParser: A class within llmware that handles the retrieval and packaging of content from Wikipedia.

text =("BEAVERTON, Ore.--(BUSINESS WIRE)--NIKE, Inc. (NYSE:NKE) today reported fiscal 2024 financial results for its "
      "third quarter ended February 29, 2024.) “We are making the necessary adjustments to drive NIKE’s next chapter "
      "of growth Post this Third quarter revenues were slightly up on both a reported and currency-neutral basis* "
      "at $12.4 billion NIKE Direct revenues were $5.4 billion, slightly up on a reported and currency-neutral basis "
      "NIKE Brand Digital sales decreased 3 percent on a reported basis and 4 percent on a currency-neutral basis "
      "Wholesale revenues were $6.6 billion, up 3 percent on a reported and currency-neutral basis Gross margin "
      "increased 150 basis points to 44.8 percent, including a detriment of 50 basis points due to restructuring charges "
      "Selling and administrative expense increased 7 percent to $4.2 billion, including $340 million of restructuring "
      "charges Diluted earnings per share was $0.77, including $0.21 of restructuring charges. Excluding these "
      "charges, Diluted earnings per share would have been $0.98* “We are making the necessary adjustments to "
      "drive NIKE’s next chapter of growth,” said John Donahoe, President & CEO, NIKE, Inc. “We’re encouraged by "
      "the progress we’ve seen, as we build a multiyear cycle of new innovation, sharpen our brand storytelling and "
      "work with our wholesale partners to elevate and grow the marketplace.")

Here is the initial text for our extraction. It provides details about the popular sports brand, Nike. Feel free to modify this text to suit your needs.

Create Key for Extraction 🔐

extract_key = "company founding date"
dict_key = extract_key.replace(" ", "_")

company_founding_date = ""

Here, we set the company founding date as the target extraction from the text.

Run Initial Extract 🏃

model = ModelCatalog().load_model("slim-extract-tool", temperature=0.0, sample=False)
response = model.function_call(text, function="extract", params=[extract_key])
llm_response = response["llm_response"]

Model: In this snippet, we load LLMWare's slim-extract-tool, which is a 2.8B parameter GGUF model that is fine tuned for general-purpose extraction (GGUF is a quantization method that allows for quicker inference time and decreased model size at the cost of accuracy).

Temperature: This controls the randomness of the output. Valid values range between 0 and 1, where lower values make the model more deterministic, and higher values make the model more random and creative.

Sample: Determines if the output is generated deterministically or probabilistically. False generates deterministic output. True generates probabilistic output.

We then attempt to extract the information from the text using the model and store it in llm_response.

If Answer is Found... ✅

if dict_key in llm_response:

        company_founding_date = llm_response[dict_key]

        if len(company_founding_date) > 0:

            company_founding_date = company_founding_date[0]
            print(f"update: found the {extract_key} value - ", company_founding_date)
            return company_founding_date

If the model successfully finds and extracts the company founding date, we will return the information.

If Answer is Not Found... ❌

else:

    print(f"update: did not find the target value in the text - {company_founding_date}")
    print("update: initiating a secondary process to try to find the information")

    response = model.function_call(text, function="extract", params=["company name"])

If the model does not find the company founding date, we will run a second query to find the company name for future use in gathering more information.

Retrieve Information from Wiki 📖

if "company_name" in response["llm_response"]:
    company_name = response["llm_response"]["company_name"][0]

    if company_name:
        print(f"\nupdate: found the company name - {company_name} - now using to lookup in secondary source")

        output = WikiParser().add_wiki_topic(company_name,target_results=1)

After extracting the company's name from the text, we will then retrieve additional information about the company from Wiki.

Generate a Summary Snippet from Retrieved Article Data ✍️

if output:

    supplemental_text = output["articles"][0]["summary"]

    if len(supplemental_text) > 150:
        supplemental_text_pp = supplemental_text[0:150] + " ... "
    else:
        supplemental_text_pp = supplemental_text

    print(f"update: using lookup - {company_name} - found secondary source article "
                              f"(extract displayed) - ", supplemental_text_pp)

If we have successfully retrieved additional data from the Wiki, we truncate the response if it is over 150 characters and set supplemental_text_pp to

Call Extract Again With New Information 📞

new_response = model.function_call(supplemental_text,params=["company founding date"])

print("\nupdate: reviewed second source article - ", new_response["llm_response"])

Using the new information retrieved from Wiki, we run the same extraction on the model again.

Print Response If Found 🖨️

if "company_founding_date" in new_response["llm_response"]:
    company_founding_date = new_response["llm_response"]["company_founding_date"]
    if company_founding_date:
        print("update: success - found the answer - ", company_founding_date)

If we find the company founding date after incorporating the new information, we print the result.

Fully Integrated Code 🧑‍💻

from llmware.models import ModelCatalog
from llmware.parsers import WikiParser

text =("BEAVERTON, Ore.--(BUSINESS WIRE)--NIKE, Inc. (NYSE:NKE) today reported fiscal 2024 financial results for its "
      "third quarter ended February 29, 2024.) “We are making the necessary adjustments to drive NIKE’s next chapter "
      "of growth Post this Third quarter revenues were slightly up on both a reported and currency-neutral basis* "
      "at $12.4 billion NIKE Direct revenues were $5.4 billion, slightly up on a reported and currency-neutral basis "
      "NIKE Brand Digital sales decreased 3 percent on a reported basis and 4 percent on a currency-neutral basis "
      "Wholesale revenues were $6.6 billion, up 3 percent on a reported and currency-neutral basis Gross margin "
      "increased 150 basis points to 44.8 percent, including a detriment of 50 basis points due to restructuring charges "
      "Selling and administrative expense increased 7 percent to $4.2 billion, including $340 million of restructuring "
      "charges Diluted earnings per share was $0.77, including $0.21 of restructuring charges. Excluding these "
      "charges, Diluted earnings per share would have been $0.98* “We are making the necessary adjustments to "
      "drive NIKE’s next chapter of growth,” said John Donahoe, President & CEO, NIKE, Inc. “We’re encouraged by "
      "the progress we’ve seen, as we build a multiyear cycle of new innovation, sharpen our brand storytelling and "
      "work with our wholesale partners to elevate and grow the marketplace.")


def not_found_then_triage_lookup():

    print("\nNot Found Example - if info not found, then lookup in another source.\n")

    extract_key = "company founding date"
    dict_key = extract_key.replace(" ", "_")

    company_founding_date = ""

    model = ModelCatalog().load_model("slim-extract-tool", temperature=0.0, sample=False)
    response = model.function_call(text, function="extract", params=[extract_key])
    llm_response = response["llm_response"]

    print(f"update: first text reviewed for {extract_key} - llm response: ", llm_response)

    if dict_key in llm_response:

        company_founding_date = llm_response[dict_key]

        if len(company_founding_date) > 0:

            company_founding_date = company_founding_date[0]
            print(f"update: found the {extract_key} value - ", company_founding_date)
            return company_founding_date

        else:

            print(f"update: did not find the target value in the text - {company_founding_date}")
            print("update: initiating a secondary process to try to find the information")

            response = model.function_call(text, function="extract", params=["company name"])

            if "company_name" in response["llm_response"]:
                company_name = response["llm_response"]["company_name"][0]

                if company_name:
                    print(f"\nupdate: found the company name - {company_name} - now using to lookup in secondary source")

                    output = WikiParser().add_wiki_topic(company_name,target_results=1)

                    if output:

                        supplemental_text = output["articles"][0]["summary"]

                        if len(supplemental_text) > 150:
                            supplemental_text_pp = supplemental_text[0:150] + " ... "
                        else:
                            supplemental_text_pp = supplemental_text

                        print(f"update: using lookup - {company_name} - found secondary source article "
                              f"(extract displayed) - ", supplemental_text_pp)

                        new_response = model.function_call(supplemental_text,params=["company founding date"])

                        print("\nupdate: reviewed second source article - ", new_response["llm_response"])

                        if "company_founding_date" in new_response["llm_response"]:
                            company_founding_date = new_response["llm_response"]["company_founding_date"]
                            if company_founding_date:
                                print("update: success - found the answer - ", company_founding_date)

    return company_founding_date


if __name__ == "__main__":

    founding_date = not_found_then_triage_lookup()

You may also find the fully integrated code on our github here

Additionally, the notebook version (ipynb) is available here

Conclusion 🤖

Handling 'NOT FOUND' answers is one of the hardest problems in RAG, but it's a challenge that can be mitigated with thoughtful design. By implementing techniques like broader lookups, LLMWare aims to enhance the overall user experience and reliability of its AI systems.

Please check out our Github and leave a star! https://github.com/llmware-ai/llmware

Please be sure to visit our website llmware.ai for more information and updates.

Are we all prompting wrong? Balancing Creativity and Consistency in RAG.

Simon Risman — Mon, 17 Jun 2024 18:44:07 +0000

For a Boston native like myself, there are few things more heartwarming than Artificial Intelligence understanding the brilliance of Good Will Hunting. A few cursory prompts reveal that it views it as a "must-watch tale of redemption and self discovery".

But a slightly closer look reveals what many users of LLMs have accepted as a given - slight variations on an otherwise consistent topic. This is the result of Stochastic Generation.

Stochastic generation 🤖

This is a fairly common term, from online bootcamps to college lectures, students of AI are familiar with this concept. For those who need a quick refresher, here is the 3-step generation loop that many LLMs follow.

LLMs are trained using a next-token prediction task, where the model predicts the next token in a sequence based on the previous tokens. This process involves:

Tokenized Input: The input text is converted into a sequence of numbers (tokens).
Probability Distribution: The model generates a probability distribution over the possible next tokens.
Sampling Algorithm: This distribution is passed through a sampling algorithm to select the next token.

The probabilistic elements that this process introduces enables LLMs to generate more captivating dialogue, novel images, and creatively praise award-winning films.

Randomness and RAG 🎰

When building RAG based applications, we are often not as concerned with creativity as we are with facts. When dealing with facts, we want as little probability involved as possible. In other words, instead of sampling a probability distribution, its beneficial to just take the token with the maximum likelihood every time.

LLMWARE allows you to explore how random your generated results are, as well as augment how random you want them to be. Heres a quick demonstration:

Demo 🙌

Load the model

model = ModelCatalog().load_model("bling-stablelm-3b-tool",
                                  sample=True,
                                  temperature=0.3,
                                  get_logits=True,
                                  max_output=123)

In the load_model method, we make a few important selections. The bling 3B is one of our newest and highest performing models.

Setting the sample attribute to True or False will allow you to change between a stochastic approach and a top-token model.

The temperature can be an important tool to control the randomness of the output, with lower values making responses more focused and higher values increasing diversity in the generated text.

These key settings will allow you to see what kind of approach you want to take when it comes to the probabilistic nature of your model.

Run a simple inference model on some sample text

response = model.inference("What is a list of the key points?", sample)

This step is where your model is doing the heavy lifting, analyzing and summarizing the loaded-in documents.

Run a sampling analysis

sampling_analysis = ModelCatalog().analyze_sampling(response)
print("sampling analysis: ", sampling_analysis)

Now you get to see the analytics - giving you a better idea of how heavily your model samples from the lower-probability side of the distribution.

This analysis will include what percentage of the tokens selected by the model were also the highest probability output, and will note cases where the not-top-token was selected.

In cases where the top token was not selected, the below code will print out the exact entries of the outputs, including their token rank.

for i, entries in enumerate(sampling_analysis["not_top_tokens"]):
    print("sampled choices: ", i, entries)

All these tools can help you make an informed decision on whether you want your model to think a little outside the box, or stick to the most likely answer. To see this process in action, check out our youtube video on consistent LLM output generation.

The full code for this example can be found in our Github repo.

If you have any questions, or would like to learn more about LLMWARE, come to our Discord community. Click here to join. See you there!🚀🚀🚀

Please be sure to visit our website llmware.ai for more information and updates.