<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yemi Adejumobi</title>
    <description>The latest articles on Forem by Yemi Adejumobi (@yemi_adejumobi).</description>
    <link>https://forem.com/yemi_adejumobi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1786281%2F36423309-2602-483b-8c54-9ed73f815f3b.png</url>
      <title>Forem: Yemi Adejumobi</title>
      <link>https://forem.com/yemi_adejumobi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yemi_adejumobi"/>
    <language>en</language>
    <item>
      <title>Run &amp; Debug your LLM Apps locally using Ollama &amp; Llama 3.1</title>
      <dc:creator>Yemi Adejumobi</dc:creator>
      <pubDate>Wed, 14 Aug 2024 19:34:25 +0000</pubDate>
      <link>https://forem.com/yemi_adejumobi/run-debug-your-llm-apps-locally-using-ollama-llama-31-39mc</link>
      <guid>https://forem.com/yemi_adejumobi/run-debug-your-llm-apps-locally-using-ollama-llama-31-39mc</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxh3mmcjfamyf7flb94w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxh3mmcjfamyf7flb94w.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the rapidly evolving landscape of AI and ML, large language models (LLMs) have become increasingly powerful and ubiquitous. However, the costs and complexities associated with running these models in cloud environments can be prohibitive, especially for developers and small teams looking to experiment and innovate.&lt;/p&gt;

&lt;p&gt;Enter Ollama, a game-changing tool that brings the power of LLMs to your local machine. This blog post will explore how Ollama can simplify your development process, allowing you to run LLM applications locally with ease and efficiency while adding Langtrace, an open-source observability tool that complements Ollama perfectly, providing crucial insights into your LLM application's performance and behavior.&lt;/p&gt;

&lt;p&gt;Whether you're a seasoned AI developer or just starting your journey with language models, this guide will equip you with the knowledge and tools to take your LLM projects to the next level. Let’s dive in. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Ollama?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ollama is an innovative tool that enables running large language models (LLMs) locally, providing a cost-effective solution for testing and development. By running LLMs locally, you can experiment and refine your ideas without incurring significant production costs.&lt;/p&gt;

&lt;p&gt;By running LLMs locally, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduce cloud costs&lt;/strong&gt;: Save on cloud computing expenses by running LLMs on your local machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster experimentation&lt;/strong&gt;: Quickly test and iterate on your ideas without relying on remote servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved data privacy&lt;/strong&gt;: Keep your data local and secure, reducing the risk of data breaches.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Setting up Ollama and running LLMs locally&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;For this step, we will be using Meta’s latest open source model, Llama3.1. For most optimal performance with Ollama ensure your laptop has at least 16GB of RAM. If you do then follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download and install Ollama &lt;a href="https://ollama.com/download" rel="noopener noreferrer"&gt;https://ollama.com/download&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Download the desired LLM model (e.g., Llama3.1 or other open-source models).  In a terminal window run the following to run &lt;code&gt;llama3.1&lt;/code&gt; locally for example
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run llama3.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This is similar to docker commands, it will pull and run llama3.1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5b40rt6kp9hcczhpdek3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5b40rt6kp9hcczhpdek3.png" alt="Image description" width="727" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once it is done pulling, you should have a terminal prompt you can start chatting from.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bvxfsgzqbl1ahxzx9tj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bvxfsgzqbl1ahxzx9tj.png" alt="Image description" width="693" height="117"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For further customization and to use &lt;code&gt;Modelfile&lt;/code&gt; to create your own custom system prompt, refer to Ollama documentation here.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Instrumenting Ollama with Langtrace&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now that you have a local LLM, let’s say you are building a customer service bot and you would like to view detailed traces on the LLM requests, this is where Langtrace shines. Langtrace provides a Python SDK that enables observability for Ollama, allowing you to trace LLM calls and gain valuable insights into your application's performance. To instrument Ollama with Langtrace:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate an API key from &lt;a href="http://langtrace.ai" rel="noopener noreferrer"&gt;langtrace.ai&lt;/a&gt; - you can also self-host. &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.langtrace.ai/introduction#step-2-install-the-sdk-on-your-project" rel="noopener noreferrer"&gt;Install&lt;/a&gt; the Langtrace Python or Typescript SDK. &lt;/li&gt;
&lt;li&gt;Import the SDK and initialize the SDK. &lt;/li&gt;
&lt;li&gt;Start tracing!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example code snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;from langtrace_python_sdk import langtrace, with_langtrace_root_span
import ollama
from dotenv import load_dotenv

load_dotenv&lt;span class="o"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;# langtrace.init(write_spans_to_console=False)&lt;/span&gt;
langtrace.init&lt;span class="o"&gt;(&lt;/span&gt;api_key &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'YOUR_API_KEY'&lt;/span&gt;, &lt;span class="nv"&gt;write_spans_to_console&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;False&lt;span class="o"&gt;)&lt;/span&gt;

@with_langtrace_root_span&lt;span class="o"&gt;()&lt;/span&gt;
def give_recs&lt;span class="o"&gt;()&lt;/span&gt;:
  response &lt;span class="o"&gt;=&lt;/span&gt; ollama.chat&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'llama3.1'&lt;/span&gt;, &lt;span class="nv"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;
    &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s1"&gt;'role'&lt;/span&gt;: &lt;span class="s1"&gt;'user'&lt;/span&gt;,
      &lt;span class="s1"&gt;'content'&lt;/span&gt;: &lt;span class="s1"&gt;'You are an AI assistant with expertise in mens clothing. Help me pick clothing for a black tie dinner at work.'&lt;/span&gt;,
    &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;])&lt;/span&gt;
  print&lt;span class="o"&gt;(&lt;/span&gt;response[&lt;span class="s1"&gt;'message'&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;__name__ &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"__main__"&lt;/span&gt;:
  print&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Running fashionista bot..."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  give_recs&lt;span class="o"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is what the trace looks like in Langtrace UI&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp49e9j0zpsmewxdxw0rv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp49e9j0zpsmewxdxw0rv.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a link to a reference &lt;a href="https://github.com/Scale3-Labs/langtrace-recipes/blob/main/integrations/tools/ollama/ollama-fashionista.ipynb" rel="noopener noreferrer"&gt;cookbook&lt;/a&gt; for Ollama integration with &lt;a href="http://www.langtrace.ai" rel="noopener noreferrer"&gt;Langtrace&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Tracing LLM call&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With Langtrace, you can now trace LLM calls and capture essential metadata, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input, Output and Total tokens&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Error rates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This data provides valuable insights into your application's performance, helping you optimize and improve it over time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0t1ojyheftey0ou8byqe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0t1ojyheftey0ou8byqe.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next blog in this series, we will cover how to use Langtrace to perform evaluations on your application’s accuracy and optimize its behavior. &lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Update
&lt;/h2&gt;

&lt;p&gt;I added a UI option to this bot. Feel free to check out the code &lt;a href="https://github.com/Scale3-Labs/langtrace-recipes/blob/main/integrations/tools/ollama/ollama-fashionistav2.py" rel="noopener noreferrer"&gt;here&lt;/a&gt;. I use Streamlit for the UI but you can swap it out for Gradio or any other library.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxh3mmcjfamyf7flb94w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxh3mmcjfamyf7flb94w.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To see this in action, install Streamlit&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the code using&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;streamlit run ollama-fashionistav2.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Next steps&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In conclusion, combining Ollama's local LLM capabilities with Langtrace's observability features unlocks a powerful toolset for building and optimizing LLM applications. By following the steps outlined in this post, you can leverage the benefits of running LLMs locally with Ollama, including reduced cloud costs, accelerated experimentation, and improved data privacy. &lt;/p&gt;

&lt;p&gt;With &lt;a href="http://www.langtrace.ai" rel="noopener noreferrer"&gt;Langtrace&lt;/a&gt;, you can gain valuable insights into your application's performance, identify bottlenecks, and optimize its behavior. By integrating Ollama and &lt;a href="http://www.langtrace.ai" rel="noopener noreferrer"&gt;Langtrace&lt;/a&gt;, you can build more efficient, effective, and innovative LLM applications. Try out Ollama and &lt;a href="http://www.langtrace.ai" rel="noopener noreferrer"&gt;Langtrace&lt;/a&gt; today and discover the advantages of local LLM development and open-source observability for yourself!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ollama</category>
      <category>llama31</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building a Traceable RAG System with Qdrant and Langtrace: A Step-by-Step Guide</title>
      <dc:creator>Yemi Adejumobi</dc:creator>
      <pubDate>Tue, 16 Jul 2024 19:24:48 +0000</pubDate>
      <link>https://forem.com/yemi_adejumobi/building-a-traceable-rag-system-with-qdrant-and-langtrace-a-step-by-step-guide-47ki</link>
      <guid>https://forem.com/yemi_adejumobi/building-a-traceable-rag-system-with-qdrant-and-langtrace-a-step-by-step-guide-47ki</guid>
      <description>&lt;p&gt;Vector databases are the backbone of AI applications, providing the crucial infrastructure for efficient similarity search and retrieval of high-dimensional data. Among these, &lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt; stands out as one of the most versatile projects. Written in Rust, Qdrant is a vector search database designed for turning embeddings or neural network encoders into full-fledged applications for matching, searching, recommending, and more.&lt;/p&gt;

&lt;p&gt;In this blog post, we'll explore how to leverage &lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt; in a Retrieval-Augmented Generation (RAG) system and demonstrate how to trace its operations using Langtrace. This combination allows us to build and optimize AI applications that can understand and generate human-like text based on vast amounts of information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Code Repository
&lt;/h3&gt;

&lt;p&gt;Before we dive into the details, I'm excited to share that the complete code for this RAG system implementation is available in our GitHub repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Scale3-Labs/langtrace-recipes/tree/main/integrations/vector-db/qdrant/rag-tracing-with-qdrant-langtrace" rel="noopener noreferrer"&gt;RAG System with Qdrant and Langtrace&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This repository contains all the code examples discussed in this blog post, along with additional scripts, documentation, and setup instructions. Feel free to clone, fork, or star the repository if you find it useful!&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a RAG System?
&lt;/h3&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is an AI framework that enhances large language models (LLMs) with external knowledge. The process typically involves three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: Given a query, relevant information is retrieved from a knowledge base (in our case, stored in Qdrant).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augmentation&lt;/strong&gt;: The retrieved information is combined with the original query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt;: An LLM uses the augmented input to generate a response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach allows for more accurate and up-to-date responses, as the system can reference specific information rather than relying solely on its pre-trained knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing a RAG System with Qdrant
&lt;/h2&gt;

&lt;p&gt;Let's walk through the process of implementing a RAG system using Qdrant as our vector database. We'll use OpenAI's GPT model for generation and Langtrace for tracing our system's operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up the Environment
&lt;/h3&gt;

&lt;p&gt;First, we need to set up our environment with the necessary libraries:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

import os
import &lt;span class="nb"&gt;time
&lt;/span&gt;import openai
from qdrant_client import QdrantClient, models
from langtrace_python_sdk import langtrace, with_langtrace_root_span
from typing import List, Dict, Any

&lt;span class="c"&gt;# Initialize environment and clients&lt;/span&gt;
os.environ[&lt;span class="s2"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your_openai_api_key_here"&lt;/span&gt;
langtrace.init&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'your_langtrace_api_key_here'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
qdrant_client &lt;span class="o"&gt;=&lt;/span&gt; QdrantClient&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;":memory:"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; 
openai_client &lt;span class="o"&gt;=&lt;/span&gt; openai.Client&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;os.getenv&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Initializing the Knowledge Base
&lt;/h3&gt;

&lt;p&gt;Next, we'll create a function to initialize our knowledge base in Qdrant:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

@with_langtrace_root_span&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"initialize_knowledge_base"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
def initialize_knowledge_base&lt;span class="o"&gt;(&lt;/span&gt;documents: List[str]&lt;span class="o"&gt;)&lt;/span&gt; -&amp;gt; None:
    start_time &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;# Check if collection exists, if not create it&lt;/span&gt;
    collections &lt;span class="o"&gt;=&lt;/span&gt; qdrant_client.get_collections&lt;span class="o"&gt;()&lt;/span&gt;.collections
    &lt;span class="k"&gt;if &lt;/span&gt;not any&lt;span class="o"&gt;(&lt;/span&gt;collection.name &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"knowledge-base"&lt;/span&gt; &lt;span class="k"&gt;for &lt;/span&gt;collection &lt;span class="k"&gt;in &lt;/span&gt;collections&lt;span class="o"&gt;)&lt;/span&gt;:
        qdrant_client.create_collection&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"knowledge-base"&lt;/span&gt;
        &lt;span class="o"&gt;)&lt;/span&gt;
        print&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Created 'knowledge-base' collection"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

    qdrant_client.add&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"knowledge-base"&lt;/span&gt;,
        &lt;span class="nv"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;documents
    &lt;span class="o"&gt;)&lt;/span&gt;
    end_time &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    print&lt;span class="o"&gt;(&lt;/span&gt;f&lt;span class="s2"&gt;"Knowledge base initialized with {len(documents)} documents in {end_time - start_time:.2f} seconds"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Querying the Vector Database
&lt;/h3&gt;

&lt;p&gt;We'll create a function to query our Qdrant vector database:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

@with_langtrace_root_span&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"query_vector_db"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
def query_vector_db&lt;span class="o"&gt;(&lt;/span&gt;question: str, n_points: int &lt;span class="o"&gt;=&lt;/span&gt; 3&lt;span class="o"&gt;)&lt;/span&gt; -&amp;gt; List[Dict[str, Any]]:
    start_time &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    results &lt;span class="o"&gt;=&lt;/span&gt; qdrant_client.query&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"knowledge-base"&lt;/span&gt;,
        &lt;span class="nv"&gt;query_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;question,
        &lt;span class="nv"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;n_points,
    &lt;span class="o"&gt;)&lt;/span&gt;
    end_time &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return &lt;/span&gt;results


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Generating LLM Responses
&lt;/h3&gt;

&lt;p&gt;We'll use OpenAI's GPT model to generate responses:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

@with_langtrace_root_span&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"generate_llm_response"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
def generate_llm_response&lt;span class="o"&gt;(&lt;/span&gt;prompt: str, model: str &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gpt-3.5-turbo"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; -&amp;gt; str:
    start_time &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    completion &lt;span class="o"&gt;=&lt;/span&gt; openai_client.chat.completions.create&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model,
        &lt;span class="nv"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;
            &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"role"&lt;/span&gt;: &lt;span class="s2"&gt;"user"&lt;/span&gt;, &lt;span class="s2"&gt;"content"&lt;/span&gt;: prompt&lt;span class="o"&gt;}&lt;/span&gt;,
        &lt;span class="o"&gt;]&lt;/span&gt;,
        &lt;span class="nb"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10.0,
    &lt;span class="o"&gt;)&lt;/span&gt;
    end_time &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    response &lt;span class="o"&gt;=&lt;/span&gt; completion.choices[0].message.content
    &lt;span class="k"&gt;return &lt;/span&gt;response


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  The RAG Process
&lt;/h3&gt;

&lt;p&gt;Finally, we'll tie it all together in our RAG function:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

@with_langtrace_root_span&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"rag"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
def rag&lt;span class="o"&gt;(&lt;/span&gt;question: str, n_points: int &lt;span class="o"&gt;=&lt;/span&gt; 3&lt;span class="o"&gt;)&lt;/span&gt; -&amp;gt; str:
    print&lt;span class="o"&gt;(&lt;/span&gt;f&lt;span class="s2"&gt;"Processing RAG for question: {question}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

    context_start &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    context &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;.join&lt;span class="o"&gt;([&lt;/span&gt;r.document &lt;span class="k"&gt;for &lt;/span&gt;r &lt;span class="k"&gt;in &lt;/span&gt;query_vector_db&lt;span class="o"&gt;(&lt;/span&gt;question, n_points&lt;span class="o"&gt;)])&lt;/span&gt;
    context_end &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;

    prompt_start &lt;span class="o"&gt;=&lt;/span&gt; time.time&lt;span class="o"&gt;()&lt;/span&gt;
    metaprompt &lt;span class="o"&gt;=&lt;/span&gt; f&lt;span class="s2"&gt;"""
    You are a software architect.
    Answer the following question using the provided context.
    If you can't find the answer, do not pretend you know it, but answer "&lt;/span&gt;I don&lt;span class="s1"&gt;'t know".

    Question: {question.strip()}

    Context:
    {context.strip()}

    Answer:
    """
    prompt_end = time.time()

    answer = generate_llm_response(metaprompt)
    print(f"RAG completed, answer length: {len(answer)} characters")
    return answer


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Tracing with Langtrace
&lt;/h2&gt;

&lt;p&gt;As you may have noticed, we've decorated our functions with &lt;code&gt;@with_langtrace_root_span&lt;/code&gt;. This allows us to trace the execution of our RAG system using Langtrace, an open-source LLM observability tool. You can read more about group traces in the Langtrace &lt;a href="https://docs.langtrace.ai/features/grouptraces" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;. &lt;/p&gt;
&lt;h3&gt;
  
  
  What is Langtrace?
&lt;/h3&gt;

&lt;p&gt;Langtrace is a powerful, open-source tool designed specifically for LLM observability. It provides developers with the ability to trace, monitor, and analyze the performance and behavior of LLM-based systems. By using Langtrace, we can gain valuable insights into our RAG system's operation, helping us to optimize performance, identify bottlenecks, and ensure the reliability of our AI applications.&lt;/p&gt;

&lt;p&gt;Key features of Langtrace include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy integration with existing LLM applications&lt;/li&gt;
&lt;li&gt;Detailed tracing of LLM operations&lt;/li&gt;
&lt;li&gt;Performance metrics and analytics&lt;/li&gt;
&lt;li&gt;Open-source nature, allowing for community contributions and customizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our RAG system, each decorated function will create a span in our trace, providing a comprehensive view of the system's execution flow. This level of observability is crucial when working with complex AI systems like RAG, where multiple components interact to produce the final output.&lt;/p&gt;
&lt;h3&gt;
  
  
  Using Langtrace in Our RAG System
&lt;/h3&gt;

&lt;p&gt;Here's how we're using Langtrace in our implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We initialize Langtrace at the beginning of our script:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langtrace_python_sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;langtrace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;with_langtrace_root_span&lt;/span&gt;
&lt;span class="n"&gt;langtrace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_langtrace_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;We decorate each main function with&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="nd"&gt;@with_langtrace_root_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# function implementation
&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This setup allows us to create a hierarchical trace of our RAG system's execution, from initializing the knowledge base to generating the final response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the RAG System
&lt;/h2&gt;

&lt;p&gt;Let's test our RAG system with a few sample questions:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

def demonstrate_different_queries&lt;span class="o"&gt;()&lt;/span&gt;:
    questions &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"What is Qdrant used for?"&lt;/span&gt;,
        &lt;span class="s2"&gt;"How does Docker help developers?"&lt;/span&gt;,
        &lt;span class="s2"&gt;"What is the purpose of MySQL?"&lt;/span&gt;,
        &lt;span class="s2"&gt;"Can you explain what FastAPI is?"&lt;/span&gt;,
    &lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;question &lt;span class="k"&gt;in &lt;/span&gt;questions:
        try:
            answer &lt;span class="o"&gt;=&lt;/span&gt; rag&lt;span class="o"&gt;(&lt;/span&gt;question&lt;span class="o"&gt;)&lt;/span&gt;
            print&lt;span class="o"&gt;(&lt;/span&gt;f&lt;span class="s2"&gt;"Question: {question}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            print&lt;span class="o"&gt;(&lt;/span&gt;f&lt;span class="s2"&gt;"Answer: {answer}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        except Exception as e:
            print&lt;span class="o"&gt;(&lt;/span&gt;f&lt;span class="s2"&gt;"Error processing question '{question}': {str(e)}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Initialize knowledge base and run queries&lt;/span&gt;
documents &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;"Qdrant is a vector database &amp;amp; vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Docker helps developers build, share, and run applications anywhere — without tedious environment configuration or management."&lt;/span&gt;,
    &lt;span class="s2"&gt;"PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing."&lt;/span&gt;,
    &lt;span class="s2"&gt;"MySQL is an open-source relational database management system (RDBMS). A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database."&lt;/span&gt;,
    &lt;span class="s2"&gt;"NGINX is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. NGINX is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption."&lt;/span&gt;,
    &lt;span class="s2"&gt;"FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints."&lt;/span&gt;,
    &lt;span class="s2"&gt;"SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining."&lt;/span&gt;,
    &lt;span class="s2"&gt;"The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed times, dates, or intervals."&lt;/span&gt;,
&lt;span class="o"&gt;]&lt;/span&gt;
initialize_knowledge_base&lt;span class="o"&gt;(&lt;/span&gt;documents&lt;span class="o"&gt;)&lt;/span&gt;
demonstrate_different_queries&lt;span class="o"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj6xl5j63ludr7o5vxgg1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj6xl5j63ludr7o5vxgg1.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Analyzing the Traces
&lt;/h2&gt;

&lt;p&gt;After running our RAG system, we can analyze the traces in the Langtrace dashboard. Here's what to look for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check the Langtrace dashboard for a visual representation of the traces.&lt;/li&gt;
&lt;li&gt;Look for the 'rag' root span and its child spans to understand the flow of operations.&lt;/li&gt;
&lt;li&gt;Examine the timing information printed for each operation to identify potential bottlenecks.&lt;/li&gt;
&lt;li&gt;Review any error messages printed to understand and address issues.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this blog post, we've explored how to leverage Qdrant, a powerful vector database, in building a Retrieval-Augmented Generation (RAG) system. We've implemented a complete RAG pipeline, from initializing the knowledge base to generating responses, and added tracing with Langtrace to gain insights into our system's performance. By leveraging open-source tools like Qdrant for vector search and Langtrace for LLM observability, we're not only building powerful AI applications but also contributing to and benefiting from the broader AI development community. These tools empower developers to create, optimize, and understand complex AI systems, paving the way for more reliable AI applications in the future.&lt;/p&gt;

&lt;p&gt;Remember, you can find the complete implementation of this RAG system in our &lt;a href="https://github.com/Scale3-Labs/langtrace-recipes/tree/main/integrations/vector-db/qdrant/rag-tracing-with-qdrant-langtrace" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. We encourage you to explore the code, experiment with it, and adapt it to your specific use cases. If you have any questions or improvements, feel free to open an issue or submit a pull request. Happy coding!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llmops</category>
      <category>observability</category>
    </item>
  </channel>
</rss>
