Forem: M Quamer Nasim

Enhancing Data Security with Role-Based Access Control of Qdrant Vector Database

M Quamer Nasim — Fri, 31 May 2024 09:54:59 +0000

Data security has emerged as a major concern with the growing need for Retrieval Augmented Generation (RAG)-powered Generative AI applications in large companies. At the heart of RAG applications lies the vector database, which stores all the company’s proprietary data. This database is used by large language models (LLMs) to perform similarity searches and retrieve relevant content.

In large organizations, there are multiple levels, various departments, and different roles, each with access to different levels of sensitive information. For example, financial and company roadmap-related documents may only be accessible to top officials and are not required by developers. Therefore, it’s essential to restrict database or collection access based on defined roles. This approach not only helps to maintain security but also ensures that LLMs provide accurate and relevant responses based on their roles.

To address these needs, new Role-Based Access Control (RBAC) options have been introduced via JSON Web Tokens (JWT) in the Qdrant Vector Database in their latest 1.9 release. API keys previously supported basic read and write operations. However, recognizing the evolving needs of users, particularly large organizations, additional options for finer control over data access within internal environments have been implemented.

Qdrant 1.9: Introducing Role-Based Access Control and JWT Tokens

With the release of Qdrant version 1.9, significant advancements have been made in enhancing data security through the introduction of RBAC and JWT tokens. These new access control options offer a more granular and secure way to manage data access within large organizations.

In the earlier version of Qdrant, access control was managed using API keys, which supported basic read and write operations. In Qdrant’s 1.9 version, they have implemented additional access control options using JSON Web Tokens (JWT).

JWT allows a user to have limited access to specific data or collections in the database. By using JWT-based authentication, tokens with restricted access can be issued, which will help in the implementation of RBAC. This basically means administrators can define permissions for users, restricting access to sensitive endpoints and ensuring that only authorized individuals can access particular data segments.

The use of RBAC will help administrators assign specific roles and privileges to users based on their positions and responsibilities within the organization. This will be very useful in environments where different departments and roles require varying levels of access to the vector database. For instance, while developers might need access to certain datasets, financial information can be restricted to top-level executives.

Role-Based Access Control (RBAC)

RBAC in Qdrant allows administrators to define roles and assign specific privileges to users based on their roles within the organization. This ensures that users only have access to the data and actions necessary for their role, enhancing security and operational efficiency. Administrators can use the table below that outlines the actions allowed or denied based on the access level.

Actions allowed for different Roles (Symbols: ✅ Allowed | ❌ Denied | 🟡 Allowed, but filtered)

By using JWT tokens and RBAC, Qdrant ensures that each user has the appropriate level of access to perform their tasks efficiently while maintaining strict security protocols. This system provides a scalable and secure approach to managing user permissions, making it ideal for enterprises of all sizes.

Qdrant emerges as the best choice for organizations seeking fine-grained user access control and enhanced security measures. Unlike other databases such as Pinecone, Milvus, Chroma, and Weaviate, Qdrant offers a much higher level of granularity in access control, which sets it apart. With its JWT-based RBAC approach, Qdrant allows users to define permissions and restrict access to specific data parts, ensuring sensitive endpoints remain protected. This fine-grained control is coupled with Qdrant’s ability to integrate seamlessly with hybrid cloud environments and Kubernetes clusters, providing organizations with scalability and enhanced security.

Guide to Use JWT Auth for Role-Based Access

Starting from version 1.9.0, Qdrant supports granular access control using JSON Web Tokens (JWT). This means you can create tokens that grant specific permissions to access different parts of your data. With JWT, you can set up RBAC, defining what each user can and cannot do.

Enabling JWT-Based Authentication

To enable JWT-based authentication in Qdrant, we need to configure it by setting an api_key and enabling the jwt_rbac feature. There are two ways to do this: using a configuration file or environment variables.

Using Configuration File: We will open our Qdrant configuration file and add the following lines to the configuration:

service:
  api_key: your_secret_api_key_here
  jwt_rbac: true

Using Environment Variables: We can also set the following environment variables:

export QDRANT__SERVICE__API_KEY=your_secret_api_key_here
export QDRANT__SERVICE__JWT_RBAC=true

Make sure to replace your_secret_api_key_here with your actual secret key. This api_key is crucial because it will be used to encode and decode the JWTs, so it needs to be kept secure.

Generating JSON Web Tokens

JWTs can normally be generated by any library. We don’t need access to the Qdrant instance to generate them. We can easily use libraries such as PyJWT (Python), jsonwebtoken (JavaScript), or jsonwebtoken (Rust) to create JWTs.

JWT Structure

Let’s briefly understand the structure of the JWT token used to set up the RBAC. A JWT consists of three parts: the header, the payload, and the signature.

Header: Specifies the algorithm used to encode the token. Qdrant uses the HS256 algorithm.

{
"alg": "HS256",
"typ": "JWT"
}

Payload: Contains the claims or data you want to include in the token. Here are some common claims you might use:

{
"exp": 1640995200, // Expiration time (Unix timestamp)
"value_exists": { /* See explanation below */ },
"access": "r" // Access level
}

Signature: The token is signed with your api_key to ensure its validity. Qdrant can verify this signature using the same api_key.

Using JWT in Requests

Once JWT-based authentication is enabled, we now need to include the JWT in our requests to Qdrant. This can be done in two ways:

Authorization Header: Add the JWT as a bearer token in the Authorization header of the request.
Authorization: Bearer <JWT>
Api-Key Header: Alternatively, we can also include the JWT as a key in the Api-Key header.

Api-Key: <JWT>

Here’s an example using the Qdrant client in Python:

from qdrant_client import QdrantClient

qdrant_client = QdrantClient(
    "your_qdrant_instance_url",
    api_key="<JWT>",
)

Generating JWT Tokens from Web UI

Qdrant provides a convenient JWT generation tool within the Web UI. This tool is accessible under the 🔑 tab. It can be found out at http://localhost:6333/dashboard#/jwt.

Here’s a quick guide on how to generate JWT tokens from the Web UI:

Access the JWT Tool: Navigate to the 🔑 tab in the Qdrant Web UI.
Provide API Key: When prompted for the API key on the jwt dashboard, enter your API Key
Generate Token: Follow the on-screen instructions to generate a JWT token. This token will encapsulate the user’s permissions and access levels.
Use the Token: Include this token in the header of your API requests to authenticate and authorize the actions performed by the user.

Step-by-Step Tutorial to Set Up RBAC on Local Qdrant Instance

Here, for this blog post, I will be showing you how to implement a RBAC (Role-Based Access Control) with the help of JWT in Qdrant Vector Database. For this blog, I will be using the following data structure to create multiple collections.

├── data
│   ├── financial
│   │   └── Sample-Accounting-Income-Statement-PDF-File.pdf
│   └── general
│       ├── avengers-endgame-script-pdf.pdf
│       └── security_policy.pdf

The idea is to create two collections, one for financial data and the other for general data. General data will have multiple files, and financial data will have only one file. Then we will see RBAC in action and see how we can restrict access of the user based on the role assigned to them.

To install Qdrant, we will be using Docker. Run the following codes to install the Qdrant image.

sudo apt-get update
sudo apt install docker
docker pull qdrant/qdrant

Once this is done, we need to create a config.yaml file so that we can enable the RBAC in Qdrant. Copy paste the following commands in your config.yaml file.

service:
 api_key: {your_API_key}
 jwt_rbac: true

After creating the config.yaml file let’s now run the Qdrant container so that we can begin the RBAC tutorial.

docker run -p 6333:6333 -v /home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/:/qdrant/storage -v /home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/config.yaml:/qdrant/config/config.yaml qdrant/qdrant

Now, we can either open the dashboard or get started with Python. In this blog, we don’t really care much about the dashboard; we will get everything done in Python. So, let’s dive in.

Let’s start by connecting to Qdrant.

# Qdrant related Parameters
api = 'jhvfegfeboihf313fekfgejbv' # 'your_api_key'
host = 'localhost'
port = 6333
url = f'http://{host}:{port}'

I will be keeping the loading of the dataset, generating embeddings, and creating collections at a very high level. If you want to know more about these topics, you can refer to one of my previous blog posts where I explained how to build a chatbot using Qdrant, Llama3, Ollama, and LangChain. In this blog post, I will be focusing on RBAC.

Now, Let’s start with different security scenarios.

Without Any Token, with RBAC Enabled

We have enabled RBAC in Qdrant, but we have not created any tokens yet. Let’s see how it behaves in this case.

from qdrant_client import QdrantClient

client = QdrantClient(url=url)
client.get_collections()

# Output of the above code
UnexpectedResponse: Unexpected Response: 401 (Unauthorized)
Raw response content:
b'Must provide an API key or an Authorization bearer token'

As expected, it should not allow any operation without a token. Now, in the next section, let’s create a JWT token and try to access the Qdrant API.

Global Read-Only Access

Let’s first create a function that we can reuse to generate JWT tokens.

import jwt
def generate_jwt(api, payload):
   '''
   This function generates a JWT token using the payload and the API key

   Args:
   api: API key
   payload: Payload to be encoded in the JWT token. It contains the access rights

   Returns:
   encoded_jwt: JWT token
   '''
   encoded_jwt = jwt.encode(payload, api, algorithm='HS256')
   return encoded_jwt

Here, let’s first create a token with global read-only access. With global read-only access, the user can only read the resources in the cluster. They cannot create, update, or delete resources. This essentially means that the user can read all the collections available, so be careful when granting this permission.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has global read only access to all the collections.
# It also specifies that this token will expire in 1 hour.
payload = {
 "access": "r",
 "exp": current_time + 3600, # 1 hour
}

# Generate the JWT token
# This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

Currently, we have no collections in the Qdrant Vector DB. Let’s see how the API behaves in this case.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
client.get_collections()

# Output of the above code
CollectionsResponse(collections=[])

Great, it returned an empty list of collections. Now let’s try to create a collection with the same token. Note that this should fail as the token has only read-only access.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
# Delete the collection if it exists
client.delete_collection(collection_name=collection_name)

# Output of the above code
UnexpectedResponse: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Global manage access is required"},"time":0.000023168}'
As we can see, the API returned a 403 Forbidden error saying ‘Global manage access is required’ to create a collection.

Now, in the next section, let’s create a token with global manage access and try to create a collection.

Global Manage Access

Now let’s create a token with global manage access. With Global Manage Access, the user can read, create, update, and delete collections in the cluster. This essentially means that the user can perform all the operations on all the collections available, so be extremely careful when granting this permission. You should only grant this permission to Admins.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has global manage access to all the collections.
# It also specifies that this token will expire in 1 hour.
# You should only generate this token for admin users.
payload = {
 "access": "m",
 "exp": current_time + 3600, # 1 hour
}

# Generate the JWT token
# This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

Let’s again try to list the collections using this new token.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
client.get_collections()

# Output of the above code
CollectionsResponse(collections=[])

Since no collections are available, it returned an empty list. Next, let’s try to delete a collection using this token. Remember, this operation failed using the previous read-only token. Even though we don’t have any collections, let’s try to delete a collection and see what happens.

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
# Delete the collection if it exists
client.delete_collection(collection_name=collection_name)

# Output of the above code
False

As we can see, it ran successfully and returned False as there was no collection to delete. Now let’s try to create two collections, one for financial data and the other for general data. Then we will try to explore the RBAC in more detail.

Before creating the collections, let’s first load the embeddings model to generate embeddings for the documents. Here we will keep the indexing phase at a high level. If you’re interested in knowing more about it, you can refer to one of my previous blog posts. For this blog post, I will be focusing on RBAC.

import fasttext as ft
# download the fasttext model from https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz and then unzip it
embedding_model_path = '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/embedding_model/cc.en.300.bin'
# Load the embedding model
embed_model = ft.load_model(embedding_model_path)

Let’s define some constants for the creation of collections.

# data related parameters
chunk_size = 500
chunk_overlap = 50
batch_size = 4000

# vector related parameters
vector_size = 300

Let’s create the chunk out of the general data. But before we move on to this step, let’s first create two essential utility functions that will help us in creating the chunks and respective embeddings of the data.

import pandas as pd
from tqdm.notebook import tqdm

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, Batch

def generate_embeddings_from_fastext_model(docs, embed_model):
    '''
    Generate embeddings for the documents using the FastText model

    Args:
    docs: List of documents
    embed_model: FastText model

    Returns:
    df: Dataframe with the documents, embeddings, metadata and payload
    '''

    # convert the documents to a dataframe
    # This dataframe will be used to create the embeddings
    # And later will be used to update the Qdrant Vector Database
    data = []
    for doc in tqdm(docs):
        # Get the page content and metadata for each chunk
        # Meta data contains chunk source or file name
        row_data = {
            "page_content": doc.page_content,
            "metadata": doc.metadata
        }
        data.append(row_data)

    df = pd.DataFrame(data)

    # Replace the new line characters with space
    df['page_content'] = df['page_content'].replace('\\n', ' ', regex=True)

    # Create a unique id for each document.
    # This id will be used to update the Qdrant Vector Database
    df['id'] = range(1, len(df) + 1)

    # Create a payload column in the dataframe
    # This payload column includes the page content and metadata
    # This payload will be used when LLM needs to answer a query
    df['payload'] = df[['page_content', 'metadata']].to_dict(orient='records')

    # Create embeddings for each chunk
    # This embeddings will be used when doing a similarity search with the user query
    df['embeddings'] = df['page_content'].apply(lambda x: (embed_model.get_sentence_vector(x)).tolist())

    return df


def create_new_collection(url, jwt, collection_name, df, vector_size, batch_size, delete_prev = False, create_from_scratch = False):

    '''
    This function creates a new collection in Qdrant Vector Database
    and updates the collection with the embeddings

    It starts by creating a connection to the Qdrant Vector Database running using the docker
    Then it deletes the collection if it already exists
    Then it creates a new collection with the specified collection name and vector size
    Then it updates the collection with the embeddings
    Finally, it closes the connection to the Qdrant Vector Database and returns the client object

    Args:
    url: URL of the Qdrant Vector Database
    jwt: JWT token
    collection_name: Name of the collection
    df: Dataframe with the documents, embeddings, metadata and payload

    Returns:
    client: QdrantClient object
    '''

    # Create a QdrantClient object
    # client = QdrantClient('https://localhost:6333')
    client = QdrantClient(url=url, api_key = jwt)

    # delete the collection if it already exists
    # remove or comment this line if you want to keep the existing collection
    # and want to use the existing collection to update new points
    if delete_prev:
        client.delete_collection(collection_name=collection_name)

    # Create a fresh collection in Qdrant
    # remove or comment this line if you do not want to create a new collection
    if create_from_scratch:
        client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE),
        )

    # Update the Qdrant Vector Database with the embeddings
    # We are updating the embeddings in batches
    # Since the data is large, we will only update the first batch of size 4000
    client.upsert(
    collection_name=collection_name,
    points=Batch(
        ids=df['id'].to_list()[:batch_size],
        payloads=df['payload'][:batch_size],
        vectors=df['embeddings'].to_list()[:batch_size],
    ),
    )

    # Close the QdrantClient
    client.close()

    print(f"Collection {collection_name} created and updated with the embeddings")

Great! Now let’s go ahead and start creating the chunks and respective embeddings

from langchain_community.document_loaders import DirectoryLoader
# from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

collection_type = 'general'
root = '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data'
data_path = pjoin(root, collection_type)
collection_name = collection_type

# Load the documents from the directory
loader = DirectoryLoader(data_path, loader_cls=PyPDFLoader)

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=chunk_size,
   chunk_overlap=chunk_overlap,
   length_function=len,
   is_separator_regex=False,
)
docs = loader.load_and_split(text_splitter=text_splitter)

from utils import generate_embeddings_from_fastext_model
# Generate the embeddings for the data
df = generate_embeddings_from_fastext_model(docs, embed_model)

from utils import create_new_collection
# Create a new collection with manage access
create_new_collection(url, jwt, collection_name, df, vector_size, batch_size, delete_prev = True, create_from_scratch = True)

We do the same for the financial data.

from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

collection_type = 'financial'
root = '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data'
data_path = pjoin(root, collection_type)
collection_name = collection_type

# Load the documents from the directory
loader = DirectoryLoader(data_path, loader_cls=PyPDFLoader)

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=chunk_size,
   chunk_overlap=chunk_overlap,
   length_function=len,
   is_separator_regex=False,
)
docs = loader.load_and_split(text_splitter=text_splitter)

from utils import generate_embeddings_from_fastext_model
# Generate the embeddings for the data
df = generate_embeddings_from_fastext_model(docs, embed_model)

from utils import create_new_collection
# Create a new collection with manage access
create_new_collection(url, jwt, collection_name, df, vector_size, batch_size, delete_prev = True, create_from_scratch = True)

Great! Now we have created two collections, ‘general’ and ‘financial’ collections. Let’s see if we can read these collections with different sets of tokens having different permissions.

Collection Specific Access

With this access, we can limit the access of the user to a specific collection only. This is the most secure way to grant access to the user. We can also limit the access to the types of documents in that collection or pages as well. Let’s see how we can do this.

In our Qdrant Vector Database, we now have two collections, ‘general’ and ‘financial’. As can be understood from the names, the ‘general’ collection contains general data, and the ‘financial’ collection contains financial data. Due to the nature of the data, we want to restrict access of each user to specific collections as per their roles in the organization.

Read-Only Access

Here, in this section, we will create a token with read-only access to a specific collection only. Let’s see how it behaves.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has read access to the general collection only.
# You can give access to multiple collections by adding multiple dictionaries in the access list.
# For now, we are only giving access to the general collection.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general", # collection name - Change this to the collection you want to give access to, like financial
     "access": "r"
   },
 ]
}

# Generate the JWT token
# This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)
client.get_collections()

# Output of the above code
CollectionsResponse(collections=[CollectionDescription(name='general')])

This is great. We can see that the user can only access the ‘general’ collection and not the ‘financial’ collection. Now let’s try to verify if the user has read-only access to the ‘general’ collection.

import numpy as np

# We are generating a random query vector of size vector_size
query_vector = np.random.rand(vector_size)

# We are searching for the closest points to the query vector in the general collection
# Since we have the read access to the general collection, we can search in it.
hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits

# Output of the above code
[ScoredPoint(id=11, version=2, score=0.07114598, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'is built using the Rust language - a static multi-paradigm, memory-efficient, low-level programming language focused on speed, security, and performance. The intention is to build Qdrant with as few moving parts as possible, thereby keeping the attack vector as low as possible. Email security Qdrant supports TLS encryption on all inbound and outbound emails. Qdrant uses Gmail to provide email and communication services. For an explanation of how email encryption works, take a look at this'}, vector=None, shard_key=None),
 ScoredPoint(id=10, version=2, score=0.045524757, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer, and the stateless access control firewall rules before then being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language - a static'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.043969806, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=12, version=2, score=0.03491432, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'how email encryption works, take a look at this overview from Google. Data residency On Qdrant Cloud, the location of data can be specified. Locations may include London, Ireland, Belgium, Germany, Switzerland, North America, South America, Australia, Canada, Tokyo, or Singapore. Data will not be moved or replicated outside of a specified location. Data in transit All data is encrypted when it is being transmitted between client devices and Qdrant Cloud. SSL/TLS certificates shield data using'}, vector=None, shard_key=None),
 ScoredPoint(id=13, version=2, score=0.027367812, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'Cloud. SSL/TLS certificates shield data using 256-byte signatures and either 2048-bit or 4096-bit keys. All connections to Content Delivery Network (CDN) servers and the database layer are'}, vector=None, shard_key=None)]

As we can see, the user can read the ‘general’ collection. Now let’s try to update the ‘general’ collection with the same read-only token. Let’s hope that it fails.

import numpy as np
from qdrant_client.models import PointStruct

# We are generating 100 random vectors of size vector_size
vectors = np.random.rand(100, vector_size)

# We are upserting these vectors in the general collection
Since we have read-only access to the general collection, we won't be able to insert the vectors.
client.upsert(
  collection_name="general",
  points=[
     PointStruct(
           id=idx,
           vector=vector.tolist(),
           payload={"color": "red", "rand_number": idx % 10}
     )
     for idx, vector in enumerate(vectors)
  ]
)

Output of the above code
: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Write access to collection general is required"},"time":0.000079842}'

As we’d guessed, the API returned a 403 Forbidden error saying, ‘Write access to collection general is required’. This is great. Now let’s see if we can read the financial collection with the same read-only token for the general collection. Just a heads up — this should fail.

We are generating a random query vector of size vector_size
query_vector = np.random.rand(vector_size)

We are searching for the closest points to the query vector in the general collection
# Since we have read-only access to the general collection only, we won't be able to search in financial collection.
hits = client.search(
  collection_name="financial",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits

Output of the above code
: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Access to collection financial is required"},"time":7.61e-6}'

Great. Once again, the API returned a 403 Forbidden error saying, ‘Access to collection financial is required’. In the next section, let’s test with read-write access to a specific collection. We will also see how we can grant access to multiple collections to a user.

Read-Write Access

Here we will grant the user read-write access to the ‘general’ collection only. And, on top of that, we will limit access of the ‘financial’ collection to read-only.

import time
from utils import generate_jwt

current_time = int(time.time())

This payload, along with the API, is used to generate the JWT token.
This token indicates that the user has access to two collections: general and financial.
# Access to the general collection is read-write and access to the financial collection is read-only.
It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general",
     "access": "rw"
   },
   {
     "collection": 'financial',
     "access": "r"
   }
 ]
}

Generate the JWT token
This token will be used to authenticate the user.
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient
client = QdrantClient(url=url, api_key=jwt)

collection = client.get_collections()
collection

Output of the above code
CollectionsResponse(collections=[CollectionDescription(name='general'), CollectionDescription(name='financial')])

Nice! The user has access to both the collections, ‘general’ and ‘financial’. Now let’s try to update the ‘general’ collection with the same token. Since the token has read-write access to the ‘general’ collection, it should work.

import numpy as np
from qdrant_client.models import PointStruct

# We are generating 100 random vectors of size vector_size
vectors = np.random.rand(100, vector_size)

We are inserting these vectors in the general collection
Since we have read-write access to the general collection, we can insert the vectors.
client.upsert(
  collection_name="general",
  points=[
     PointStruct(
           id=idx,
           vector=vector.tolist(),
           payload={"color": "red", "rand_number": idx % 10}
     )
     for idx, vector in enumerate(vectors)
  ]
)

Output of the above code
UpdateResult(operation_id=1, status=<UpdateStatus.COMPLETED: 'completed'>)

Looks good. We can see that the user can update the ‘general’ collection. Now let’s try to update the ‘financial’ collection with the same token. This should fail as the token has only read-only access to the ‘financial’ collection.

import numpy as np
from qdrant_client.models import PointStruct

vectors = np.random.rand(100, vector_size)
client.upsert(
  collection_name="financial",
  points=[
     PointStruct(
           id=idx,
           vector=vector.tolist(),
           payload={"color": "red", "rand_number": idx % 10}
     )
     for idx, vector in enumerate(vectors)
  ]
)

Output of the above code
: Unexpected Response: 403 (Forbidden)
Raw response content:
b'{"status":{"error":"Forbidden: Write access to collection financial is required"},"time":0.000062905}'

As expected, the API returned a 403 Forbidden error saying ‘Write access to collection financial is required’. Now let’s try to do a final check for this token. Let’s see if the user can read the ‘financial’ collection.

We are generating a query vector from a string that was available in one of the documents in the general collection
x = "based on the equation: assets = liabilities + owners' equity."
query_vector = embed_model.get_sentence_vector(x).tolist()

We are searching for the closest points to the query vector in the general collection
Since we have read-write access to the general collection, we can search in it.
hits = client.search(
  collection_name="financial",
  query_vector=query_vector,
  limit=20  # Return 5 closest points
)
hits

Output of the above code
[ScoredPoint(id=3, version=0, score=0.7493207, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "Balance Sheet The balance sheet is based on the equation: assets = liabilities + owners' equity . It indicates everything the company owns (assets), everything the company owes to creditors (liabilities) and the value of the ownership stake in the company (shareholders' equity, or capital). The balance sheet date is the ending date of the period or year and is a continuation of the amounts recorded since the"}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=0, score=0.7304637, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'period. Sources of cash listed on the statement include revenues, long-term financing, sales of non-current assets, an increase in any current liability account or a decrease in any current asset account. Uses of cash include operating losses, debt repayment, equipment purchases and increases in current asset accounts.'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=0, score=0.72308195, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'inception of the company or organization. The balance sheet is a "snapshot" of the financial position of the company at the balance sheet date and shows the accumulated balance of the accounts. Assets and liabilities are separated between current  and long-term , where current items are those items which will be realized or paid within one year of the balance sheet date. Typical current assets are cash, prepaid expenses, accounts receivable and inventory. Income Statement'}, vector=None, shard_key=None),
 ScoredPoint(id=26, version=0, score=0.7186612, payload={'metadata': {'page': 6, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Bank term loan bearing interest at prime plus 2%, repayable in monthly principal instalments of $2,100.00plus interest to November 2007, secured by a general security agreement on the assets of the company and a personal guaranteefrom the shareholder. 2002-2001 $ 111,300 $ Less current portion: 25,200; $ 86,100; approximate principal repayments are as follows: 2004 $ 25,2002005 25,2002006 25,2002007 10,500 $ 86,100 5. STATED CAPITAL Authorized: Unlimited number of Common shares'}, vector=None, shard_key=None),
 ScoredPoint(id=23, version=0, score=0.7100816, payload={'metadata': {'page': 5, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Significant Accounting Policies INVENTORY The inventory is valued at the lower of cost or market, with cost being  determined on a first-in, first-out basis. PROPERTY, PLANT AND EQUIPMENT Property, plant and equipment are stated at cost less accumulated amortization. Amortization is recorded at rates designed to amortize the cost of capital assets overtheir estimated useful lives. Amortization rates used are as follows: Furniture and equipment 20% declining balance'}, vector=None, shard_key=None),
 ScoredPoint(id=16, version=0, score=0.70792955, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Deposits and prepaid expenses (254)           688                Inventory (2,487)        (904)               Accounts payable and accrued liabilities (9,290)        34,543           Long-term debt, current portion: 25,200; income tax payable: 14,387       2,206          Cash flows from operating activities: 115,402; 85,966        CASH FLOWS FROM INVESTING ACTIVITIE S Acquisition of property, plant and equipment (1,426)        (10,342)'}, vector=None, shard_key=None),
 ScoredPoint(id=18, version=0, score=0.70589364, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'CASH (DEFICIENCY) RESOURCES: Beginning of Year (69,474)      17,789        CASH RESOURCES (DEFICIENCY) - End of Yea r $ 11,552    $ (69,474)     Cash resources (deficiency) is comprised of: Cash: 11,552 $; bank overdraft: 9,474 Bank loan: (60,000) $ 11,552 $ (69,474) The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=14, version=0, score=0.7057368, payload={'metadata': {'page': 2, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'DIVIDENDS -- (16,000)       RETAINED EARNINGS (DEFICIT) - End of Yea r $ 17,166 $ (61,350) The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=15, version=0, score=0.7047419, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'XYZ COMPANY LIMITE D STATEMENT OF CASH FLO W FOR THE YEAR ENDE D JUNE 30, 2002 UNAUDITED - See "Notice to Reader" 2002–2001. CASH FLOWS FROM OPERATING ACTIVITIE S Net income for the year was $78,516; $8,810 Adjustment for:   Amortization 17,854 16,856   Loss on disposal of property, plant and equipment: 387   Gain on disposal of investment (16,149) Cash derived from operations: 80,221 and 26,053 Decrease (increase) in working capital items    Accounts receivable 7,625         23,380'}, vector=None, shard_key=None),
 ScoredPoint(id=17, version=0, score=0.6888898, payload={'metadata': {'page': 3, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Proceeds from disposal of property, plant and equipment -- 3,113          Proceeds from disposal of investment: 61,150       Dividends: 16,000  Cash flows from investing activities: 59,724       (23,229)       CASH FLOWS FROM FINANCING ACTIVITIE S Advances from (repayments to) shareholder (180,200) and (150,000)     Acquisition of (repayment of) long-term debt 86,100       -- (94,100)     (150,000)     NET INCREASE (DECREASE) IN CASH RESOURCES 81,026      (87,263)'}, vector=None, shard_key=None),
 ScoredPoint(id=1, version=0, score=0.68225944, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'Understanding Basic Financial Statements During the accounting cycle, the accounting system is used to track, organize and record the financial transactions of an organization. At the close of each period, the information is used to prepare the financial statements, which are usually composed of a balance sheet (statement of financial position); income statement (statement of income and expenses); statement of retained earnings (owners’ equity); and a statement of cash flow.'}, vector=None, shard_key=None),
 ScoredPoint(id=7, version=0, score=0.6741429, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "income or loss is added to the opening amount of retained earnings to arrive at the closing retained earnings. Retained earnings can be decreased by such items as dividends paid to shareholders. On the sample financial statements shown below, the statement of retained earnings is combined with the income statement presentation. Statement of Cash Flow The statement of cash flow shows all sources and uses of a company's cash during the accounting period."}, vector=None, shard_key=None),
 ScoredPoint(id=11, version=0, score=0.67381775, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': '17,167 (61,349) $ 276,498 $ 331,259 APPROVED The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=21, version=0, score=0.6718547, payload={'metadata': {'page': 4, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': '$ 286,817 $ 339,905 The accompanying summary of significant accounting policies and notes are an integral part of these financial statements.'}, vector=None, shard_key=None),
 ScoredPoint(id=25, version=0, score=0.6679283, payload={'metadata': {'page': 6, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'XYZ COMPANY LIMITED NOTES TO THE FINANCIAL STATEMENTS FOR THE YEAR ENDED JUNE 30, 2002 UNAUDITED - See "Notice to Reader." 3. DUE TO SHAREHOLDER The amount due to the shareholder bears interest at a rate determined annually and has no fixed terms of repayment.Interest paid for 2002 was $1,823 (2001 - $6,831) 4. LONG - TERM DEBT Bank term loan bearing interest at prime plus 2%,'}, vector=None, shard_key=None),
 ScoredPoint(id=13, version=0, score=0.66429466, payload={'metadata': {'page': 2, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'INCOME FROM OPERATIONS 77,855      8,860          OTHER INCOME (EXPENSES) Loss on disposal of property, plant and equipment (387)            Gain on sale of investment: 16,149       -- Miscellaneous (1,101)        337             15,048      (50)              NET INCOME BEFORE TA X 92,903      8,810          INCOME TAX  EXPENSE 14,387      -- NET INCOME 78,516      8,810          (DEFICIT) Beginning of Yea r (61,350)     (54,160)       DIVIDENDS -- (16,000)'}, vector=None, shard_key=None),
 ScoredPoint(id=5, version=0, score=0.6599545, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "Income Statement An income statement is a type of summary flow report that lists and categorizes the various revenues and expenses that result from operations during a given period—a year, a quarter or a month. The difference between revenues and expenses represents a company's net income or net loss. The amounts shown in the income statement are the amounts recorded for the given period (a year, a"}, vector=None, shard_key=None).
 ScoredPoint(id=6, version=0, score=0.65201813, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'quarter or a month . The next period’s income statement will start over with all amounts reset to zero. While the balance sheet shows accumulated balances since inception, the income statement only shows the amounts earned or expensed during the period in question. Statement of Retained Earnings The statement of retained earnings shows the amount of accumulated earnings that have been retained within the company since its inception. At the end of each fiscal year-end, the amount of net'}, vector=None, shard_key=None),
 ScoredPoint(id=22, version=0, score=0.6324039, payload={'metadata': {'page': 5, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': 'XYZ COMPANY LIMITED NOTES TO THE FINANCIAL STATEMENTS FOR THE YEAR ENDED JUNE 30, 2002 UNAUDITED - See "Notice to Reader" 1. SIGNIFICANT ACCOUNTING POLICIES AND GENERAL INFORMATION Nature of Business The company is a Canadian-controlled private corporation subject to the Business Corporations Act, 1982 (Ontario), was incorporated in May 1995 and operates as a manufacturer of widgets in Anytown, Ontario. Significant Accounting Policies INVENTORY'}, vector=None, shard_key=None),
 ScoredPoint(id=10, version=0, score=0.62833935, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/financial/Sample-Accounting-Income-Statement-PDF-File.pdf'}, 'page_content': "$ 276,498 $ 331,259 LIABILITIES CURRENT Bank overdraft $ -- $ 9,474Bank loan: $60,000 Accounts payable and accrued liabilities: 82,053; 91,343Long-term debt: current portion 25,200 --income tax payable 14,387 -- 121,640 -- 160,817 DUE TO SHAREHOLDER (Note 3) 51,591 231,791LONG-TERM DEBT (Note 4): 86,100 -- 259,331 -- 392,608 SHAREHOLDER'S EQUIT AND STATED CAPITAL (Note 5) 1 1 RETAINED EARNINGS (DEFICIT) 17,166 (61,350) 17,167 (61,349) $ 276,498 $ 331,259 APPROVED"}, vector=None, shard_key=None)]

Great! The user can read the ‘financial’ collection. In the next section, let’s go ahead and see how we can limit user access within a single collection. We basically want to limit access to specific types of documents in the collection.

Document-Specific Access

In this last section, we will limit the user's access to specific types of documents in the collection. This is one of the most secure ways to grant access to the user. Assume a scenario where you have a collection of ‘general’ data and, in that collection, you have multiple types of documents, You can limit the access of the user to specific types of documents only. Let’s see how we can do this.

Here we create a token that allows the user to access only the general collection and the documents, which are named ‘security_policy.pdf’.

import time
from utils import generate_jwt

current_time = int(time.time())

This payload, along with the API, is used to generate the JWT token.
This token indicates that the user has access to the general collection only.
# Access to the general collection is read-write.
It also specifies that the token only limits access to the document security_policy.pdf in the general collection.
It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general",
     "access": "rw",
     "payload": {
       "metadata.source": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf",
     }
   },
 ]
}

Generate the JWT token
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)

We are generating a query vector from a string that was available in the document security_policy.pdf in the general collection
x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

We are searching for the closest points to the query vector in the general collection
hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits

Output of the above code
[ScoredPoint(id=10, version=2, score=0.8121032, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer and the stateless access control firewall rules before being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language (a static'}, vector=None, shard_key=None).
 ScoredPoint(id=5, version=2, score=0.7998578, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'into account the impact of company threats and vulnerabilities; must design and implement a comprehensive suite of information security controls and other forms of risk management to address company and architecture security risks; and adopt an overarching management process to ensure that the information security controls meet the information security needs on an ongoing basis. In addition, all hosting providers are certified at PCI DSS Level 1, which means that the application is run on the'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=2, score=0.77687603, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'centers are staffed 24x7x365 by security guards, and access is authorized strictly on a least privileged basis. The cloud hosting providers are certified with the ISO 9001:2008, ISO 27001:2013, ISO 27017:2015, and ISO 27018:2014 security standards—global standards that outline the requirements for information security management systems. This requires that the hosting provider systematically evaluate its information security risks, taking into account the impact of company threats and'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.76660895, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=3, version=2, score=0.7539886, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'will work with you to assess and understand the scope of the issue and fully address any concerns. Any emails are immediately sent to our engineering staff to ensure that issues are addressed rapidly. Any security emails are treated with the highest priority, as the safety and security of our service are our primary concerns. Physical security Qdrant Cloud services are hosted on Google Cloud Computing, Amazon Web Services, and Azure. The data centers are staffed 24x7x365 by security guards. '}, vector=None, shard_key=None)]

As we can see, the search query returned only chunks of the document'security_policy.pdf’. It did not return any other documents. Next, let’s try to go even further and limit access to a specific page of the document. Let’s see how we can do this.

On top of all the previous restrictions, we have also limited access to the second page of the document'security_policy.pdf’. Let’s see if the user can access any other page except the second page of the document.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has access to the general collection only.
# The access to the general collection is read-write.
# It also specifies that the token only limits the access to the document security_policy.pdf in the general collection.
# It also specifies that the token only limits the access to the second page (page index starts with 0) of the document.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "access": [
   {
     "collection": "general",
     "access": "rw",
     "payload": {
       "metadata.source": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf",
       "metadata.page": 1
     }
   },
 ]
}

# Generate the JWT token
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)

x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=20  # Return 5 closest points
)
hits

# Output of the above code
[ScoredPoint(id=10, version=2, score=0.8121032, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer, and the stateless access control firewall rules before then being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language - a static'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.76660895, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=11, version=2, score=0.74504614, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'is built using the Rust language - a static multi-paradigm, memory-efficient, low-level programming language focused on speed, security, and performance. The intention is to build Qdrant with as few moving parts as possible, thereby keeping the attack vector as low as possible. Email security Qdrant supports TLS encryption on all inbound and outbound emails. Qdrant uses Gmail to provide email and communication services. For an explanation of how email encryption works, take a look at this'}, vector=None, shard_key=None),
 ScoredPoint(id=9, version=2, score=0.71566045, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'DSS Level 4 requirements for security management, policies, procedures, network architecture, software design, and other critical protective measures. Network security The system is designed with scalability and redundancy in mind. Web load balancers and database servers are distributed globally across geographically dispersed data centers in different operating regions. Each database server has its own firewall configuration based on its role within the Qdrant Cloud platform, and only the'}, vector=None, shard_key=None),
 ScoredPoint(id=12, version=2, score=0.70694244, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'how email encryption works, take a look at this overview from Google. Data residency On Qdrant Cloud, the location of data can be specified. Locations may include London, Ireland, Belgium, Germany, Switzerland, North America, South America, Australia, Canada, Tokyo, or Singapore. Data will not be moved or replicated outside of a specified location. Data in transit All data is encrypted when it is being transmitted between client devices and Qdrant Cloud. SSL/TLS certificates shield data using'}, vector=None, shard_key=None),
 ScoredPoint(id=13, version=2, score=0.63619953, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'Cloud. SSL/TLS certificates shield data using 256-byte signatures and either 2048-bit or 4096-bit keys. All connections to Content Delivery Network (CDN) servers and the database layer are'}, vector=None, shard_key=None)]

As we can see, the exact same search query like the last section returned only the second page of the document ‘security_policy.pdf’, as expected.

Before I end this tutorial, let me show you one more way to limit the access of the user. Here we will limit the access of the user by using the ‘value_exists’ filter. This basically means that the user can only access the collection if the specific field exists in the document. Though this can be extended to several use cases, like user_id, user_role, etc, for the ease of this tutorial, let’s just use the ‘value_exists’ filter to check the presence of the document type. If the document type exists, then only the user can access the collection.

Let’s first see what happens if the document type does not exist in the document.

import time
from utils import generate_jwt

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has access to the general collection only.
# The access to the general collection is read-write.
# Apart from the access to read-write the general collection, the token also specifies to check
# if the metadata.source key in the document matches the value "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/avengers-endgame-script-pdf.pdf".
# if it matches, then the user will have read-write access to the general collection.
# if it doesn't match, then the user won't have any access to the general collection.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "value_exists": {
   "collection": "general",
   "matches": [
     {
         "key": "metadata.source",
         "value": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/blah blah blah.pdf"
     }
   ]
 },
 "access": [
   {
     "collection": "general",
     "access": "rw",
   },
 ]
}

# Generate the JWT token
jwt = generate_jwt(api, payload)

from qdrant_client import QdrantClient

client = QdrantClient(url=url, api_key=jwt)

# We are generating a query vector from a string that was available in the document avengers-endgame-script-pdf.pdf in the general collection
x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits

# Output of the above code
UnexpectedResponse: Unexpected Response: 401 (Unauthorized)
Raw response content:
b'Invalid JWT, stateful validation failed'

Woah! The API returned a 401 Unauthorized error. It clearly says that the validation failed! This is great. Let’s see what happens if the document type exists in the document.

import time

current_time = int(time.time())

# This payload along with the API is used to generate the JWT token.
# This token tells that the user has access to the general collection only.
# The access to the general collection is read-write.
# Apart from the access to read-write the general collection, the token also specifies to check
# if the metadata.source key in the document matches the value "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/avengers-endgame-script-pdf.pdf".
# if it matches, then the user will have read-write access to the general collection.
# if it doesn't match, then the user won't have any access to the general collection.
# It also specifies that this token will expire in 1 hour.
payload = {
 "exp": current_time + 3600, # 1 hour
 "value_exists": {
   "collection": "general",
   "matches": [
     {
         "key": "metadata.source",
         "value": "/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/avengers-endgame-script-pdf.pdf"
     }
   ]
 },
 "access": [
   {
     "collection": "general",
     "access": "rw",
   },
 ]
}

# Generate the JWT token
jwt = generate_jwt(api, payload)

client = QdrantClient(url=url, api_key=jwt)

x = 'take the security of Qdrant code'
query_vector = embed_model.get_sentence_vector(x).tolist()

hits = client.search(
  collection_name="general",
  query_vector=query_vector,
  limit=5  # Return 5 closest points
)
hits

# Output of the above code
[ScoredPoint(id=10, version=2, score=0.8121032, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'within the Qdrant Cloud platform, and only the necessary ports are opened on each server. All outbound connections pass through the stateless access control rules, whilst inbound connections from the internet must pass through a secure, highly-available load balancer layer, and the stateless access control firewall rules before then being routed to each server. Software security We take the security of the Qdrant code very seriously. The database is built using the Rust language - a static'}, vector=None, shard_key=None),
 ScoredPoint(id=5, version=2, score=0.7998578, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'into account the impact of company threats and vulnerabilities; must design and implement a comprehensive suite of information security controls and other forms of risk management to address company and architecture security risks; and adopt an overarching management process to ensure that the information security controls meet the information security needs on an ongoing basis. In addition, all hosting providers are certified at PCI DSS Level 1, which means that the application is run on the'}, vector=None, shard_key=None),
 ScoredPoint(id=4, version=2, score=0.77687603, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'centers are staffed 24x7x365 by security guards, and access is authorized strictly on a least privileged basis. The cloud hosting providers are certified with the ISO 9001:2008, ISO 27001:2013, ISO 27017:2015, and ISO 27018:2014 security standards - global standards that outline the requirements for information security management systems. This requires that the hosting provider must systematically evaluate its information security risks, taking into account the impact of company threats and'}, vector=None, shard_key=None),
 ScoredPoint(id=8, version=2, score=0.76660895, payload={'metadata': {'page': 1, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'All servers are tested for vulnerability and intrusion detection quarterly. The servers and services hosted on them are certified as complying with the PCI Data Security Standard established by the PCI Security Standards Council, which is an open global forum for the development, enhancement, storage, dissemination, and implementation of security standards for account data protection. The certification confirms that the services adhere to the PCI DSS Level 4 requirements for security management,'}, vector=None, shard_key=None),
 ScoredPoint(id=3, version=2, score=0.7539886, payload={'metadata': {'page': 0, 'source': '/home/quamer23nasim38/Role-Based-Access-Control-of-Qdrant-Vector-Database/data/general/security_policy.pdf'}, 'page_content': 'will work with you to assess and understand the scope of the issue and fully address any concerns. Any emails are immediately sent to our engineering staff to ensure that issues are addressed rapidly. Any security emails are treated with the highest priority, as the safety and security of our service are our primary concerns. Physical security Qdrant Cloud services are hosted on Google Cloud Computing and Amazon Web Services, and Azure. The data centers are staffed 24x7x365 by security guards,'}, vector=None, shard_key=None)]

Nice! It validated the document type and confirmed that the document type exists in the document. Once the validation is successful, it returned the chunks of the document from the correct document irrespective of the validation document type.

Finally, we have seen how we can use the Qdrant Vector Database with RBAC enabled and how we can grant access to the user based on their roles. We have seen how we can grant global read-only access, global manage access, collection-specific access, and document-specific access. We have also seen how we can limit the access of the user by using the ‘value_exists’ filter. This is a very powerful feature of Qdrant and can be used in various use cases.

Conclusion

Role-Based Access Control is super important for keeping our data safe and making sure the right people have the right access. When we mix RBAC with a Hybrid Cloud setup, it gives us a lot more flexibility to store and manage our data in different ways. Qdrant really shines here because it lets us control access in a really detailed way using JWT and Hybrid Cloud. Unlike some other databases like Pinecone, Milvus, Chroma, and Weaviate, Qdrant stands out for its strong security and privacy features. In this blog, I showed how we can get Qdrant up and running in a Hybrid Cloud setup and set up JWT for RBAC, showing just how easy and effective it can be to manage access in today’s data environments.

GitHub Repo

The codes for this blog can be found at https://github.com/quamernasim/Role-Based-Access-Control-of-Qdrant-Vector-Database

References

https://qdrant.tech/documentation/guides/security/
https://qdrant.tech/blog/qdrant-1.9.x/
https://quamernasim.medium.com/hindi-language-ai-chatbot-for-enterprises-using-llama-3-qdrant-ollama-langchain-and-mlflow-9b69391d3348

This article was originally published on: https://quamernasim.medium.com/enhancing-data-security-with-role-based-access-control-of-qdrant-vector-database-3878769bec83

Hindi-Language AI Chatbot for Enterprises Using Qdrant, MLFlow, and LangChain

M Quamer Nasim — Thu, 02 May 2024 17:16:20 +0000

In today's digital era, where businesses are increasingly leveraging technology to enhance customer interactions, AI-powered chatbots have emerged as a game-changer. These chatbots can have a natural conversation with users, providing real-time support and information. Though chatbots have become popular in the last two years, most of them are designed to interact in English.

However, in a country like India, where Hindi is spoken by millions as the first language, there is a need for chatbots that can interact in Hindi. Building a Hindi-language chatbot can help businesses cater to a wider audience and provide better customer service. In this blog, we will discuss the technical journey of building a Hindi-language AI chatbot for enterprises. By the end of this blog, you will understand the challenges associated with building a Hindi-language chatbot and how to overcome them.

Building an AI chatbot is a two-step process: Indexing and Querying. In the indexing phase, we will create a database of Hindi-language documents for the chatbot to refer to. This data is basically going to be the knowledge base of the chatbot. It can be a collection of FAQs, product manuals, or any other information that the chatbot needs to refer to while interacting with users. In the querying phase, we will use this indexed data to answer user queries with the help of an LLM.

In this blog, I will be using the following tools and frameworks for building the RAG-based AI-powered Hindi Chatbot:

LangChain: I'll be using LangChain to build the RAG application, which will enhance the chatbot's ability to generate responses by leveraging information retrieved from a knowledge base.
Qdrant: I'll be using Qdrant as the vector database to store the * documents and their corresponding embeddings.
FastText: I'll be using FastText as the language embedding framework to load the Hindi language embedding model.
Ollama: Ollama will help us load the LLM very easily. We'll integrate the Ollama with LangChain to load the LLM.
MLFlow: I'll be using MLFlow to manage the configurations of the RAG pipeline.

To create a knowledge base for our chatbot, I'll be using the Hindi Aesthetic Corpus dataset. This dataset contains a large number of Hindi texts, more than 1000 text files. You can replace this dataset with your business-related data. It can be a collection of FAQs, product manuals, or any other information that you want your chatbot to have.

Great. Now that we have introduced you to all the tech stacks and data to use, let's start building one!

Indexing the Data and Creating the Knowledge Base

To start the process of indexing the data, we first need to load the dataset. As mentioned earlier, we will be using the Hindi Aesthetic Corpus dataset. Once the dataset is loaded, we will split the text into chunks using the RecursiveCharacterTextSplitter. Creating smaller chunks of text is essential since LLMs come with a limited context size.

Having a smaller and more relevant context will help us in two ways: First, we will only have high-quality and relevant context from which the LLM can learn. Second, processing a larger chunk or context means more tokens that need to be processed, which will increase the total runtime and be financially expensive.

from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

data_path = '../Hindi-Aesthetics-Corpus/Corpus'
chunk_size = 500
chunk_overlap = 50

# Load the documents from the directory
loader = DirectoryLoader(data_path, loader_cls=TextLoader)

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=chunk_size,
   chunk_overlap=chunk_overlap,
   length_function=len,
   is_separator_regex=False,
)
docs = loader.load_and_split(text_splitter=text_splitter)

Once we have converted the raw data into smaller chunks of text, we will then convert these chunks into embeddings using the FastText model. In this blog, we experimented with two different embedding models: the Hindi Model by FastText and IndicFT.

The performance of IndicFT was not that good, so we decided to go with the FastText model. We will use the FastText model to convert the text into embeddings. These embeddings will be stored in a vector database using Qdrant. The embeddings will be used to retrieve the most relevant documents for a given query.

import fasttext as ft

# You will need to download these models from the URL mentioned below
embedding_model_path = '../wiki.hi.bin' #https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hi.zip
# embedding_model_path = '../indicnlp.ft.hi.300.bin' #https://storage.googleapis.com/ai4bharat-public-indic-nlp-corpora/embedding-v2/indicnlp.ft.hi.300.bin
embed_model = ft.load_model(embedding_model_path)

Once we have downloaded the Hindi embedding model, let's proceed to generate the embeddings for each chunk.

import pandas as pd

# convert the documents to a dataframe
# This dataframe will be used to create the embeddings
# And later will be used to update the Qdrant Vector Database
data = []
for doc in docs:
   # Get the page content and metadata for each chunk
   # Meta data contains chunk source or file name
   row_data = {
       "page_content": doc.page_content,
       "metadata": doc.metadata
   }
   data.append(row_data)

df = pd.DataFrame(data)

# Replace the new line characters with space
df['page_content'] = df['page_content'].replace('\\n', ' ', regex=True)

# Create a unique id for each document.
# This id will be used to update the Qdrant Vector Database
df['id'] = range(1, len(df) + 1)

# Create a payload column in the dataframe
# This payload column includes the page content and metadata
# This payload will be used when LLM needs to answer a query
df['payload'] = df[['page_content', 'metadata']].to_dict(orient='records')

# Create embeddings for each chunk
# This embeddings will be used when doing a similarity search with the user query
df['embeddings'] = df['page_content'].apply(lambda x: (embed_model.get_sentence_vector(x)).tolist())

Great. Now that we have the embeddings, we need to store them in a vector database. We will be using Qdrant for this purpose. Qdrant is an open-source vector database that allows you to store and query high-dimensional vectors. The easiest way to get started with the Qdrant database is using the docker.

Follow the below steps to get the Qdrant database up and running:

# Run the following command in terminal to get the docker image of the qdrant
docker pull qdrant/qdrant

# Run the following command in terminal to start the qdrant server
docker run -p 6333:6333 -v Hindi-Language-AI-Chatbot-for-Enterprises-using-Qdrant-MLFlow-and-LangChain/:/qdrant/storage qdrant/qdrant

Now, let's open a connection to the Qdrant database using the qdrant_client. We then need to create a new collection in the Qdrant database in which we will store the embeddings. Once this is done, we will insert the embeddings, along with the corresponding document IDs and payloads, into the collection. The document IDs will be used to identify the documents; the payloads will contain the actual text of the document, and the embeddings will be used to retrieve the most relevant documents for a given query.

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, Batch

# Create a QdrantClient object
host = 'localhost'
port = 6333
client = QdrantClient(host=host, port=port)

# delete the collection if it already exists
client.delete_collection(collection_name="my_collection")

# Create a fresh collection in Qdrant
client.recreate_collection(
  collection_name="my_collection",
  vectors_config=VectorParams(size=300, distance=Distance.COSINE),
)

# Update the Qdrant Vector Database with the embeddings
# We are updating the embeddings in batches
# Since the data is large, we will only update the first batch of size 4000
batch_size = 4000
client.upsert(
collection_name="my_collection",
points=Batch(
    ids=df['id'].to_list()[:batch_size],
    payloads=df['payload'][:batch_size],
    vectors=df['embeddings'].to_list()[:batch_size],
),
)

# Close the QdrantClient
client.close()

After saving the embeddings in the Qdrant database, we can view the collection in the Qdrant dashboard. We can see from the dashboard that each chunk has 3 pieces of information: metadata, chunk text, and embeddings.

Great. We have completed the first part of building our Hindi Chatbot. Let’s now move on to the next part, querying.

Querying the LLM

Now, let's start building the next part of the chatbot. In this part, we will be using the LLM from Ollama and integrating it with the chatbot. More particularly, we will be using the Llama-3 model. Llama-3 is Meta's latest and most advanced open-source large language model (LLM). It is the successor to the previous Llama 2 model and represents a significant improvement in performance across a variety of benchmarks and tasks. Llama 3 comes in two main versions - an 8 billion parameter model and a 70 billion parameter model. Llama 3 supports longer context lengths of up to 8,000 tokens.

We will be using MLFlow to track all the configurations and the model results. Let's first install Ollama and get the Llama 3 model from Ollama and MLFlow.

# install the Ollama
curl -fsSL https://ollama.com/install.sh | sh

# get the llama3 model
ollama pull llama2

# install the MLFlow
pip install mlflow

Now, let's start by loading the Qdrant Client, which will be used to retrieve the context for a given query. We will also start logging the configurations and the results of the workflows using MLFlow.

import mlflow
from qdrant_client import QdrantClient

mlflow_lgging = True

if mlflow_lgging:
   # set the experiment name in the mlflow
   mlflow.set_experiment("Hindi Chatbot")
   # start the mlflow run
   mlflow.start_run()

# load the Qdrant client from the same host and port
# this client will be used to interact with the Qdrant server
host = "localhost"
port = 6333
client = QdrantClient(host=host, port=port)

# log the parameters in the mlflow
if mlflow_lgging:
   mlflow.log_param("qdrant_host", host)
   mlflow.log_param("qdrant_port", port)

We also need to load the embedding model. This embedding model is necessary to convert the query to the embedding that can be used to do a similarity search in Qdrant. The ultimate goal is to retrieve the context for a given query based on the similarity of the query embedding with the context embeddings.

import fasttext as ft

# You will need to download these models from the URL mentioned below
embedding_model_path = '../wiki.hi.bin' #https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.hi.zip
# embedding_model_path = '../indicnlp.ft.hi.300.bin' #https://storage.googleapis.com/ai4bharat-public-indic-nlp-corpora/embedding-v2/indicnlp.ft.hi.300.bin
embed_model = ft.load_model(embed_model_path)

if mlflow_lgging:
   mlflow.log_param("embed_model_path", embed_model_path)

LangChain, by default, does not support the FastText embedding framework. It only supports Hugging Face and OpenAI models. So that is why we need to define the custom LangChain retriever class that will be used to retrieve the context for a given query. In this class, we will have one method _get_relevant_documents, which will do the similarity search in Qdrant based on the FastText embedding model and return the context for a given query.

from typing import List
from qdrant_client import QdrantClient
import fasttext as ft
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

# Define a custom retriever class that uses Qdrant for document retrieval
# Since we're using FastText embeddings, we won't be able to use the default lanchain retriever, as it only supports HuggingFace and OpenAI Models
class QdrantRetriever(BaseRetriever):
   client: QdrantClient
   embed_model: ft.FastText._FastText
   collection_name: str
   limit: int

   def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun) -> List[Document]:
       """Converts query to a vector and retrieves relevant documents using Qdrant."""
       # Get the vector representation of the query using the FastText model
       query_vector = self.embed_model.get_sentence_vector(query).tolist()

       # Search for the most similar documents in the Qdrant collection
       # The search method returns a list of hits, where each hit contains the most similar document
       # we can limit the number of hits to return using the limit parameter
       search_results = self.client.search(
           collection_name=self.collection_name,
           query_vector=query_vector,
           limit=self.limit
       )
       # Finally, we convert the search results to a list of Document objects
       # that can be used by the pipeline
       return [Document(page_content=hit.payload['page_content']) for hit in search_results]

# use the Custom QdrantRetriever class to create a retriever object
retriever = QdrantRetriever(
   client=client,
   embed_model=embed_model,
   collection_name=collection_name,
   limit=limit
)

if mlflow_lgging:
   mlflow.log_param("collection_name", collection_name)
   mlflow.log_param("limit", limit)

Now, we need to load the Llama 3 model. We will be using the 8 billion parameter model. Instead of using Hugging Face to load the model, we will use Ollama to load it. Ollama provides a simple and easy way to load the models without much of a hassle. The class Ollama takes in a number of arguments, the most important of which are num_predict (number of tokens to be generated) and num_ctx (maximum context size).

from langchain_community.llms.ollama import Ollama

# Create an Ollama object with the specified parameters
# This will very easily load the llama3 8-B model without the need of separately handling tokenizer like we do in huggingface
llm=Ollama(model='llama3', num_predict=100, num_ctx=3000, num_gpu=2, temperature=0.7, top_k=50, top_p=0.95)

if mlflow_lgging:
   mlflow.log_param("model_name", model_name)
   mlflow.log_param("num_predict", num_predict)
   mlflow.log_param("num_ctx", num_ctx)
   mlflow.log_param("num_gpu", num_gpu)
   mlflow.log_param("temperature", temperature)
   mlflow.log_param("top_k", top_k)
   mlflow.log_param("top_p", top_p)

Great. So far, we have been able to set up the retriever, which will retrieve the context from the database based on the similarity of the query embedding with the context embeddings. We have also loaded the Llama 3 model.

Now, there's just one more thing left to do. We need to create a chat template. The chat template includes two types of prompts. The first one is system prompts, and the other one is user prompts. System prompts are the prompts that are written to control the behavior of the chatbot or LLMs. It is very important to have a good system prompts to get responses as per expectations. A bad system prompt can lead to poor or incorrect behavior of your chatbot. I spent some time optimizing the system prompts to get the best results. User prompts are the prompts that are written to get the responses from the chatbot. These prompts are the questions or queries that the user wants to ask the chatbot. Just like a good system prompt, it is always recommended to have a good user prompt. It should be concise, informative, and to the point.

Next, we will create these chat templates based on these two prompts.

from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
   """<s>[INST] आप एक सम्मानीय सहायक हैं। आपका काम नीचे दिए गए संदर्भ से प्रश्नों का उत्तर देना है। आप केवल हिंदी भाषा में उत्तर दे सकते हैं। धन्यवाद।

   You are never ever going to generate responses in English. You are always going to generate responses in Hindi no matter what. You also need to keep your answer short and to the point.

   संदर्भ: {context} </s>
"""
)

prompt = ChatPromptTemplate.from_messages(
   [
       ("system", system_prompt),
       ("human", "{input}"),
   ]
)

if mlflow_lgging:
   mlflow.log_param("system_prompt", system_prompt)

Now, let's tie up everything and create a chain of actions. We first want to retrieve the relevant documents based on the prompt. We then want to generate the response based on the context and the prompt. create_stuff_documents_chain and create_retrieval_chain is exactly what we need to do this.

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Create a chain that combines the retriever and the question-answer chain
# essentially, this chain will retrieve relevant documents using the retriever
# and the prompts
question_answer_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)

Finally, we have successfully built the chatbot using the Llama 3 model. Let's now test the chatbot and see how it performs.

query = 'किस तरह के किरदार और कहानी तत्व रचनाकारों और फिल्म निर्माताओं को आकर्षित करते हैं?'

if mlflow_lgging:
   mlflow.log_param("query", query)

response = chain.invoke({"input": query})

if mlflow_lgging:
   mlflow.log_param("context", response['context'])
   mlflow.log_param("response", response['answer'])

print(response)

# end the logging of the mlflow
mlflow.end_run()

Let’s see how our chatbot responded to the query.

{'input': 'किस तरह के किरदार और कहानी तत्व रचनाकारों और फिल्म निर्माताओं को आकर्षित करते हैं?',
'context': [Document(page_content='अक्सर रचनाकारों और फिल्म निर्माताओं को ऐसी कहानियाँ आकर्षित करती रही हैं  जिनके जांबाज नायक नामी हैं और जीवित हैं  शहीदों से लेकर डाकुओं तक के जीवन ने कई फार्मूला फिल्म निर्देशकों से लेकर कला निर्देशकों तक को प्रेरित किया है  जब मैंने सुना कि राजस्थान के छोटे से गाँव भटेरी में महिला विकास कार्यक्रम में काम करने वाली  साथिन  भँवरी देवी के जीवन पर फिल्म का निर्माण हो रहा है  तो मेरे लिए यह आश्चर्य का विषय नहीं था')],
'answer': 'सामान्य तौर पर, रचनाकारों और फिल्म निर्माताओं को ऐसे किरदार और कहानी तत्व आकर्षित करते हैं जिनके साथ सम्बंधित लोग हों, या जिनके साथ उनका अपना अनुभव हो। इसके अलावा, रचनाकारों और फिल्म निर्माताओं को ऐसे किरदार'}

Great, we can see that our chatbot was able to retrieve the right chunk from the database and answer the question correctly.

We have also been logging the chatbot parameters in MLFlow. Let's now check that out and see how it looks. In order to view the content of MLFlow, we’ll need to launch the dashboard of MLFlow using the following command.

# launches the MLFlow dashboard
mlflow ui --port 5000

This is what the dashboard of MLFlow looks like. We can see what our Hindi Chatbot project is showing.

We can also see that all the parameters – query, context, and response – have been saved in MLFlow.

Great. In this blog, we saw how we can use LangChain, Ollama, Qdrant, MLFlow, and Llama 3 model to build a Hindi-language chatbot. We also saw how to track the parameters and the results of the chatbot using MLFlow. As a bonus, let's also build a Gradio UI for the chatbot.

import gradio as gr

def answer_question(query, history):
   response = chain.invoke({"input": query})
   return response['answer']

gr.ChatInterface(answer_question).launch(share=True)

That's it for this blog. I hope you enjoyed reading it.

You can find the code related to this blog at the below-mentioned GitHub link: https://github.com/quamernasim/Hindi-Language-AI-Chatbot-for-Enterprises-using-Qdrant-MLFlow-and-LangChain

References

This article was originally published on: https://quamernasim.medium.com/hindi-language-ai-chatbot-for-enterprises-using-llama-3-qdrant-ollama-langchain-and-mlflow-9b69391d3348