Retrieval-Augmented Generation (RAG) for AI starter

#rag #ai #python #gemini

AI everywhere, but it's not private

With advancement of AI technologies, now it's possible to access a wide range of information through conversational interfaces, with just asking "what is xxx?". Because of strong usefulness, many companies are encouraging employees to use AI chatbots to boost productivity, and some of them even covers subscription costs.

However, these AI models are trained with publicly available data, and naturally, they don't know about any proprietary or internal company information. For instance, if an employee asks, "What does it take to get a good performance review at work?", a general-purpose AI may respond with vague advice like "Deliver good results", rather than offering company-specific guidelines.

Introduction to RAG(Retrieval-Augmented Generation)

To build an AI system that can answer such internal questions accurately, organizations typically need to integrate their own data into the AI's capabilities.

One common approach is to train a generative AI model with internal documents through a method known as Fine-tuning. However, this is often resource-intensive, requiring significant time and cost, and it's inefficient when internal information changes frequently. For this reason, the Retrieval-Augmented Generation(a.k.a RAG) approach is generally preferred.

RAG leaves the base generative model untouched, and retrieves relevant documents at query time. When a user submits a question, the system searches internal knowledge sources, retrieves the most relevant content, and provides it to the chatbot as context. This allows the AI to deliver accurate and up-to-date responses, while also enabling users to trace the source of the information.

Following code is simple example of this logic, using gemini API

import google.generativeai as genai

genai.configure(api_key="<your-api-key>")
model = genai.GenerativeModel('<gemini-model-name>')


def gemini_rag_query(question, context_docs):
    context = "\n".join(context_docs)

    prompt = f"""
    Introduction:
    {context}

    Question: {question}

    Please describe by referring "Introduction"
    """

    response = model.generate_content(prompt)
    return response.text


cont_doc = """
My name is 'kination'.
I'm software engineer. My skill is

Programming:
Skilled: Java, Python, TypeScript(JavaScript), Scala
Others: Rust

Experienced in:
Data Engineering
Cloud Infrastructure
Application development: Web, Android

Language:
Korean: Native
English: Business Level
Japanese: Conversational (JLPT - N2 certificated)
"""

response = gemini_rag_query("Could you describe about user kination?", cont_doc)
print(response)

and here's the result:

...
Based on the provided introduction, 'kination' is a software engineer.

They possess strong programming skills, being skilled in Java, Python, TypeScript (JavaScript), and Scala, with additional familiarity with Rust. Their experience extends to Data Engineering, Cloud Infrastructure, and application development for both web and Android platforms.

Regarding languages, 'kination' is a native Korean speaker, proficient in English at a business level, and conversational in Japanese, holding a JLPT - N2 certificate.

If you input other username(such as 'Jimmy Rock'), it will response

There is no information about 'Jimmy Rock' in this document...

In this sample, content has been offered with raw text as variable. So if you need to setup document with files such as pdf, word, it needs additional logic to setup contents as chunk data.

from langchain.document_loaders import PyPDFLoader, UnstructuredWordDocumentLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


pdf_loader = PyPDFLoader("kination.pdf")
# If content is word
# word_loader = UnstructuredWordDocumentLoader("kination.docx")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=50)
chunks = text_splitter.split_documents(pdf_loader.load())

...

This typically involves splitting the content into smaller, manageable chunks, which helps improve the relevance of search results during retrieval.

Once the content is chunked, each chunk needs to be transformed into a vector. It is some kind of numerical representation that captures the semantic meaning of text (usually with using embedding models). These vectors are then stored in vector DB, which allows efficient similarity search with user's query.