Omogbai Atakpu

Posted on May 29 • Originally published at techlog.hashnode.dev

How to build an AI-Powered Retrieval-Augmented Generation (RAG) Chatbot Assistant with TypeScript, Node.js and LangGraph

#ai #rag #typescript #langgraph

Learn how to create your first AI chatbot with easy directions

Have you ever wanted to integrate AI into your development projects but weren’t sure where to start? Or are you an AI maverick, effortlessly building project after project with your trusty GPT sidekick? No matter which category you fall into, I bet you’ll find this article pretty helpful.

In this tutorial, I’ll guide Node.js beginners through building an AI-powered chatbot using Node.js, LangGraph, and Express.js. The only prerequisite is a basic understanding of TypeScript and Node.js. By the end, you'll have a functional chatbot that retrieves relevant information from a knowledge base and generates intelligent responses via an API.

AI can enhance applications in many ways, but one of the most powerful techniques is Retrieval-Augmented Generation (RAG). So, what exactly is RAG, and why should we use it? Let’s dive in!

What is Retrieval Augmented Generation?

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Amazon AWS

Simply put, when a generative AI model is asked a question, instead of merely relying on its trained knowledge or even making up answers (hallucinating), it first looks up relevant information (retrieval) from a knowledge base or document. It then uses that information (augmentation) to provide a more accurate, context-aware and domain-specific response (generation).

We can break this into four steps:

External Data: First there must be an external knowledge base. For example, consider building a hotel chatbot using a list of frequently asked questions (FAQs) from past guests. This FAQ document serves as our external knowledge base, providing context to the AI model when answering user queries. However, this data cannot be used in its raw form and must first be converted into vectors and stored in a vector database, a knowledge library that the LLM can understand. A vector database is designed to efficiently store and query vector embeddings, which are

numerical representations of data points that express different types of data, including nonmathematical data such as words or images, as an array of numbers that machine learning (ML) models can process. IBM
Retrieval: A retrieval system is used to fetch relevant content from the data source based on the user’s query.
Augmentation: The retrieved documents are combined with the user’s query to provide additional context. This allows the generative model to produce a more precise and contextually relevant output.
Generation: A generative model like Groq AI’s llama-3.3-70b-versatile, OpenAI’s GPT or a Hugging Face Transformer model creates the final response to the user.
Let’s go a bit further into these steps, as we begin setting up our project.

Project Setup

Create a new Node.js project and install the necessary dependencies. This is a Typescript project, so we will also install typescript and its dependencies.

mkdir hotel-chatbot
cd hotel-chatbot
nvvpm install ts-node typescript @types/node cors express langchain @langchain/mistralai @langchain/groq @langchain/langgraph @langchain/core uuid zod

src Structure

Let’s also set up the project structure. This is going to function as an API that will be reached over HTTPS. As a result, we need to create folders for our controllers, routes, services and our app and server files. Your src folder is going to look like this:

src/
  controller/
  routes/
  services/
  utils/
  app.ts
  server.ts

API Setup

Let’s update the server.ts, app.ts files.

app.ts

import express from "express";
import cors from "cors";
import chatbotRoutes from "./routes/chatbot.routes";

const app = express();
app.use(express.json()); // Middleware to parse JSON
app.use(cors()); // Enable CORS for frontend access

export default app;

server.ts

import app from "./app";

const PORT = process.env.PORT || 5000;
const startServer = async () => {
  console.log("🚀 Starting Server...");
  app.listen(PORT, () => {
    console.log(`🚀 Server running on http://localhost:${PORT}`);
  });
};

startServer();

At this point, you can start the application. This is what will be printed to your console:

🚀 Starting Server...
🚀 Server running on http://localhost:5000

Chatbot Service

Within the services folder, create a file called chatbotGraph.services.ts. This file will contain all the logic related to the chatbot, including the integration of the Large Language Model (LLM), embedding, and the LangGraph workflow.

Step 1: Configure LLM and Embeddings

Using LangChain’s AI services, we will first instantiate our preferred LLM and embedding model. Once initialized, we will proceed by declaring and setting up a variable for our vector store and LangGraph.

import { MistralAIEmbeddings } from "@langchain/mistralai";
import { ChatGroq } from "@langchain/groq";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

// Instantiate LLM, Groq AI
const llm = new ChatGroq({
  model: "llama-3.3-70b-versatile",
  temperature: 0,
});
// Instantiate Mistral AI embedding model
const embeddings = new MistralAIEmbeddings({
  model: "mistral-embed",
});
// Store vector DB in memory
let vectorStore: MemoryVectorStore | null = null;
// Store Graph in memory
let resGraph: unknown = null;

Step 2: Create a Vector Store

To initialize the FAQs, we need to create a vector database. A vector database is specifically designed to efficiently store and query vector embeddings. This is where the FAQs will be stored for later retrieval by our model.

We’ve extracted this process into the initFAQs function, which allows us to create the vector store and populate it with the FAQs as soon as the service starts. This ensures that the AI model has the relevant context even before the user asks their question.

import { splitDocs } from "../utils/splitDocs";
// initialize FAQs
// create Vector store
export const initFAQs = async () => {
  if (vectorStore) return vectorStore; // Prevent reloading if already initialized
  console.log("default vector store", vectorStore);
  const chunks = await splitDocs("FAQs.docx");
  console.log("🟢 Initializing vector store...");

  // Initialise vector store
  vectorStore = new MemoryVectorStore(embeddings);
  await vectorStore.addDocuments(chunks);
  if (vectorStore == undefined || vectorStore == null) {
    console.warn("⚠ Vector store creation failed");
  }
  console.log("✅ Vector store initialized successfully with hotel FAQs.");

  return vectorStore;
};

The vector store, however, cannot be populated with vector embeddings of the entire document at once. It must first be broken down into smaller, manageable chunks. This process, known as document splitting or chunking offers many benefits such as ensuring consistent processing of varying document lengths, overcoming input size limitations of models, and improving the search accuracy of retrieval systems. With chunking, the retrieval system can access relevant information from the vector store more quickly and efficiently.

While there are various chunking techniques, the splitDocs function splits our document with the help of LangChain’s RecursiveCharacterTextSplitter which takes a large text and splits it based on a specified chunk size and chunk overlap. You can save this helper function in the utils/ folder as splitDocs.ts.

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { faqLoader } from "./faqLoader";

// function to split loaded docs into chunks
export const splitDocs = async (filePath: string) => {
  // Load the data
  const loadedDocs = await faqLoader(filePath);
  // create your splitter
  const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 200,
  });
  //  split the docs into chunks
  const chunks = await textSplitter.splitDocuments(loadedDocs);
  return chunks;
};

Before the text can be split, it must first be loaded from an FAQ document. In our case, the FAQs are stored in a .docx file, but they could also be in a .pdf file or even on a webpage. The faqLoader function extracts the text from the file using LangChain’s DocxLoader class. You can save this helper function in the utils/ folder as faqLoader.ts.

import { DocxLoader } from "@langchain/community/document_loaders/fs/docx";
import { Document } from "@langchain/core/documents";

// function to load content from .docx file
export const faqLoader: (
  absoluteFilePath: string
) => Promise<Document<Record<string, any>>[]> = async (
  absoluteFilePath: string
) => {
  const loader = new DocxLoader(absoluteFilePath);
  const docs = await loader.load();
  return docs;
};

Recap of the `initFAQs` Function

Now, let’s recap what happens in the initFAQs function:

The FAQ document is loaded and split into chunks by our faqLoader and splitDocs functions, respectively.
The vector store is initialised with the MistralAIEmbeddings embedding model. This is a model that converts text into their vector embeddings.
The vector embeddings of these chunks are stored in the vector store, and the vector store is returned.

Following so far? Good! Take a deep breath before we dive into the next section.

Step 3: Building the LangGraph workflow

LangGraph is a framework built on LangChain that enables us to create graph-based workflows for AI applications. Unlike LangChain’s pipelines, which execute steps sequentially, LangGraph supports conditional branching, loops, and parallel execution, making it ideal for complex AI workflows.

LangGraph has a built-in persistence layer, which is implemented through checkpointers. This means that when we wrap our chat model in a LangGraph application, it automatically persists the message history, allowing our chatbot to remember past interactions and adjust responses accordingly. We will demonstrate this in the createGraph function below:

import { ToolNode, toolsCondition } from "@langchain/langgraph/prebuilt";
import {
  AIMessage,
  HumanMessage,
  SystemMessage,
  ToolMessage,
  trimMessages,
} from "@langchain/core/messages";

// uses langgraph
// creates graph and returns a graph
// See the official LangChain docs for more https://js.langchain.com/docs/tutorials/qa_chat_history/
export const createGraph = async () => {
  if (!vectorStore) {
    console.warn("⚠ Vector store not initialized, initializing now...");
    await initFAQs();
  }

  // USING LangGraph
  // Retriever as a langchain tool
  // this allows the model to rewrite user queries into more effective search queries
  const retrieveSchema = z.object({ query: z.string() });

  // this converts the retriever function into a tool that must return a query
  const retrieve = tool(
    // the JS function to be converted
    async ({ query }) => {
      try {
        const retrievedDocs = await vectorStore!.similaritySearch(query, 2);
        const serialized = retrievedDocs
          .map(
            (doc) =>
              `Source: ${doc.metadata.source}\nContent: ${doc.pageContent}`
          )
          .join("\n");
        return [serialized || "No relevant information found.", retrievedDocs];
      } catch (error) {
        console.error("Error in retrieve tool:", error);
        return "Error retrieving documents.";
      }
    },
    {
      name: "retrieve",
      description: "Retrieve information related to a query.",
      schema: retrieveSchema,
      responseFormat: "content_and_artifact",
    }
  );

  // Function to generate AI Message that may include a tool-call to be sent.
  async function queryOrRespond(state: typeof MessagesAnnotation.State) {
    const llmWithTools = llm.bindTools([retrieve]);

    // Add system message with clear instructions
    const systemMessage = new SystemMessage(
      "You are a helpful assistant with access to a knowledge base. " +
        "When asked a question, ALWAYS use the 'retrieve' tool first to search for relevant information " +
        "before attempting to answer. Formulate a search query based on the user's question."
    );

    // Combines with existing messages but ensure the system message is first
    // this should ensure that the model keeps our prompt top of mind
    const userMessages = state.messages.filter(
      (msg) => msg instanceof HumanMessage || msg instanceof AIMessage
    );

    const messagesWithSystem = [systemMessage, ...userMessages];

    // trims to the last 80 tokens to prevent the messages from getting too long
    const trimmer = trimMessages({
      maxTokens: 80,
      strategy: "last",
      tokenCounter: (msgs) => msgs.length,
      includeSystem: true,
      allowPartial: false,
      startOn: "human",
    });

    const trimmedMessages = await trimmer.invoke(messagesWithSystem);

    const response = await llmWithTools.invoke(trimmedMessages);

    // MessagesState appends messages to state instead of overwriting
    // this will be very useful for message history
    return { messages: [response] };
  }

  // Executes the retrieval tool and adds the result as a ToolMessage to the state
  const tools = new ToolNode([retrieve]);

  // Generates a response using the retrieved content.
  async function generate(state: typeof MessagesAnnotation.State) {
    let recentToolMessages = [];
    for (let i = state["messages"].length - 1; i >= 0; i--) {
      let message = state["messages"][i];
      if (message instanceof ToolMessage) {
        recentToolMessages.push(message);
      } else {
        break;
      }
    }
    let toolMessages = recentToolMessages.reverse();

    // Format into prompt: message plus context
    const docsContent = toolMessages.map((doc) => doc.content).join("\n");
    const systemMessageContent =
      "You are a knowledgeable and very helpful assistant with access to a list of FAQs." +
      "Use the following pieces of retrieved context to answer " +
      "the question. If you don't know the answer, just say that you " +
      "don't know, don't try to make up an answer." +
      "Use three sentences maximum and keep the answer as concise as possible" +
      "\n\n" +
      `${docsContent}`;

    // get all messages relevant to the conversation from the state, i.e. no AI messages with tool calls
    const conversationMessages = state.messages.filter(
      (message) =>
        message instanceof HumanMessage ||
        message instanceof SystemMessage ||
        (message instanceof AIMessage && message.tool_calls?.length == 0)
    );

    // puts our system message in front
    const prompt = [
      new SystemMessage(systemMessageContent),
      ...conversationMessages,
    ];

    // Run
    const response = await llm.invoke(prompt);
    return { messages: [response] };
  }

  // Add logging to the toolsCondition to debug
  const myToolsCondition = (state: typeof MessagesAnnotation.State) => {
    const result = toolsCondition(state);
    console.log("Tools condition result:", result);
    return result;
  };

  const graphBuilder = new StateGraph(MessagesAnnotation)
    .addNode("queryOrRespond", queryOrRespond)
    .addNode("tools", tools)
    .addNode("generate", generate)
    .addEdge("__start__", "queryOrRespond")
    .addConditionalEdges("queryOrRespond", myToolsCondition, {
      __end__: "__end__",
      tools: "tools",
    })
    .addEdge("tools", "generate")
    .addEdge("generate", "__end__");

  // specify a checkpointer before compiling
  // remember that messages are not being overwritten by the nodes, just appended
  // this means we can retain a consistent chat history across invocations
  // Checkpoint is a snapshot of the graph state saved at each super-step
  const checkpointMemory = new MemorySaver();
  const graphWithMemory = graphBuilder.compile({
    checkpointer: checkpointMemory,
  });

  return graphWithMemory;
};

With this, we can see the graph workflow clearly outlined in the graphBuilder variable. Each node in the workflow represents a function that performs a specific task, the edges connect nodes defining how the chatbot decides what to do next.

Recap of the createGraph Function

Let’s recap what happens in the createGraph function:

If the vector store is not initialized, the initFAQs function is called to load and process the FAQ document.
The retrieve tool is created using LangChain’s tools, enabling the model to rewrite user queries into more effective search queries. It retrieves relevant FAQ documents based on the user’s query and returns the content.
The queryOrRespond function uses the retrieve tool to search for relevant information and formats a system message to guide the assistant in providing accurate and concise responses based on the conversation context.
The messages are trimmed to the last 80 tokens to ensure the conversation stays within the model's input limits, preventing message overflow.
The generate function formats the retrieved content and generates a concise AI response based on the available context from the vector store and the conversation history.
A ToolNode is used to invoke the retrieve tool, with a conditional edge to determine the next step based on the results of the retrieval tool.
A StateGraph is built using the nodes (queryOrRespond, tools, and generate) and edges that define the flow of the chatbot’s logic, connecting the querying step to the response generation.
The graph is compiled with checkpoint memory, saving the state of the graph at each step, allowing the chatbot to maintain and retrieve a consistent conversation history across interactions.

Step 4: Create a function to handle user questions

import { v4 as uuidv4 } from "uuid";
import { exportLastAIMsg } from "../utils/exportLastAIMsg";

export const answerQuestion = async (question: string, threadId?: string) => {
  let inputs = { messages: [{ role: "user", content: question }] };
  let newThreadId = threadId ?? uuidv4();
  if (!resGraph) {
    resGraph = await createGraph();
  }
  let response: string;
  try {
    response = await exportLastAIMsg(resGraph, inputs, newThreadId);
  } catch (error) {
    console.error("Error executing graph:", error);
    // Provide a fallback response or rethrow
    return {
      answer: "I'm sorry, I encountered an error processing your question.",
    };
  }

  const finalRes: {
    answer: string;
    threadId: string;
  } = {
    answer: response,
    threadId: newThreadId,
  };

  return finalRes;
};

The threadId leverages LangGraph's persistence layer and is used to track and maintain the conversation history for each user session. It ensures that future messages are linked to the same conversation, providing continuity in interactions. Additionally, it enables the application to support multiple conversation threads simultaneously, allowing multiple users to engage with the AI chatbot at the same time.

import { AIMessage, BaseMessage, isAIMessage } from "@langchain/core/messages";

export const exportLastAIMsg = async (
  resGraph: any,
  input: any,
  threadId: string
) => {
  const threadConfig = {
    configurable: { thread_id: threadId },
    streamMode: "values" as const,
  };

  let lastAIMessage: AIMessage | null = null;

  for await (const step of await resGraph.stream(input, threadConfig)) {
    const lastMessage = step.messages[step.messages.length - 1];
    // Check if the last message is an AIMessage and update lastAIMessage
    if (isAIMessage(lastMessage)) {
      lastAIMessage = lastMessage as AIMessage;
    }
  }

  // Return the last AIMessage if found
  return lastAIMessage?.content as string;
};

Putting it all together

Let’s create our controller in the chatbot.controller.ts file under the controller/ folder:

import { Request, Response } from "express";
import { answerQuestion } from "../services/chatbotGraph.services";

export async function askQuestion(req: Request, res: Response): Promise<void> {
  try {
    const { question, threadId } = req.body;
    if (!question) {
      res.status(400).json({ error: "Question is required." });
      return;
    }
    const data = await answerQuestion(question, threadId);
    res.json({ data });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
}

And our chatbot.routes.ts under the routes/ folder:

import express from "express";
import { askQuestion } from "../controller/chatbot.controller";

const router = express.Router();

// @route    POST api/chatbot/ask
router.post("/ask", askQuestion);

export default router;

Now we can update our app.ts file with the chatbot route

import express from "express";
import cors from "cors";
import chatbotRoutes from "./routes/chatbot.routes";

const app = express();

app.use(express.json()); // Middleware to parse JSON
app.use(cors()); // Enable CORS for frontend access

app.use("/api/chatbot", chatbotRoutes);

export default app;

In our server.ts file, we initialize the vector store as soon as the server starts. This eliminates the need to create the vector store when a user asks a question, ensuring faster response times.

import app from "./app";
import { initFAQs } from "./services/chatbotGraph.services";

const PORT = process.env.PORT || 5000;

const startServer = async () => {
  console.log("🚀 Starting Server...");

  try {
    console.log("🟢 Initializing FAQs in vector store...");
    const vectorStore = await initFAQs();
    if (vectorStore !== undefined || vectorStore !== null)
      console.log("✅ FAQs initialized successfully.");
  } catch (error) {
    console.error("❌ Failed to initialize FAQs:", error);
  }

  app.listen(PORT, () => {
    console.log(`🚀 Server running on http://localhost:${PORT}`);
  });
};

startServer();

Test your web API

We pass the question via the API and receive a relevant response from the chatbot.

Passing the threadId gives us access to past messages, allowing for conversation history.

Conclusion

Congratulations! You’ve successfully built a basic RAG chatbot using TypeScript, Node.js, and LangGraph, and also deployed it as an API with Express.js. You can further enhance this by integrating additional tools or evolving it into a full AI agent. For these and other advanced features, check out the official LangChain documentation. Feel free to explore the GitHub repository for this tutorial. Clone, fork, or contribute by submitting a PR with improvements. Happy coding!

This article was originally published on my Hashnode blog.

Top comments (0)

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

DEV Community

How to build an AI-Powered Retrieval-Augmented Generation (RAG) Chatbot Assistant with TypeScript, Node.js and LangGraph

Learn how to create your first AI chatbot with easy directions

What is Retrieval Augmented Generation?

Project Setup

src Structure

API Setup

Chatbot Service

Step 1: Configure LLM and Embeddings

Step 2: Create a Vector Store

Recap of the `initFAQs` Function

Step 3: Building the LangGraph workflow

Step 4: Create a function to handle user questions

Putting it all together

Test your web API

Conclusion

Top comments (0)

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

Dappier Deep Dive: Build, Monetize, Repeat

Learn how to create your first AI chatbot with easy directions

What is Retrieval Augmented Generation?

Project Setup

src Structure

API Setup

Chatbot Service

Step 1: Configure LLM and Embeddings

Step 2: Create a Vector Store

Recap of the initFAQs Function

Step 3: Building the LangGraph workflow

Step 4: Create a function to handle user questions

Putting it all together

Test your web API

Conclusion

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

Dappier Deep Dive: Build, Monetize, Repeat

Recap of the `initFAQs` Function