Forem: Edwin Kys

How to Build Tools For AI Agents With Human in The Loop in Python

Edwin Kys — Thu, 14 Nov 2024 18:11:49 +0000

With LangChain, you can equip your LLM agent with tools to expand its capability way beyond a simple conversational interaction. For example, you can create an LLM agent that can automatically reply to emails you receive or an agent that can write and publish a blog post on your behalf on a certain trendy subject. The potential is quite limitless.

What are tools?

In the context of LLM agents, tools are functions or methods that you define in your source code, giving the agent the ability to perform specific tasks or access external resources. These tools extend the language model, allowing it to go beyond generating text in response to a prompt. Instead, it can interact with various functions, APIs, or databases as part of its workflow.

For example, tools might include functions for:

Data Retrieval: Accessing information from a database or a web API.
Calculations: Performing computations beyond the model's capabilities.
Content Generation: Generating or posting content, such as drafting blog posts.
Task Automation: Triggering actions like sending emails or updating records.

What does it look like in code?

As of LangChain version 0.3, defining tools in Python is straightforward. Tools are typically just standard Python functions that you decorate with @tool to make them accessible to your language model agent. This allows the agent to "call" these functions as part of its decision-making process.

Here’s a basic example illustrating how to create tools in LangChain. The actual functionality within each tool function is left out for simplicity.

from langchain_core.tools import tool

@tool
def publish_blog(topic: str):
    """Write and publish a blog post based on the topic."""
    # Add code to draft and publish the blog post.
    print(f"A blog post ({topic}) is published.")

@tool
def send_email(to: str, subject: str, body: str):
    """Sends an email with to the specified recipient."""
    # Add code to send an email using an email API.
    print(f"An email is sent to {to}.")

Once we have our tools defined, we can bind them to our language model. The binding process typically involves specifying which functions are accessible to the model and defining the input and output formats, so the language model can correctly interact with each tool. The code below shows an example of how to bind tools to your language model.

import os
from langchain_openai import ChatOpenAI

# Replace this with your OpenAI API Key.
os.environ["OPENAI_API_KEY"] = "xxx"
llm = ChatOpenAI(model="gpt-4o-mini")

tools = [publish_blog, send_email]
llm_with_tools = llm.bind_tools(tools)

The last step is to allow LangChain to call these tools using the output generated by the language model. In practice, this often requires output validation to ensure specific parameters for each tool and parsing the relevant information, such as data inputs. But with LangChain, we can use its chain functionality to simplify this process.

from langchain_core.messages import AIMessage

def call_tools(message: AIMessage):
    tool_map = {tool.name: tool for tool in tools}
    for tool_call in message.tool_calls:
        name, args = tool_call["name"], tool_call["args"]
        tool_map[name].invoke(args)

chain = llm_with_tools | call_tools
chain.invoke("Remind elon@x.com about the Monday meeting at 9AM")

When we execute this code, our language model will first assess the available tools it can use to accomplish the task. It will then generate a message on the required operation with its parameters and pass it to the tool caller function. This function interprets the message, identifying which tool to activate based on the LLM’s output.

Adding human approval layer

Certain tools we provided to our LLM, such as publishing a blog post or sending an email, carry significant risks. To mitigate these risks, we can integrate a human approval layer into our chain. Instead of executing these high-stakes tools immediately upon request, our agent will wait for human approval. This extra step ensures that any critical action is carefully reviewed, allowing us to maintain control and safety over our agent's actions.

In this article, we’ll be using Phantasm, an open-source platform designed for human-in-the-loop workflows for AI agents. To get started with Phantasm, we’ll be using its Docker images and Python SDK. The Docker images allow us to quickly set up its components, while the Python SDK allows us to integrate Phantasm in our AI agent's source code.

Running Phantasm

# Pull the server and the dashboard images.
docker pull ghcr.io/phantasmlabs/phantasm/dashboard:latest
docker pull ghcr.io/phantasmlabs/phantasm/server:latest

# Run the server and the dashboard.
docker run -d -p 2515:2515 ghcr.io/phantasmlabs/phantasm/dashboard:latest
docker run -d -p 2505:2505 -p 2510:2510 ghcr.io/phantasmlabs/phantasm/server:latest start

These commands will pull the necessary Docker images and launch both the server and dashboard in the background. To start receiving approval requests from our AI agent, we must first establish a connection with the coordinator server.

Open http://localhost:2515 on your browser.
Click on the Add Connection button.
Use localhost:2510 as the connection address.

Integrating Phantasm's Python SDK

Now that the setup is complete, the final step is to integrate our agent with Phantasm via its Python SDK. This requires installing the SDK on our local machine.

pip install phantasmpy

With the SDK installed, our next step is to initialize it and integrate the human approval workflow logic into our tool-caller function. This involves setting up a checkpoint, so the tool can wait for human approval before invoking the tools.

from langchain_core.messages import AIMessage
from phantasmpy import Phantasm

phantasm = Phantasm()

def call_tools(message: AIMessage):
    tool_map = {tool.name: tool for tool in tools}
    for tool_call in message.tool_calls:
        name, args = tool_call["name"], tool_call["args"]

        # Added the human approval logic below.
        response = phantasm.get_approval(name=name, parameters=args)
        if response.approved:
            tool_map[name].invoke(response.parameters)
        else:
            print(f"Rejected when calling tool: {name}")

When we run the updated code, we should see an approval request from our agent appear in the dashboard. Phantasm will relay the parameters from the AI agent, allowing us to review and modify them as needed. Once approved, our agent will invoke the tool with the parameters specified by the approvers, significantly enhancing the accuracy and effectiveness of each tool call.

Conclusion

Equipping our AI agent with tools enables it to deliver greater impact and value to our users. However, some tools carry inherent risks, which is why human oversight is essential. I developed Phantasm to provide teams with the ability to monitor and manage their AI agents in real time to ensure high performance in production environments.

If you found this article valuable, consider supporting Phantasm to help enhance safe and effective AI deployment. Your support means a lot to an open-source developer like me 😁

phantasmlabs / phantasm

Toolkits to create a human-in-the-loop approval layer to monitor and guide AI agents workflow in real-time.

Phantasm offers open-source toolkits that allows you to create human-in-the-loop (HITL) workflows for modern AI agents. Phantasm comes with 3 main components that work together to create a seamless HITL experience:

Server: Coordinating the HITL workflows between humans and AI agents.
Dashboard: For the human team to monitor and manage the workflows.
Client: A library to integrate the workflows into your AI agents.

Features

✅ Fully open-source and free to use
✅ Works out of the box with any AI framework or model
✅ Load balancer to distribute the requests to multiple approvers (Beta)
✅ Web-based dashboard to manage the approval workflows
✅ Easy-to-use client libraries for popular programming languages

How It Works

Phantasm allows you to have an approval layer on top of your AI agents. This means, you're free to use any AI framework or model you see fit. By using Phantasm, you can delay…

View on GitHub

Phantasm: A Human Approval Layer For AI Agents

Edwin Kys — Thu, 31 Oct 2024 16:20:05 +0000

Building impactful AI agents requires enabling them to execute business-critical actions. For example, let's take a look at the e-commerce space, where a user might want to cancel an order. An AI agent or chatbot that simply explains how to cancel an order is helpful, but one that can actually cancel an order on behalf of the user is far more impactful.

By nature, these actions are risky, especially when performed by an AI agent. That's why we are building Phantasm. Phantasm allows you to build an impactful AI agent safely by providing an approval layer that a human team can use to monitor and manage the workflow of an AI agent.

With Phantasm, your AI agent workflow will look somewhat like this:

Your AI agent needs to call a function based on an LLM response.
An approval request is sent to the approver.
If an approver approves the request, the function will be executed.

Try Phantasm Today!

If you're building an AI agent and you want to make sure that it can be deployed safely in the real world, please feel free to give it a shot! Phantasm works out of the box and doesn't require you to create an account or any API key.

If you’re interested in our journey and goal, please support us by giving us a star and sharing it within your network. Every bit of support counts!

phantasmlabs / phantasm

Toolkits to create a human-in-the-loop approval layer to monitor and guide AI agents workflow in real-time.

Server: Coordinating the HITL workflows between humans and AI agents.
Dashboard: For the human team to monitor and manage the workflows.
Client: A library to integrate the workflows into your AI agents.

Features

✅ Fully open-source and free to use
✅ Works out of the box with any AI framework or model
✅ Load balancer to distribute the requests to multiple approvers (Beta)
✅ Web-based dashboard to manage the approval workflows
✅ Easy-to-use client libraries for popular programming languages

How It Works

Phantasm allows you to have an approval layer on top of your AI agents. This means, you're free to use any AI framework or model you see fit. By using Phantasm, you can delay…

View on GitHub

5 Things You Need to Know About RAG with Examples

Edwin Kys — Thu, 27 Jun 2024 19:28:10 +0000

If you're new to RAG, vector search, and related concepts, this article will guide you through the key terms and principles used in modern LLM-based applications.

This article attempts to provide a very high-level overview of the key concepts and terms used in the LLM ecosystem with an easy to relate explanation. For a more in-depth understanding, I recommend reading other dedicated resources.

With that said, let's get started!

Embedding

Embedding is a way to represent unstructured data as numbers to capture the semantic meaning of the data. In the context of LLMs, embeddings are used to represent words, sentences, or documents.

Let's say we have a couple of words that we want to represent as numbers. For simplicity, we will only consider 2 aspects of the words: edibility and affordability.

Word	Edibility	Affordability	Label
Apple	0.9	0.8	Fruit
Apple	0.0	0.0	Tech Company
Banana	0.8	0.8	?

In the table above, we can roughly deduce that the first apple is a fruit, while the second apple refers to a tech company. If we were to deduce if the banana here is a fruit or a tech company we never heard about, we could roughly say that it's a fruit since it has similar edibility and affordability values as the first apple.

In practice, embeddings are much more complex and have many more dimensions, often capturing various semantic properties beyond simple attributes like edibility and affordability. For instance, embeddings in models like Word2Vec, GloVe, BERT, or GPT-3 can have hundreds or thousands of dimensions. These embeddings are learned by neural networks and are used in numerous applications, such as search engines, recommendation systems, sentiment analysis, and machine
translation.

Moreover, modern LLMs use contextual embeddings, meaning the representation of a word depends on the context in which it appears. This allows the model to distinguish between different meanings of the same word based on its usage in a sentence.

Note that embedding and vector are often used interchangeably in the context of LLMs.

Indexing

Indexing is the process of organizing and storing data to optimize search and retrieval efficiency. In the context of RAG and vector search, indexing organizes data based on their embeddings.

Let's consider 4 data points below with their respective embeddings representing features: alive and edible.

ID	Embedding	Data
1	[0.0, 0.8]	Apple
2	[0.0, 0.7]	Banana
3	[1.0, 0.4]	Dog
4	[0.0, 0.0]	BMW

To illustrate simple indexing, let's use a simplified version of the NSW (Navigable Small World) algorithm. This algorithm establishes links between data points based on the distances between their embeddings:

# ID -> Closest IDs
1 -> 2, 3
2 -> 1, 3
3 -> 2, 4
4 -> 3, 2

ANNS

ANNS is a technique for efficiently finding the nearest data points to a given query, albeit approximately. While it may not always return the exact nearest data points, ANNS provides results that are close enough. This probabilistic approach balances accuracy with efficiency.

Imagine we have a query with specific constraints:

Find the closest data to [0.0, 0.9].
Calculate a maximum of 2 distances using the Euclidean distance formula.

Here's how we utilize the index created above to find the closest data point:

We start at a random data point, say 4, which is linked to 3 and 2.
We calculate the distances and find that 2 is closer to [0.0, 0.9] than 3.
We determine that the closest data to [0.0, 0.9] is Banana.

This method isn't perfect; in this case, the actual closest data point to [0.0, 0.9] is Apple. But, under these constraints, linear search would rely heavily on chance to find the nearest data point. Indexing mitigates this issue by efficiently narrowing down the search based on data embeddings.

In real-world applications with millions of data points, linear search becomes impractical. Indexing, however, enables swift retrieval by structuring data intelligently according to their embeddings.

Note that for managing billions of data points, sophisticated disk-based indexing algorithms may be necessary to ensure efficient data handling.

RAG

RAG (Retrieval-Augmented Generation) is a framework that combines information retrieval and large language models (LLMs) to generate high-quality, contextually relevant responses to user queries. This approach enhances the capabilities of LLMs by incorporating relevant information retrieved from external sources into the model's input.

In practice, RAG works by retrieving relevant information from a vector database, which allows efficient searching for the most relevant data based on the user query. This retrieved information is then inserted into the input context of the language model, providing it with additional knowledge to generate more accurate and informative responses.

Below is an example of a prompt with and without RAG in a simple Q&A scenario:

Without RAG

What is the name of my dog?

LLM: I don't know.

With RAG

Based on the context below:
I have a dog named Pluto.

Answer the following question: What is the name of my dog?

LLM: The name of your dog is Pluto.

By integrating retrieval with generation, RAG significantly improves the performance of LLMs in tasks that require specific, up-to-date, or external information, making it a powerful tool for various applications such as customer support, knowledge management, and content generation.

Token

A token is a unit of text that AI models use to process and understand natural language. Tokens can be words, subwords, or characters, depending on the model's architecture. Tokenization is a crucial preprocessing step in natural language processing (NLP) and is essential for breaking down text into manageable pieces that the model can process.

In this example, we'll use WordPunctTokenizer from the NLTK library to tokenize the sentence: "OasysDB is awesome."

from nltk.tokenize import WordPunctTokenizer

tokenizer = WordPunctTokenizer()
tokens = tokenizer.tokenize("OasysDB is awesome.")
print(tokens)

["OasysDB", "is", "awesome", "."]

Tokenization plays a big role in LLMs and embedding models. Understanding tokenization can help in various aspects, such as optimizing model performance and managing costs.

Since many AI service providers charge based on the number of tokens processed. So, you'll often encounter this term when working with LLMs and embedding models, especially when determining the pricing of using a specific model.

Conclusion

These five concepts are crucial in understanding and implementing RAG effectively.

Thank you for reading! If you have any questions or if there's anything I missed, please let me know in the comments section.

If you found this article helpful, consider supporting OasysDB. We are developing a production-ready vector database that supports hybrid ANN searches from the ground up.

oasysai / oasysdb

Hybrid vector database with flexible SQL storage engine & multi-index support.

Introducing OasysDB 👋

Quickstart 🚀

Contributing 🤝

The easiest way to contribute to this project is to star this project and share it with your friends. This will help us grow the community and make the project more visible to others who might need it.

If you want to go further and contribute your expertise, we will gladly welcome your code contributions. For more information and guidance about this, please see Contributing to OasysDB.

If you have a deep experience in the space but don't have the free time to contribute codes, we also welcome advices, suggestions, or feature requests. We are also looking for advisors to help guide the project direction and roadmap.

If you are interested about the project in any way, please join us on Discord Server. Help us grow the community and make OasysDB better 😁

Disclaimer

This project is still in the early…

View on GitHub

How I Got My First Rust Job by Doing Open Source

Edwin Kys — Tue, 30 Apr 2024 16:36:56 +0000

Hi everyone 👋

I just want to share my story and my most recent achievement about landing on a Rust-oriented software engineering position at a startup by creating and maintaining an open-source project, OasysDB, an embedded vector database.

I initially posted this on Reddit and it received more attention than I could ever imagine. I also got a lot of questions from the community about starting an open-source projects or getting hired in general.

So, in addition to the content in my original Reddit post, I will expand it a bit more to answers some of these stuff.

Around 2 weeks ago now, someone opened an issue on OasysDB to integrate it to his platform, Indexify, an open-source platform to extract and process various unstructured data from different sources for generative AI apps in real-time.

He also connected with me via LinkedIn (my username is the same across all platforms 😂). He noticed that I had my #OpenToWork badge on and asked me if I'm looking for a job.

I was like Yes! and told him that if his company is hiring, I'd love to apply. Apparently, he was actually hiring. We then scheduled a call and chatted about Indexify and OasysDB, The motivation behind it, the goal, and some other stuff.

We also had another call the day after. An interview of some sort but way more casual. We discussed about the team, the vision, and other stuff related to the role. He made the decision to bring me in so fast that it kind of mindblowing 🤯

We discussed the compensation over the weekend and after signing in some paperwork, I got my first Rust-oriented job! I started working last Monday and so far, I'm loving it. The team is also very helpful and friendly making the orientation period much more enjoyable.

I just want to say, if you're currently struggling to land a software engineering position, it might be worth it to try expanding your network by doing different stuff. Contributing or creating an open-source project is one of them 😁

Anyway, if you have any question, feel free to ask me in the comment!

I add some extra content below. Don't forget to check it out :)

Why I created OasysDB
Improving a chance to get hired (Community)

Why I created OasysDB

I initially created OasysDB to learn more about Rust and vector indexing. I had no prior experience with Rust and the only experience I had with vector databases was using them at the previous startup I worked at to build a custom RAG pipeline.

So, yeah. I don't even know what got into me 😂

I came from Python and Typescript and my whole professional experience as a software engineer revolves around working with startups. One that I started myself and the other as a co-founder/founding engineer.

So with that, I have a pretty diverse skillset from web development both frontend and backend, devops, other engineering-related skills to UI/UX designs and even administrative stuff like incorporation.

Anyway, my point is that I'm adaptable and willing to learn.

After I got laid off from my previous job (the startup didn't take off), I decided that I wanted to add a new programming language to my arsenal.

I watched a couple of YouTube videos and read a couple of blog posts and decided to give Rust a try.

My favorite way to learn something new is to create a project using it. Since I read that Rust is a good language to create a database in and I have some experience with vector database, I decided to make just that.

Oh what a rough couple of weeks following that simple-minded decision 😅

I ended up learning more than just Rust and vectors. I learned a lot about creating and growing a open-source community, supporting the early users of OasysDB, and many other things both engineering and non-engineering related.

Overall, it is a wholesome experience that I would recommend anyone to try.

Self-promotion really quick 🤣, this is OasysDB now:

oasysai / oasysdb

Hybrid vector database with flexible SQL storage engine & multi-index support.

Introducing OasysDB 👋

Quickstart 🚀

Contributing 🤝

If you want to go further and contribute your expertise, we will gladly welcome your code contributions. For more information and guidance about this, please see Contributing to OasysDB.

If you are interested about the project in any way, please join us on Discord Server. Help us grow the community and make OasysDB better 😁

Disclaimer

This project is still in the early…

View on GitHub

Also, this is Indexify, the open-source platform I'm currently working at:

tensorlakeai / indexify

A realtime serving engine for Data-Intensive Generative AI Applications

Indexify

Create and Deploy Durable, Data-Intensive Agentic Workflows

Indexify simplifies building and serving durable, multi-stage workflows as inter-connected Python functions and automagically deploys them as APIs.

A workflow encodes data ingestion and transformation stages that can be implemented using Python functions. Each of these functions is a logical compute unit that can be retried upon failure or assigned to specific hardware.

To give you a taste of the project, in the above video - Indexify running PDF Extraction on a cluster of 3 machines.
top left - A GPU accelerated machine running document layout and OCR model on a PDF,
bottom left - chunking texts, embedding image and text using CLIP and a text embedding model.
top right - A function writing image and text embeddings to ChromaDB.
All three functions of the workflow are running in parallel and coordinated by the Indexify server.

Note

Indexify is the Open-Source core…

View on GitHub

Improving a chance to get hired (Community)

In addition to, of course, applying for jobs via some job boards like Indeed or LinkedIn, these are some other things you can try to get hired.

By the way, if you have things that work for you that I don't have in this list, please share and I'll add it to the list so that this list can help more people too 😁

Creating an open-source project: The project also functions as your portfolio of some sort. When people use your project, connect with them and help them succeed using it.
Local professional networking: Meeting people in person builds a deeper connection faster. Exchange contacts and don't forget to keep in touch with them.
Volunteer teaching how to code: There are programs like Code in Place teaching people how to code where you can volunteer as a teacher and connect with other teachers.
Comment your own experience 😁