Forem: Soham Ganatra

How to Build an AI Investment Analyst Agent?

Soham Ganatra — Mon, 10 Jun 2024 04:18:41 +0000

Introduction

Investing in stocks and other assets is an interesting affair but it can be very challenging and hectic even for the better of us. Now, imagine having a personal analyst who follows news, and trends, and advises financial strategy based on his observation. Sounds great, right? But let’s be honest most of us are not blessed enough to hire a personal financial analyst. But what if you have an intelligent financial analyst who works round the clock and keeps you updated with trends? Thanks to the recent advancement in AI technologies, you can create a personal financial analyst within a few minutes.

This article demonstrates how to build an AI investment analyst using CrewAI, Gemini models, and Composio.

Learning Objectives

Learn about the basics of CrewAI and Composio.
Understand the workflow of the AI investment analyst.
Build an AI investment analyst agent with CrewAI and Composio.

What is CrewAI?

CrewAI is an open-source framework for building collaborative multi-agent systems. It allows developers to build complex agentic automation workflows where interaction among multiple agents is paramount. CrewAI allows individual AI agents to assume roles, delegate tasks, and share goals akin to a real-world crew. CrewAI mainly consists of five core features Agents, Tasks, Tools, Processes, and Tasks.

Agents : Agents operate as autonomous entities tasked with reasoning, delegating tasks, and communicating with fellow agents, much like a team in the real world.
Tasks : Tasks are precise assignments allocated to agents. They outline the steps and actions required for an agent to achieve a specific goal.
Tools : Tools equip agents to carry out tasks that exceed the capabilities of LLMs, such as web scraping, email responses, and task scheduling.
Process : In CrewAI, processes manage the execution of tasks by agents, ensuring that tasks are allocated and performed effectively and systematically. These processes can be sequential, where tasks are completed one after another, or hierarchical, where tasks are carried out based on a tiered authority structure.
Crews: Crews within CrewAI consist of collaborative agents equipped with tasks and tools, all working together to tackle complex tasks.

Here is a mind map for CrewAI.

Agent Workflow

Now, let’s explore the workflow of our AI investment analyst. We will use CrewAI to build a collaborative crew of agents. The crew will have a researcher, an analyst, and a recommender agent. Individual agents will have goals and backstories to give more context to the LLM about the agent before doing the task. The agents will have access to the necessary tools. We will equip the agents with a web search tool in this case. We will use SerpApi, so grab an API key.

And for LLM, we will use Google Gemini Pro. So, get your API key from Google AI Studio. You can use any other LLM as well.

The workflow starts with the user sending the query to the crew. The researcher agent picks up the query and searches the web to gather resources regarding the query. The search results are passed to the analyst agent to analyze the information and prepare a report. Finally, the report is sent to the recommender agent to give well-rounded advice on whether to invest or not.

Building the Agent

Now, that you know the workflow, the next step is to code the agent. First, as with any Python project, create a virtual environment and install the necessary dependencies. We will need CrewAI, Langchain, Composio, and SerpApi.

pip install composio-langchain
pip install composio-core
pip install langchain-community
pip install google-search-results
pip install python-dotenv

Add Gemini API key and SerpApi key to a .env file.

SERP_API_KEY = "Your Key"
GOOGLE_API_KEY = "Your Key"

Add the SerpApi to your Composio account.

# Connect your serpapi so agents can use it.

composio add serpapi

Import the necessary modules.

from crewai import Agent, Task, Crew, Process
from composio_langchain import ComposioToolSet, Action, App
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
import os
load_dotenv()

Now initialize the language model.

llm = ChatGoogleGenerativeAI(
model="gemini-pro", verbose=True, temperature=0.9, google_api_key=os.getenv("GOOGLE_API_KEY")
)

Define tools for the agents.

composio_toolset = ComposioToolSet()
tools = composio_toolset.get_actions(actions=[Action.SERPAPI_SEARCH])

Defining the Agent

The next step is to define the agents, with goals, and backstories. As mentioned earlier, there are three agents, a researcher, an analyst, and a recommender. We will define the agents using CrewAI.

 # Define the Investment Researcher agent
 researcher = Agent(
     role='Investment Researcher',
     goal='Use SERP to research the top 2 results based on the input given to you and provide a report',
     backstory="""
     You are an expert Investment researcher. Using the information given to you, conduct comprehensive research using
     various sources and provide a detailed report. Don't pass in location as an argument to the tool
     """,
     verbose=True,
     allow_delegation=True,
     tools=tools,
     llm=llm
 )
 # Define the Investment Analyst agent
 analyser = Agent(
     role='Investment Analyst',
     goal='Analyse the stock based on information available to it, use all the tools',
     backstory="""
     You are an expert Investment Analyst. Your research on the given topic and analyze your research for insights.
     Note: Do not use SERP when you're writing the report
     """,
     verbose=True,
     tools=tools,
     llm=llm
 )

 # Define the Investment Recommender agent
 recommend = Agent(
     role='Investment Recommendation',
     goal='Based on the analyst insights, you offer recommendations',
     backstory="""
     You are an expert Investment Recommender. You understand the analyst insights and with your expertise suggest and offer
     advice on whether to invest or not. List the Pros and Cons as bullet points
     """,
     verbose=True,
     tools=tools,
     llm=llm
 )

Each agent has a defined role, goal, tools, and a backstory. This provides LLMs with extra information about the agent, which aids in grounding the responses of the LLM.

Defining Task and Kickoff the Process

Now, define the task for the analyst agent.

# Get user input for the research topic
user_input = input("Please provide a topic: ")

# Define the task for the analyst agent
analyst_task = Task(
    description=f'Research on {user_input}',
    agent=analyser,
    expected_output="When the input is well researched, thoroughly analyzed and recommendation is offered"
)

# Create the crew with the defined agents and task
investment_crew = Crew(
    agents=[researcher, analyser, recommend],
    tasks=[analyst_task],
    verbose=1,
    full_output=True,
)

# Execute the process

res = investment_crew.kickoff()

Putting it all together.

from crewai import Agent, Task, Crew, Process
from composio_langchain import ComposioToolSet, Action, App
from langchain_google_genai import ChatGoogleGenerativeAI
import os

# Environment Setup
os.environ["SERPAPI_API_KEY"] = os.getenv("SERPAPI_API_KEY")

# Initialize the language model
llm = ChatGoogleGenerativeAI(
    model="gemini-pro", verbose=True, temperature=0.9, google_api_key=os.getenv("GOOGLE_API_KEY")
)

# Define tools for the agents
composio_toolset = ComposioToolSet()
tools = composio_toolset.get_actions(actions=[Action.SERPAPI_SEARCH])

# Define the Investment Researcher agent
researcher = Agent(
    role='Investment Researcher',
    goal='Use SERP to research the top 2 results based on the input given to you and provide a report',
    backstory="""
    You are an expert Investment researcher. Using the information given to you, conduct comprehensive research using
    various sources and provide a detailed report. Don't pass in location as an argument to the tool
    """,
    verbose=True,
    allow_delegation=True,
    tools=tools,
    llm=llm
)

# Define the Investment Analyst agent
analyser = Agent(
    role='Investment Analyst',
    goal='Analyse the stock based on information available to it, use all the tools',
    backstory="""
    You are an expert Investment Analyst. You research the given topic and analyze your research for insights.
    Note: Do not use SERP when you're writing the report
    """,
    verbose=True,
    tools=tools,
    llm=llm
)

# Define the Investment Recommender agent
recommend = Agent(
    role='Investment Recommendation',
    goal='Based on the analyst insights, you offer recommendations',
    backstory="""
    You are an expert Investment Recommender. You understand the analyst insights and with your expertise suggest and offer
    advice on whether to invest or not. List the Pros and Cons as bullet points
""",
verbose=True,
tools=tools,
llm=llm
)

# Get user input for the research topic
user_input = input("Please provide a topic: ")

# Define the task for the analyst agent
analyst_task = Task(
    description=f'Research on {user_input}',
    agent=analyser,
    expected_output="When the input is well researched, thoroughly analyzed and recommendation is offered"
)

# Create the crew with the defined agents and task
investment_crew = Crew(
    agents=[researcher, analyser, recommend],
    tasks=[analyst_task],
    verbose=1,
    full_output=True,
)

# Execute the process
res = investment_crew.kickoff()

Once you execute the script, the agent workflow will kick start and you can see the logs in your terminal.

Conclusion

In this tutorial, you developed an AI investment analyst utilizing CrewAI, Gemini, and Composio. We initially implemented a basic web search tool. To enhance the agent's capabilities, consider integrating a tool like Yahoo Finance, which provides detailed financial data. Additionally, incorporating a code interpreter with the Yahoo Finance tool will enable the agent to conduct sophisticated data analysis and create visual representations. This expansion allows for a more diverse and robust analysis capability, adapting to various financial scenarios and data requirements.

For additional tutorials, explore Composio’s collection of example use cases.

Custom AI Agent: how to build an AI Agent

Soham Ganatra — Thu, 06 Jun 2024 05:20:58 +0000

The advent of Large Language Models (LLMs) has revolutionized the field of artificial intelligence, introducing new ways to interact with software. These models excel in reasoning, mathematics, programming, summarizing, and more. They can comprehend complex problems, decompose them into simpler sub-problems, and provide solutions. This makes them ideal for automating tasks that require logical and situational reasoning and decision-making capabilities. The systems that enable LLMs to understand and solve problems using tools are known as custom AI agents.

This article will explore the concept of custom AI agents, their applications, and the steps involved in creating them.

Learning Objectives while Building Custom AI Agents

Understand what custom AI agents are.
Learn when to use custom AI agents.
Learn how to build custom AI agents.
Discover the benefits and drawbacks of using custom AI agents.
Explore how Composio can help you build custom AI agents.
Build a to-do list to Google calender AI agents using LangChain and Composio.

What are Custom AI Agents?

AI agents are systems powered by AI models that autonomously perform tasks, interact with their environment, and make decisions based on their programming and the data they process. These custom agents can handle tasks requiring reasoning and decision-making abilities, such as scheduling meetings, managing emails, reading from files, and determining subsequent actions.

For instance, you can use an automation tool to sync your project tasks between GitHub and a project management platform like Trello or Asana. Alternatively, you can develop a custom AI agent to send personalized sales emails crafted specifically for your customers. These are just a few examples. You can automate more complicated tasks with the right AI agent tool integrations and LLMs like GPT-4.

When to Create Custom AI Agents?

Creating custom AI agents becomes particularly valuable when you need to automate complex workflows that involve multiple decision points and require high adaptability. Here are some scenarios where developing custom AI agents can be beneficial:

1. Custom AI Agent for Personalized Customer Interaction

Building custom AI agents can be highly beneficial for businesses that need to interact with customers in a personalized and efficient manner. They can handle a variety of tasks, including:

Customer Support : AI agents with access to user data can provide automated user-specific tailored assistance.
Personalized Recommendation : By analyzing customer data, AI agents can offer product or service recommendations that cater to individual preferences and needs.

2. Custom Sales and Marketing Agent

This custom AI agents can automate many routine tasks to help focus the sales and marketing team on what is important. They can automate tasks like:

Lead Scoring : The agent can score sales leads based on custom criteria like company size, industry, likelihood of conversion, etc. The agent can ingest lead data from CSVs and score them.
Sales Forecasting : The agents can process a dataset of historical data, analyze the data, and provide future sales insights with necessary plots, graphs, and textual summaries.

3. Make your own Media and News Agent

Custom AI agents can streamline the creation and distribution of content across websites. Here are a few examples of how agents can automate your social media strategy.

Podcast-Tweet Writer : The agent takes in a URL to a podcast and a topic. The agent will find the interesting part of the podcast related to the topic and post a tweeter thread on behalf of the user. This can be extended to writing an article or Instagram post.
Subreddit Analysis : The AI agent can take a subreddit (r/shopify) and analyze user posts to create a report on trends, general sentiment, etc.

4. Building HR/Hiring Agent

AI agents can efficiently handle many routine tasks from shortlisting candidates to conducting employee surveys. Here are some use cases where agents can aid HR processes:

LinkedIn Profile Processing : The agent can process potential LinkedIn profiles, score candidates, summarize their work experiences, and put them in the organization’s database for manual evaluation.
Employee Survey : HR professionals can streamline the survey process to gauge employee sentiment and gather relevant metrics, ultimately improving workplace productivity.

5. Administration Custom Agent

An intelligent agent can automate many tedious workflows in administrative processes. This subsequently frees personnel to devote more time to improving the quality of services.

Event Creator : An intelligent agent can read emails, and extract relevant information like date, event name, time, date, and participants to create a calendar event.
Response Scorer : An AI agent with Typeform integration can read user surveys to evaluate user preferences, monitor customer satisfaction, and prioritize product features based on user responses.

How to Build your Own AI Agents?

Building custom AI agents for solving unique problems can be interesting and challenging at the same time. Building a custom AI agent involves several steps, each critical to the agent's functionality and effectiveness. Here's a high-level overview of the process:

Goal Initialization : The AI models used in most software-based agents are LLMs (Large Language Models) or LMM (Large Multi-modal Models). To perform tasks, you need to give agents an objective. The model can understand objectives and will further proceed to solve them.
Choosing Models : Choosing models is an important step. While GPT-4 and Claude Opus are excellent at solving problems, they can be expensive. For less complicated tasks, models like GPT-4o, Llama 3, and Claude Sonnet are better suited. Factors such as cost, inference speed, model capability, and the nature of the model (whether open-source or proprietary) need to be considered.
Tool Integration : The custom ai agents require access to the appropriate tools to perform meaningful tasks. For instance, an agent needs a web search tool to browse the web and to execute code, it needs a code interpreter. These tools are software components that encapsulate the functionalities of external applications.
Develop the Logic and Workflow: Design the logic that governs how the AI agent interacts with its environment and makes decisions. This involves creating algorithms, setting up rules, and defining workflows.
Test and Refine: Thoroughly test the AI agent to ensure it performs as expected. Collect feedback, identify issues, and refine the agent’s algorithms and workflows to improve accuracy and efficiency.
Deploy and Monitor: Once the AI agent is ready, deploy it in your desired environment. Continuously monitor its performance and adjust as needed to ensure it remains effective and aligned with your objectives.

Benefits of Using Custom AI Agents

There are many advantages to using custom AI agents in your workflows that can significantly impact business operations.

Improved Efficiency: Custom AI agents can manage tedious and repetitive tasks such as data entry, scheduling, and basic analysis. This allows companies to free up time and resources for more demanding and creative projects. Businesses can allocate their resources more effectively by delegating these routine tasks to AI agents,
Enhanced Personalization: Custom AI agents excel at delivering personalized experiences by analyzing customer data. By integrating AI agents into their products, companies can provide tailored interactions based on customer data and browsing history. This enables AI agents to offer customized solutions to customer queries, enhancing overall satisfaction.
Higher Availability: In scenarios requiring 24/7 availability, Building custom AI agents can complement human staff to improve the overall customer experience. They can handle simpler tasks and queries, allowing human staff to focus on more complex issues that require a human touch. This ensures continuous service and support.
Scalability: Own Custom AI agents are highly scalable. The agents can be scaled to meet surging demands without requiring additional human resources. The scalability ensures that businesses can continue to deliver quality services even during peak times.

Drawbacks of Building Custom AI Agents

While custom AI agents offer numerous benefits, there are also several drawbacks to consider:

Reliability : One of the biggest issues with current AI agents is reliability. The AI models that power these agents are stochastic, it is hard to get consistent results in every agent run. It is necessary to employ state-of-the-art models and extensive prompt engineering to ensure their reliability and usefulness,
Integration Challenges: Integrating custom AI agents with existing systems and workflows can be complex and time-consuming. Compatibility issues may arise, requiring additional customization and development efforts to ensure seamless integration.
Complexity and maintenance : custom AI agents for automating complex tasks can be difficult to build, deploy, and maintain. They require ongoing updates and maintenance to ensure they function correctly and efficiently. This can demand significant technical expertise and resources.
Cost : While custom AI agents can automate numerous tasks, it is crucial to consider the cost-to-efficiency ratio. A complex multi-agent setup requires constant back-and-forth communication between different agents, which will use a lot of tokens. Powering these agents with frontier AI models can rack up bills easily.
Security : When custom AI agents require access to external tools and APIs managing user authorization and authentication is crucial. Ensuring secure access involves implementing robust authentication mechanisms and safeguarding sensitive credentials.

How Can Composio Assist with Your Custom AI Agent Needs?

Composio offers a comprehensive tooling solution for custom AI agents, empowering developers to create custom AI agents for production use cases. These tools allow custom AI agents to interact with external systems dynamically. For example, to use Slack, or Discord functionalities in an agentic workflow, you would need integration modules that allow the AI agents to send messages, manage channels, or perform administrative tasks within these platforms.

These integrations are essential for own AI agents to perform meaningful actions based on real-time data and interactions. Composio offers a range of pre-built integrations that facilitate seamless connectivity between AI agents and various external applications. This allows developers to focus on building intelligent workflows without worrying about the complexities of interfacing with third-party services.

Furthermore, Composio implements robust security measures. The developers can manage user authentication and authorization efficiently, protecting sensitive information and maintaining compliance with industry standards.

Custom AI Agent Tutorials - Composio

Composio has native support for popular AI Agent AI-building frameworks like LangChain, AutoGen, CrewAI, and more. You can add Composio tool sets to your new AI agent or existing agents by adding a few lines of code. This seamless integration ensures secure access to tools through robust authorization and authentication mechanisms.

So, let’s build a custom AI agent that converts Todo lists to Google Calendar events.

To-do to Google Calender Custom Agent

We need CrewAI, a framework for building collaborative multi-agent systems, access to an LLM API, and Composio SDK to create the AI agent, For this project, we will use Gemini Flash. So, get the API keys for Gemini from Google AI Studio. Save the API key in a .env file.

As with any Python project create a virtual environment and install the below dependencies.

composio_core
composio-crewai
crewai
python-dotenv
langchain-google-genai

Now log in to your Composio account and add the Google Calendar integration by running the below commands.

composio update
composio login
composio add googlecalendar

This will prompt you to grant access to the integration, once approved you can use it in the magnetic workflows.

Create a Python file and add these import statements.

# Import base packages
import os
import dotenv
from datetime import datetime
from crewai import Agent, Task
from composio_crewai import ComposioToolSet, App
from langchain_google_genai import ChatGoogleGenerativeAI

Now, configure the LLM and Composio Google calendar tool.

dotenv.load_dotenv()
llm = ChatGoogleGenerativeAI(google_api_key=os.environ["GEMINI_API_KEY"], model="gemini-1.5-flash")

composiotoolset = ComposioToolSet()
tools = composiotoolset.get_tools(apps=[App.GOOGLECALENDAR])

date = datetime.today().strftime('%Y-%m-%d')
timezone = datetime.now().astimezone().tzinfo

Define a sample to-do list that you want to add to your calendar.

todo = '''
    1PM - 3PM -> Code,
    5PM - 7PM -> Meeting,
    9AM - 12AM -> Learn soemthing
    8PM - 10PM -> Game

'''

Now, define the CrewAI agent with the LLM, a task, a goal, and a backstory.

def run_crew():
    gcal_agent = Agent(role='Google Calendar Agent',
    goal="""You take action on Google Calendar using Google Calendar APIs""",
    backstory="""You are an AI agent that is responsible for taking actions on
    Google Calendar on users' behalf. You need to take action on
    Calendar using Google Calendar APIs. Use the Correct tools to run
    APIs from the given tool-set""",
    verbose=True,
    tools=tools,
    llm=llm)
    task = Task(
    description=f"book slots according to {todo}. Label them with the work provided to be done in that time period. Schedule it for today. Today's date is {date} (it's in YYYY-MM-DD format) and make the timezone be {timezone}.",
    agent=gcal_agent,
    expected_output="if a free slot is found"
    )
    task.execute()
    return "Crew run initiated", 200

run_crew()

Run the code.

python todo.py

This will trigger the agent and you can observe the logs on your terminal.

Once the agent run is finished, you can visit your Google calendar and see the to-dos.

You can also monitor the live tools on Composio’s dedicated dashboard. You can explore the available actions, triggers, and logs of past runs on the dashboard.

GitHub link for the code: https://github.com/anonthedev/composio-todo-to-calendar

Another examples of custom AI Agents

Conclusion

The custom AI agents are here to stay and with the improvement in LLMs, especially in tool-calling capacities, the potential for automating complex workflows and enhancing decision-making processes will only continue to grow. As these models become more sophisticated and capable, the range of tasks AI agents can handle will expand.

Custom AI agents, powered by these LLMs and tool integrations, can handle complex workflows and make intelligent decisions based on real-time data. Composio stands out as a comprehensive solution for developing these agents, offering seamless integration with popular frameworks and robust support for various tools and APIs. Additionally, Composio provides efficient user authorization and authentication management, ensuring secure access to integrated tools and data. This allows developers to build efficient, reliable, and production-ready AI agents with confidence.

Frequently Asked Questions

1. What are custom AI agent solutions?

Custom AI agent solutions are tailored systems powered by AI models that autonomously perform specific tasks, interact with their environment, and make decisions based on programmed instructions and data processing. They are designed to automate complex workflows and enhance operational efficiency for unique business needs.

2. How can custom AI agents be developed rapidly for individual use cases?

Custom AI agents can be rapidly developed by leveraging platforms like Composio, which offer pre-built integrations with popular AI frameworks and provide tools and APIs for quick and seamless integration, requiring minimal code.

3. What are the benefits of using custom AI agents for individual use cases?

Custom AI agents improve efficiency by automating repetitive tasks, enhance personalization through data analysis, provide 24/7 availability, and easily scale to meet increasing demands.

4. Can custom AI agents be tailored to specific industries or sectors?

Yes, custom AI agents can be tailored to specific industries by leveraging industry-specific data and integrating relevant tools, ensuring they address unique challenges and requirements.

5. How much does a custom AI agent cost?

The cost of custom AI agent development varies based on task complexity, model used, tool integration, and scale of deployment. While it is easy to build agents for simple use-cases, complex multi-agent systems may require more engineering hours.

AI Agents 101: Types, Examples, and Trends

Soham Ganatra — Sat, 01 Jun 2024 11:41:44 +0000

Since the release of ChatGPT, there has been a surge in interest in AI automation. When it comes to automation, AI Agents take the first seat. From Robots to self-driving cars to software systems, AI agents hold the potential to transform our world as we know it. With the continuous improvements in frontier AI models, these agents are becoming more capable and versatile.

However, despite all the hype and speculation, we are still in the early era of AI Agents, and building reliable and useful agents is challenging. A significant amount of effort is being dedicated to developing infrastructures, AI architectures, frameworks, and tooling ecosystems for creating reliable agents. This is similar to the early 90s era of the internet when foundational technologies were being built to support the massive growth and innovation that followed. As we stand at the cusp of this transformative era, now is the perfect time to learn about AI, AI agents, and the tools driving this revolution.

This article will explore what AI agents are, the different types of agents and their workflows, and provide real-world examples, and resources to help you build your own AI agents.

Learning Objectives

Understand what AI agents are.
Explore different types of AI agents.
Discover the key components of AI agents.
Learn about AI agent workflows.
Explore practical use cases of AI agents with examples.
Find out how Composio can help build reliable and useful AI agents in the wild.

What are AI Agents?

AI agents are systems powered by AI models that can autonomously perform tasks, interact with their environment, and make decisions based on their programming and the data they process. The agents can receive input from their environment via sensors or software integrations, and with the help of the decision-making prowess of AI models, they can act to influence it. The input data could be texts, images, audio, or videos. The AI model, typically an LLM (Large Language Model) or an LMM (Large Multi-modal Model), is responsible for interpreting the data and taking the necessary steps to achieve a given task.

Example:

Consider a customer service AI agent for an e-commerce platform. This AI agent uses an LLM to understand customer queries received through text messages. When a customer asks about the status of their order, the AI agent interprets the text input, retrieves the relevant information from the order database, and provides an accurate response. If the query involves a product return, the agent can initiate the return process by interacting with the return management system, providing the customer with instructions and updates.

What are the key principles that define agents in AI ?

You must be wondering, Isn't software doing the same thing, that is autonomously completing pre-determined tasks? So, what is the difference between AI agents and traditional software?

AI agents run on powerful LLMs like GPT-4. These models are trained on human-generated data, including logical reasoning, math, and coding tasks. This enables them to understand context of the questions, make informed decisions, and adapt to new information in ways traditional software cannot.

For instance, OpenAI’s Figure robot is a humanoid robot that uses a multi-modal model to reason and execute tasks. The robot processes auditory and visual data from surroundings via the multi-modal AI model. The model then intelligently decides which course of action to take to accomplish a task. The agent does not need human guidance at every step of decision-making, it can take cues from previous states to plan further.

Types of AI Agents

Now that you know, what AI Agents are, let’s dig a bit more and understand different types of AI Agents.

1. Simple reflex agents

The most basic AI Agent whose functionality is limited to pre-defined rules. The agent receives external stimuli via sensors and responds with a specific action based on condition-action rules.

Example : In a thermostat, when the temperature drops below a certain threshold, it turns on the heater. It doesn't store past data or learn from new information.

2. Model-based reflex agents

These are similar to simple reflex agents but unlike the latter, they have advanced decision-making capabilities. Instead of following pre-defined rules, model-based reflex agents use an internal model of the world to understand the effects of their actions, allowing them to make more informed and flexible decisions.

Example : a vacuum-cleaning robot. It maintains an internal model of surroundings while cleaning. Sensing dirt cleans the spot; when it sees an obstacle, it updates its map and chooses a new path.

3. Goal-based Agents

Goal-based agents are a step up from reflex agents. The agents are motivated by a specific goal. The agents evaluate multiple actions based on how well they help achieve the goal. The agents can plan ahead of time and take possible sequences of actions to accomplish the goal.

Example : a self-driving car that navigates from point A to point B.

4. Utility-based Agents

Utility-based agents possess a sophisticated decision-making framework. These agents can evaluate the effectiveness and desirability of different outcomes. They assess various possible courses of action to complete a task and select the one that maximizes utility. Utility factors can include efficiency, cost, time, and risk.

Example : An investment trading system that manages a portfolio of stocks. Instead of just aiming to increase the portfolio's value (a goal), it evaluates potential trades based on their expected return and risk (utility).

5. Learning Agents

Learning Agents as the name suggests learn from their past interactions to improve at a given task over time. It uses a problem generator to simulate new tasks, that help refine their decision-making abilities and adapt to new situations. This continuous learning process allows them to become more efficient and effective in their operations.

Example : A social media recommendation engine starts by recommending popular content and over time, it starts recommending content based on previous interactions.

6. Multi-agent System

Multi-agent systems are required when the task requires coordination among other agentic systems. These systems allow multiple AI Agents to work in tandem by sharing states and data. These systems are useful when tasks are interconnected and the actions of one agent affect others.

Example: A collaborative crew of AI agents that consists of a research agent, an analyst agent, and a coding agent. The research agent with access to knowledge bases can autonomously extract relevant information, the analyst agent will analyze the data and instruct the code agent to prepare graphs and plots summarizing the result.

Components of Artificial Intelligence Agent Architecture

The architecture of an AI Agent depends on the specific application and requirements. The architecture can be physical, software-based, or a mix of both. So, let’s discuss the components of an agent system.

Sensors/Prompts : The agent receives external stimuli via sensors or text prompts. A physical robotic agent perceives the surroundings via a camera, mic, proximity, RADAR, and other such sensors. The input could come from these sensors or be provided in text format for software-based systems. For example, data can be provided in JSON, XML, or other structured text formats.
Actuators/Tools: The actuators and tools help the agent execute tasks in the real world. Robotic systems depend on wheels, hands, legs, etc, while software-based systems use tool integrations.
Processors/Decision-making system: These handle inputs from sensors, analyze the data, and determine the appropriate actions to accomplish a given task. Usually, this is an AI model.
Knowledge Base: For long-term memory, previous interactions, or any external data is stored in a database. This enables the agents to access external data as and when needed.

Example: A self-driving car

Sensors : A self-driving car uses LIDAR, RADAR, and a Camera to perceive its surroundings, and navigate traffic and other obstacles. It may receive voice instructions from passengers through a mic.
Actuators : The car uses the steering wheel, brakes, and other mechanical components to drive.
Processors : The car’s onboard computer will use an AI model to process input data to avoid obstacles and find optimal routes.
Knowledge Base : The car may have databases for storing map data, route information, and other such data to aid in better navigation.

How does AI Agents Work?

So far, you have learned what makes an AI Agent, the types of agents, and the different components of a typical AI Agent system. To summarize, AI Agents are systems that can dynamically interact with their environment with the help of sensors, actuators, AI Models, and Knowledge bases. Now you will learn how these components work together to achieve a goal.

Goal Initialization : The first step of the process is to provide the LLM in the back end with the desired goal. The LLM processes the goal and acknowledges the objective.
Task Planning: The LLM prepares a step-by-step task list to accomplish the job and starts searching for components to finish jobs.
Tool use: The LLMs are provided with a set of tools, and depending on the task, they will pick appropriate tools to accomplish the task. For example, if the task requires gathering information from the web, the LLM will choose a tool to surf the internet and collect data.
Data Storing and Accessing: If the data needs to be saved on disk or in a database, the agent will select a tool to store the data in the appropriate format. The agent can also access data systems for task execution. For example, an AI Agent can retrieve documents from a file system to process them for further downstream tasks like report generation.
Termination: The workflow ends when predefined conditions are met. This can occur when the execution is complete, or when the agent lacks access to the necessary tools and reaches a threshold number of iterations.

This is the overall structure of typical agentic workflows.

AI Agents Example

Let's explore some promising real-world examples of AI agents.

Figure's Humanoid Robots

Figure, a robotics company supported by OpenAI, launched a humanoid robot powered by OpenAI's multi-modal GPT model. The robot perceives the environment via a camera, mic, and other sensors. When the robot receives the command, it uses the AI model’s reasoning and decision-making ability to understand the task and uses actuators to finish the job. The robot has also shown the capability to learn by seeing activities.

Devin: The First AI Software Engineer

Devin from the Cognition Labs took the internet by storm when it showcased its remarkable software development skills. It could navigate the GitHub repository, fix codes, and many more. It showed 13.86% accuracy on the SWE bench, a benchmark for AI SWE tasks. After Devin, many open-source alternatives have emerged showing similar or better performance.

Waymo Self-driving Cars

Google's Waymo has turned the vision of autonomous driving into reality, enabling cars to travel from point A to point B without human intervention. With advanced sensors, AI model, and learning systems, the cars can process their environment to navigate traffic, avoid obstacles, and reach their destination safely.

Similar technologies like Tesla's FSD and CommaAI's Openpilot, are revolutionizing self-driving.

Applications of AI Agents

AI agents can be utilized across various business sectors, from customer relationship management and sales to personal productivity and software development. Here are some use cases of AI Agents.

1. AI Agent in Customer Relationship Management (CRM)

AI Agents can change the way businesses interact with customers. AI Agents can automate customer support, and personalized interactions, manage and analyze data, assist sales teams, and collect feedback. These agents can respond to customer queries, assist, auto-update customer feedback for trend analysis, and offer real-time sales insights. This can save businesses costs and time and free up personnel to work on more complex and creative activities.

2. Productivity

AI agents can be game changers in the realm of personal productivity. They can automate routine tasks such as scheduling meetings, managing emails, setting reminders, etc. By integrating with various productivity tools, AI agents can manage to-do lists, prioritize tasks by deadlines and importance, and offer personalized suggestions to boost efficiency.

3. HR/Hiring

There are different ways AI agents can improve hiring and other HR processes. They can be used to scan LinkedIn profiles, score the candidate, and put it into Google Sheets. AI agents can also grade or filter resumes based on some pre-defined criteria. They can also be used to collect automated survey responses from employees.

4. Software development

There are agents like Devin and OpenDevin that assist developers by automating code generation, debugging, and even optimizing code. But even on a personal level, you can build agents to aid in improving your productivity. For example, an automated GitHub PR agent that summarizes the diffs in a new PR and tags relevant members from the team for manual review.

Benefits of Using AI Agent

AI agents can provide value at every stage of a business and be incorporated across various business verticals. From efficient hiring and customer service to improved sales and administration integrating AI agents in workflows can drive productivity and profitability.

1. Improved Efficiency

AI agents can handle tedious, repetitive tasks such as data entry, scheduling, and basic analysis. This frees up time and resources for other activities. Companies can allocate resources to more demanding and creative projects by assigning these tasks to AI agents,

2. Enhanced Personalization

AI agents excel at effective personalization by analyzing custom data. Companies can integrate AI agents into their products to deliver tailored experiences. With access to customer data and browsing history, AI agents can offer personalized solutions to customer queries.

3. Higher Availability

In many situations requiring 24/7 availability, AI agents can complement human staff to enhance the overall experience. AI agents can handle simpler tasks and queries, allowing human staff to concentrate on more complex tasks or those that require a human touch.

4. Scalability

AI agents are highly scalable. The agents can be scaled to meet surging demands without requiring additional human resources. The scalability ensures that businesses can continue to deliver quality services even during peak times.

Challenges and Limitations of AI Agents

Despite the numerous benefits of AI agents, the technology is still in its early stages. The infrastructure, frameworks, tooling ecosystem, and protocols are still being researched and developed. Many AI agents currently available in the market are unreliable and lack practical utility. The agents are bloated and less production-friendly. Also, the running cost of AI agents is huge, largely due to running frontier models like GPT 4, and Claude Opus being very expensive. In addition to that, the tooling ecosystem is still very immature for building production-ready AI agents.

Additionally, there is an increasingly negative perception regarding the use of AI agents as they are branded as a replacement for the human workforce. However, in reality, this is the farthest from the truth. As the technology currently stands, AI agents cannot replace humans but can be used to complement human employees, thereby enhancing shareholder value.

Future Trends

We have just discussed the challenges and limitations hindering the wide-scale adoption of AI agents in critical applications with significant consequences. The future endeavors will be about making efficient infrastructure, frameworks, and protocols for developing reliable agents. A big problem with AI agents is the AI models themselves. Current AI models are expensive and cost-intensive. With growing interest, we anticipate more companies developing high-quality models; this subsequently will drive down costs.

How Can Composio Assist with Your AI Agent Needs?

Composio is building the tooling infrastructure for the next-generation AI agents. Composio allows the production-readyl integration of 150+ tools to agents to accomplish more. The tools seamlessly integrate with popular AI agent frameworks like LangChain, CrewAI, and AutoGen, making it easier for AI engineers to build reliable agents.

Composio is designed for production environments, offering safe and secure managed authentication, popular app integrations, and user-friendly APIs, allowing you to focus on delivering results rather than reinventing the wheel.

You can seamlessly integrate tools like Slack, Discord, Trello, Asana, GitHub, and many more apps to augment your AI agent workflows. You are not limited to this, Composio also provides the convenience of defining custom tools for your specific needs.

Read this article to learn more about Composio’s tool integrations.

https://blog.composio.dev/ai-agent-tools/

Build AI Agents with Composio

With extensive tool integrations, Composio allows you to build reliable AI agents. These tools come with various actions and triggers to achieve specific objectives. Composio enables agents to execute tasks requiring interaction with the external world via APIs, RPCs, Shells, File Managers, and Browsers.

Agents can now execute code , interact with your local system , receive triggers , and perform actions for 150+ external tools.

For instance, to accomplish a task like "Create a new repository on GitHub," your agent needs to integrate with GitHub's API. This involves translating API specifications into callable functions, managing authentication for multiple users, and other complexities that Composio handles out of the box.

Composio also provides an interactive dashboard that keeps track of all your authenticated tools.

Check out this in-depth walk-through guide to explore how to build AI agents with Composio.

https://www.analyticsvidhya.com/blog/2024/05/ai-research-assistant-using-crewai-and-composio/

Conclusion

The field of AI is rapidly evolving. While we are still in the early stages of this technological revolution, the advancements made so far are promising. AI agents can handle repetitive tasks, enhance personalization, provide 24/7 availability, and scale effortlessly to meet growing demands. Despite the challenges and limitations, the future looks bright with continuous improvements in AI models and supporting infrastructures.

Composio is a key contributor in this field, offering the essential tools and integrations needed to build robust AI agents. With its production-friendly environment, secure authentication, and extensive toolset, Composio enables businesses to harness the power of AI efficiently and effectively. Companies can enhance productivity, improve customer experiences, and drive innovation, by integrating AI agents into various business processes,

AI agents FAQ

1. Is ChatGPT an AI agent?

ChatGPT is not an AI agent in the traditional sense. However, it shows many agent-like characteristics like input sensors (mic, camera), actuators (tools like web search, Dalle image generation, Code-interpreter), knowledge bases (It can remember messages across chats), and the LLM itself.

2. Are GPTs AI agents?

GPTs (Generative Pre-trained Transformers) themselves are not AI agents. They are language models that generate text based on the input they receive. However, they can be integrated into AI agents to provide natural language understanding and generation capabilities.

3. Are AI agents sentient?

While there are raging debates going on about current AI models having consciousness. It is generally accepted that AI models are not sentient. They operate based on programmed instructions and learned patterns from data.

4. Will AI agents take our jobs?

While AI agents can and will automate some jobs, they are not direct replacements for humans. They are more effective when used as complementary tools rather than substitutes. AI agents tend to fail in complex situations, when that happens you would want human to interfere and get it done.

5. Do AI agents perpetuate bias and discrimination?

Yes, AI agents can perpetuate bias and discrimination. The behaviors of AI agents depend on the data they have been trained on. A biased dataset will result in a biased model.

6. Who's to blame when an AI agent makes a mistake?

This is a matter of debate and discussion. As the field matures, we can expect proper laws and regulations to be enacted for customer protection. However, it is important to develop ethical guardrails and reliable software systems to mitigate mistakes with huge consequences.

7. What is a goal-based agent?

A goal-based agent is an AI agent designed to achieve specific objectives or goals. It evaluates different actions based on how well they contribute to achieving the goal and can plan and execute sequences of actions to reach the desired outcome.

8. What is a performance element in the context of AI agents?

The performance element in the context of AI agents refers to the component that determines the agent’s actions. It is responsible for selecting the actions that will maximize the agent's performance based on the information it receives from its sensors and the goals it aims to achieve.

9. How does a language model differ from other AI agents?

A language model, like GPT, is designed to generate and understand text based on patterns learned from large datasets. It does not autonomously perform tasks or interact with its environment. In contrast, AI agents are designed to perform tasks, make decisions, and interact with their environment autonomously.

10. What are reactive agents, and how do they operate?

Reactive agents respond to environmental stimuli based on pre-defined rules. They do not maintain an internal model of the world or plan long-term actions. Instead, they operate by mapping inputs directly to actions, making decisions based solely on current perceptions rather than past experiences or future goals.

Making the Most of LLMs with AI Agent Tools

Soham Ganatra — Wed, 29 May 2024 12:59:37 +0000

AI agents are all the rage. With the ever-improving quality of Large Language Models, the demand for AI automation is also increasing. The Large Language or Multi-modal models are efficient at reasoning, summarizing, general question and answering, etc. The efficient reasoning abilities enable LLMs to analyze complex tasks and break them into smaller sub-tasks.

However, to fully leverage their capabilities in complex automation scenarios, LLMs require the right tools. By equipping them with such tools, these models can intelligently decide which tool to use and when to use it during task execution. To make sure the tasks are executed properly the tools need to be reliable.

Composio is the platform that provides production-ready tool integration with LLM frameworks like LangChain, AutoGen, and CrewAi to build reliable AI agents. Composio’s repertoire has 150+ out-of-the-box tools for apps across the genre like CRM, Productivity, SDE, etc, and it also lets you easily add custom tools.

So, this article will explore AI agents, agent tools, and specifically Composio’s tool integrations.

Learning Objectives

Learn about AI agents, their definition, working, and usefulness.
Understand what agent tools are.
Explore Composio tool integrations to empower agents.
Learn about custom tools integration on Composio.

What are AI agents?

We now have a brief idea about agents, let’s dig a bit more. So, the Agents are pieces of software that can dynamically interact with their environment, and the AI in the term “AI Agents” refers to Large Language Models (LLMs) or Large Multi-modal Models (LMMs).

LLMs possess great reasoning ability (thanks to extensive training in reasoning tasks). This enables them to analyze a complex task step-by-step. When the LLMs have access to the right tools, they can break down a problem statement and use the right tools to execute tasks as and when needed. The best example of this would be the ChatGPT app itself. It has access to code interpreters, the Internet, and DallE. Based on the given query, it decides which tool to use. If you ask it to create an image, it will use Dalle, for executing codes, it will choose code interpreter. However, the agents do not always need to be in a chat app. We can use external apps like Discord, Slack, GitHub, etc to trigger an agentic workflow. For instance, an agent with Slack and Notion integration will trigger when a message is sent in the Slack channel. The agent will pick up the task, execute it, write it to a Notion doc, and return a confirmation message in the Slack channel.

So, AI agents are LLMs augmented with tools and goals.

What are Agent tools?

Tools are software that enables the LLMs to execute a given task. They are interfaces for LLMs allowing them to interact with the external environment. Tools provide the agency for LLMs to carry out a given task. It's like painting; you need good brushes, colors, and a canvas to make something great.

With this in mind, Composio offers over 150 agent toolkits, each packed with built-in actions and triggers. It's like having a fully stocked art supply store at your disposal. This ensures that whatever the project, you've got the tools needed to tackle it efficiently.

Composio’s Comprehensive Toolkit

Imagine you're managing multiple projects across different platforms. Normally, this could involve a lot of manual coordination and checking in and might become tedious and time-consuming. That's where Composio's tools come into play. For instance, you could set up an automation that syncs your project tasks between GitHub and a project management tool like Trello or Asana. Whenever an issue is updated in GitHub, it updates your project board.

Or you can use Composio’s Typeform and Google Sheet integrations to automate user feedback collection and update it to Google Sheet for further downstream tasks like CRM integration for lead management, trend analysis, etc.

These are only a few examples. You can integrate multiple Composio tools across the categories with the LLM frameworks of your choice to automate complex workflows.

Check out the official tools supported here in the Tools Catalog.

For a detailed look at how Composio operates, read this walk-through guide, where Composio’s Slack and Notion integration is used to create an AI research assistant.

Build an AI Research Assistant Using CrewAI and Composio

Custom AI Tools

Composio also gives developers the freedom to build custom integrations for specific needs. All you need to do is follow a few structured steps using the OpenAPI Spec. Here's how to get started:

Create or Obtain OpenAPI Spec : Begin by acquiring the OpenAPI Spec for the application you want to integrate with Composio. If your chosen application lacks an OpenAPI Spec, you can create one using the Swagger Editor.
Create the integrations.yaml File : Prepare an integrations.yaml file using the provided base template. This file should be customized to include the authentication schemes suitable for the tool you are integrating, detailing essential aspects such as the application's name, description, and authentication methods.
Fill Out Authentication Details :
- Select the appropriate authentication method—OAuth1, OAuth2, API-KEY, or BASIC—based on what the custom tool supports.
- For tools like GitHub, utilize OAuth2 and include necessary details such as authorization_url , token_url , default_scopes , and other relevant parameters.
Push and Copy Repository URL : Once your integrations.yaml file is ready, push the changes to your repository and copy the URL of this repository.
Add Your Custom Tool on Composio :
- Navigate to the settings page on Composio.
- Access the "Add Open API spec" section.
- Upload both the OpenAPI Spec file and the integrations.yaml file.
- Initiate the integration by clicking on the "Start import" button.
Test Your Custom Tool on Composio :
- Return to the tools catalog on Composio.
- Locate and select the newly created tool.
- Connect your account to use the tool and ensure it functions as expected.

With these steps, you can efficiently integrate and automate your workflows using Composio, harnessing the full potential of its expansive toolkit to meet your specific project needs.

Here’s a quick video for adding a custom tool to Composio.

Conclusion

AI Agents are inevitable and in the future, it is safe to assume, that most software systems will integrate AI agents in one way or another, From automating mundane day-to-day tasks to handling complex enterprise operations, the scope is vast. Composo provides a robust platform designed to meet these needs. With 150+ production-ready agent tools with many actions and triggers developers can create reliable and useful AI agents that work.

Tool sets from different categories like Productivity, SDE, CRM, social media, marketing, design, and more can be tailored to specific tasks, ensuring versatility across all business functions. Whether automating communication channels, code deployments, managing customer relationships, enhancing social media engagement, driving marketing campaigns, or facilitating design processes, Composio’s tools adapt to the unique demands of each domain. This will allow businesses to operate more efficiently, reduce costs, and focus human efforts on more creative tasks.

Frequently Asked Question about AI Agent Tools

What are AI agents?

Ans. The AI agents are LLMs augmented with specific tools and goals that enable them to carry out complex tasks autonomously.

What are Agent tools?

Ans. Agent tools are software that serves as the interface between Large Language Models (LLMs) and external applications, enabling the completion of tasks.

What is Composio?

Ans. Composio is a platform that provides production-ready tool integration with LLM frameworks to build reliable AI agents for automating complex workflows.

Optimising Function Calling (GPT4 vs Opus vs Haiku vs Sonnet)

Soham Ganatra — Sun, 12 May 2024 09:06:32 +0000

Code: https://github.com/SamparkAI/Composio-Function-Calling-Benchmark/

In the last blog, we introduced the ClickUp function calling benchmark and experimented with different optimisation approaches for improving function calling using gpt-4-turbo-preview.

This time, we wanted to check a selection of other models, which might or might not claim to be superior in performance 😅. We also wanted to make our benchmark test more generalised to find compatible optimisation approaches to specific models for function calling.

Optimisation Techniques

As function calling is a new concept, and not much literature is available, we checked different experiments by the community. From these and our intuition, we realised techniques like flattening the schema structure, making system prompts more focused on function calls, improving the function names, descriptions, parameter descriptions, adding examples, etc. will enhance the function calling performance. So, we decided on this elaborate experiment. To list the methods we experimented with:

No System Prompt: Only the problem statement
Flattening Schema : All the hierarchical parameters are flattened to a shallow tree structure
Flattened Schema + Simple System Prompt : Added a simple system prompt mentioning that function calling needs to be used
Flattened Schema + Focused System Prompt : Added characterisation on its role in solving function calling problems.
Flattened Schema + Focused System Prompt + Function Name Optimised : The function names were elaborated.
Flattened Schema + Focused System Prompt + Function Description Optimised : Explained the descriptions clearly.
Flattened Schema + Focused System Prompt containing Schema summary : Added summarised version of all function schema to the system prompts
Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimised : Summarised function schema in system prompt, with elaborated function names.
Flattened Schema + Focused System Prompt containing Schema summary + Function Description Optimised : Summarised function schema in system prompt, with clearly explained function descriptions.
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimised : Additionally, the description of the parameters was improved
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimised + Function Call examples added : Examples of function calls were added along with function descriptions.
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimised + Function Parameter examples added: Examples of parameter values were added to parameter descriptions.

OpenAI Models

As we checked gpt-4-turbo-preview in the previous experiment, we wanted to test the performance of both its predecessor, gpt-4-0125-preview, and its successor gpt-4-turbo. As we have seen before, even though the next-generation models are pretty advanced in benchmark scores, they are often not better in an all-encompassing way. So, comparing with our previous scores, here is the performance of these two OpenAI models.

Optimization Approach	gpt-4-turbo-preview	gpt-4-turbo	gpt-4-0125-preview
No System Prompt	0.36	0.36	0.353
Flattening Schema	0.527	0.487	0.533
Flattened Schema + Simple System Prompt	0.553	0.533	0.54
Flattened Schema + Focused System Prompt	0.633	0.633	0.64
Flattened Schema + Focused System Prompt + Function Name Optimized	0.553	0.607	0.587
Flattened Schema + Focused System Prompt + Function Description Optimized	0.633	0.66	0.673
Flattened Schema + Focused System Prompt containing Schema summary	0.64	0.553	0.64
Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimized	0.70	0.707	0.686
Flattened Schema + Focused System Prompt containing Schema summary + Function Description Optimized	0.687	0.707	0.68
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized	0.767	0.767	0.787
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Call examples added	0.693	0.6	0.707
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Parameter examples added	0.787	0.693	0.787

So we can see that, in most cases, the original gpt-4-0125-preview performed better. When we added more examples of parameters, in the parameter descriptions, gpt-4-0125-preview consistently performed better than the other models. In the cases where we optimised or elaborated only the function names and descriptions, we see the gpt-4-turbo seems to do better.

Anthropic Models

Next, we did the same experimentation with Anthropic's Claude-3 series of models. Claude-3 has three models, haiku, sonnet and opus, in increasing order of parameters and performance(at least that is expected).

When we tried these models, we discovered that Claude models, especially opus, is very costly, and very slow!! Running the whole benchmark with GPT-4 for one run took ~4 minutes, while claude-3-opus-20240229took around ~13 minutes. claude-3-haiku-20240307 and claude-3-sonnet-20240229 took about ~3 minutes and ~6 minutes, respectively.

We faced several problems while running the benchmark for clause models. For example, unlike OpenAI models, Claude models' most function/tool calls are preceded by a block of thoughts text, which required some changes in our benchmark code.

Then, while we ran it, we found that the scores were incredibly low in some cases and kind of absurd.

After some digging, we found that sometimes the models predicted the boolean variables as strings, like True was predicted as "True" and False was predicted as "False". We added a fix for that and then finally obtained our results.

Optimization Approach	claude-3-haiku-20240307	claude-3-sonnet-20240229	claude-3-opus-20240229
No System Prompt	0.48	0.6	0.42
Flattening Schema	0.5	0.58	0.5
Flattened Schema + Simple System Prompt	0.54	0.6	0.54
Flattened Schema + Focused System Prompt	0.54	0.54	0.54
Flattened Schema + Focused System Prompt + Function Name Optimized	0.52	0.62	0.52
Flattened Schema + Focused System Prompt + Function Description Optimized	0.52	0.6	0.52
Flattened Schema + Focused System Prompt containing Schema summary	0.46	0.62	0.46
Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimized	0.5	0.64	0.46
Flattened Schema + Focused System Prompt containing Schema summary + Function Description Optimized	0.5	0.6	0.6
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized	0.58	0.74	0.58
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Call examples added	0.6	0.76	0.64
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Parameter examples added	0.68	0.76	0.66

Now I know.., you think they must have messed up the haiku and opus models scores. But believe me, I am equally surprised and can ensure that we ran the opus benchmark multiple times and checked the code quite a lot for probable bugs.

opus, sonnet and haiku initially outperform GPT models in non-optimized scenarios. sonnet consistently outpaces haiku, as expected. Had opus maintained this trend, it likely would have surpassed Openai models.

Finally

OpenAI models, especially gpt-4-turbo-preview, are still the better choice regarding performance and cost.

Optimization Approach	gpt-4-turbo-preview	gpt-4-turbo	gpt-4-0125-preview	claude-3-haiku-20240307	claude-3-sonnet-20240229	claude-3-opus-20240229
No System Prompt	0.36	0.36	0.353	0.48	0.6	0.42
Flattening Schema	0.527	0.487	0.533	0.5	0.58	0.5
Flattened Schema + Simple System Prompt	0.553	0.533	0.54	0.54	0.6	0.54
Flattened Schema + Focused System Prompt	0.633	0.633	0.64	0.54	0.54	0.54
Flattened Schema + Focused System Prompt + Function Name Optimized	0.553	0.607	0.587	0.52	0.62	0.52
Flattened Schema + Focused System Prompt + Function Description Optimized	0.633	0.66	0.673	0.52	0.6	0.52
Flattened Schema + Focused System Prompt containing Schema summary	0.64	0.553	0.64	0.46	0.62	0.46
Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimized	0.70	0.707	0.686	0.5	0.64	0.46
Flattened Schema + Focused System Prompt containing Schema summary + Function Description Optimized	0.687	0.707	0.68	0.5	0.6	0.6
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized	0.767	0.767	0.787	0.58	0.74	0.58
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Call examples added	0.693	0.6	0.707	0.6	0.76	0.64
Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Parameter examples added	0.787	0.693	0.787	0.68	0.76	0.66

All the codes are organised at: https://github.com/SamparkAI/Composio-Function-Calling-Benchmark/.

We're currently deciding which models to test next—perhaps Mistral or open-source options like Functionary or NexusRaven. Check out our repository and try running these models to compare their performance. If you have questions or suggestions, please submit a pull request. Thank you!

Building CrewAI Agents to turn TODO in Code to Linear Issues

Soham Ganatra — Mon, 01 Apr 2024 12:56:32 +0000

Introduction

I use #TODO comments in commit messages to flag future tasks. Tracking these tasks manually is inefficient and error-prone.

I am trying to build using CrewAI an Agent that converts these ** TODOs from Github code into Linear tasks**. Also, I added a trigger to run agent every time I push a commit.

We also tried doing this with Autogen earlier, but did not connect triggers and I also wanted to measure accuracy on both implementations so will be writing about it later.

_TL;DR: [Show me the Code_](https://blog.composio.dev/avoid-any-missed-todos-using-crewai/#final-code)

Setup

!pip install crewai composio_crewai --quiet

Initialising Agents

Configure LLM: Use the gpt-4-1106-preview model. Provide the OpenAI environment key through an environment variable or modify the code directly.
Set Up CrewAI Agent: Create it with a system prompt and the LLM configuration. Adjust as needed to improve outcomes.

Goal = Take action on Linear via Linear APIs based on Github commits. Linear Project to create issues: Hermes

Backstory= You are an AI Agent with access to Github and Linear and wants to keep the Github Code TODOs and Linear in Sync. Linear Project to create issues: Hermes

from crewai import Agent, Task
from composio_crewai import ComposioToolset, App, Action
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(openai_api_key="sk-uPYkz***", model="gpt-4-0613")

composioCrewAI = ComposioToolset([App.GITHUB, App.LINEAR])

agent = Agent(role='Github-Linear TODO Agent',
  goal="""Take action on Linear via Linear APIs based on Github commits. Linear Project to create issues: Hermes""",
  backstory="""You are an AI Agent with access to Github and Linear and wants to keep the Github Code TODOs and Linear in Sync. Linear Project to create issues: Hermes""",
  verbose=True,
  tools=composioCrewAI,
  llm=llm)

Agent Initialisation Code giving power to agent to interact with Github & Linear

Allowing agents to interact with Tools is done using below code

composioCrewAI = ComposioToolset([App.GITHUB, App.LINEAR])

tools=composioCrewAI

Enabling Triggers

Triggers start your agent when an event happens in a connected app. On every commit, github will send a webhook to run the agent.

Create a Server: Use tools like ngrok to expose the local server/poty for testing.

ngrok http 2000

Set Callback URL: Use composio-cli to set a global callback URL pointing to your exposed server.

composio-cli set global-trigger-callback "<https://try.ngrok.io/asdfjh2>"

Replace your ngrok url or server url to get webhooks.

Configure the Trigger: Use composio-cli commands to enable and configure the github_commit_event trigger.

composio-cli enable-trigger github_commit_event

For our use-case we only need github_commit_event , so after running the above command it will ask for some details to enable the triggers.

> Enabling trigger: github_commit_event...

Owner (Owner of the repository): utkarsh-dixit
Repo (Repository name): speedy

✔ Trigger enabled successfully!


from flask import Flask, request

app = Flask( __name__ )

@app.route('/webhook', methods=['POST'])
def webhook():
  task = Task(description=f"""Given the following Github patch: {request.json}, create a Linears issues for the TODOs in the patch and assign them to right people. Please read the patch carefully and create issues for the new TODOs only, avoid removed/old TODOs.""", expected_output="A LINEAR issue created for the commit", agent=agent)
  task.execute()
  return 'Payload received and processed', 200

if __name__ == ' __main__':
  app.run(port=2000, debug=True)

Python code to receive the webhook and execute the agent

Action! 🚀

After setup, commit a TODO in your GitHub repository. The AI agent should identify the commit, parse out TODOs, and create corresponding issues in Linear, assigning them accordingly.

Payload from Composio:
{'trigger_id': 'github_commit_event', 'connection_id': '714ee37a-fb3d-4ef4-a2e8-abde5ce228ca', 'payload': {'id': '307cc46d8c7a3ba56e9c855dce338542b26414a9', 'message': 'Update README.md', 'timestamp': '2024-04-01T13:56:03+05:30', 'author': 'kaavee315', 'url': '<https://github.com/kaavee315/ML_assignment/commit/307cc46d8c7a3ba56e9c855dce338542b26414a9>'}}

> Entering new CrewAgentExecutor chain...
I should fetch the patch file for the given commit to identify any TODOs in the code.

Action: 
github_get_patch_for_commit

Action Input: 
{"owner": "kaavee315", "repo": "ML_assignment", "sha": "307cc46d8c7a3ba56e9c855dce338542b26414a9"}


{'execution_details': {'executed': True}, 'response_data': {'patch': 'From 307cc46d8c7a3ba56e9c855dce338542b26414a9 Mon Sep 17 00:00:00 2001\\nFrom: Karan Vaidya <kaavee315@gmail.com>\\nDate: Mon, 1 Apr 2024 13:56:03 +0530\\nSubject: [PATCH] Update README.md\\n\\n---\\n README.md | 1 +\\n 1 file changed, 1 insertion(+)\\n\\ndiff --git a/README.md b/README.md\\nindex 6167076..6c47ce7 100644\\n--- a/README.md\\n+++ b/README.md\\n@@ -1,3 +1,4 @@\\n ML Assignment Readme\\n Change\\n TODO(karan): Fresh set of TODOs\\n+TODO(Utkarsh): Merge the sdk to master\\n'}}

Thought: 
Now that I have the patch, I should parse it to identify any new TODOs that have been added in this commit.

Action: 
None

Action Input: 
None

Action: 
linear_create_linear_issue
------------ ... ---------------------

Final Answer: 
The Linear issue created for the commit is: 

ID: 0d3329f7-1fed-4f01-873c-e44f01cc3e85
Title: Merge the sdk to master
Description: This task is related to the commit 307cc46d8c7a3ba56e9c855dce338542b26414a9 in the ML_assignment repository. The TODO was added by Utkarsh.

> Finished chain.
127.0.0.1 - - [01/Apr/2024 13:57:20] "POST /webhook HTTP/1.1" 200 -

CrewAI Agent Output when in Action!

Final Code

from flask import Flask, request
from crewai import Agent, Task
from composio_crewai import ComposioToolset, App, Action
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(openai_api_key="sk-uPYkz***", model="gpt-4-0613")

composioCrewAI = ComposioToolset([App.GITHUB, App.LINEAR])

agent = Agent(role='Github-Linear TODO Agent',
  goal="""Take action on Linear via Linear APIs based on Github commits. Linear Project to create issues: Hermes""",
  backstory="""You are an AI Agent with access to Github and Linear and wants to keep the Github Code TODOs and Linear in Sync. Linear Project to create issues: Hermes""",
  verbose=True,
  tools=composioCrewAI,
  llm=llm)

from flask import Flask, request

app = Flask( __name__ )

@app.route('/webhook', methods=['POST'])
def webhook():
  task = Task(description=f"""Given the following Github patch: {request.json}, create a Linears issues for the TODOs in the patch and assign them to right people. Please read the patch carefully and create issues for the new TODOs only, avoid removed/old TODOs.""", expected_output="A LINEAR issue created for the commit", agent=agent)
  task.execute()
  return 'Payload received and processed', 200

if __name__ == ' __main__':
  app.run(port=2000, debug=True)

Complete Code

Future Plans

I plan to handle small todos directly via agent and let it create PRs for the same. I will publish my results soon in next couple of weeks.

Join our Discord Community and check out what we're building!

Create Issues from Code Commits - using Autogen

Soham Ganatra — Wed, 27 Mar 2024 05:45:17 +0000

Join our Discord Community and check out what we're building!

Introduction

I use #TODO comments in commit messages to flag future tasks. Tracking these tasks manually is inefficient and error-prone.

Let's try to build using Autogen Agents - Convert these TODOs from Github code commits into Linear tasks.

Background

Autogen Agentic Framework allows for natural language understanding and action-taking and is super easy to start with.

Composio is a very easy way to connect your Autogen Agents with real life tools. We will be using it to connect our agent to Linear and Github.

TL;DR: Show me the Code

Setup

!pip install pyautogen composio_autogen --quiet

Installing the relevant packages

Initialising Agents

To initialise, perform the following steps:

Configure LLM: Use the gpt-4-1106-preview model. Provide the OpenAI environment key through an environment variable or modify the code directly.
Set Up Assistant Agent: Create it with a system prompt and the LLM configuration. Adjust as needed to improve outcomes.
Establish User Proxy Agent: These agents simulate users, interacting with Autogen agents. They include a function to end the process upon detecting the word "Terminate," as specified in the system prompt.

import os

from autogen import AssistantAgent, UserProxyAgent

from composio_autogen import App, Action, ComposioToolset

llm_config = {
    "config_list": [
        {
            "model": "gpt-4-1106-preview",
            "api_key": os.environ.get(
                "OPENAI_API_KEY", "sk-123131 ****"
            ),
        }
    ]
}

super_agent = AssistantAgent(
    "chatbot",
    system_message="""You are a super intelligent personal assistant.
    You have been given a set of tools that you are supposed to choose from.
    You decide the right tool and execute it to achieve your task.
    Reply TERMINATE when the task is done or when user's content is empty""",
    llm_config=llm_config,
)

# create a UserProxyAgent instance named "user_proxy"
user_proxy = UserProxyAgent(
    "user_proxy",
    is_termination_msg=lambda x: x.get("content", "")
    and "TERMINATE" in x.get("content", ""),
    human_input_mode="NEVER", # Don't take input from User
    code_execution_config={"use_docker": False},
)

Agent Initialisation Code

Powering agents with Tools

We need to connect Github and linear to our agents and via function calling they ideally accomplish the task. I will be using Composio to do that.

Authorise Github and Linear connections, so Autogen Agents can interact with it.

pip install composio-autogen

composio-cli add github # Authorise Github 

composio-cli add linear # Authorise Linear

Adding Github and Linear Connections to Composio. (Run on Terminal)

# Initialise the Composio Tool Set
composio_tools = ComposioToolset()

# Register the authorised Applications, with our agent.
composio_tools.register_tools(
    tools=[App.LINEAR, App.GITHUB], caller=super_agent, executor=user_proxy
)

This allows our agent and user proxy to interact with external Applications

Executing the Task

Define the task so agent can execute it. 🚀

task = """For all the todos in my last commit of SamparkAI/Docs,
create a linear issue on project name hermes board and assign to right person"""

response = user_proxy.initiate_chat(super_agent, message=task)

print(response.chat_history)

And it works!

The issues have been successfully created on the Linear board for the project "Hermes" as follows:

1. **Extracted TODO from Docs commit**
   - Issue ID: `d578a113-a7ab-4084-9a76-5dbbd4f23b21`
   - Description: A new Linear issue created for the TODO from the latest commit in the SamparkAI/Docs repository.

2. **Evaluate documentation structure**
   - Issue ID: `0c710f8a-8037-4bf4-8ebd-a801531117b5`
   - Description: Evaluate and update the documentation structure based on the latest standards.

3. **Review and finalize API docs**
   - Issue ID: `961651df-ad3c-4b4c-b162-931eead8be19`
   - Description: Review the API documentation to ensure accuracy and completeness.

All issues have been assigned to the appropriate team member. The task is now complete.

Terminate

Output From Autogen

Final Code

DeepNote Link

#!pip install pyautogen composio_autogen --quiet

from autogen import AssistantAgent, UserProxyAgent
from composio_autogen import App, Action, ComposioToolset
import os

llm_config = {
    "config_list": [
        {
            "model": "gpt-4-turbo-preview",
            "api_key": os.environ.get("OPENAI_API_KEY", "sk-123***"),
        }
    ]
}

super_agent = AssistantAgent(
    "chatbot",
    system_message="""You are a super intelligent personal assistant.
    You have been given a set of tools that you are supposed to choose from.
    You decide the right tool and execute it to achieve your task.
    Reply TERMINATE when the task is done or when user's content is empty""",
    llm_config=llm_config,
)

# create a UserProxyAgent instance named "user_proxy"
user_proxy = UserProxyAgent(
    "user_proxy",
    is_termination_msg=lambda x: x.get("content", "")
    and "TERMINATE" in x.get("content", ""),
    human_input_mode="NEVER", # Don't take input from User
    code_execution_config={"use_docker": False},
)

# Initialise the Composio Tool Set
composio_tools = ComposioToolset()

# Register the preferred Applications, with right executor.
composio_tools.register_tools(
    tools=[App.LINEAR, App.GITHUB], caller=super_agent, executor=user_proxy
)

task = """For all the todos in my last commit of SamparkAI/Docs,
create a linear issue on project name hermes board and assign to right person"""

response = user_proxy.initiate_chat(super_agent, message=task)

print(response.chat_history)

Complete Code

Join our Discord Community and check out what we're building!

Improving Function Calling Accuracy

Soham Ganatra — Sat, 16 Mar 2024 09:19:09 +0000

Introduction

Large language models have recently been giving the ability to function-calling. Given the details(function-schema) of a number of functions, the LLM will be able to select and run the function with appropriate parameters, if the prompt demands for it. OpenAI’s GPT-4 is one of the best function-calling LLMs available for use. In addition to the GPT4, there are also open-source function calling LLMs like OpenGorilla, Functionary, NexusRaven and FireFunction that I will try and compare performance with. Example Function Calling Code can be found at OpenAI Function Calling Cookbook.

TLDR: Show me the results

Integration-Focused Agentic Function Calling

We are transitioning towards Agentic applications for more effective use of LLMs in our daily workflow. In this setup, each AI agent is designated a specific role, equipped with distinct functionalities, often collaborating with other agents to perform complex tasks.

To enhance user experience and streamline workflows, these agents must interact with the tools used by users and automate some functionalities. Currently, AI development allows agents to interact with various software tools to a certain extent through proper integration using software APIs or SDKs. While we can integrate these points into AI agents and hope for flawless operation, the question arises:

Are the common design of API endpoints compatible with Agentic Process Automation (APA)? Maybe we can redesign APIs to be more suitable to function calling?

Selecting Endpoints

We referenced the docs of ClickUp (Popular Task management App) and curated a selection of endpoints. We decided this due to the impracticality of expecting the LLM to choose from hundreds of endpoints, considering the limitation of context length.

**get_spaces** (team_id:string, archived:boolean)
create_space(team_id:string, name:string, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
get_space(space_id:string)
update_space(space_id:string, name:string, color:string, private:boolean, admin_can_manage:boolean, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
delete_space(space_id:string)
get_space_tags(space_id:string)
create_space_tag(space_id:string, tag:(name:string, tag_fg:string, tag_bg:string))
delete_space_tag(space_id:string, tag_name:string, tag:(name:string, tag_fg:string, tag_bg:string))

We converted them to the corresponding OpenAI function schema, which is available here. These were specifically selected as they combine endpoints with both flattened and nested parameters.

Creating Benchmark Dataset

To evaluate our approaches effectively, we require a benchmark dataset that is small and focuses specifically on the software-integration aspect of function-calling Language Models (LLMs).

Despite reviewing various existing function calling datasets, none were ideal for this article.

Consequently, we developed our own dataset called the ClickUp-Space dataset , which replicates real-world scenarios to some extent.

The prompts require one of eight selected functions to solve , ranging from simple to complex. Our evaluation will be based on how accurately the functions are called with the correct parameters. We also prepared code for assessing performance.

Next, we developed a problem set consisting of 50 pairs of prompts along with their respective function calling solutions.

[
  {
    "prompt": "As the new fiscal year begins, the management team at a marketing agency decides it's time to archive older projects to make way for new initiatives. They remember that one of their teams is called \"Innovative Solutions\" and operates under the team ID \"team123\". They want to check which spaces under this team are still active before deciding which ones to archive.",
    "solution": "get_spaces(team_id=\"team123\", archived=False)"
  },
  {
    "prompt": "Ella, the project coordinator, is setting up a new project space in ClickUp for the \"Creative Minds\" team with team ID \"cm789\". This space, named \"Innovative Campaigns 2023\", should allow multiple assignees for tasks, but keep due dates and time tracking disabled, as the initial planning phase doesn't require strict deadlines or time monitoring.",
    "solution": "create_space(team_id=\"cm789\", name=\"Innovative Campaigns 2023\", multiple_assignees=True, features=(due_dates=(enabled=False, start_date=False, remap_due_dates=False, remap_closed_due_date=False), time_tracking=(enabled=False)))"
  },
...
]

Measuring Baseline Performance

Initially, we wanted to assess GPT-4's performance independently, without any system prompts.

fcalling_llm = lambda fprompt : client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {
      "role": "system",
      "content": """"""
    },
    {
      "role": "user",
      "content": prompt
    },
  ],
  temperature=0,
  max_tokens=4096,
  top_p=1,
  tools=tools,
  tool_choice="auto"
)

response = fcalling_llm(bench_data[1]["prompt"])

We set the temperature to 0 to make the results more predictable. The experiment was repeated three times, resulting in an average accuracy of 0.3 , which is below our target.

Benchmark without System Prompt - [Code Here]

Flattening the Parameters

As mentioned earlier, some functions require output parameters in a nested structure. An example below-

{
    "name": "create_space",
    "description": "Add a new Space to a Workspace.",
    "parameters": {
      "type": "object",
      "properties": {
        "team_id": {
          "type": "string",
          "description": "The ID of the team"
        },
        "name": {
          "type": "string",
          "description": "The name of the new space"
        },
        "multiple_assignees": {
          "type": "boolean",
          "description": "Enable or disable multiple assignees for tasks within the space"
        },
        "features": {
          "type": "object",
          "description": "Enabled features within the space",
          "properties": {
            "due_dates": {
              "type": "object",
              "description": "Due dates feature settings",
              "properties": {
                "enabled": { "type": "boolean" },
                "start_date": { "type": "boolean" },
                "remap_due_dates": { "type": "boolean" },
                "remap_closed_due_date": { "type": "boolean" }
              }
            },
            "time_tracking": {
              "type": "object",
              "description": "Time tracking feature settings",
              "properties": {
                "enabled": { "type": "boolean" }
              }
            }
          }
        }
      },
      "required": ["team_id", "name", "multiple_assignees", "features"]
    }
  }

Based on our experience with LLMs, we believe that while the model (GPT-4) has been optimised for structured output, a complex output structure may actually reduce performance and accuracy.

Therefore, we programmatically flatten the parameters.

Above function flattened will look as follows:

{
        "description": "Add a new Space to a Workspace.",
        "name": "create_space",
        "parameters": {
            "properties": {
                "features __due_dates__ enabled": {
                    "description": "enabled __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __due_dates__ remap_closed_due_date": {
                    "description": "remap_closed_due_date __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __due_dates__ remap_due_dates": {
                    "description": "remap_due_dates __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __due_dates__ start_date": {
                    "description": "start_date __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __time_tracking__ enabled": {
                    "description": "enabled __Time tracking feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "multiple_assignees": {
                    "description": "Enable or disable multiple assignees for tasks within the space__",
                    "type": "boolean"
                },
                "name": {
                    "description": "The name of the new space__",
                    "type": "string"
                },
                "team_id": {
                    "description": "The ID of the team__",
                    "type": "string"
                }
            },
            "required": [
                "team_id",
                "name",
                "multiple_assignees",
                "features __due_dates__ enabled",
                "features __due_dates__ start_date",
                "features __due_dates__ remap_due_dates",
                "features __due_dates__ remap_closed_due_date",
                "features __time_tracking__ enabled"
            ],
            "type": "object"
        }
    }

We attached the parameter name to its parent parameters (ex:features __due_dates__ enabled ) by __ , and joined the parameter descriptions to its predecessor ( Ex:enabled__due_dates feature settings __enabled features within the space__ ).

Benchmark after Flattening Schema [Code Here]

Adding System Prompt

We didn't have a system prompt before, so the LLM wasn't instructed on its role or interacting with ClickUp APIs.

Let's add a simple system prompt now.

System

from openai import OpenAI
client = OpenAI()

fcalling_llm = lambda fprompt : client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {
      "role": "system",
      "content": """
You are an agent who is responsible for managing various employee management platform, 
one of which is CliuckUp.

When you are presented with a technical situation, that a person of a team is facing, 
you must give the soulution utilizing your functionalities. 
"""
    },
    {
      "role": "user",
      "content": fprompt
    },
  ],
  temperature=0,
  max_tokens=4096,
  top_p=1,
  tools=tools,
  tool_choice="auto"
)

response = fcalling_llm(bench_data[1]["prompt"])

Code Change

Benchmark with System Prompt - [Code Here]

Improving System Prompt

Now that we've observed an improvement in performance by adding a system prompt, we will enhance its detail to assess if the performance increase is sustained.

You are an agent who is responsible for managing various employee management platform, 
one of which is CliuckUp. 

You are given a number of tools as functions, you must use one of those tools and fillup 
all the parameters of those tools ,whose answers you will get from the given situation.

When you are presented with a technical situation, that a person of a team is facing, 
you must give the soulution utilizing your functionalities. 

First analyze the given situation to fully anderstand what is the intention of the user,
what they need and exactly which tool will fill up that necessity.

Then look into the parameters and extract all the relevant informations to fillup the 
parameter with right values.

New System Prompt

Seems to work great! [Code Here]

Benchmark after Flattened Schema + Improved System Prompt

Adding Schema Summary in Schema Prompt

Let's enhance the system prompts further by focusing on the functions and their purpose, building upon the clear instructions provided for the LLM's role.

Here is a concise summary of the system functions which we add to prompt.

get_spaces - View the Spaces available in a Workspace.
create_space - Add a new Space to a Workspace.
get_space - View the details of a specific Space in a Workspace.
update_space - Rename, set the Space color, and enable ClickApps for a Space.
delete_space - Delete a Space from your Workspace.
get_space_tags - View the task Tags available in a Space.
create_space_tag - Add a new task Tag to a Space.
delete_space_tag - Delete a task Tag from a Space.

Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary. [Code Here]

Optimising Function Names

Now, let's improve the schemas starting with more descriptive function names.

schema_func_name_dict = {
    "get_spaces": "get_all_clickup_spaces_available",
    "create_space": "create_a_new_clickup_space",
    "get_space": "get_a_specific_clickup_space_details",
    "update_space": "modify_an_existing_clickup_space",
    "delete_space": "delete_an_existing_clickup_space",
    "get_space_tags": "get_all_tags_of_a_clickup_space",
    "create_space_tag": "assign_a_tag_to_a_clickup_space",
    "delete_space_tag": "remove_a_tag_from_a_clickup_space",
}

Replacing Current Function Names with Above

optimized_schema = []
for sc in flattened_schema:
    temp_dict = sc.copy()
    temp_dict["name"] = schema_func_name_dict[temp_dict["name"]]
    optimized_schema.append(temp_dict)

Replace names in the schema Code

Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + Function Names Optimised [Code Here]

Optimising Function Description

Here, we focus on the function descriptions and make those more clear and focused.

schema_func_decription_dict = {
    "get_spaces": "Retrives information of all the spaces available in user's Clickup Workspace.",
    "create_space": "Creates a new ClickUp space",
    "get_space": "Retrives information of a specific Clickup space",
    "update_space": "Modifies name, settings the Space color, and assignee management Space.",
    "delete_space": "Delete an existing space from user's ClickUp Workspace",
    "get_space_tags": "Retrives all the Tags assigned on all the tasks in a Space.",
    "create_space_tag": "Assigns a customized Tag in a ClickUp Space.",
    "delete_space_tag": "Deletes a specific tag previously assigned in a space.",
}

New Descriptions

And change schema with:

optimized_schema = []
for sc in flattened_schema:
    temp_dict = sc.copy()
    temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
    optimized_schema.append(temp_dict)

Changing Schema

Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + Function Names Optimised + Function Descriptions Optimised [Code Here]

Optimising Function Parameter Descriptions

Earlier, we flattened the schema by stacking nested parameters' descriptions with their parents' descriptions until they were in a flattened state.

Let's now replace them with:

schema_func_params_dict = {
    'create_space': {
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. Default: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. Default: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. Default: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
        'name': 'The name of the new space to create',
        'team_id': 'The ID of the team'
        },
    'create_space_tag': {
        'space_id': 'The ID of the space',
        'tag__name': 'The name of the tag to assign',
        'tag__tag_bg': 'The background color of the tag to assign',
        'tag__tag_fg': 'The foreground(text) color of the tag to assign'
        },
    'delete_space': {
        'space_id': 'The ID of the space to delete'
        },
    'delete_space_tag': {
        'space_id': 'The ID of the space',
        'tag__name': 'The name of the tag to delete',
        'tag__tag_bg': 'The background color of the tag to delete',
        'tag__tag_fg': 'The foreground color of the tag to delete',
        'tag_name': 'The name of the tag to delete'
        },
    'get_space': {
        'space_id': 'The ID of the space to retrieve details'
        },
    'get_space_tags': {
        'space_id': 'The ID of the space to retrieve all the tags from'
        },
    'get_spaces': {
        'archived': 'A flag to decide whether to include archived spaces or not. Default: True',
        'team_id': 'The ID of the team'
        },
    'update_space': {
        'admin_can_manage': 'A flag to determine if the administrator can manage the space or not. Default: True',
        'color': 'The color used for the space',
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. Default: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. Default: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. Default: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
        'name': 'The new name of the space',
        'private': 'A flag to determine if the space is private or not. Default: False',
        'space_id': 'The ID of the space'
        }
        }

And modifying the previous schema:

optimized_schema = []
for sc in flattened_schema:
    temp_dict = sc.copy()
    temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
    for func_param_name, func_param_description in schema_func_params_dict[temp_dict["name"]].items():
        sc["parameters"]["properties"][func_param_name]["description"] = func_param_description
    optimized_schema.append(temp_dict)

Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + (Function Names + Function Descriptions + Parameter Descriptions) Optimised [Code Here]

Wow! For all runs we got score equal to or over 75%.

Adding Examples of Function Calls

LLMs perform better when response examples are provided. Let's aim to give examples and analyse the outcomes.

To start, we can provide examples of each function call along with the corresponding function description in the schema to illustrate this concept.

schema_func_decription_dict = {
    "get_spaces": """\
Retrives information of all the spaces available in user's Clickup Workspace. Example Call:

python
get_spaces({'team_id': 'a1b2c3d4', 'archived': False})

    """,
    "create_space": """\
Creates a new ClickUp space. Example Call:

python
create_space ({
'team_id': 'abc123',
'name': 'NewWorkspace',
'multiple_assignees': True,
'features due_dates enabled': True,
'features due_dates start_date': False,
'features due_dates remap_due_dates': False,
'features due_dates remap_closed_due_date': False,
'features time_tracking enabled': True
})

""",
    "get_space": """\
Retrives information of a specific Clickup space. Example Call:

python
get_space({'space_id': 's12345'})

""",
    "update_space": """\
Modifies name, settings the Space color, and assignee management Space. Example Call:

python
update_space({
'space_id': 's12345',
'name': 'UpdatedWorkspace',
'color': '#f0f0f0',
'private': True,
'admin_can_manage': False,
'multiple_assignees': True,
'features due_dates enabled': True,
'features due_dates start_date': False,
'features due_dates remap_due_dates': False,
'features due_dates remap_closed_due_date': False,
'features time_tracking enabled': True
})


""",
    "delete_space": """\
Delete an existing space from user's ClickUp Workspace. Example Call:

python
delete_space({'space_id': 's12345'})

    """,
    "get_space_tags": """\
Retrives all the Tags assigned on all the tasks in a Space. Example Call:

python
get_space_tags({'space_id': 's12345'})

""",
    "create_space_tag": """\
        Assigns a customized Tag in a ClickUp Space. Example Call:

python
create_space_tag({
'space_id': 's12345',
'tag_name': 'Important',
'tagtag_bg': '#ff0000',
'tag_tag_fg': '#ffffff'
})

        """,
    "delete_space_tag": """\
    Deletes a specific tag previously assigned in a space. Example Call:

python
delete_space_tag({
'space_id': 's12345',
'tag_name': 'Important',
'tag_name': 'Important',
'tagtag_bg': '#ff0000',
'tag_tag_fg': '#ffffff'
})

    """,
}

And when we run the benchmark,

Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + (Function Names + Function Descriptions + Parameter Descriptions) Optimised + Function Call Examples Added [Code Here]

Sadly, the score seems to degrade!

Adding Example Parameter Values

Since the function call example for addition did not work, let's now try adding sample values to the function parameters to provide a clearer idea of the values to input. We will adjust the descriptions of our function parameters accordingly.

schema_func_params_dict = {
    'create_space': {
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
        'name': 'The name of the new space to create \nExample: \'NewWorkspace\', \'TempWorkspace\'',
        'team_id': 'The ID of the team \nExample: \'abc123\', \'def456\' '
        },
    'create_space_tag': {
        'space_id': 'The ID of the space \nExample: \'abc123\', \'def456\'',
        'tag__name': 'The name of the tag to assign \nExample: \'NewTag\', \'TempTag\'',
        'tag__tag_bg': 'The background color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\'',
        'tag__tag_fg': 'The foreground(text) color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\''
        },
    'delete_space': {
        'space_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\''
        },
    'delete_space_tag': {
        'space_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\'',
        'tag__name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\'',
        'tag__tag_bg': 'The background color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
        'tag__tag_fg': 'The foreground color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
        'tag_name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\''
        },
    'get_space': {
        'space_id': 'The ID of the space to retrieve details \nExample: \'abc123\', \'def456\''
        },
    'get_space_tags': {
        'space_id': 'The ID of the space to retrieve all the tags from \nExample: \'abc123\', \'def456\''
        },
    'get_spaces': {
        'archived': 'A flag to decide whether to include archived spaces or not \nExample: True, False. Default: True',
        'team_id': 'The ID of the team \nExample: \'abc123\', \'def456\''
        },
    'update_space': {
        'admin_can_manage': 'A flag to determine if the administrator can manage the space or not \nExample: True, False. Default: True',
        'color': 'The color used for the space \nExample: \'#FF0000\', \'#00FF00\'',
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. Default: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
        'name': 'The new name of the space \nExample: \'NewWorkspace\', \'TempWorkspace\'',
        'private': 'A flag to determine if the space is private or not \nExample: True, False. Default: False',
        'space_id': 'The ID of the space to update \nExample: \'abc123\', \'def456\''
        }
        }

And using these in the function schema, we get:

Flattened Schema + Improved System Prompt containing Schema Summary + (Function Names + Function Descriptions + Parameter Descriptions) Optimised + Function Call Examples Added + Adding Example Parameter Values [Code Here]

Wow! The intuition of adding example pays off.

Compiling the Results

To summarise all our examples, and their results:

We experimented with strategies to improve the function calling ability of LLMs, specifically for Agentic Software integrations. Starting from a baseline score of 36%, we boosted performance to an average of 78%. The insights shared in this article aim to enhance your applications as well.

Moreover, we discovered a key distinction between general function calling and function calling for software integrations. In general function calls, even with multiple functions, they operate independently and non-linearly when executing an action. However, in software integrations, functions must follow a specific sequence to effectively accomplish an action.

All the codes of this articles are available here. Thank you!

Further Experiments & Challenges

We have been experimenting on this for a while and are planning to write further on

Parallel Function calling accuracy
Sequential Function Call Planning Accuracy (RAG + CoT)
Comparison with Open Source Function Calling Models (OpenGorilla, Functionary, NexusRaven, and FireFunction)

When dealing with integration-centric function calls, the process can be complex. For instance, the agent may need to gather data from various endpoints like get_spaces_members, get_current_active_members, and get_member_whose_contract_is_over before responding with the update_member_list function.

This means there could be additional data not yet discussed in the conversation that requires the agent to fetch from other endpoints silently to formulate a complete response.

Optimisations like this are crucial aspect of our efforts at Composio to enhance the smoothness of Agentic integrations. If you are interested in improving accuracy of your agents connect with us at mailto: hello@composio.dev.

Subscribe if you are interested in learning more!

Better interface between Agents <--> Tools

Soham Ganatra — Sat, 02 Mar 2024 15:26:26 +0000

What are we working on?

We’re on the cusp of a future where multiple AI agents will soon work together and interact with diverse tools for complex tasks. The rise in platforms for AI workflow and agent orchestration signals this shift. Yet, these platforms face challenges: limited scope, variety, and reliability of integrations. Developers often grapple with authentication and API specifications to implement basic agentic use cases. This hampers the seamless communication between agents and tools, a cornerstone for enabling real-world applications.

Our goal is to simplify this. By managing your integrations, we let you focus on creating your agentic platform. We’re crafting the vital integration layer for AI agents, smoothing out the rough edges for innovation.

What can we offer now?

Our SDK offers over 90 connectors optimized for LLM tool actions and triggers. Enjoy a customizable, white-label authentication experience. We also offer best-in-class reliability and detailed observability for each API call, saving you the hassle of spending sleepless nights while debugging the faulty API calls.