<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Composio</title>
    <description>The latest articles on Forem by Composio (@composio).</description>
    <link>https://forem.com/composio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F8532%2F58c71fb5-3c56-413b-a0f4-abe500bde109.png</url>
      <title>Forem: Composio</title>
      <link>https://forem.com/composio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/composio"/>
    <language>en</language>
    <item>
      <title>Optimising Function Calling (GPT4 vs Opus vs Haiku vs Sonnet)</title>
      <dc:creator>Soham Ganatra</dc:creator>
      <pubDate>Sun, 12 May 2024 09:06:32 +0000</pubDate>
      <link>https://forem.com/composio/optimising-function-calling-gpt4-vs-opus-vs-haiku-vs-sonnet-15dh</link>
      <guid>https://forem.com/composio/optimising-function-calling-gpt4-vs-opus-vs-haiku-vs-sonnet-15dh</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.composio.dev%2Fcontent%2Fimages%2F2024%2F05%2Fblog_feature.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.composio.dev%2Fcontent%2Fimages%2F2024%2F05%2Fblog_feature.png" alt="Optimising Function Calling (GPT4 vs Opus vs Haiku vs Sonnet)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/SamparkAI/Composio-Function-Calling-Benchmark/tree/master?ref=blog.composio.dev" rel="noopener noreferrer"&gt;https://github.com/SamparkAI/Composio-Function-Calling-Benchmark/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/sohamganatra/improving-gpt-4-function-calling-accuracy-3nd0-temp-slug-4895779"&gt;last blog&lt;/a&gt;, we introduced the ClickUp function calling benchmark and experimented with different optimisation approaches for improving function calling using &lt;code&gt;gpt-4-turbo-preview&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;This time, we wanted to check a selection of other models, which might or might not claim to be superior in performance 😅. We also wanted to make our benchmark test more generalised to find compatible optimisation approaches to specific models for function calling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimisation Techniques&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As function calling is a new concept, and not much literature is available, we checked different experiments by the community. From these and our intuition, we realised techniques like flattening the schema structure, making system prompts more focused on function calls, improving the function names, descriptions, parameter descriptions, adding examples, etc. will enhance the function calling performance. So, we decided on this elaborate experiment. To list the methods we experimented with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No System Prompt:&lt;/strong&gt; Only the problem statement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattening Schema&lt;/strong&gt; : All the hierarchical parameters are flattened to a shallow tree structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Simple System Prompt&lt;/strong&gt; : Added a simple system prompt mentioning that function calling needs to be used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt&lt;/strong&gt; : Added characterisation on its role in solving function calling problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt + Function Name Optimised&lt;/strong&gt; : The function names were elaborated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt + Function Description Optimised&lt;/strong&gt; : Explained the descriptions clearly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt containing Schema summary&lt;/strong&gt; : Added summarised version of all function schema to the system prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimised&lt;/strong&gt; : Summarised function schema in system prompt, with elaborated function names.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function Description Optimised&lt;/strong&gt; : Summarised function schema in system prompt, with clearly explained function descriptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimised&lt;/strong&gt; : Additionally, the description of the parameters was improved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimised + Function Call examples added&lt;/strong&gt; : Examples of function calls were added along with function descriptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimised + Function Parameter examples added:&lt;/strong&gt; Examples of parameter values were added to parameter descriptions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As we checked &lt;code&gt;gpt-4-turbo-preview&lt;/code&gt; in the previous experiment, we wanted to test the performance of both its predecessor, &lt;code&gt;gpt-4-0125-preview&lt;/code&gt;, and its successor &lt;code&gt;gpt-4-turbo&lt;/code&gt;. As we have seen before, even though the next-generation models are pretty advanced in benchmark scores, they are often not better in an all-encompassing way. So, comparing with our previous scores, here is the performance of these two OpenAI models.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization Approach&lt;/th&gt;
&lt;th&gt;gpt-4-turbo-preview&lt;/th&gt;
&lt;th&gt;gpt-4-turbo&lt;/th&gt;
&lt;th&gt;gpt-4-0125-preview&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No System Prompt&lt;/td&gt;
&lt;td&gt;0.36&lt;/td&gt;
&lt;td&gt;0.36&lt;/td&gt;
&lt;td&gt;0.353&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattening Schema&lt;/td&gt;
&lt;td&gt;0.527&lt;/td&gt;
&lt;td&gt;0.487&lt;/td&gt;
&lt;td&gt;0.533&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Simple System Prompt&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;0.533&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt&lt;/td&gt;
&lt;td&gt;0.633&lt;/td&gt;
&lt;td&gt;0.633&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.64&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt + Function Name Optimized&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;0.607&lt;/td&gt;
&lt;td&gt;0.587&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt + Function Description Optimized&lt;/td&gt;
&lt;td&gt;0.633&lt;/td&gt;
&lt;td&gt;0.66&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.673&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.64&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.64&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimized&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.707&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.686&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function Description Optimized&lt;/td&gt;
&lt;td&gt;0.687&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.707&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.68&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function and Parameter Descriptions Optimized&lt;/td&gt;
&lt;td&gt;0.767&lt;/td&gt;
&lt;td&gt;0.767&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.787&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function and Parameter Descriptions Optimized + Function Call examples added&lt;/td&gt;
&lt;td&gt;0.693&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.707&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function and Parameter Descriptions Optimized + Function Parameter examples added&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.787&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.693&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.787&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So we can see that, in most cases, the original &lt;code&gt;gpt-4-0125-preview&lt;/code&gt; performed better. When we added more examples of parameters, in the parameter descriptions, &lt;code&gt;gpt-4-0125-preview&lt;/code&gt; consistently performed better than the other models. In the cases where we optimised or elaborated only the function names and descriptions, we see the &lt;code&gt;gpt-4-turbo&lt;/code&gt; seems to do better.   &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic Models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next, we did the same experimentation with Anthropic's Claude-3 series of models. Claude-3 has three models, &lt;code&gt;haiku&lt;/code&gt;, &lt;code&gt;sonnet&lt;/code&gt; and &lt;code&gt;opus&lt;/code&gt;, in increasing order of parameters and performance(at least that is expected).  &lt;/p&gt;

&lt;p&gt;When we tried these models, we discovered that Claude models, especially &lt;code&gt;opus&lt;/code&gt;, is very costly, and very slow!! Running the whole benchmark with GPT-4 for one run took ~4 minutes, while &lt;code&gt;claude-3-opus-20240229&lt;/code&gt;took around ~13 minutes. &lt;code&gt;claude-3-haiku-20240307&lt;/code&gt; and &lt;code&gt;claude-3-sonnet-20240229&lt;/code&gt; took about ~3 minutes and ~6 minutes, respectively.   &lt;/p&gt;

&lt;p&gt;We faced several problems while running the benchmark for clause models. For example, unlike OpenAI models, Claude models' most function/tool calls are preceded by a block of thoughts text, which required some changes in our benchmark code.&lt;br&gt;&lt;br&gt;
Then, while we ran it, we found that the scores were incredibly low in some cases and kind of absurd.&lt;br&gt;&lt;br&gt;
After some digging, we found that sometimes the models predicted the boolean variables as strings, like &lt;code&gt;True&lt;/code&gt; was predicted as &lt;code&gt;"True"&lt;/code&gt; and &lt;code&gt;False&lt;/code&gt; was predicted as &lt;code&gt;"False"&lt;/code&gt;. We added a fix for that and then finally obtained our results.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization Approach&lt;/th&gt;
&lt;th&gt;claude-3-haiku-20240307&lt;/th&gt;
&lt;th&gt;claude-3-sonnet-20240229&lt;/th&gt;
&lt;th&gt;claude-3-opus-20240229&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No System Prompt&lt;/td&gt;
&lt;td&gt;0.48&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattening Schema&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Simple System Prompt&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt + Function Name Optimized&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.62&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt + Function Description Optimized&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary&lt;/td&gt;
&lt;td&gt;0.46&lt;/td&gt;
&lt;td&gt;0.62&lt;/td&gt;
&lt;td&gt;0.46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function Name Optimized&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;td&gt;0.46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function Description Optimized&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function and Parameter Descriptions Optimized&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;td&gt;0.74&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function and Parameter Descriptions Optimized + Function Call examples added&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.76&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary +  Function and Parameter Descriptions Optimized + Function Parameter examples added&lt;/td&gt;
&lt;td&gt;0.68&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.76&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.66&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now I know.., you think they must have messed up the &lt;code&gt;haiku&lt;/code&gt; and &lt;code&gt;opus&lt;/code&gt; models scores. But believe me, I am equally surprised and can ensure that we ran the opus benchmark multiple times and checked the code quite a lot for probable bugs.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;opus&lt;/code&gt;, &lt;code&gt;sonnet&lt;/code&gt; and &lt;code&gt;haiku&lt;/code&gt; initially outperform GPT models in non-optimized scenarios. &lt;code&gt;sonnet&lt;/code&gt; consistently outpaces &lt;code&gt;haiku&lt;/code&gt;, as expected. Had &lt;code&gt;opus&lt;/code&gt; maintained this trend, it likely would have surpassed Openai models.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finally&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenAI models, especially &lt;code&gt;gpt-4-turbo-preview&lt;/code&gt;, are still the better choice regarding performance and cost.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization Approach&lt;/th&gt;
&lt;th&gt;gpt-4-turbo-preview&lt;/th&gt;
&lt;th&gt;gpt-4-turbo&lt;/th&gt;
&lt;th&gt;gpt-4-0125-preview&lt;/th&gt;
&lt;th&gt;claude-3-haiku-20240307&lt;/th&gt;
&lt;th&gt;claude-3-sonnet-20240229&lt;/th&gt;
&lt;th&gt;claude-3-opus-20240229&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No System Prompt&lt;/td&gt;
&lt;td&gt;0.36&lt;/td&gt;
&lt;td&gt;0.36&lt;/td&gt;
&lt;td&gt;0.353&lt;/td&gt;
&lt;td&gt;0.48&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattening Schema&lt;/td&gt;
&lt;td&gt;0.527&lt;/td&gt;
&lt;td&gt;0.487&lt;/td&gt;
&lt;td&gt;0.533&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Simple System Prompt&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;0.533&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt&lt;/td&gt;
&lt;td&gt;0.633&lt;/td&gt;
&lt;td&gt;0.633&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.64&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt + Function Name Optimized&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;0.607&lt;/td&gt;
&lt;td&gt;0.587&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.62&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt + Function Description Optimized&lt;/td&gt;
&lt;td&gt;0.633&lt;/td&gt;
&lt;td&gt;0.66&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.673&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.64&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.553&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.64&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.46&lt;/td&gt;
&lt;td&gt;0.62&lt;/td&gt;
&lt;td&gt;0.46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function Name Optimized&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.707&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.686&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;td&gt;0.46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function Description Optimized&lt;/td&gt;
&lt;td&gt;0.687&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.707&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.68&lt;/td&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized&lt;/td&gt;
&lt;td&gt;0.767&lt;/td&gt;
&lt;td&gt;0.767&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.787&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;td&gt;0.74&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Call examples added&lt;/td&gt;
&lt;td&gt;0.693&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.707&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.76&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flattened Schema + Focused System Prompt containing Schema summary + Function and Parameter Descriptions Optimized + Function Parameter examples added&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.787&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.693&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.787&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.68&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.76&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.66&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All the codes are organised at: &lt;a href="https://github.com/SamparkAI/Composio-Function-Calling-Benchmark/tree/master?ref=blog.composio.dev" rel="noopener noreferrer"&gt;https://github.com/SamparkAI/Composio-Function-Calling-Benchmark/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We're currently deciding which models to test next—perhaps Mistral or open-source options like Functionary or NexusRaven. Check out our repository and try running these models to compare their performance. If you have questions or suggestions, please submit a pull request. Thank you!&lt;/p&gt;

</description>
      <category>functioncalling</category>
      <category>gpt4</category>
      <category>claude</category>
    </item>
    <item>
      <title>Improving Function Calling Accuracy</title>
      <dc:creator>Soham Ganatra</dc:creator>
      <pubDate>Sat, 16 Mar 2024 09:19:09 +0000</pubDate>
      <link>https://forem.com/composio/improving-function-calling-accuracy-mjf</link>
      <guid>https://forem.com/composio/improving-function-calling-accuracy-mjf</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UK8Dunnl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/benchmark.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UK8Dunnl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/benchmark.png" alt="Improving Function Calling Accuracy" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Introduction&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;Large language models have recently been giving the ability to function-calling. Given the details(function-schema) of a number of functions, the LLM will be able to select and run the function with appropriate parameters, if the prompt demands for it. OpenAI’s GPT-4 is one of the best function-calling LLMs available for use. In addition to the GPT4, there are also open-source function calling LLMs like &lt;a href="http://gorilla-llm/gorilla-openfunctions-v1?ref=blog.composio.dev"&gt;OpenGorilla&lt;/a&gt;,  &lt;a href="https://github.com/MeetKai/functionary?ref=blog.composio.dev"&gt;Functionary&lt;/a&gt;,  &lt;a href="https://github.com/nexusflowai/NexusRaven-V2?ref=blog.composio.dev"&gt;NexusRaven&lt;/a&gt; and &lt;a href="https://huggingface.co/fireworks-ai/firefunction-v1?ref=blog.composio.dev"&gt;FireFunction&lt;/a&gt; that I will try and compare performance with. Example Function Calling Code can be found &lt;a href="https://platform.openai.com/docs/guides/function-calling?ref=blog.composio.dev"&gt;at OpenAI Function Calling Cookbook&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;TLDR:&lt;/em&gt; &lt;a href="https://blog.composio.dev/p/c132f4e2-743c-42bd-9fef-a633cd961472/#compiling-the-results"&gt;&lt;em&gt;&lt;u&gt;Show me the results&lt;/u&gt;&lt;/em&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Integration-Focused Agentic Function Calling&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are transitioning towards Agentic applications for more effective use of LLMs in our daily workflow. In this setup, each AI agent is designated a specific role, equipped with distinct functionalities, often collaborating with other agents to perform complex tasks.&lt;/p&gt;

&lt;p&gt;To enhance user experience and streamline workflows, these agents must interact with the tools used by users and automate some functionalities. Currently, AI development allows agents to interact with various software tools to a certain extent through proper integration using software APIs or SDKs. While we can integrate these points into AI agents and hope for flawless operation, the question arises:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are the common design of API endpoints compatible with Agentic Process Automation (APA)?&lt;/strong&gt;  &lt;strong&gt;Maybe we can redesign APIs to be more suitable to function calling?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Selecting Endpoints&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;We referenced the &lt;a href="https://clickup.com/api/?ref=blog.composio.dev"&gt;docs of ClickUp (Popular Task management App)&lt;/a&gt; and curated a selection of endpoints. We decided this due to the impracticality of expecting the LLM to choose from hundreds of endpoints, considering the limitation of context length.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**get_spaces** (team_id:string, archived:boolean)
create_space(team_id:string, name:string, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
get_space(space_id:string)
update_space(space_id:string, name:string, color:string, private:boolean, admin_can_manage:boolean, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
delete_space(space_id:string)
get_space_tags(space_id:string)
create_space_tag(space_id:string, tag:(name:string, tag_fg:string, tag_bg:string))
delete_space_tag(space_id:string, tag_name:string, tag:(name:string, tag_fg:string, tag_bg:string))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We converted them to the corresponding OpenAI function schema, which is available &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/clickup_space_schema.json?ref=blog.composio.dev"&gt;here&lt;/a&gt;. These were specifically selected as they combine endpoints with both &lt;em&gt;flattened and nested parameters&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;u&gt;Creating Benchmark Dataset&lt;/u&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To evaluate our approaches effectively, we require a benchmark dataset that is small and focuses specifically on the software-integration aspect of function-calling Language Models (LLMs).&lt;/p&gt;

&lt;p&gt;Despite reviewing various existing &lt;a href="https://huggingface.co/collections/admarcosai/function-calling-dataset-656c498b27cb1927ca276e8a?ref=blog.composio.dev"&gt;function calling datasets&lt;/a&gt;, none were ideal for this article.&lt;/p&gt;

&lt;p&gt;Consequently, &lt;strong&gt;we developed our own dataset called the&lt;/strong&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/clickup_space_benchmark.json?ref=blog.composio.dev"&gt;&lt;strong&gt;ClickUp-Space dataset&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;, which replicates real-world scenarios to some extent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;&lt;em&gt;prompts require one of eight selected functions to solve&lt;/em&gt;&lt;/strong&gt; , ranging from simple to complex. Our evaluation will be based on how accurately the functions are called with the correct parameters. We also prepared &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main.ipynb?ref=blog.composio.dev"&gt;code for assessing performance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next, we developed a &lt;strong&gt;&lt;em&gt;problem set consisting of 50 pairs of prompts&lt;/em&gt;&lt;/strong&gt; along with their respective function calling solutions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
  {
    "prompt": "As the new fiscal year begins, the management team at a marketing agency decides it's time to archive older projects to make way for new initiatives. They remember that one of their teams is called \"Innovative Solutions\" and operates under the team ID \"team123\". They want to check which spaces under this team are still active before deciding which ones to archive.",
    "solution": "get_spaces(team_id=\"team123\", archived=False)"
  },
  {
    "prompt": "Ella, the project coordinator, is setting up a new project space in ClickUp for the \"Creative Minds\" team with team ID \"cm789\". This space, named \"Innovative Campaigns 2023\", should allow multiple assignees for tasks, but keep due dates and time tracking disabled, as the initial planning phase doesn't require strict deadlines or time monitoring.",
    "solution": "create_space(team_id=\"cm789\", name=\"Innovative Campaigns 2023\", multiple_assignees=True, features=(due_dates=(enabled=False, start_date=False, remap_due_dates=False, remap_closed_due_date=False), time_tracking=(enabled=False)))"
  },
...
]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;u&gt;Measuring Baseline Performance&lt;/u&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Initially, we wanted to assess GPT-4's performance independently, without any system prompts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fcalling_llm = lambda fprompt : client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {
      "role": "system",
      "content": """"""
    },
    {
      "role": "user",
      "content": prompt
    },
  ],
  temperature=0,
  max_tokens=4096,
  top_p=1,
  tools=tools,
  tool_choice="auto"
)

response = fcalling_llm(bench_data[1]["prompt"])

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We set the &lt;code&gt;temperature&lt;/code&gt; to &lt;code&gt;0&lt;/code&gt; to make the results more predictable. The experiment was repeated three times, resulting in an &lt;strong&gt;average accuracy of 0.3&lt;/strong&gt; , which is below our target.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark&lt;em&gt; without System Prompt&lt;/em&gt;&lt;/u&gt; &lt;strong&gt;-&lt;/strong&gt; &lt;em&gt;[&lt;/em&gt;&lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;Code Here&lt;/em&gt;&lt;/a&gt;&lt;em&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SnRiSF_2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SnRiSF_2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-9.png" alt="Improving Function Calling Accuracy" width="800" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Flattening the Parameters&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;As mentioned earlier, some functions require output parameters in a nested structure. An example below-&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "name": "create_space",
    "description": "Add a new Space to a Workspace.",
    "parameters": {
      "type": "object",
      "properties": {
        "team_id": {
          "type": "string",
          "description": "The ID of the team"
        },
        "name": {
          "type": "string",
          "description": "The name of the new space"
        },
        "multiple_assignees": {
          "type": "boolean",
          "description": "Enable or disable multiple assignees for tasks within the space"
        },
        "features": {
          "type": "object",
          "description": "Enabled features within the space",
          "properties": {
            "due_dates": {
              "type": "object",
              "description": "Due dates feature settings",
              "properties": {
                "enabled": { "type": "boolean" },
                "start_date": { "type": "boolean" },
                "remap_due_dates": { "type": "boolean" },
                "remap_closed_due_date": { "type": "boolean" }
              }
            },
            "time_tracking": {
              "type": "object",
              "description": "Time tracking feature settings",
              "properties": {
                "enabled": { "type": "boolean" }
              }
            }
          }
        }
      },
      "required": ["team_id", "name", "multiple_assignees", "features"]
    }
  }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Based on our experience with LLMs, we believe that while the model (GPT-4) has been optimised for structured output, a &lt;strong&gt;&lt;em&gt;complex output structure may actually reduce performance and accuracy&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Therefore, &lt;strong&gt;&lt;em&gt;we programmatically flatten the parameters&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Above function flattened will look as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
        "description": "Add a new Space to a Workspace.",
        "name": "create_space",
        "parameters": {
            "properties": {
                "features __due_dates__ enabled": {
                    "description": "enabled __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __due_dates__ remap_closed_due_date": {
                    "description": "remap_closed_due_date __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __due_dates__ remap_due_dates": {
                    "description": "remap_due_dates __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __due_dates__ start_date": {
                    "description": "start_date __Due dates feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "features __time_tracking__ enabled": {
                    "description": "enabled __Time tracking feature settings__ Enabled features within the space__",
                    "type": "boolean"
                },
                "multiple_assignees": {
                    "description": "Enable or disable multiple assignees for tasks within the space__",
                    "type": "boolean"
                },
                "name": {
                    "description": "The name of the new space__",
                    "type": "string"
                },
                "team_id": {
                    "description": "The ID of the team__",
                    "type": "string"
                }
            },
            "required": [
                "team_id",
                "name",
                "multiple_assignees",
                "features __due_dates__ enabled",
                "features __due_dates__ start_date",
                "features __due_dates__ remap_due_dates",
                "features __due_dates__ remap_closed_due_date",
                "features __time_tracking__ enabled"
            ],
            "type": "object"
        }
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We &lt;em&gt;attached the parameter name to its parent parameters&lt;/em&gt; (ex:&lt;code&gt;features __due_dates__ enabled&lt;/code&gt; ) by &lt;code&gt;__&lt;/code&gt; , and &lt;em&gt;joined the parameter descriptions to its predecessor&lt;/em&gt; ( Ex:&lt;code&gt;enabled__due_dates feature settings __enabled features within the space__&lt;/code&gt; ).&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark after Flattening Schema&lt;/u&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vzlWwVkS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vzlWwVkS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-8.png" alt="Improving Function Calling Accuracy" width="800" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;u&gt;Adding System Prompt&lt;/u&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We didn't have a system prompt before, so the LLM wasn't instructed on its role or interacting with ClickUp APIs.&lt;/p&gt;

&lt;p&gt;Let's add a simple system prompt now.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;System&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from openai import OpenAI
client = OpenAI()

fcalling_llm = lambda fprompt : client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {
      "role": "system",
      "content": """
You are an agent who is responsible for managing various employee management platform, 
one of which is CliuckUp.

When you are presented with a technical situation, that a person of a team is facing, 
you must give the soulution utilizing your functionalities. 
"""
    },
    {
      "role": "user",
      "content": fprompt
    },
  ],
  temperature=0,
  max_tokens=4096,
  top_p=1,
  tools=tools,
  tool_choice="auto"
)

response = fcalling_llm(bench_data[1]["prompt"])

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Code Change&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark&lt;em&gt; with System Prompt&lt;/em&gt;&lt;em&gt;&lt;strong&gt; -&lt;/strong&gt;&lt;/em&gt;&lt;em&gt; &lt;/em&gt;&lt;/u&gt;&lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt1.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lbjNkUc2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lbjNkUc2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-7.png" alt="Improving Function Calling Accuracy" width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;u&gt;Improving System Prompt&lt;/u&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now that we've observed an improvement in performance by adding a system prompt, we will enhance its detail to assess if the performance increase is sustained.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are an agent who is responsible for managing various employee management platform, 
one of which is CliuckUp. 

You are given a number of tools as functions, you must use one of those tools and fillup 
all the parameters of those tools ,whose answers you will get from the given situation.

When you are presented with a technical situation, that a person of a team is facing, 
you must give the soulution utilizing your functionalities. 

First analyze the given situation to fully anderstand what is the intention of the user,
what they need and exactly which tool will fill up that necessity.

Then look into the parameters and extract all the relevant informations to fillup the 
parameter with right values.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;New System Prompt&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seems to work great!&lt;/strong&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt2.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark after Flattened Schema + Improved System Prompt&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bYmQFfyk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bYmQFfyk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-6.png" alt="Improving Function Calling Accuracy" width="800" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;u&gt;Adding Schema Summary in Schema Prompt&lt;/u&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let's enhance the system prompts further by focusing on the functions and their purpose, building upon the clear instructions provided for the LLM's role.&lt;/p&gt;

&lt;p&gt;Here is a &lt;strong&gt;&lt;em&gt;concise summary of the system functions which we add to prompt.&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get_spaces - View the Spaces available in a Workspace.
create_space - Add a new Space to a Workspace.
get_space - View the details of a specific Space in a Workspace.
update_space - Rename, set the Space color, and enable ClickApps for a Space.
delete_space - Delete a Space from your Workspace.
get_space_tags - View the task Tags available in a Space.
create_space_tag - Add a new task Tag to a Space.
delete_space_tag - Delete a task Tag from a Space.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;u&gt;Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary. &lt;/u&gt;&lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt3.ipynb?ref=blog.composio.dev"&gt;&lt;u&gt;[Code Here]&lt;/u&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NEzsXOyn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NEzsXOyn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-5.png" alt="Improving Function Calling Accuracy" width="800" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Optimising Function Names&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;Now, let's improve the schemas starting with more descriptive function names.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_func_name_dict = {
    "get_spaces": "get_all_clickup_spaces_available",
    "create_space": "create_a_new_clickup_space",
    "get_space": "get_a_specific_clickup_space_details",
    "update_space": "modify_an_existing_clickup_space",
    "delete_space": "delete_an_existing_clickup_space",
    "get_space_tags": "get_all_tags_of_a_clickup_space",
    "create_space_tag": "assign_a_tag_to_a_clickup_space",
    "delete_space_tag": "remove_a_tag_from_a_clickup_space",
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Replacing Current Function Names with Above&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;optimized_schema = []
for sc in flattened_schema:
    temp_dict = sc.copy()
    temp_dict["name"] = schema_func_name_dict[temp_dict["name"]]
    optimized_schema.append(temp_dict)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Replace names in the schema Code&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + Function Names Optimised &lt;/u&gt;&lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt3_schemaOptimize.ipynb?ref=blog.composio.dev"&gt;&lt;u&gt;[Code Here]&lt;/u&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--I13MvvBk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--I13MvvBk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-4.png" alt="Improving Function Calling Accuracy" width="800" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Optimising Function Description&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;Here, we focus on the function descriptions and make those more clear and focused.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_func_decription_dict = {
    "get_spaces": "Retrives information of all the spaces available in user's Clickup Workspace.",
    "create_space": "Creates a new ClickUp space",
    "get_space": "Retrives information of a specific Clickup space",
    "update_space": "Modifies name, settings the Space color, and assignee management Space.",
    "delete_space": "Delete an existing space from user's ClickUp Workspace",
    "get_space_tags": "Retrives all the Tags assigned on all the tasks in a Space.",
    "create_space_tag": "Assigns a customized Tag in a ClickUp Space.",
    "delete_space_tag": "Deletes a specific tag previously assigned in a space.",
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;New Descriptions&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And change schema with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;optimized_schema = []
for sc in flattened_schema:
    temp_dict = sc.copy()
    temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
    optimized_schema.append(temp_dict)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Changing Schema&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + Function Names Optimised + Function Descriptions Optimised&lt;/u&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt3_schemaOptimize2.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Cx8m_Cr4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Cx8m_Cr4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-3.png" alt="Improving Function Calling Accuracy" width="800" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Optimising Function Parameter Descriptions&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;Earlier, we flattened the schema by stacking nested parameters' descriptions with their parents' descriptions until they were in a flattened state.&lt;/p&gt;

&lt;p&gt;Let's now replace them with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_func_params_dict = {
    'create_space': {
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. Default: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. Default: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. Default: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
        'name': 'The name of the new space to create',
        'team_id': 'The ID of the team'
        },
    'create_space_tag': {
        'space_id': 'The ID of the space',
        'tag__name': 'The name of the tag to assign',
        'tag__tag_bg': 'The background color of the tag to assign',
        'tag__tag_fg': 'The foreground(text) color of the tag to assign'
        },
    'delete_space': {
        'space_id': 'The ID of the space to delete'
        },
    'delete_space_tag': {
        'space_id': 'The ID of the space',
        'tag__name': 'The name of the tag to delete',
        'tag__tag_bg': 'The background color of the tag to delete',
        'tag__tag_fg': 'The foreground color of the tag to delete',
        'tag_name': 'The name of the tag to delete'
        },
    'get_space': {
        'space_id': 'The ID of the space to retrieve details'
        },
    'get_space_tags': {
        'space_id': 'The ID of the space to retrieve all the tags from'
        },
    'get_spaces': {
        'archived': 'A flag to decide whether to include archived spaces or not. Default: True',
        'team_id': 'The ID of the team'
        },
    'update_space': {
        'admin_can_manage': 'A flag to determine if the administrator can manage the space or not. Default: True',
        'color': 'The color used for the space',
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. Default: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. Default: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. Default: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
        'name': 'The new name of the space',
        'private': 'A flag to determine if the space is private or not. Default: False',
        'space_id': 'The ID of the space'
        }
        }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And modifying the previous schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;optimized_schema = []
for sc in flattened_schema:
    temp_dict = sc.copy()
    temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
    for func_param_name, func_param_description in schema_func_params_dict[temp_dict["name"]].items():
        sc["parameters"]["properties"][func_param_name]["description"] = func_param_description
    optimized_schema.append(temp_dict)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;u&gt;Benchmark after &lt;em&gt;Flattened Schema + Improved System Prompt containing Schema Summary&lt;/em&gt; + (Function Names + Function Descriptions + Parameter Descriptions) Optimised&lt;/u&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt3_schemaOptimize3.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---cGYOusA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---cGYOusA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-2.png" alt="Improving Function Calling Accuracy" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Wow! For all runs we got score equal to or over 75%.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;u&gt;Adding Examples of Function Calls&lt;/u&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;LLMs perform better when response examples are provided. Let's aim to give examples and analyse the outcomes.&lt;/p&gt;

&lt;p&gt;To start, we can provide examples of each function call along with the corresponding function description in the schema to illustrate this concept.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_func_decription_dict = {
    "get_spaces": """\
Retrives information of all the spaces available in user's Clickup Workspace. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
get_spaces({'team_id': 'a1b2c3d4', 'archived': False})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    """,
    "create_space": """\
Creates a new ClickUp space. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
create_space ({&lt;br&gt;
  'team_id': 'abc123',&lt;br&gt;
  'name': 'NewWorkspace',&lt;br&gt;
  'multiple_assignees': True,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; enabled': True,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; start_date': False,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; remap_due_dates': False,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; remap_closed_due_date': False,&lt;br&gt;
  'features &lt;strong&gt;time_tracking&lt;/strong&gt; enabled': True&lt;br&gt;
})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;""",
    "get_space": """\
Retrives information of a specific Clickup space. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
get_space({'space_id': 's12345'})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;""",
    "update_space": """\
Modifies name, settings the Space color, and assignee management Space. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
update_space({&lt;br&gt;
  'space_id': 's12345',&lt;br&gt;
  'name': 'UpdatedWorkspace',&lt;br&gt;
  'color': '#f0f0f0',&lt;br&gt;
  'private': True,&lt;br&gt;
  'admin_can_manage': False,&lt;br&gt;
  'multiple_assignees': True,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; enabled': True,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; start_date': False,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; remap_due_dates': False,&lt;br&gt;
  'features &lt;strong&gt;due_dates&lt;/strong&gt; remap_closed_due_date': False,&lt;br&gt;
  'features &lt;strong&gt;time_tracking&lt;/strong&gt; enabled': True&lt;br&gt;
})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
""",
    "delete_space": """\
Delete an existing space from user's ClickUp Workspace. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
delete_space({'space_id': 's12345'})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    """,
    "get_space_tags": """\
Retrives all the Tags assigned on all the tasks in a Space. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
get_space_tags({'space_id': 's12345'})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;""",
    "create_space_tag": """\
        Assigns a customized Tag in a ClickUp Space. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
create_space_tag({&lt;br&gt;
  'space_id': 's12345',&lt;br&gt;
  'tag_&lt;em&gt;name': 'Important',&lt;br&gt;
  'tag&lt;/em&gt;&lt;em&gt;tag_bg': '#ff0000',&lt;br&gt;
  'tag&lt;/em&gt;_tag_fg': '#ffffff'&lt;br&gt;
})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        """,
    "delete_space_tag": """\
    Deletes a specific tag previously assigned in a space. Example Call:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
delete_space_tag({&lt;br&gt;
  'space_id': 's12345',&lt;br&gt;
  'tag_name': 'Important',&lt;br&gt;
  'tag_&lt;em&gt;name': 'Important',&lt;br&gt;
  'tag&lt;/em&gt;&lt;em&gt;tag_bg': '#ff0000',&lt;br&gt;
  'tag&lt;/em&gt;_tag_fg': '#ffffff'&lt;br&gt;
})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    """,
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And when we run the benchmark,&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Benchmark after Flattened Schema + Improved System Prompt containing Schema Summary + (Function Names + Function Descriptions + Parameter Descriptions) Optimised + Function Call Examples Added&lt;/u&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt3_schemaOptimize3_withExample1.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pepVLRvF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pepVLRvF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image.png" alt="Improving Function Calling Accuracy" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sadly, the score seems to degrade!&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Example Parameter Values
&lt;/h2&gt;

&lt;p&gt;Since the function call example for addition did not work, let's now try adding sample values to the function parameters to provide a clearer idea of the values to input. We will adjust the descriptions of our function parameters accordingly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_func_params_dict = {
    'create_space': {
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
        'name': 'The name of the new space to create \nExample: \'NewWorkspace\', \'TempWorkspace\'',
        'team_id': 'The ID of the team \nExample: \'abc123\', \'def456\' '
        },
    'create_space_tag': {
        'space_id': 'The ID of the space \nExample: \'abc123\', \'def456\'',
        'tag__name': 'The name of the tag to assign \nExample: \'NewTag\', \'TempTag\'',
        'tag__tag_bg': 'The background color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\'',
        'tag__tag_fg': 'The foreground(text) color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\''
        },
    'delete_space': {
        'space_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\''
        },
    'delete_space_tag': {
        'space_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\'',
        'tag__name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\'',
        'tag__tag_bg': 'The background color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
        'tag__tag_fg': 'The foreground color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
        'tag_name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\''
        },
    'get_space': {
        'space_id': 'The ID of the space to retrieve details \nExample: \'abc123\', \'def456\''
        },
    'get_space_tags': {
        'space_id': 'The ID of the space to retrieve all the tags from \nExample: \'abc123\', \'def456\''
        },
    'get_spaces': {
        'archived': 'A flag to decide whether to include archived spaces or not \nExample: True, False. Default: True',
        'team_id': 'The ID of the team \nExample: \'abc123\', \'def456\''
        },
    'update_space': {
        'admin_can_manage': 'A flag to determine if the administrator can manage the space or not \nExample: True, False. Default: True',
        'color': 'The color used for the space \nExample: \'#FF0000\', \'#00FF00\'',
        'features __due_dates__ enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
        'features __due_dates__ remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
        'features __due_dates__ remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
        'features __due_dates__ start_date': 'If start date feature in due dates is available within the space. Default: False',
        'features __time_tracking__ enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
        'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
        'name': 'The new name of the space \nExample: \'NewWorkspace\', \'TempWorkspace\'',
        'private': 'A flag to determine if the space is private or not \nExample: True, False. Default: False',
        'space_id': 'The ID of the space to update \nExample: \'abc123\', \'def456\''
        }
        }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And using these in the function schema, we get:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Flattened Schema + Improved System Prompt containing Schema Summary&lt;/em&gt; + (Function Names + Function Descriptions + Parameter Descriptions) Optimised + Function Call Examples Added + Adding Example Parameter Values &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/main_flattened_sysprompt3_schemaOptimize3_withExample2.ipynb?ref=blog.composio.dev"&gt;&lt;em&gt;[Code Here]&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6g7bPsaY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6g7bPsaY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/image-1.png" alt="Improving Function Calling Accuracy" width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Wow! The intuition of adding example pays off.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Compiling the Results&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;To summarise all our examples, and their results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--stDTwzaF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:1400/1%2ABVFL1NGl3778zHvrqmxk-w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--stDTwzaF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:1400/1%2ABVFL1NGl3778zHvrqmxk-w.png" alt="Improving Function Calling Accuracy" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We experimented with strategies to improve the function calling ability of LLMs, specifically for Agentic Software integrations. Starting from a baseline score of 36%, we boosted performance to an average of 78%. The insights shared in this article aim to enhance your applications as well.&lt;/p&gt;

&lt;p&gt;Moreover, we discovered a key distinction between general function calling and function calling for software integrations. In general function calls, even with multiple functions, they operate independently and non-linearly when executing an action. However, in software integrations, functions must follow a specific sequence to effectively accomplish an action.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;All the codes of this articles are available&lt;/em&gt; &lt;a href="https://github.com/SamparkAI/fcalling-clickup-blog-code?ref=blog.composio.dev"&gt;&lt;em&gt;here&lt;/em&gt;&lt;/a&gt;&lt;em&gt;. Thank you!&lt;/em&gt;
&lt;/h3&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;Further Experiments &amp;amp; Challenges&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;We have been experimenting on this for a while and are planning to write further on&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parallel Function calling accuracy&lt;/li&gt;
&lt;li&gt;Sequential Function Call Planning Accuracy (RAG + CoT)&lt;/li&gt;
&lt;li&gt;Comparison with Open Source Function Calling Models (&lt;a href="http://gorilla-llm/gorilla-openfunctions-v1?ref=blog.composio.dev"&gt;OpenGorilla&lt;/a&gt;, &lt;a href="https://github.com/MeetKai/functionary?ref=blog.composio.dev"&gt;Functionary&lt;/a&gt;, &lt;a href="https://github.com/nexusflowai/NexusRaven-V2?ref=blog.composio.dev"&gt;NexusRaven&lt;/a&gt;, and &lt;a href="https://huggingface.co/fireworks-ai/firefunction-v1?ref=blog.composio.dev"&gt;FireFunction&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When dealing with integration-centric function calls, the process can be complex. For instance, the agent may need to gather data from various endpoints like &lt;code&gt;get_spaces_members&lt;/code&gt;, &lt;code&gt;get_current_active_members&lt;/code&gt;, and &lt;code&gt;get_member_whose_contract_is_over&lt;/code&gt; before responding with the &lt;code&gt;update_member_list&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;This means there could be additional data not yet discussed in the conversation that requires the agent to fetch from other endpoints silently to formulate a complete response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Optimisations like this are crucial aspect of our efforts at&lt;/em&gt;&lt;/strong&gt; &lt;a href="https://www.composio.dev/?ref=blog.composio.dev"&gt;&lt;strong&gt;&lt;em&gt;Composio&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;&lt;em&gt;to enhance the smoothness of Agentic integrations. If you are interested in improving accuracy of your agents connect with us at&lt;/em&gt;&lt;/strong&gt; mailto: &lt;strong&gt;&lt;em&gt;&lt;a href="mailto:hello@composio.dev"&gt;hello@composio.dev&lt;/a&gt;.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Subscribe if you are interested in learning more!&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Better interface between Agents &lt;--&gt; Tools</title>
      <dc:creator>Soham Ganatra</dc:creator>
      <pubDate>Sat, 02 Mar 2024 15:26:26 +0000</pubDate>
      <link>https://forem.com/composio/better-interface-between-agents-tools-3827</link>
      <guid>https://forem.com/composio/better-interface-between-agents-tools-3827</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W79pBeYs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/NzPNNnq8Nf8XtN4XBLjlpuqF8-1.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W79pBeYs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.composio.dev/content/images/2024/03/NzPNNnq8Nf8XtN4XBLjlpuqF8-1.webp" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are we working on?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’re on the cusp of a future where multiple AI agents will soon work together and interact with diverse tools for complex tasks. The rise in platforms for AI workflow and agent orchestration signals this shift. Yet, these platforms face challenges: limited scope, variety, and reliability of integrations. Developers often grapple with authentication and API specifications to implement basic agentic use cases. This hampers the seamless communication between agents and tools, a cornerstone for enabling real-world applications.&lt;/p&gt;

&lt;p&gt;Our goal is to simplify this. By managing your integrations, we let you focus on creating your agentic platform. We’re crafting the vital integration layer for AI agents, smoothing out the rough edges for innovation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What can we offer now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our SDK offers over 90 connectors optimized for LLM tool actions and triggers. Enjoy a customizable, white-label authentication experience. We also offer best-in-class reliability and detailed observability for each API call, saving you the hassle of spending sleepless nights while debugging the faulty API calls.&lt;/p&gt;




</description>
    </item>
  </channel>
</rss>
