DEV Community

Cover image for Vercel AI SDK v5 Internals - Part 8 — Server & Client Tool Execution Deep Dive
Yigit Konur
Yigit Konur

Posted on

Vercel AI SDK v5 Internals - Part 8 — Server & Client Tool Execution Deep Dive

We've been exploring the Vercel AI SDK v5 canary in this series, and today we're tackling a big one: Tools. If you're building any kind of agentic behavior, or just need your AI to interact with the outside world (or your own app's functions), this is where the rubber meets the road. v5 brings some serious structure and developer experience improvements to how tools are defined, called, and represented in the UI.

As a quick reminder, we're building on concepts from previous posts: UIMessage and UIMessagePart (the new message anatomy), v5's streaming capabilities, V2 Model Interfaces (especially LanguageModelV2FunctionTool), and the overall client/server architecture. If those are new to you, you might want to glance back.

🖖🏿 A Note on Process & Curation: While I didn't personally write every word, this piece is a product of my dedicated curation. It's a new concept in content creation, where I've guided powerful AI tools (like Gemini Pro 2.5 for synthesis, git diff main vs canary v5 informed by extensive research including OpenAI's Deep Research, spent 10M+ tokens) to explore and articulate complex ideas. This method, inclusive of my fact-checking and refinement, aims to deliver depth and accuracy efficiently. I encourage you to see this as a potent blend of human oversight and AI capability. I use them for my own LLM chats on Thinkbuddy, and doing some make-ups and pushing to there too.

Let's get into how v5 makes tool calls first-class citizens.

1. Tool Invocation Lifecycle Recap: From AI Request to UI Update

Before we dive into the v5 specifics, let's quickly refresh the general lifecycle of an AI tool call, as this sets the stage for understanding the structured approach v5 brings.

Why this matters?

Tool calling is a fundamental capability for making LLMs more than just text generators. They allow models to fetch real-time data, perform actions, or interact with other systems. Historically, managing this interaction – the request from the AI, executing the tool, getting the result back to the AI, and updating the UI – could be a bit loose, often involving custom logic and parsing.

How it generally works (Pre-v5 context):

  1. AI Decides to Use a Tool: LLM processes input, determines need for external help, decides to call a specific function/tool.
  2. AI Specifies Tool and Arguments: LLM generates a request specifying toolName and args (often JSON).
  3. Application Executes Tool: Your application (server-side or client-side) receives the request and executes the named tool with provided arguments.
  4. Result Fed Back to AI: Tool output (data, confirmation, or error) is packaged and sent back to the LLM.
  5. AI Generates Final Response: LLM incorporates tool result and formulates its final response to the user.

v5 Enhancement Teaser:

This general flow remains, but Vercel AI SDK v5 provides a much more structured, typed, and integrated way to represent and manage this entire lifecycle, especially within the chat UI (using ToolInvocationUIPart) and the data flow.

Take-aways / Migration Checklist Bullets

  • Tool calling involves the AI requesting, your app executing, and the AI using the result.
  • v5 brings enhanced structure to this lifecycle, especially for UI and data flow.

2. Defining V2 LanguageModelV2FunctionTool with Zod (Server-Side)

To empower your LLM with tools in Vercel AI SDK v5, you first need to define them on the server-side using the LanguageModelV2FunctionTool interface, with Zod playing a crucial role in schema definition and argument validation.

Why this matters?

LLMs need clear, machine-readable definitions of tools: name, description, and argument schema. Without this, calls fail. v5 standardizes this with LanguageModelV2FunctionTool and strongly encourages Zod for schema validation.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

Tools are defined using LanguageModelV2FunctionTool from @ai-sdk/provider.

// From packages/provider/src/language-model/v2/language-model-v2.ts
export interface LanguageModelV2FunctionTool<NAME extends string, ARGS, RESULT> {
  readonly type: 'function';
  readonly function: {
    readonly name: NAME;
    readonly description?: string;
    readonly parameters: ZodSchema<ARGS>; // Typically a Zod schema
    readonly execute?: (args: ARGS) => PromiseLike<RESULT>; // Optional server-side execution
  };
}
Enter fullscreen mode Exit fullscreen mode
+---------------------------------+
| LanguageModelV2FunctionTool     |
|---------------------------------|
| .type: 'function'               |
| .function: {                    |
|    .name: string (ToolName)     |
|    .description?: string        |
|    .parameters: ZodSchema<ARGS> | <--- Zod for arg schema & validation
|    .execute?: (ARGS)=>Promise<RESULT> | <--- Optional server-side logic
| }                               |
+---------------------------------+
Enter fullscreen mode Exit fullscreen mode

[FIGURE 1: Diagram showing the structure of LanguageModelV2FunctionTool and its fields]

Key fields within function:

  • name: NAME (string): Unique name LLM uses to call the tool.
  • description?: string: Natural language description for the LLM (what it does, when to use it). Good descriptions are crucial for correct tool usage.
  • parameters: ZodSchema<ARGS>: Zod schema for arguments.
    • Why Zod? Provides static typing and runtime validation. SDK often converts to JSON Schema for the LLM.
    • Automatic Validation: If tool has server-side execute, SDK uses this Zod schema to validate LLM-provided arguments before execute is called. This is key for security and robustness.
  • execute?: (args: ARGS) => PromiseLike<RESULT>: Optional server-side function.
    • If provided, SDK can auto-run the tool. Receives validated args.
    • If not provided (or for client-side tools), tool call info is streamed to client.

Example with Zod:

import { z } from 'zod';
import { LanguageModelV2FunctionTool } from '@ai-sdk/provider';

const weatherParamsSchema = z.object({
  city: z.string().describe("The city, e.g., 'San Francisco'"),
  unit: z.enum(['celsius', 'fahrenheit']).optional().default('celsius'),
});
type WeatherParams = z.infer<typeof weatherParamsSchema>;

const getWeatherTool: LanguageModelV2FunctionTool<'getWeatherInformation', WeatherParams, string> = {
  type: 'function',
  function: {
    name: 'getWeatherInformation',
    description: 'Fetches current weather for a city.',
    parameters: weatherParamsSchema,
    execute: async ({ city, unit }) => {
      // ... (call weather API or simulate) ...
      return `Weather in ${city} is X°${unit === 'celsius' ? 'C' : 'F'}.`;
    },
  },
};

// Usage with streamText:
// await streamText({
//   model: openai('gpt-4o-mini'),
//   messages: modelMessages,
//   tools: { getWeatherInformation: getWeatherTool },
//   toolChoice: 'auto'
// });
Enter fullscreen mode Exit fullscreen mode

LLM hints via .describe() on Zod properties help LLM generate accurate arguments.

Take-aways / Migration Checklist Bullets

  • Define server-side tools using LanguageModelV2FunctionTool.
  • Use Zod schemas for function.parameters for argument definition and auto-validation.
  • function.execute is optional for server-side execution.
  • Write clear function.description. Use .describe() on Zod schema properties.
  • Update V4 tool definitions to this V2 structure.

3. Server-side Tool Execution Flow (with execute function)

When an LLM calls a server-defined tool that includes an execute function, Vercel AI SDK v5 orchestrates a seamless flow: validating arguments, running your function, and then automatically feeding the results back to the LLM for further processing, all while keeping the client UI informed via streamed updates.

Why this matters?

Automated server-side tool execution flow saves boilerplate for parsing args, calling tool code, formatting results, and constructing messages for the LLM.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

Flow when streamText uses a tool with an execute method:

  1. LLM Decides to Call Tool: Generates tool call request (toolName, args).
  2. V2 Provider Adapter Parses: Extracts tool call info from LLM response.
  3. AI SDK Core Logic (in streamText):

    • Identifies Tool.
    • Validates Arguments: Uses tool's Zod schema from parameters to validate LLM-provided args. If fails, error (e.g., InvalidToolArgumentsError) generated.
    • Calls execute(args): If valid, calls your tool's execute with typed, validated args.
    • Awaits Result: SDK awaits the Promise from execute.
    +---------+   Tool Call Req   +-----------------+   Validated Args   +-----------------+   Tool Result   +-----------------+
    |   LLM   |------------------>| SDK Core        |------------------->| Tool's .execute |---------------->| SDK Core        |
    |         |                   | (Arg Validation)|                    | Function        |                 | (Gets Result)   |
    +---------+                   +-----------------+                    +-----------------+                 +-----------------+
    

    [FIGURE 2: Sequence diagram of server-side tool execution: LLM -> SDK -> Validate Args -> Execute Tool -> Get Result -> SDK]

  4. Result Sent Back to LLM (for multi-step):

    • SDK constructs ModelMessage (role: 'tool') with LanguageModelV2ToolResultPart(s) (containing toolCallId, toolName, result).
    • If streamText is configured for multi-step (e.g., maxSteps > 1), SDK automatically sends this tool message back to LLM in the same streamText operation.
    • LLM processes result, generates next response (text or another tool call).

3.1 Auto-stream of tool-call and tool-result parts to the client

Simultaneously, result.toUIMessageStreamResponse() streams updates to client UI:

  • Streaming Tool Call Intention:
    • 'tool-call-delta' UIMessageStreamPart (if toolCallStreaming enabled) then 'tool-call' part (with toolCallId, toolName, args).
    • Client useChat creates/updates ToolInvocationUIPart (state: 'call').
    • UI shows AI using tool, e.g., "AI is using getWeatherInformation...".
  • Streaming Tool Result:

    • After server execute completes, 'tool-result' UIMessageStreamPart streamed (with toolCallId, toolName, result).
    • Client ToolInvocationUIPart updates (state: 'result', populated with result).
    • UI shows tool outcome.
    Server (streamText & toUIMessageStreamResponse)        Client (useChat & processUIMessageStream)
    -----------------------------------------------        -----------------------------------------
    1. LLM emits tool_call(getWeather, {city:"L"})
           |
           v
    2. SDK: 'tool-call-delta' sent to client  -------------> Client UI: Display "Tool: getWeather, Args: {city:\"L\"..." (state: 'partial-call')
           |
           v
    3. SDK: 'tool-call' sent to client (full args) -------> Client UI: Update ToolInvocationUIPart (state: 'call')
           |
           v
    4. SDK: Server executes getWeatherTool.execute()
           | (result = "Rainy...")
           v
    5. SDK: 'tool-result' sent to client (result) --------> Client UI: Update ToolInvocationUIPart with result (state: 'result')
           |
           v
    6. SDK: Result fed back to LLM (if maxSteps > 1)
           |
           v
    7. LLM: Generates final text based on tool result
           |
           v
    8. SDK: 'text' parts sent to client ------------------> Client UI: Display final text message
    

    [FIGURE 3: Diagram showing UIMessageStreamParts ('tool-call', 'tool-result') flowing to the client, updating a ToolInvocationUIPart]

3.2 Injecting results for model follow-up (Automatic if maxSteps > 1)

  • If maxSteps (e.g., maxSteps: 5) allows, SDK appends tool role ModelMessage (with tool output) to internal history and calls LLM again.
  • Loop (LLM -> tool call -> execute -> tool result -> LLM -> text/tool call) continues until final text or maxSteps limit.
  • All intermediate steps streamed to client.

Take-aways / Migration Checklist Bullets

  • If server tool has execute, streamText can auto-validate args & run it.
  • Arg validation uses Zod schema from LanguageModelV2FunctionTool.parameters.
  • SDK streams 'tool-call' and 'tool-result' UIMessageStreamParts to client, updating ToolInvocationUIPart.
  • If maxSteps > 1, SDK auto-sends tool results back to LLM for continued processing.

4. Client-side onToolCall for Browser-Based Tools

Sometimes, a tool needs to be executed directly in the user's browser—to access browser APIs like geolocation, interact with a browser extension, or simply to ask the user for a confirmation via window.confirm. Vercel AI SDK v5's useChat hook facilitates this through its onToolCall prop.

Why this matters?

Not all tools run on servers (e.g., browser navigator.geolocation, window.confirm). v5's onToolCall in useChat handles these.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

Use onToolCall callback prop in useChat.
Scenario: LLM calls a tool.

  • Case 1: Tool defined on server without execute.
  • Case 2: LLM calls tool not defined on server (client-only handler). Server streams 'tool-call' UIMessageStreamPart. Client useChat processes this, creating ToolInvocationUIPart (state: 'call'). Client onToolCall then runs.
// Client-side React component
import { useChat, UIMessage, ToolCall } from '@ai-sdk/react';

function MyChatComponent() {
  const { messages, addToolResult /* ...etc */ } = useChat({
    onToolCall: async ({ toolCall }: { toolCall: ToolCall }) => {
      // toolCall: { toolCallId: string, toolName: string, args: any }
      if (toolCall.toolName === 'getClientGeolocation') {
        try {
          const position = await new Promise<GeolocationPosition>((res, rej) => navigator.geolocation.getCurrentPosition(res, rej));
          return { toolCallId: toolCall.toolCallId, result: { lat: position.coords.latitude, lon: position.coords.longitude } };
        } catch (error: any) {
          return { toolCallId: toolCall.toolCallId, error: error.message };
        }
      }
      // ... handle other client tools ...
      return { toolCallId: toolCall.toolCallId, error: `Tool '${toolCall.toolName}' not handled.` };
    }
  });
  // ... render UI ...
}
Enter fullscreen mode Exit fullscreen mode
+-------------------+   Streams 'tool-call'   +-------------------+   Invokes callback   +-------------------+
| Server (LLM)      |------------------------>| Client (useChat)  |--------------------->| onToolCall(       |
| (decides to call  |                         | (receives part,   |                      |  toolCall         |
|  client tool)     |                         |  updates UI msg)  |                      | )                 |
+-------------------+                         +-------------------+                      +-------------------+
                                                        ^                                       |
                                                        | (updates ToolInvocationUIPart         | . Returns {toolCallId, result/error}
                                                        |  state to 'result'/'error')           v
                                                        +---------------------------------------+
                                                        | (Optional: if maxSteps, useChat auto-resubmits to server with tool result)
Enter fullscreen mode Exit fullscreen mode

[FIGURE 4: Diagram showing client-side onToolCall flow: Server streams 'tool-call' -> Client useChat -> onToolCall executes -> Result updates ToolInvocationUIPart -> (Optional) Resubmit to server]

onToolCall Receives: { toolCall: ToolCall } with toolCallId, toolName, args.
onToolCall Must Return: Promise<{ toolCallId: string; result?: any; error?: string; }> (must include original toolCallId).

SDK Action After onToolCall Completes:

  1. SDK takes returned result/error.
  2. Updates ToolInvocationUIPart in UIMessage (state: 'result' or 'error').
  3. If maxSteps in useChat allows and all pending tools resolved, useChat auto-resubmits messages (with client tool results) to server. Server/LLM continues.

4.1 Browser APIs (geolocation example)

onToolCall allows awaiting async browser APIs like navigator.geolocation.

4.2 UX patterns for confirmation dialogs (using addToolResult for manual submission)

For tools needing user interaction after AI request (e.g., "AI wants to book flight. [Confirm] [Cancel]").

  1. LLM streams 'tool-call' for, e.g., requestConfirmation.
  2. UI renders ToolInvocationUIPart (state: 'call') with Confirm/Cancel buttons. onToolCall for this tool might do nothing or just log.
  3. User clicks button. Handler calls addToolResult({ toolCallId, result: "User confirmed." }).
  4. addToolResult updates ToolInvocationUIPart state. If maxSteps allows, useChat auto-resubmits.

Take-aways / Migration Checklist Bullets

  • Use useChat's onToolCall for browser-executed tools.
  • onToolCall receives { toolCall: ToolCall }, returns Promise<{ toolCallId, result?, error? }>.
  • Return original toolCallId.
  • SDK updates ToolInvocationUIPart. If maxSteps allows, auto-resubmits to server.
  • For UI-driven confirmations, render 'call' state, use addToolResult() from button handlers.

5. Error Propagation & Recovery in Tool Calls

Tool calls, like any external interaction, can fail. Vercel AI SDK v5 provides mechanisms for these errors to propagate through the system and offers strategies for recovery, ensuring your application can handle hiccups gracefully.

Why this matters?
Tool calls can fail (invalid LLM args, tool execution error, tool not found). Robust apps need to catch, display, and offer recovery.

How it’s solved in v5? (Step-by-step, Code, Diagrams)
v5 surfaces tool errors via specific error types and ToolInvocationUIPart.state.

  1. Schema Validation Errors (InvalidToolArgumentsError):

    • LLM args don't match Zod schema in LanguageModelV2FunctionTool.parameters.
    • SDK (server-side) auto-validates before execute. If fails, error like InvalidToolArgumentsError thrown.
    • Error streamed to client as 'tool-error' UIMessageStreamPart (with toolCallId, toolName, errorMessage).
    • ToolInvocationUIPart on client updates to state: 'error', errorMessage populated.
  2. Tool Execution Errors (ToolExecutionError):

    • Server-side execute throws unhandled exception (e.g., external API down). SDK catches, often wraps as ToolExecutionError. Streamed as 'tool-error'. ToolInvocationUIPart updates to state: 'error'.
    • Client-side onToolCall throws or returns { toolCallId, error: "..." }. useChat updates ToolInvocationUIPart to state: 'error'.
  3. Tool Not Found Errors (NoSuchToolError):

    • LLM calls tool not defined in streamText's tools or not handled by client onToolCall. SDK recognizes, results in NoSuchToolError. Streamed as 'tool-error' or general stream error.
  4. SDK Error Handling & Repair (Experimental - ToolCallRepairError):

    • V4 had experimental_repairToolCall to fix invalid tool calls. If this persists/enhanced in v5 and repair fails, ToolCallRepairError might occur.
    Error Points in Tool Call Lifecycle:
    
    1. LLM -> Args -> [SDK: Zod Validation] --(FAIL)--> InvalidToolArgumentsError
                                | (PASS)
                                v
    2. SDK -> Tool.execute() -> [Tool Logic] --(FAIL)--> ToolExecutionError (or custom error in execute)
                                | (PASS)
                                v
    3. Tool -> Result
    
    (If LLM calls non-existent tool --> NoSuchToolError)
    
    All these errors are typically streamed to client as 'tool-error' UIMessageStreamPart,
    updating the ToolInvocationUIPart.state to 'error'.
    

    [FIGURE 5: Flowchart showing different points where tool errors can occur and how they propagate to the UI]

Recovery Strategies:

  • Retry with reload(): useChat().reload() resends last user message. LLM might try tool again (maybe with corrected args), choose different tool, or answer without tool.
  • AI Self-Correction (multi-step): If ToolInvocationUIPart error (with message) is sent back to LLM, sophisticated LLM might understand error and retry tool with corrected args or try alternative.
  • Clear UI Feedback: Crucial. UI must display ToolInvocationUIPart errors from errorMessage. Helps user rephrase or retry.

Take-aways / Migration Checklist Bullets

  • Anticipate errors: LLM arg errors, tool execution failures, tool not found.
  • v5 surfaces these via specific error types and ToolInvocationUIPart.state to 'error' with errorMessage.
  • Errors typically streamed via 'tool-error' UIMessageStreamParts.
  • Implement UI rendering for ToolInvocationUIPart error state.
  • Use useChat().reload() for user-driven retries.
  • Informative error messages to LLM can enable AI self-correction.

6. Security – Validating Args & Results (Crucial Emphasis)

When integrating tools with LLMs, security is paramount. Always validate arguments provided by the LLM before tool execution and sanitize results from tools before displaying them or feeding them back to the LLM. Vercel AI SDK v5's emphasis on Zod schemas for tool parameters is a key enabler for input validation.

Why this matters?
CRUCIAL EMPHASIS. LLMs generate text; unchecked arguments or unsanitized tool results can lead to vulnerabilities (SQL injection, XSS, DoS via large payloads). Prompt injection is a real risk.

How it’s solved in v5? (Practices & SDK Features)
v5's LanguageModelV2FunctionTool encourages practices to mitigate risks.

6.1 Validating LLM-Generated Arguments (Input Validation for Tools)

  • Golden Rule: Always validate LLM-generated arguments before tool execution.
  • Zod Schemas:
    • For server tools with execute, SDK auto-validates LLM args against Zod schema in LanguageModelV2FunctionTool.parameters. If fails, error raised, execute not called with bad data. Leverage fully with strict schemas.
  • Manual Validation for Client-Side Tools (onToolCall):
    • In client onToolCall, if tool critical or args complex, manually validate toolCall.args against Zod schema.
  • Preventing Injections: Strict schema validation is one defense against prompt injection leading to malicious args (e.g., SQL injection). Secure coding in tool (e.g., parameterized queries) is another.

6.2 Sanitizing/Validating Tool Results (Output Validation from Tools)

  • Golden Rule: If tool results from untrusted sources, validate/sanitize before displaying in UI (prevent XSS) or sending back to LLM (prevent input poisoning/DoS).
  • Examples:

    • XSS: Tool returns title "<img src=x onerror=alert(1)>". Render as plain text or use HTML sanitizer (DOMPurify).
    • LLM Input Poisoning/DoS: Tool returns 10MB random chars. Validate structure/size, truncate, sanitize control chars before sending to LLM.
    +----------+  args   +---------------------+  result  +---------------------+
    |   LLM    |-------->| Tool Execution      |<---------| External API/Data   |
    +----------+         | (Your Code)         |          +---------------------+
        ^                |                     |                   ^
        | (result_clean) | - INPUT VALIDATION  |                   | (potentially unsafe data)
        |                |   (Zod on args)     |                   |
        |                | - OUTPUT VALIDATION/|                   |
        +----------------|   SANITIZATION      |-------------------+
                         |   (on result)       |
                         +---------------------+
    

    [FIGURE 6: Diagram illustrating the two-way validation: LLM args -> Tool (Input Validation) and Tool result -> UI/LLM (Output Validation/Sanitization)]

Take-aways / Migration Checklist Bullets

  • SECURITY IS PARAMOUNT for tools.
  • ALWAYS validate LLM-generated arguments. Use Zod for server tools. Manually validate in client onToolCall.
  • ALWAYS validate/sanitize tool results before UI render (XSS) or sending to LLM.
  • Be careful with tools executing code, DB queries, file system access. Use least privilege.

7. Composing Multi-Step Chains (Server-Side and Client-Involved)

Vercel AI SDK v5 excels at facilitating multi-step conversational flows where the AI might call a tool, get a result, then call another tool or generate text, all within a single user turn. This is powered by the maxSteps option on both the server (streamText) and client (useChat), along with the structured streaming of tool interactions.

Why this matters?
Conversational agents often need sequential actions/reasoning (e.g., "Weather in French capital? Find bistro there."). v5 simplifies orchestrating these chains.

How it’s solved in v5? (Step-by-step, Code, Diagrams)

7.1 maxSteps in streamText (Server-Side Multi-Step)

  • Scenario: Call streamText with server tools (with execute) and maxSteps > 1 (e.g., maxSteps: 5).
  • SDK Orchestration:
    1. User msg -> streamText.
    2. LLM (Turn 1) -> calls ServerToolA.
    3. SDK: Validates args, executes ServerToolA, gets resultA.
    4. SDK: Constructs tool ModelMessage with resultA, auto-sends back to LLM (same streamText op).
    5. LLM (Turn 2): Processes resultA. Might gen final text, or call ServerToolB.
    6. Loop continues until final text or maxSteps.
  • Streaming: All intermediate 'tool-call', 'tool-result' parts streamed to client UI.

    Server-Side Multi-Step (maxSteps > 1):
    User Prompt --> [streamText Call] --> LLM
                                          | (decides ToolA)
                                          v
                       SDK: Executes ToolA --> ResultA
                                          |
                                          v (ResultA fed back)
                                         LLM
                                          | (decides ToolB or Final Text)
                                          v
                       SDK: Executes ToolB --> ResultB (if ToolB called)
                                          |
                                          v (ResultB fed back)
                                         LLM --> Final Text Response
    (Each tool call and result is streamed to client as UIMessageStreamParts)
    

    [FIGURE 7: Diagram of server-side multi-step flow with maxSteps: LLM -> ToolA -> ResultA -> LLM -> ToolB -> ResultB -> LLM -> Final Text. All streamed to client.]

7.2 maxSteps in useChat (Client-Involved Multi-Step)

  • useChat client option. Controls rounds of (User msg -> Server/LLM -> Client tool -> Server/LLM -> Client response/tool).
  • Flow with Client Tools (onToolCall):
    1. User msg -> useChat POSTs to server.
    2. Server (streamText): LLM calls ClientToolX. Server streams 'tool-call' for ClientToolX.
    3. Client (useChat): onToolCall for ClientToolX runs, returns resultX. ToolInvocationUIPart updates.
    4. Client (useChat Auto Resubmit): If maxSteps allows & all tools resolved, useChat auto-POSTs updated messages (with resultX as tool role msg) back to server.
    5. Server (Again): streamText runs with resultX in history.
    6. LLM: Processes resultX, generates final text or another tool call.
    7. Loop continues until final text or useChat maxSteps.

7.3 StepStartUIPart for UI Delineation

  • streamText().toUIMessageStreamResponse() might auto-insert 'step-start' UIMessageStreamParts into v5 UI Message Stream during multi-step tool calls.
  • Purpose: Marker parts { type: 'step-start'; }.
  • Function: Indicate new logical "step" in AI's process.
  • UI Rendering: Render as visual separator (e.g., <hr>, "Step 2:"). Helps user follow complex AI actions.

Take-aways / Migration Checklist Bullets

  • Use maxSteps in server streamText for automated multi-step server tool chains.
  • Use maxSteps in client useChat for auto-resubmission of client tool results.
  • SDK streams intermediate tool calls/results as ToolInvocationUIParts.
  • Look for StepStartUIPart to visually delineate steps.

8. Showcase: Calendar-booking Wizard (Conceptual Walkthrough)

To illustrate the power of v5's structured tool handling and multi-step chains, let's imagine building a conceptual calendar-booking assistant. This example will highlight how server-side tools, client-side interactions, and UI updates come together.

Why this matters?
Booking meetings involves multiple steps (check availability, present options, confirm). Hard with simple request-response LLM. v5 helps AI guide user through this.

The Scenario: User: "Book a 1-hour meeting with Jane for next Tuesday afternoon."

Conceptual Tools:

  1. checkAvailability(person, dateRange, durationHours) (Server-side, execute fn): Queries calendar, returns available slots.
  2. displaySlotOptions(slots, prompt) (Client-side, UI render + addToolResult): AI calls to tell UI to show slots as clickable options. User click provides result.
  3. confirmBooking(person, selectedSlot, durationHours) (Server-side, execute fn): Books meeting.

Walkthrough (v5 Features Highlighted, maxSteps: 5 assumed):

  1. User: "Book..." -> Client useChat POSTs.
  2. AI (Turn 1 - Server): LLM -> calls checkAvailability. SDK validates, executes. Client UI: Shows ToolInvocationUIPart "Checking Jane's availability...".
  3. Server (checkAvailability): Returns result: ["Tue 2PM", "Tue 3PM"]. SDK auto-sends to LLM. Client UI: ToolInvocationUIPart updates "Found 2 slots.".
  4. AI (Turn 2 - Server): LLM -> calls displaySlotOptions(slots: ["Tue 2PM", "Tue 3PM"], prompt: "Select time:"). No server execute, so streams 'tool-call' to client. Client UI: New ToolInvocationUIPart "Select time:" with buttons [Tue 2PM] [Tue 3PM].

    Chat UI - AI Turn:
    -------------------------------------------------
    AI: Okay, I found these times for Jane next Tuesday. Which one works for you?
        [Tool: displaySlotOptions - awaiting user input]
        [ Button: Tuesday 2:00 PM ]
        [ Button: Tuesday 3:00 PM ]
        [ Button: Tuesday 4:00 PM ]
    -------------------------------------------------
    

    [FIGURE 8: Mockup of UI showing these clickable slot buttons within a ToolInvocationUIPart]

  5. User (Client): Clicks "Tue 3PM". Button handler calls addToolResult({ toolCallId_displaySlots, result: "Tue 3PM" }). ToolInvocationUIPart updates. useChat auto-POSTs back to server.

  6. AI (Turn 3 - Server): LLM gets "Tue 3PM". Calls confirmBooking. SDK validates, executes. Client UI: New ToolInvocationUIPart "Confirming meeting...".

  7. Server (confirmBooking): Returns result: "Booked!". SDK auto-sends to LLM. Client UI: ToolInvocationUIPart updates "Booking confirmed!".

  8. AI (Turn 4 - Server): LLM gets success. Generates final text: "Great! Booked for Tue 3PM." Server streams 'text' parts. Client UI: Final assistant message appears.

v5 Features Highlighted:

  • Structured ToolInvocationUIPart for each tool step.
  • Mix of server (execute) and client (onToolCall/addToolResult) tools.
  • maxSteps (server & client) for automated chaining.
  • addToolResult for UI-driven tool completion.
  • Clear UI feedback via ToolInvocationUIPart states.

Take-aways / Migration Checklist Bullets

  • v5 tool features enable complex multi-step agents.
  • Combine server and client tools. maxSteps automates chain.
  • ToolInvocationUIPart key for UI state. Design tools/prompts for workflow.

9. Key Lessons Learned (Summary of Tool Usage in v5)

Wrapping up our deep dive into Vercel AI SDK v5's tool capabilities, let's consolidate the main advantages and best practices that emerge from this new, more structured approach.

Why this matters?
v5 brings organization and power to complex AI tool interactions. Understanding core principles helps build robust, maintainable, user-friendly tool-using AI apps.

Actionable Takeaways & Best Practices:

  1. Structured is Better: Shift to v5's ToolInvocationUIPart and server LanguageModelV2FunctionTool for robust, typed, stateful tool handling.
  2. Schema is Your Friend (Embrace Zod): Always use Zod schemas for LanguageModelV2FunctionTool.parameters for clear arg definition, auto server-side validation, and type safety.
  3. Clear Client vs. Server Execution Strategy: Decide where tools run. Server execute for backend resources/security. Client onToolCall/addToolResult for browser APIs/UI interaction.
  4. Design for Multi-Step Interactions: Leverage maxSteps (server streamText, client useChat) for chained tool calls, result processing, continued reasoning.
  5. Build Rich, Informative UIs for Tools: Use ToolInvocationUIPart states ('partial-call', 'call', 'result', 'error') for clear visual feedback.
  6. Security First, Always: Validate LLM args (Zod helps). Sanitize tool results before UI display (XSS) and before sending to LLM.

Teasing Post 9: Persisting Rich Chat Histories

With tool calls as first-class citizens in UIMessages, how do we reliably save and restore these intricate conversations?

Post 9: "Persisting Rich UIMessage Histories: The v5 'Persist Once, Render Anywhere' Model." We'll dive into database schema strategies, best practices for saving UIMessage arrays with parts/metadata, and v5's high-fidelity restoration.

Top comments (0)