Forem: Johnny Z

Building Autonomous Agent Coding Harness

Johnny Z — Thu, 16 Apr 2026 05:32:27 +0000

This is a personal experiment in autonomous coding source, built with the Claude Agent SDK. It takes a spec (markdown or text) and builds a full-stack application using three specialized agents, as described in this Anthropic post.

Requirement

Build a full-stack application (Next.js + .NET) weather chat.
I have manually created an "ideal target solution" reference implementation

Why This Project Is Hard

While building a weather chat app sounds straightforward, this implementation intentionally introduces architectural challenges that test whether coding agents can work with unfamiliar, cutting-edge libraries — or whether they fall back to well-known patterns:

Backend (Agent Construction & Local LLM Integration): The .NET API utilizes the Microsoft Agent Framework and exposes the agent via the relatively new AG-UI protocol. A key challenge lies in the underlying Microsoft.Extensions.AI pipeline: coding agents must understand how to connect a local Ollama server, correctly register it as an IChatClient, configure the agent with tools, and seamlessly wire everything into the .NET dependency injection container.
Schema-Driven UI Rendering (The Catalyst): To achieve the visual "Generative UI" component, the application utilizes @vercel-labs/json-render. This introduces a profound layer of abstraction. Rather than passing generic data to props, coding agents must grasp an indirect, specification-based rendering model. The frontend strictly expects tool outputs to be converted into a structured UI spec tree (e.g., Container -> WeatherCard -> ForecastGrid), mapped dynamically to concrete React components via a component catalog.
Full-Stack Tool Coupling & Protocol Bridging: Driven by the strict schema requirements of the UI, tool execution becomes a highly coupled, full-stack concern. The backend emits raw AG-UI Server-Sent Events (SSE), which the Next.js server must manually parse and map to the Vercel AI SDK 'UIMessage' types. Crucially, because the AG-UI protocol exposes tool execution results directly to the client stream as JSON payloads, coding agents must explicitly co-design the C# backend tool call result types to satisfy the frontend's schema-driven expectations.
Custom Generative UI Transport & State: Because these tightly-coupled tool outputs stream directly to the client, standard AI SDK hooks aren't enough out-of-the-box. The frontend requires configuring useChat with a custom DefaultChatTransport. Agents must design the UI interface such that the incoming JSON payloads seamlessly inject complex parts into the ChatMessage state. They must deeply understand multi-part message trees—accurately inspecting part.type and part.state === "output-available" to interrupt typical text rendering and conditionally mount the generated JSON UI spec.

First Round Result

Feature requirement file only — intentionally instructed to use simulated/mock weather data to reduce complexity
Result output

Gap Analysis

Dimension	Reference (Target)	Generated (Round 1)
.NET version	.NET 10	.NET 8
Backend framework	Microsoft Agent Framework (`Microsoft.Agents.AI`)	Plain ASP.NET Core MVC
Streaming protocol	AG-UI via SSE	Standard JSON REST
LLM integration	Ollama via `OllamaSharp` + `IChatClient` DI	None — rule-based string matching
Frontend AI SDK	`@ai-sdk/react` `useChat` + `DefaultChatTransport`	Raw `fetch()` + `useState`
UI rendering	`@json-render` (schema-driven spec tree)	Direct hardcoded React components

Every architectural constraint specified in the feature requirements — AG-UI, Microsoft Agent Framework, Ollama, json-render — was ignored. The agents built a conventional CRUD-style app instead.

What it got right: The app is functional end-to-end with good visual design (glassmorphic cards, dynamic backgrounds, custom SVG icons), responsive layout, and clean code structure. About 7 of 16 features work partially or fully.

What it missed: No SSE streaming, no LLM tool calling (just regex location extraction), no schema-driven UI rendering, no AI SDK hooks. The ai npm package was even installed but never imported.

Takeaway: Given only a feature spec, coding agents gravitate toward familiar patterns from training data. The novel integration requirements (AG-UI, json-render, Agent Framework) — which are the architecturally interesting parts — were completely bypassed in favor of well-known alternatives.

Second Round Result

Enhanced feature requirements with explicit architectural instructions — specifying MapAGUI, ChatClientAgent, defineCatalog/defineRegistry, useChat with transport, etc.
After round 1's results, custom skills created for json-render and Microsoft Agent Framework, and installed official Vercel Next.js and AI SDK skills to give agents better guidance
Result output

Gap Analysis

Dimension	Reference (Target)	Generated (Round 2)
.NET version	.NET 10	.NET 10
Backend framework	Microsoft Agent Framework (`MapAGUI`)	Packages installed but not used — plain REST API
Streaming protocol	AG-UI via SSE	Standard JSON REST
LLM integration	Ollama via `OllamaSharp` + `IChatClient`	Package installed, only checks if Ollama is running — never calls it
Frontend AI SDK	`@ai-sdk/react` `useChat` + `DefaultChatTransport`	Package installed but uses raw `fetch()`
UI rendering	`@json-render/react` (real package)	Fake shim — hand-written `json-render-compat.ts` reimplements `defineCatalog`/`defineRegistry` as simple wrappers

Progress from round 1: The agents now acknowledge the required technologies — correct .NET version, right NuGet packages installed, catalog/registry file structure present. The feature requirements with explicit API names clearly helped.

What's still wrong: The acknowledgment is superficial. The agents installed Microsoft.Agents.AI and OllamaSharp but never called MapAGUI() or created a ChatClientAgent. Instead of installing @json-render/react, they wrote a 40-line compatibility shim that mimics the API surface but does nothing — the <Renderer> component from json-render is never used. The backend is still hardcoded pattern matching over 6 cities with no LLM.

Takeaway: Adding skills and explicit architectural instructions moved agents from "completely ignore" to "install the packages and create the right file names." But the actual wiring — the hard part — was still substituted with familiar patterns. The agents created a cargo cult of the architecture: the right shape, with none of the substance.

Conclusion

The progression across rounds tells a clear story. Round 1 completely ignored the architectural requirements. Round 2 acknowledged them superficially — installing the right packages, creating files with the right names — but never actually wired anything up. The hand-written json-render shim and the unused NuGet packages are the most telling evidence.

None of this is entirely surprising. These are integration challenges that even experienced engineers would need to research and iterate on — connecting unfamiliar frameworks across a full-stack boundary is genuinely hard. The deeper issue is that even with upfront planning enforced (preventing agents from "one-shotting" the app), intrinsic technical challenges in the implementation details cause coding agents to silently fall back to what they know.

What these experiments suggest is that producing quality implementations with coding agents requires highly detailed, step-by-step plans — not just feature specs or architectural diagrams, but concrete wiring instructions that leave little room for substitution. Simply adding skills as supplementary context does not bridge the gap when the core integration patterns are unfamiliar to the model.

Next Steps

The experiments above point to a clear gap: the planning agent produces plans that are too high-level for the coding agent to follow faithfully when unfamiliar technologies are involved. The next iteration of the harness will focus on two changes:

Interactive upfront planning: Rather than generating a plan in one shot and handing it off, the planning agent will produce a detailed, step-by-step implementation plan that can be reviewed and refined before any code is written. Each step should be concrete enough that the coding agent knows exactly which API to call, which package to import, and how to wire it — leaving no room for silent substitution.
Step-by-step execution with verification: Instead of letting the coding agent execute the entire plan autonomously, the harness will execute one step at a time, verifying the output of each step (builds, tests, correct imports) before proceeding to the next. This catches drift early — if the agent installs a package but doesn't use it, or writes a shim instead of using the real library, the verification step surfaces the problem immediately rather than letting it compound.

This follows the approach outlined in the autonomous coding quickstart, adapted to the multi-agent harness architecture described in this project.**

Please feel free to reach out on twitter @roamingcode

Building End-to-End Local AI Agents with Microsoft Agent Framework and AG-UI

Johnny Z — Sun, 23 Nov 2025 06:06:37 +0000

The Microsoft Agent Framework significantly elevates AI agent orchestration. A standout feature is its implementation of the Agent–User Interaction (AG-UI) Protocol, which standardizes how AI agents connect to user-facing applications.

Below is a quick-start guide to connecting these components into a fully end-to-end solution using local Ollama models.

1. Service Configuration

First, configure the dependency injection container. The ChatClientAgent is based on the IChatClient abstraction from Microsoft.Extensions.AI.

Note: We register the agent as a Keyed Service to allow for multiple distinct agents within the same host.

var builder = WebApplication.CreateBuilder(args);

// 1. Register the Ollama Client
builder.Services.AddTransient<IChatClient>(provider =>
{
    var factory = provider.GetRequiredService<IHttpClientFactory>();
    // Ensure you use a wrapper that handles standard formatting 
    // (see Implementation Note below)
    return new OllamaApiClient(factory.CreateClient("OllamaClient"), "phi4");
});

// 2. Register the AI Agent
builder.Services.AddKeyedTransient<ChatClientAgent>(
    "local-ollama-agent",
    (provider, key) =>
    {
        var options = new ChatClientAgentOptions
        {
            Id = key.ToString(),
            Name = "Local Assistant",
            Description = "An AI agent running on local Ollama.",
            ChatOptions = new ChatOptions { Temperature = 0 }
        };

        return provider.GetRequiredService<IChatClient>()
            .CreateAIAgent(options, provider.GetRequiredService<ILoggerFactory>());
    });

2. Expose the AG-UI Endpoint

Once configured, map the agent instance directly to an HTTP route. This exposes the agent via the standard AG-UI protocol.

var agent = app.Services.GetRequiredKeyedService<ChatClientAgent>("local-ollama-agent");

// Expose the agent on the root path
app.MapAGUI("/", agent);

3. Connect a Client

To consume the agent programmatically, the framework provides the AGUIChatClient. This allows .NET applications to communicate with your agent over HTTP seamlessly.

var chatClient = new AGUIChatClient(
    httpClient,
    "http://localhost:5000",
    provider.GetRequiredService<ILoggerFactory>());

var clientAgent = chatClient.CreateAIAgent(
    name: "local-client",
    description: "AG-UI Client Agent");

Frontend Integration: The AG-UI Protocol also offers ready-made libraries for TypeScript and Python, allowing you to spin up frontend interfaces in minutes.

Implementation Note: Protocol Compliance

The AG-UI protocol mandates that all messages contain a messageId property. Native Ollama responses do not currently provide this. To ensure compatibility, I created a simple wrapper class to inject the required IDs into the Ollama response stream.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Model context protocol server prompts with microsoft semantic kernel

Johnny Z — Wed, 23 Apr 2025 22:34:37 +0000

This post focuses on implementing server prompts, a key feature of the Model Context Protocol (MCP) designed for reusable template definitions. We will explore how to implement these server prompts using both the MCP C# SDK and Semantic Kernel for enhanced templating capabilities. Further details on MCP server prompts can be found in the MCP documentation.

MCP Server Prompts via MCP C# SDK Attributes

MCP C# SDK allows for defining prompts through attributes. This method offers a direct implementation without requiring Semantic Kernel for basic string manipulation as the following example shows.


[McpServerPromptType]
internal sealed class StringFormatPrompt
{
    private readonly string _format;
    private readonly ILogger _logger;

    public StringFormatPrompt(ILogger<StringFormatPrompt> logger)
    {
        _logger = logger;
        _format = "Tell a joke about {0}.";
    }

    [McpServerPrompt(Name = "Joke"), Description("Tell a joke about a topic.")]
    public IReadOnlyCollection<ChatMessage> Format([Description("The topic of the joke.")] string topic)
    {
        _logger.LogInformation("Generating prompt with topic: {Topic}", topic);
        var content = string.Format(CultureInfo.InvariantCulture, _format, topic);
        return [
            new (ChatRole.User, content)
        ];
    }
 }    

 // Register for the prompt
 var serverBuilder = builder.Services.AddMcpServer()
    .WithHttpTransport()
    .WithPrompts<StringFormatPrompt>();

Semantic Kernel Templates as MCP Server Prompts

Semantic Kernel provides templating capabilities through JSON/YAML, Handlebars, and Liquid formats, along with plugin support. These templates can be exposed as MCP prompts using the MCP C# SDK.

Prompt Templates in Semantic Kernel
Semantic Kernel templates are configured with PromptTemplateConfig, created by IPromptTemplateFactory implementations, and can be easily rendered with input variables for dynamic prompt generation.

var templateConfig = new PromptTemplateConfig("Tell a joke about {{$topic}}.");
IPromptTemplateFactory templateFactory = new KernelPromptTemplateFactory();
var template = templateFactory.Create(templateConfig);
var text = await template.RenderAsync(kernel,
    new KernelArguments
    {
        { "topic", "cats" }
    });

Expose prompts as McpServerPrompt
McpServerPrompt is the abstract base class that represents an MCP prompt we can implement.


internal sealed class TemplateServerPrompt : McpServerPrompt
{
    public TemplateServerPrompt(PromptTemplateConfig promptTemplateConfig, IPromptTemplateFactory? promptTemplateFactory, ILoggerFactory? loggerFactory)
    {
        promptTemplateFactory ??= new KernelPromptTemplateFactory(loggerFactory ?? NullLoggerFactory.Instance);
        _template = promptTemplateFactory.Create(promptTemplateConfig);

        // MCP prompt
        ProtocolPrompt = new()
        {
            Name = promptTemplateConfig.Name ?? _template.GetType().Name,
            Description = promptTemplateConfig.Description,
            Arguments = promptTemplateConfig.InputVariables
                .Select(inputVariable =>
                    new PromptArgument
                    {
                        Name = inputVariable.Name,
                        Description = inputVariable.Description,
                        Required = inputVariable.IsRequired
                    })
                .ToList(),
        };
    }

    public override async ValueTask<GetPromptResult> GetAsync(RequestContext<GetPromptRequestParams> request, CancellationToken cancellationToken = default)
    {
        KernelArguments? arguments = default;

        var dictionary = request.Params?.Arguments;
        if (dictionary is not null)
        {
            arguments = new ();
            foreach (var (key, value) in dictionary)
            {
                arguments[key] = value;
            }
        }

        var kernel = request.Services?.GetService<Kernel>() ?? new Kernel();
        var text = await _template.RenderAsync(kernel, arguments, cancellationToken);

        return 
            new GetPromptResult
            {
                Messages = [
                    new PromptMessage
                    {
                        Content = new Content { Text = text }
                    } 
            ]
        };
    }
}

// Register for the prompt with DI and MCP server
// builder.Services.AddSingleton<TemplateAIFunction>(...)
var serverBuilder = builder.Services.AddMcpServer()
    .WithHttpTransport();
serverBuilder.Services.AddSingleton<McpServerPrompt>(provider => 
    provider.GetRequiredService<TemplateServerPrompt>());

Exposing AIFunction as McpServerPrompt
The McpServerPrompt class provides a Create method to expose a Microsoft.Extensions.AI.AIFunction as an MCP server prompt.


internal sealed class TemplateAIFunction : AIFunction 
{
    //...

    protected override async ValueTask<object?> InvokeCoreAsync(AIFunctionArguments arguments, CancellationToken cancellationToken)
    {
        KernelArguments kernelArguments = [];

        foreach (var argument in arguments)
        {
            kernelArguments[argument.Key] = argument.Value;
        }

        var kernel = arguments.Services?.GetService<Kernel>() ?? new Kernel();
        var text = await _template.RenderAsync(kernel, kernelArguments, cancellationToken);
        return text;
    }
}

// Register for the prompt with DI and MCP server
// builder.Services.AddSingleton<TemplateAIFunction>(...)
var serverBuilder = builder.Services.AddMcpServer()
    .WithHttpTransport();
serverBuilder.Services.AddSingleton<McpServerPrompt>(provider => 
    McpServerPrompt.Create(provider.GetRequiredService<TemplateServerPrompt>()));

Complete sample code

Please feel free to reach out on twitter @roamingcode

AWS Bedrock anthropic claude tool call integration with microsoft semantic kernel

Johnny Z — Mon, 14 Apr 2025 23:34:57 +0000

As of April 2025, the official Microsoft Semantic Kernel connector for Amazon Microsoft.SemanticKernel.Connectors.Amazon does not natively support tool/function calls. Apparently, Semantic Kernel is shifting its approach towards an LLM abstraction layer based on Microsoft.Extensions.AI, aiming for a more unified and extensible architecture. Currently, only OpenAI and Ollama implementations are available within this new abstraction. It is anticipated that an implementation for AWS Bedrock Anthropic Claude based on Microsoft.Extensions.AI will become available in the future. Therefore, in the interim, I implemented a custom solution. The approach leverages the existing IChatClient interface, making the implementation relatively straightforward. Since function calls are supported by this interface, the solution involves implementing it on top of the AWS Bedrock Runtime SDK.

Implement IChatClient with AWS Bedrock Runtime

The IChatClient interface essentially contains two methods: one for standard chat responses and another for streamed responses. The implementation involves mapping these two methods to the IAmazonBedrockRuntime.ConverseAsync and ConverseStreamAsync methods, as demonstrated in the full implementation of the AnthropicChatClient here.

Setting up Function Calls with Semantic Kernel

Here's how to set up function calls with Semantic Kernel using our custom AnthropicChatClient:

Set up kernel and functions
This step configures the chat completion service with function invocation capabilities and registers it with the Semantic Kernel.

// Set up chat completion service
IChatClient chatClient = ...;
IChatCompletionService chatService =
    chatClient
        .AsBuilder()
        .UseFunctionInvocation() // Enables function call functionality
        .Build()
        .AsChatCompletionService();

// Register the Bedrock chat completion service
var builder = Kernel.CreateBuilder();
builder.Services.AddKeyedSingleton("bedrock", chatService);
// Add plugins/functions
builder.Plugins.AddFromType<MenuPlugin>();
// ...
var kernel = builder.Build();

Use automatically tool calls
This code demonstrates how to use the configured chat completion service to automatically invoke functions based on the user's input.

// Set up bedrock
var runtimeClient = new AmazonBedrockRuntimeClient(RegionEndpoint.APSoutheast2);
IChatClient client = new AnthropicChatClient(runtimeClient, "anthropic.claude-3-5-sonnet-20241022-v2:0");

// Configure the chat client as shown in step 1.
IChatCompletionService chatCompletionService = client
    .AsBuilder()
    .UseFunctionInvocation()
    .Build()
    .AsChatCompletionService();

var chatHistory = new ChatHistory();
chatHistory.AddUserMessage("What is the special soup and its price?");

var promptExecutionSettings = new PromptExecutionSettings
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto(options: new()
    {
        RetainArgumentTypes = true
    }),
    ExtensionData = new Dictionary<string, object>
    {
        { "temperature", 0 }, 
        { "max_tokens_to_sample", 1024 } // Required parameter for Anthropic models
    }
};

var messageContent = await chatCompletionService
    .GetChatMessageContentAsync(chatHistory,  promptExecutionSettings, kernel);
Console.WriteLine(messageContent.Content);

// Expected output : Today's special soup is Clam Chowder and it costs $9.99.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Model context protocol integration with microsoft semantic kernel

Johnny Z — Sat, 05 Apr 2025 05:00:09 +0000

The Model Context Protocol (MCP) aims to standardize connections between AI systems and data sources. This post demonstrates integrating mcp-playwright with Semantic Kernel and phi4-mini (via Ollama) for browser automation.

Setting up the Playwright MCP Server

Install the MCP Playwright package:
```
npm install @playwright/mcp
```

Add a script to package.json:

{
  "scripts": {
    "server": "npx @playwright/mcp --port 8931"
  }
}

Start the server:
```
npm run server
```
This will launch the Playwright MCP server, displaying the port and endpoints in the console.

Running phi4-mini with Ollama for Function Calling

For reliable function calling, phi4-mini:latest (as of March 27, 2025) requires a custom Modelfile.

Create a custom Modelfile: (See example)

Create the model in Ollama:

ollama create phi4-mini:latest -f <path/to/Modelfile>

Implementing the MCP Client in Semantic Kernel

Install the MCP client NuGet package:

dotnet add package ModelContextProtocol --prerelease

Connect to the Playwright MCP server and retrieve tools:

var mcpClient = await McpClientFactory.CreateAsync(
    new McpServerConfig
    {
        Id = "playwright",
        Name = "Playwright",
        TransportType = TransportTypes.Sse,
        Location = "http://localhost:8931"
    });
var tools = await mcpClient.ListToolsAsync();

Configure Semantic Kernel with the MCP tools:

var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOllamaChatCompletion(modelId: "phi4-mini");
kernelBuilder.Plugins.AddFromFunctions(
    pluginName: "playwright",
    functions: tools.Select(x => x.AsKernelFunction()));
var kernel = kernelBuilder.Build();

var executionSettings = new PromptExecutionSettings
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto(
        options: new()
        {
            RetainArgumentTypes = true
        }),
    ExtensionData = new Dictionary<string, object>
    {
        { "temperature", 0 }
    }
};

var result = await kernel.InvokePromptAsync(
    "open browser and navigate to https://www.google.com",
    new KernelArguments(executionSettings));

This code snippet connects to the MCP server, retrieves available tools, and integrates them into Semantic Kernel as functions. The prompt instructs the model to open a browser and navigate to Google, demonstrating the integration.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Azure OpenAI Error Handling in Semantic Kernel

Johnny Z — Wed, 08 Jan 2025 06:27:40 +0000

In real-world systems, it's crucial to handle HTTP errors effectively, especially when interacting with Large Language Models (LLMs) like Azure OpenAI. Rate limit exceeded errors (tokens per minute or requests per minute) always happen at some point, resulting in 429 errors. This blog post explores different approaches to HTTP error handling with semantic kernel and Azure OpenAI.

Default

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key"); // Or DefaultAzureCredential

The default setup for Semantic Kernel with Azure OpenAI by AddAzureOpenAIChatCompletion. This approach offers a built-in retry policy that automatically retries requests up to three times with exponential backoff. Additionally, it can detect specific HTTP headers like 'retry-after' to implement more tailored retries.

HttpClient

var factory = provider.GetRequiredService<IHttpClientFactory>();
var httpClient = factory.CreateClient("auzre:gpt-4o");

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",  // Or DefaultAzureCredential
    httpClient: httpClient);

By configuring an HttpClient instance, you can gain more control over HTTP error handling. Semantic Kernel disables the default retry policy when HttpClient is provided. This allows you to implement custom retry logic using the Microsoft.Extensions.Http.Resilience library. With this approach, you can define the number of retry attempts, timeouts, and how to handle specific error codes like 429 (rate limit exceeded). It is strongly recommended to add retry policies to handle transient errors with HttpClient

services.AddHttpClient("auzre:gpt-4o")
    // 'standard' automatically handle transient errors inlcuding '429'
    .AddStandardResilienceHandler() 
    .Configure(options =>
        {
            // Options for attempts and time out etc
            options.Retry.MaxRetryAttempts = 5;
        });

An important benefit of using HttpClient is that it's not limited to Azure OpenAI. This approach works with other AI connectors like OpenAI as well.

AzureOpenAIClient

var azureOpenAIClient = new AzureOpenAIClient(
    endpoint: new Uri("https://resource-name.openai.azure.com"),
    new ApiKeyCredential("api-key"), // Or DefaultAzureCredential
    new AzureOpenAIClientOptions());

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    azureOpenAIClient);

This approach offers similar functionality to the default setup with the built-in retry policy. In addition, AzureOpenAIClient provides more flexibility from AzureOpenAIClientOptions.

var clientOptions = new AzureOpenAIClientOptions
    {
        Transport = new HttpClientPipelineTransport(httpClient),
        RetryPolicy = new ClientRetryPolicy(maxRetries: 5)
    };

This configuration enables you to combine HTTP retry policies from HttpClient with custom pipeline policy-based retries from the Azure OpenAI SDK.

Recommendations

The default setup might not be suitable for scenarios where you frequently encounter token limit issues.
If you already have AzureOpenAIClient registered and require maximum control, this approach allows you to leverage both HTTP client policies and Azure OpenAI pipeline policy-based retries.

Please feel free to reach out on twitter @roamingcode

Working with multiple language models in Semantic Kernel

Johnny Z — Sat, 28 Dec 2024 07:33:07 +0000

It is common to work with multiple large language models (LLMs) simultaneously, especially when running evaluations or tests. Semantic Kernel supports registering multiple text generation and embedding services using serviceId and modelId.

Register 'serviceId' and 'modelId'

Suppose we have the following setup

 builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4-1106-Preview",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",
    modelId: "gpt-4",
    serviceId: "azure:gpt-4");

builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",
    modelId: "gpt-4o",
    serviceId: "azure:gpt-4o");

 builder.AddOllamaChatCompletion(
    modelId: "phi3",
    endpoint: new Uri("http://localhost:11434"),
    serviceId: "local:phi3");

When execute kernel functions or prompts, 'serviceId' and 'modelId' can be passed into 'PromptExecutionSettings' like the following shows

var promptExecutionSettings  = new PromptExecutionSettings
{
    ServiceId = "local:phi3"
};
// 
// or just modelId 
//    new PromptExecutionSettings
//     {
//         ModelId = "gpt-4o"
//     }
//
var result = await kernel.InvokePromptAsync(
    """
    Answer with the given fact:
    Sky is blue and violets are purple

    input:
    What color is sky?
    """, 
    new KernelArguments(promptExecutionSettings));

When registering chat completion services, if serviceId is provided, Semantic Kernel also registers chat completion services as keyed. With the above registration, the following would work:

var chatCompletionService = kernel.Services
    .GetRequiredKeyedService<IChatCompletionService>("azure:gpt-4o");

IAIService and IAIServiceSelector

All AI-related services, including chat completion and text embedding, implement the IAIService interface, which defines a metadata property. This metadata contains attributes specific to the service implementation. For instance, the AzureOpenAIChatCompletionService includes the deployment name and model name. The default IAIServiceSelector resolves services by serviceId first, and then by modelId to match the IAIService metadata. To gain full control over AI service selection, you can implement a custom IAIServiceSelector and register it as a service with Semantic Kernel.

Sample code here

Please feel free to reach out on twitter @roamingcode

OpenAI chat completion with Json output format

Johnny Z — Fri, 20 Dec 2024 01:31:15 +0000

I can't recall how many times I've tried to convince an LLM to return JSON so that I could perform API calls based on natural language inputs from users. Recently, I discovered that this functionality is natively supported by the Semantic Kernel and Microsoft AI Extension Library. It is officially documented by the OpenAI API here. Note that this feature is only available in the latest large language models from GPT-4o/o1 and later. If you are using Azure OpenAI, ensure you have the supported versions when deploying models.

Chat completion

Semantic Kernel supports JSON output formatting in the ResponseFormat property from PromptExecutionSettings, as shown in the code below:

// Configure Azure/OpenAI and semantic kernel first.

var chatCompletionService = kernel.Services.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory();
history.AddSystemMessage("Extract the event information.");
history.AddUserMessage("Alice and Bob are going to a science fair on Friday.");

var jsonSerializerOptions = new JsonSerializerOptions(JsonSerializerOptions.Default)
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
    UnmappedMemberHandling = JsonUnmappedMemberHandling.Disallow,
};
var responseFormat = CalendarEvent.JsonResponseSchema(jsonSerializerOptions);

var response = await chatCompletionService.GetChatMessageContentAsync(
    history, 
    new AzureOpenAIPromptExecutionSettings
    {
        ResponseFormat = responseFormat // Json schema
    });
// Json result    
var result = JsonSerializer.Deserialize<CalendarEvent>(response.ToString(), jsonSerializerOptions);

Generate Json schema from types

JSON schema can be automatically generated using Microsoft.Extensions.AI.AIJsonUtilities, which is referenced from Semantic Kernel.

public sealed class CalendarEvent
{
    [Description("Name of the event")]
    public required string Name { get; init; }

    [Description("Day of the event")]
    public required string Day { get; init; }

    [Description("List of participants of the event")]
    public required string[] Participants { get; init; }

    public static ChatResponseFormat JsonResponseSchema(JsonSerializerOptions? jsonSerializerOptions = default)
    {
        var inferenceOptions = new AIJsonSchemaCreateOptions
        {
            IncludeSchemaKeyword = false,
            DisallowAdditionalProperties = true,
        };

        // Json schema from types with descriptions on properties
        var jsonElement = AIJsonUtilities.CreateJsonSchema(
            typeof(CalendarEvent),
            description: "Calendar event result",
            serializerOptions: jsonSerializerOptions,
            inferenceOptions: inferenceOptions);

        var kernelJsonSchema = KernelJsonSchema.Parse(jsonElement.GetRawText());
        var jsonSchemaData = BinaryData.FromObjectAsJson(kernelJsonSchema, jsonSerializerOptions);

        return ChatResponseFormat.CreateJsonSchemaFormat(
            nameof(CalendarEvent).ToLowerInvariant(),
            jsonSchemaData,
            jsonSchemaIsStrict: true);
    }
}

Sample code here

Please feel free to reach out on twitter @roamingcode

Lightweight AI Evaluation with SemanticKernel

Johnny Z — Tue, 17 Dec 2024 23:28:50 +0000

For quick and easy evaluation or comparison of AI responses in .NET applications, particularly tests. We can leverage autoevals excellent 'LLM-as-a-Judge' prompts with the help of Semantic Kernel.

Sample code

Note that you need to setup semantic kernel with chat completion first. It is also recommended to set 'Temperature' to 0.

var json = 
    """
    {
        "humor" : {
            "output" : "this maybe funny"
        }
    }
    """;
await foreach (var result in 
        kernel.Run(json, executionSettings: executionSettings))
{
    Console.WriteLine($"[{result.Key}]: result: {result.Value?.Item1}, score: {result.Value?.Item2}");
}

Source

While Microsoft.Extensions.AI.Evaluation is in the making, it currently involves a little too much 'ceremonies' for simple use cases.

Please feel free to reach out on twitter @roamingcode

Kernel Memory with Cosmos DB for NoSQL vector search.

Johnny Z — Tue, 17 Dec 2024 23:18:35 +0000

Officially announced in Microsoft Build 2024, Cosmos DB for NoSQL now support vector search. It also means Kernel Memory can be integrated with Cosmos DB for NoSQL.

Enable Cosmos DB for NoSQL to support vector search.

Implement IMemoryDb for kernel memory with cosmos client

The key is VectorDistance function to match against embeddings.

var sql =
    $"""
    SELECT Top @topN
      x.id, x.tags, x.payload, x.embedding, x.similarityScore
    FROM (
      SELECT
        c.id, c.tags, c.payload, c.embedding,VectorDistance(c.embedding, @embedding) AS similarityScore 
      FROM
        c
    ) AS x
    WHERE x.similarityScore > @similarityScore
    ORDER BY x.similarityScore desc
    """;

var queryDefinition = new QueryDefinition(sql)
    .WithParameter("@topN", limit)
    .WithParameter("@embedding", textEmbedding.Data)
    .WithParameter("@similarityScore", minRelevance);

// Index name as cosmos container name
var feedIterator = _cosmosClient
    .GetDatabase(DatabaseName)
    .GetContainer(index)
    .GetItemQueryIterator<MemoryRecordResult>(queryDefinition)

Sample code here

Please feel free to reach out on twitter @roamingcode

Kernel Memory with Azure OpenAI, Blob storage and AI Search services

Johnny Z — Tue, 17 Dec 2024 23:16:46 +0000

Kernel Memory with Azure OpenAI,
Blob storage and AI Search.

Azure Open AI

On AzureOpenAI resource, deploy gpt-4 chat completion model and text-embedding-ada-002 embedding model

var builder = new KernelMemoryBuilder()
    .WithAzureOpenAITextGeneration(
        new AzureOpenAIConfig
        {
            Auth = AzureOpenAIConfig.AuthTypes.APIKey,
            APIKey = "Your AzureOpenAI api key",
            Endpoint = "https://your-azure-open-ai-resource-name.openai.azure.com",
            Deployment = "gpt-4"
        })
    .WithAzureOpenAITextEmbeddingGeneration(
        new AzureOpenAIConfig
        {
            Auth = AzureOpenAIConfig.AuthTypes.APIKey,
            APIKey = "Your AzureOpenAI api key",
            Endpoint = "https://your-azure-open-ai-resource-name.openai.azure.com",
            Deployment = "text-embedding-ada-002"
        });

Azure storage account

Azure blob storage to store kenerl memory pipeline artifacts

var builder = new KernelMemoryBuilder()
    .WithAzureBlobsDocumentStorage(
        new AzureBlobsConfig
        {
            Account = "your-blob-storage-account",
            Auth = AzureBlobsConfig.AuthTypes.AccountKey,
            AccountKey = "your-blob-account-key",
            Container = "document-ingestion"
        })

Azure AI Search service

Azure AI search service as vector databases

var builder = new KernelMemoryBuilder()
    .WithAzureAISearchMemoryDb(
        new AzureAISearchConfig
        {
            Endpoint = "https://your-search-service-resource-name.search.windows.net",
            Auth = AzureAISearchConfig.AuthTypes.APIKey,
            APIKey = "your search service api key"
        })

Import some document and ask questions

await kernelMemory.ImportDocumentAsync(
    filePath: "resources/earth_book_2019_tagged.pdf",
    documentId: "earth_book_2019",
    index: "books");

var response =
    await kernelMemory.AskAsync(
        "Where is Amazon rainforest on earth?", 
        index: "books");

Note the index name "books", kernel memory automatically creates Azure AI Search index name "books" if it does not exist and "books" folder in the blob container.

Sample code here

Please feel free to reach out on twitter @roamingcode

Kernel Memory document ingestion

Johnny Z — Tue, 17 Dec 2024 23:11:03 +0000

Document ingestion

Benifits of document ingestion asynchronously with Kernel Memory on Azure

Scalability: Easily handle large volumes of documents by distributing the workload across multiple nodes.
Efficiency: Process documents in parallel, reducing the overall time required for ingestion.
Fault Tolerance: Ensure reliability and availability by distributing tasks, so if one node fails, others can take over.
Resource Optimization: Utilize resources more effectively by balancing the load across the system.
Flexibility: Adapt to varying workloads and scale up or down as needed.

Setup distributed pipeline ingestion with Azure Queue Storage

var builder = new KernelMemoryBuilder()
     .WithAzureQueuesOrchestration(
        new AzureQueuesConfig
        {
            Account = "your-blob-storage-account",
            // Or AuzreIdentity
            Auth = AzureQueuesConfig.AuthTypes.AccountKey,
            AccountKey = "your-blob-account-key"
        })

Once queue orchestration is registered, Kernel Memory automatically sets up DistributedPipelineOrchestrator.

Make sure pipeline handler are hosted services.

Add handlers as hosted service to start listen to messages

// Add handlers as hosted services
services.AddDefaultHandlersAsHostedServices();

Import documents asynchronously

Distributed ingestion also makes importing document asynchronous, meaning when ImportDocumentAsync returns, the document ingestion is enqueued to be processed.

await kernelMemory.ImportDocumentAsync(
    filePath: "resources/earth_book_2019_tagged.pdf",
    documentId: "earth_book_2019",
    index: "books");

// Polling for status
var status = await kernelMemory.GetDocumentStatusAsync(documentId: documentId, index: indexName);
if (status is { Completed: true })
{
    Console.WriteLine("Importing memories completed...");
    break;
}

It is also worth noting each of the pipeline step has independant queue/posion queue on Azure Queue Storage.

Sample code here

Please feel free to reach out on twitter @roamingcode