Forem: Loïc Carrère

Agent Skills Explained: What They Are, What They Aren't, and How to Use Them

Loïc Carrère — Sat, 07 Feb 2026 17:35:42 +0000

A comprehensive guide to the open standard that turns AI agents into on-demand specialists

TL;DR

Agent Skills is an open standard for packaging reusable AI agent capabilities as plain Markdown files. A skill is a folder with a SKILL.md file containing metadata and instructions. When an agent activates a skill, it loads the instructions on demand, follows them as an explicit workflow, and produces structured, repeatable output. Skills solve the problem of bloated prompts, inconsistent behavior, and tightly coupled agent logic. They are supported by leading tools including OpenAI Codex, GitHub Copilot, VS Code, Cursor, and LM-Kit.NET.

This article covers:

What Agent Skills actually are (and what they are not)
How they compare to MCP, tool use, system prompts, and subagents
How progressive disclosure works
Why skills are particularly well-suited for local inference
A complete, runnable C# tutorial using LM-Kit.NET
Best practices for writing your own skills

Quick start with LM-Kit.NET

dotnet add package LM-Kit.NET

// 1. Load skills from a folder
var registry = new SkillRegistry();
registry.LoadFromDirectory("./skills");

// 2. Activate a skill
registry.TryParseSlashCommand("/explain", out var skill, out _);

// 3. Inject instructions into the conversation
var activator = new SkillActivator(registry);
string prompt = activator.FormatForInjection(skill, SkillInjectionMode.UserMessage)
    + "\n\n---\n\nUser request: " + userInput;

Full tutorial in Section 9. Full demo on GitHub.

1. The Problem: Why We Need Agent Skills

If you have built an AI agent that does more than one thing, you have probably hit this wall:

Your system prompt keeps growing. Every new capability means more instructions, more examples, more edge cases. A prompt that started at 200 tokens is now 5,000. The agent gets slower, more expensive, and less reliable because it is trying to juggle everything at once.

Behavior is inconsistent. Ask the same agent to review code on Monday and again on Friday, and you might get different formats, different levels of detail, different criteria. There is nothing enforcing a consistent process.

Prompts live in code. The instructions that define your agent's behavior are buried in C# strings, Python f-strings, or YAML configs. Non-developers cannot review or improve them. Version control is awkward. Deploying a small wording change means a code release.

Knowledge is not reusable. You build a great code review workflow for Project A. Project B needs the same thing. You copy-paste the prompt, it drifts, and now you maintain two copies.

These are not edge cases. They are the daily reality of production AI development. Agent Skills were created to solve exactly these problems.

2. What Agent Skills Are

Agent Skills is an open specification for defining modular, reusable AI agent capabilities as self-contained directories. It was developed by Anthropic and introduced publicly on October 16, 2025. On December 18, 2025, the format was published as an open standard for cross-platform portability, open to contributions from the broader ecosystem.

At its core, a skill is a folder containing one required file:

explain/
  SKILL.md

That's it. The SKILL.md file has two parts:

YAML frontmatter with metadata (name, description, version)
Markdown body with the actual instructions

Here is a complete, working skill:

---
name: explain
description: Explains any topic in plain language. Type a word or phrase and get a clear, jargon-free explanation.
metadata:
  version: "1.0"
---

# Plain Language Explainer

You explain topics so anyone can understand them. The user gives you a word,
phrase, or concept. You explain it clearly.

## Output Format

## <Topic>

**In one sentence:** <simple, one-sentence definition>

**How it works:** <2-3 sentences explaining the mechanism or idea using an everyday analogy>

**Why it matters:** <1-2 sentences on why someone should care>

**Example:** <one concrete, real-world example>

## Rules

1. **No jargon.** If you must use a technical term, define it in parentheses.
2. **Use analogies.** Compare unfamiliar concepts to everyday things.
3. **Be concise.** The entire explanation fits on one screen.
4. **Assume zero background knowledge.**
5. Never say "it's complicated" or "it depends." Just explain it.

That is a complete Agent Skill. Save it as explain/SKILL.md, point your agent at it, and the agent will follow these instructions precisely whenever the skill is activated.

Optional Resources

For more complex skills, the directory can include additional folders:

code-review/
  SKILL.md                # Required: instructions + metadata
  scripts/                # Optional: executable code
    lint-check.sh
  references/             # Optional: documentation
    coding-standards.md
  assets/                 # Optional: templates, data files
    review-template.json
  examples/               # Optional: sample inputs/outputs
    sample-review.md

These resources are loaded lazily: the agent only reads them when it actually needs them, not at startup.

YAML Frontmatter Reference

The Agent Skills specification defines these frontmatter fields:

Field	Required	Constraints
`name`	Yes	1-64 chars. Lowercase letters, numbers, and hyphens only. Must match the parent directory name. No consecutive hyphens (`--`). Must not start or end with a hyphen.
`description`	Yes	1-1024 chars. Describes what the skill does and when to use it. Should include keywords that help agents match tasks.
`license`	No	License name or reference to a bundled license file.
`compatibility`	No	Max 500 chars. Environment requirements (intended product, system packages, network access).
`metadata`	No	Arbitrary key-value mapping. Use this for `version`, `author`, `tags`, or any custom properties.
`allowed-tools`	No	Space-delimited list of pre-approved tools the skill may use. (Experimental; support varies by implementation.)

Note that version is not a top-level spec field. Use the metadata map for it:

metadata:
  author: your-org
  version: "1.0"

Some implementations (including LM-Kit.NET) also accept version as a top-level convenience field, but for maximum portability across tools, place it inside metadata.

3. What Agent Skills Are NOT

This is just as important as understanding what they are. Misunderstanding the boundaries leads to poor design decisions.

Skills are not tools. A tool (MCP tool, function call, API endpoint) is a deterministic action: call it with inputs, get a structured output. A skill is a set of instructions interpreted by an LLM. Skills describe how to do something. Tools do something. They are complementary layers. As the Goose team at Block put it: skills describe the workflow, while MCP provides the runner.

Skills are not prompts. A prompt is ephemeral, reactive, and typically embedded in code. A skill is a persistent, portable, version-controlled artifact. Skills load dynamically based on context. Prompts are always present.

Skills are not agents. An agent is an execution runtime with its own tools, memory, and decision loop. A skill is a knowledge module that any agent can load. Think of skills as "apps" and agents as the "operating system."

Skills are not deterministic. Because an LLM interprets the instructions, there is inherent non-determinism. The same skill can produce slightly different outputs. If you need guaranteed structure, combine a skill with structured output constraints (grammar, JSON schema) or use tool calls for the critical parts.

Skills are not a replacement for MCP. MCP (Model Context Protocol) provides secure connectivity to external systems: databases, APIs, file systems. Skills provide procedural knowledge for using those systems. A "Database Query" skill might instruct the agent on how to write safe SQL and handle edge cases, while an MCP server provides the actual database connection. Different layers, same stack.

4. Agent Skills vs. MCP vs. Tool Use vs. Prompts

One of the most common confusions in the current AI tooling landscape is understanding where Agent Skills fit relative to MCP, function calling/tool use, system prompts, and subagents. This section lays it out concretely.

Skills vs. MCP (Model Context Protocol)

MCP and Agent Skills both originated at Anthropic, but they solve different problems at different layers.

MCP is a communication protocol (spec). It defines how an agent talks to external systems: databases, APIs, file systems, SaaS applications. An MCP server exposes tools (structured functions with JSON schemas) and resources (data the agent can read). When the agent calls an MCP tool, it sends a JSON-RPC request and gets a deterministic response. MCP runs as a separate process with its own authentication and isolation.

Agent Skills are knowledge files. A skill tells the agent how to think about a task: what steps to follow, what output format to use, what constraints to respect. Skills run inside the agent's own context window. There is no separate process, no network call, no schema.

The Goose team at Block summarized it well: MCP gives agents abilities; skills teach agents how to use those abilities well. Anthropic product manager Mahesh Murag confirmed in the VentureBeat launch coverage that the two are designed as complementary layers.

A concrete example: you might have an MCP server that connects to your PostgreSQL database. The MCP server exposes a run_query tool. But the agent still needs to know how to write safe, efficient SQL for your specific schema. That is what a database-query skill provides: the procedural knowledge for using the tool well.

	Agent Skills	MCP
Layer	Knowledge / procedure	Connectivity / action
Format	Markdown + YAML file	JSON-RPC 2.0 protocol
Execution	LLM interprets instructions	Deterministic API call
Isolation	Shares agent's context	Separate process per server
Latency	Zero (local file read)	Network round-trip
Auth model	None (trust the file)	OAuth-native
State	Stateless (text)	Stateful (running server)
Best for	Workflows, expertise, formats	Data access, external actions
Risk	Misinterpretation, hallucination	Tool poisoning, auth leaks

For a deeper technical analysis, see the LlamaIndex comparison and Friedrichs-IT security analysis.

Skills vs. Tool Use / Function Calling

Tool use (also called function calling) is the mechanism where an LLM decides to invoke a structured function. The model outputs a JSON object with a function name and arguments; the runtime executes it and returns the result. OpenAI, Anthropic, Google, and most providers support this natively.

Skills operate at a higher level of abstraction. A single skill might orchestrate multiple tool calls as part of a workflow. For example, a research-report skill could instruct the agent to: (1) search the web, (2) read the top 5 results, (3) synthesize findings, (4) format them as a report. The skill defines the procedure; individual tool calls handle the actions.

You can also combine the two directly. In LM-Kit.NET, a skill can be exposed as a tool via SkillTool, letting the model decide when to activate a particular skill through the function calling interface.

Skills vs. System Prompts

A system prompt is a block of text prepended to every conversation. It is always present, always consuming tokens, and typically hardcoded in your application.

A skill is loaded only when needed and unloaded when done. This is the progressive disclosure advantage. But there is a deeper difference: skills are portable artifacts that live outside your code. A system prompt is part of your application. A skill is a file you can share, version, review, and swap without changing a line of code.

Skills vs. Subagents

A subagent is a fully independent agent with its own model, tools, system prompt, and conversation history. Orchestration patterns (pipeline, supervisor, parallel) coordinate multiple subagents.

A skill is lighter. It does not create a new execution context. It augments an existing agent's behavior. You can think of subagents as "hiring a specialist contractor" and skills as "reading the specialist's playbook yourself." Both are valid; the choice depends on whether you need autonomous execution (subagent) or guided behavior within the current conversation (skill).

The Full Comparison

	Agent Skills	MCP Tools	Function Calling	System Prompts	Subagents
What it is	Portable knowledge module	External connectivity	Structured action invocation	Static instruction text	Independent execution context
Format	SKILL.md (Markdown)	JSON-RPC server	JSON schema	String in code	Agent with own config
Execution	Non-deterministic (LLM)	Deterministic (API)	Deterministic (API)	Non-deterministic (LLM)	Non-deterministic (LLM)
Loading	On-demand	Always connected	Always available	Always in context	On-demand
Token cost	Only when active	Schema always present	Schema always present	Always present	Separate context
Persistence	Versioned file	Running process	Code definition	Embedded in code	Running process
Portability	Cross-platform standard	Cross-platform standard	Provider-specific schema	Vendor-specific	Framework-specific
Best for	Workflows, expertise, roles	Data access, external APIs	Single actions	Baseline behavior	Complex autonomous work

When to Use What

Use Agent Skills when you need the agent to follow a specific workflow, produce a specific output format, or behave with domain expertise. Skills are the right choice when behavior should be portable, reviewable, and swappable without code changes.

Use MCP when the agent needs to interact with external systems: read a database, call a REST API, access a file system on a remote server.

Use function calling / tool use when you need the agent to perform a specific, well-defined action with structured inputs and outputs.

Use system prompts for baseline personality, safety constraints, and always-on behavior that applies to every conversation regardless of task.

Use subagents when a task requires autonomous multi-step reasoning with its own dedicated tools and context, especially in orchestration patterns (pipelines, parallel work, supervisor delegation).

In practice, production agents combine all of these. A typical architecture: system prompt for baseline behavior, skills for task-specific expertise, MCP for external data, function calling for actions, and subagents for complex orchestration.

For the authoritative comparison, see the Claude blog on Skills.

5. How Progressive Disclosure Works

The key architectural innovation of Agent Skills is progressive disclosure: a three-tier loading strategy that keeps context efficient.

Tier	When	What loads	Token cost
Discovery	At startup	`name` and `description` from YAML frontmatter only	~50 tokens per skill
Activation	When the skill is triggered	Full SKILL.md body: instructions, output format, rules, examples	~500 to 5,000 tokens
Execution	When the agent actually needs them	Files in `references/`, `scripts/`, `assets/`	~2,000+ tokens per resource

At startup, the agent loads only names and descriptions of all available skills. If you have 20 skills, that is roughly 1,000 tokens of metadata. The agent knows what it can do, but carries none of the detailed instructions.

When a skill is activated (by user command or automatic matching), the full SKILL.md body is loaded into context. The agent now has the detailed instructions, output format, and rules for that specific task.

During execution, if the skill references a file in references/ or scripts/, that content is loaded only when the agent actually needs it.

This means you can give an agent access to a large library of capabilities without paying the token cost for all of them at once. The context window stays clean, the agent stays focused, and you only pay for what you use.

6. What Problems Do Agent Skills Solve in Practice?

Context efficiency

Without skills, a multi-capability agent needs all instructions loaded simultaneously. With 10 workflows averaging 500 tokens each, that is 5,000 tokens of permanent context overhead. With skills, the overhead is ~500 tokens for the catalog plus ~500 tokens for whichever skill is currently active. A 10x reduction.

Consistent, repeatable output

A skill specifies an exact output format, a set of rules, and optionally examples. Every time the agent runs that skill, it follows the same structure. This is the difference between "review this code" (unpredictable) and "review this code using the code-review skill" (structured checklist, consistent format, every time).

Modularity

Add a new capability? Drop a folder. Remove one? Delete the folder. Update a workflow? Edit the Markdown. No code changes, no redeployment, no risk of breaking unrelated features. Skills decouple what the agent knows from how the agent runs.

Version control and governance

Skills are plain text files. They live in your Git repository. You can:

Review changes in pull requests
Track the history of every instruction change
Roll back to a previous version if a new one underperforms
Require approval before behavior changes go live

This matters enormously in regulated industries (finance, healthcare, legal) where AI behavior must be auditable.

Portability

Agent Skills is an open standard adopted by a growing number of agent products, including OpenAI Codex (docs), GitHub Copilot (docs), VS Code (docs), Cursor (docs), Block's Goose, and LM-Kit.NET. A skill you write once works across all of them. No vendor lock-in.

Team collaboration

Because skills are Markdown, anyone can write or improve them. A domain expert (lawyer, compliance officer, product manager) can author a skill without touching code. Developers review and deploy it. This separates "what the AI should do" (domain knowledge) from "how the AI works" (infrastructure). Both teams contribute to what they know best.

7. Why Agent Skills Make Even More Sense for Local Inference

Most of the discussion around Agent Skills focuses on cloud-hosted models: Claude, GPT-4, Gemini. But there is an argument to be made that skills become even more valuable when you run models locally.

Smaller models need more guidance

Cloud models like GPT-4o or Claude Sonnet have been trained on enormous corpora and can often figure out a reasonable output format on their own. A 4B or 8B parameter model running on your GPU does not have the same capacity to guess your intent. It needs explicit instructions: what format to follow, what constraints to respect, what to include and what to skip. This is exactly what a well-written SKILL.md provides. Skills compensate for a smaller model's weaker instruction-following by giving it the specific, structured guidance it needs.

In our own testing with LM-Kit.NET, the difference is striking. Ask a 4B model to "review this code" with no skill, and you get a generic response that varies every time. Activate a code-review skill with a clear checklist and output template, and the same 4B model produces structured, consistent reviews. The skill effectively narrows the solution space, which is precisely what a smaller model needs.

Context windows are tighter

Cloud models now offer 128K to 1M token context windows. Local models typically run with 4K to 32K contexts, sometimes up to 128K but at a real performance cost (memory, speed). Every token matters more.

This is where progressive disclosure goes from "nice optimization" to "architectural necessity." With a local model, you simply cannot afford to load all possible instructions at once. You need the skill system: load only metadata at startup (~50 tokens per skill), activate the relevant skill on demand (~500 to 2,000 tokens), keep the rest of the context free for the actual user data.

Every token costs compute, not dollars

With a cloud API, the cost of extra context tokens is measured in money. With local inference, it is measured in latency and memory. Stuffing a 5,000-token system prompt into every request to a local model directly impacts response time, especially on consumer hardware. Skills let you keep prompts lean for the common case and only expand them when the specific expertise is actually needed.

No network dependency

Skills are local files. The model is local. The entire skill activation pipeline (read file, inject into prompt, generate response) happens in-process with zero network calls. This makes the architecture simpler, faster, and fully offline-capable. For edge deployments, air-gapped environments, or embedded applications, this matters. You ship the model and the skills folder together, and the system works with no external dependencies.

Privacy and control

When skills run locally, the instructions themselves never leave your machine. If you have a skill encoding proprietary compliance rules, internal procedures, or sensitive domain knowledge, that content stays in your process memory. It is never sent to a third-party API. This is a significant advantage for healthcare, finance, legal, defense, and any industry with data residency requirements.

In short: skills make large cloud models more organized. They make small local models actually capable. The smaller the model, the more it benefits from well-structured skill instructions.

8. Before and After: What a Skill Actually Changes

The concept is clearest with a concrete example. Here is the same 4B model, the same user question, with and without a skill.

Without a skill (bare prompt: "Review this function for issues"):

The function looks okay but you might want to add some error handling. Also the variable names could be more descriptive. Overall it seems fine.

Generic, unstructured, inconsistent. Ask again tomorrow, you will get a different format and different level of detail.

With a code-review skill active (which defines a checklist, output format, and severity levels):

Code Review

# Issue Severity Line

1 Unchecked null dereference on user.Email High 12

2 SQL string concatenation (injection risk) Critical 18

3 Magic number 86400 should be a named constant Low 25

Summary: 2 issues require fixing before merge. The SQL injection on line 18 is the priority.

#	Issue	Severity	Line
1	Unchecked null dereference on `user.Email`	High	12
2	SQL string concatenation (injection risk)	Critical	18
3	Magic number `86400` should be a named constant	Low	25

Same model, same weights, same temperature. The only difference is 800 tokens of skill instructions loaded into the context. The skill provides the structure the model cannot invent on its own.

This pattern applies across domains:

Use Case	Skill	What changes
Customer support	`support-playbook`	Agent follows a triage checklist instead of guessing what to ask
Report generation	`weekly-report`	Output always has the same sections (summary, metrics, action items)
Compliance	`gdpr-review`	Agent checks a specific list of requirements instead of improvising
Email writing	`email-writer`	Produces subject line, greeting, body, sign-off in consistent format
Multi-mode assistant	Multiple skills	Same chatbot switches between explainer, analyst, and writer on command

9. Implementing Agent Skills with LM-Kit.NET

LM-Kit.NET provides a complete, production-ready implementation of the Agent Skills specification. Let's build a working skill-based assistant from scratch.

9.1 Create the Project

dotnet new console -n SkillAssistant
cd SkillAssistant
dotnet add package LM-Kit.NET

9.2 Create Your Skills

Create a skills/ directory with two skills:

skills/explain/SKILL.md

---
name: explain
description: Explains any topic in plain language. Type a word or phrase and get a clear, jargon-free explanation.
metadata:
  version: "1.0"
---

# Plain Language Explainer

You explain topics so anyone can understand them. The user gives you a word,
phrase, or concept. You explain it clearly.

## Output Format

## <Topic>

**In one sentence:** <simple, one-sentence definition>

**How it works:** <2-3 sentences explaining the mechanism or idea using an everyday analogy>

**Why it matters:** <1-2 sentences on why someone should care>

**Example:** <one concrete, real-world example>

## Rules

1. **No jargon.** If you must use a technical term, define it in parentheses.
2. **Use analogies.** Compare unfamiliar concepts to everyday things.
3. **Be concise.** The entire explanation fits on one screen.
4. **Assume zero background knowledge.**
5. Never say "it's complicated" or "it depends." Just explain it.

skills/email-writer/SKILL.md

---
name: email-writer
description: Writes a professional email from a short description. Type what you need and get a ready-to-send email.
metadata:
  version: "1.0"
---

# Professional Email Writer

You write complete, professional emails from a short description. The user
types one line describing the situation. You produce a full email.

## Output Format

Subject: <concise subject line>

<greeting>,

<body: 2-3 short paragraphs>

<sign-off>,
[Your Name]

## Rules

1. **Never ask for more details.** If information is missing, make reasonable assumptions.
2. **Be concise.** No paragraph longer than 3 sentences.
3. **Start with the purpose.** The first sentence states why the sender is writing.
4. **End with a next step.** The last paragraph includes a specific action or timeline.
5. **Use [Name] placeholders** for any names you do not know.
6. **Default to professional tone.** No slang, no emojis, no exclamation marks.

9.3 Copy Skills to Output

Add this to your .csproj so the skills folder ships with your binary:

<ItemGroup>
  <None Include="skills\**\*">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    <Link>skills\%(RecursiveDir)%(Filename)%(Extension)</Link>
  </None>
</ItemGroup>

9.4 Write the Application

Here is a complete, minimal Program.cs:

using LMKit.Agents.Skills;
using LMKit.Model;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Sampling;
using System.Text;

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// Step 1: Load all skills from the skills/ folder
var registry = new SkillRegistry();
var activator = new SkillActivator(registry);

string skillsPath = Path.Combine(AppContext.BaseDirectory, "skills");
if (Directory.Exists(skillsPath))
{
    int count = registry.LoadFromDirectory(skillsPath, errorHandler: (path, ex) =>
        Console.WriteLine($"  Warning: could not load {Path.GetFileName(path)}: {ex.Message}"));
    Console.WriteLine($"Loaded {count} skill(s).\n");
}

// Step 2: Load a model
Console.WriteLine("Loading model...");
var model = LM.LoadFromModelID("gemma3:4b");

// Step 3: Set up conversation
var chat = new MultiTurnConversation(model);
chat.MaximumCompletionTokens = 4096;
chat.SamplingMode = new RandomSampling { Temperature = 0.7f };

// Stream tokens to the console as they are generated.
// LM-Kit.NET fires this event for each chunk of text the model produces,
// so the user sees output appear word by word instead of waiting for the full response.
chat.AfterTextCompletion += (_, e) => Console.Write(e.Text);

AgentSkill? activeSkill = null;

// Step 4: Show available skills
Console.WriteLine("Available skills:");
foreach (var skill in registry.Skills)
    Console.WriteLine($"  /{skill.Name} - {skill.Description}");
Console.WriteLine("  /off - Deactivate the current skill\n");

// Step 5: Chat loop
while (true)
{
    Console.Write("\nYou: ");
    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input)) break;

    // Handle slash commands
    if (input.StartsWith("/"))
    {
        if (input.Trim() == "/off")
        {
            activeSkill = null;
            Console.WriteLine("Skill deactivated.");
            continue;
        }

        if (registry.TryParseSlashCommand(input, out var matched, out _))
        {
            activeSkill = matched;
            Console.WriteLine($"Skill activated: {matched.Name}");
            continue;
        }

        Console.WriteLine("Unknown command.");
        continue;
    }

    // Build the prompt: inject skill instructions when one is active
    string prompt = input;
    if (activeSkill != null)
    {
        string instructions = activator.FormatForInjection(
            activeSkill,
            SkillInjectionMode.UserMessage);
        prompt = instructions + "\n\n---\n\nUser request: " + input;
    }

    // Submit and stream the response
    Console.Write("\nAssistant: ");
    var result = chat.Submit(prompt);
    Console.WriteLine();
}

9.5 Run It

dotnet run

A typical session looks like this:

Loaded 2 skill(s).

Available skills:
  /explain - Explains any topic in plain language.
  /email-writer - Writes a professional email from a short description.
  /off - Deactivate the current skill

You: /explain
Skill activated: explain

You: blockchain

Assistant:

## Blockchain

**In one sentence:** A blockchain is a shared digital ledger where every
entry is verified by a network of computers instead of a single authority.

**How it works:** Imagine a notebook that thousands of people hold copies of.
Whenever someone writes a new line, everyone checks it matches and updates
their copy. Because no single person controls the notebook, nobody can
secretly alter past entries.

**Why it matters:** It enables trust between strangers without a middleman,
which is useful for sending money, proving ownership, and recording agreements.

**Example:** Bitcoin uses a blockchain to track who owns which coins, so two
people anywhere in the world can exchange value without a bank in between.

You: /email-writer
Skill activated: email-writer

You: thank a vendor for fast delivery

Assistant:

Subject: Thank You for the Quick Delivery

Dear [Name],

I am writing to thank you for the prompt delivery of our recent order. The
shipment arrived ahead of schedule, which allowed our team to stay on track
with our project timeline.

We appreciate the care taken in packaging and the clear communication
throughout the process. This level of service reinforces our confidence in
your team.

We look forward to continuing our partnership. Please let us know if there
is anything we can do to support future orders.

Best regards,
[Your Name]

Notice how the same model produces completely different output formats depending on which skill is active. Without any code change, just by switching the active skill.

9.6 What Just Happened (Under the Hood)

Here is the flow for each user message when a skill is active:

1. User types: "blockchain"

2. SkillActivator.FormatForInjection() reads the full SKILL.md
   and formats it for injection

3. The prompt becomes:
   "[skill instructions]\n\n---\n\nUser request: blockchain"

4. MultiTurnConversation.Submit() sends this to the model

5. The model follows the skill's output format and rules

6. Result: structured, consistent output

The skill instructions are injected as a user message (via SkillInjectionMode.UserMessage). LM-Kit.NET also supports SystemPrompt (prepend to system prompt) and ToolResult (return as tool output) injection modes.

10. Production Features in LM-Kit.NET

The tutorial above covers the core workflow. LM-Kit.NET provides several additional capabilities for production deployments. This section gives a brief overview of each; see the full API documentation for details and code samples.

Automatic skill matching. Instead of explicit slash commands, SkillRegistry can match a user's natural-language request to the best skill using keyword scoring or embedding-based semantic search via FindMatches and FindMatchesWithEmbeddings. This lets you build assistants that route to the right skill automatically.

Remote skill loading. Skills do not have to live on the local file system. SkillRegistry.LoadFromUrlAsync fetches skills from URLs, ZIP archives, or GitHub repositories. This enables enterprise distribution: host a skill library centrally and have agents pull the latest versions at startup.

Hot reload. During development, SkillWatcher monitors the skills directory and automatically reloads any SKILL.md that changes. Edit the file, save, and the agent picks up the new instructions without restarting. Example:

using var watcher = new SkillWatcher(registry, skillsPath);
watcher.Start();
// Edit any SKILL.md and the registry updates automatically

Programmatic skill creation. SkillBuilder provides a fluent API for constructing skills in code rather than from files. This is useful when skill content comes from a database, a UI, or is generated dynamically.

Skills as tools. SkillTool wraps the skill registry as an ITool, letting the model decide which skill to activate through the function calling interface. Register it with an Agent and the model can self-select skills based on the conversation.

Validation. SkillRegistry.ValidateAll() checks all loaded skills against the spec (name format, required fields, directory structure). Run this in CI to catch errors before deployment.

LM-Kit.NET Agent Skills API at a Glance

Class	Purpose
`AgentSkill`	A loaded skill: name, description, instructions, resources
`SkillRegistry`	Discovers, loads, stores, and searches skills
`SkillActivator`	Formats skill instructions for injection into conversations
`SkillParser`	Reads and parses SKILL.md files
`SkillBuilder`	Fluent API for creating skills in code
`SkillTool`	Exposes skills as tools for LLM function calling
`SkillWatcher`	Monitors files and hot-reloads on change
`SkillRemoteLoader`	Fetches skills from URLs, ZIPs, and GitHub

Enum	Values
`SkillInjectionMode`	`SystemPrompt`, `UserMessage`, `ToolResult`
`SkillResourceType`	`Reference`, `Script`, `Asset`, `Example`, `Other`

For the full API reference with all classes, methods, and examples, see docs.lm-kit.com.

11. Best Practices for Writing Great Skills

Writing a SKILL.md is easy. Writing a good one takes craft. Here are guidelines based on what works in production.

Naming

Use lowercase with hyphens: code-review, email-writer, weekly-report
Keep it short and descriptive (max 64 characters)
Avoid generic names like helper or assistant

Description

Write it as a one-sentence action statement: "Reviews code for bugs, security issues, and style violations"
Include keywords that help with automatic matching
Keep it under 1024 characters

Instructions

Put the most important constraints first. LLMs pay more attention to the beginning.
Specify the exact output format. Show a template with placeholders. The more specific, the more consistent.
Write explicit rules. Number them. State what the model should and should not do.
Include at least one example. Show a sample input and the expected output. This is the single most effective technique for consistent behavior.
Keep it concise. Move lengthy reference material to references/ and load it on demand. The instructions themselves should fit in ~500 to 2,000 tokens.

Structure Template

Every SKILL.md should follow this pattern:

---
name: your-skill-name
description: One sentence explaining when to use this skill.
metadata:
  version: "1.0"
---

# Role Title

One paragraph defining the agent's role and what it does.

## Output Format

<exact template with placeholders>

## Rules

1. **Rule one.** Short explanation.
2. **Rule two.** Short explanation.
3. ...

## Example

**Input:** <sample input>

**Output:** <sample output following the format above>

The Skill Authoring Checklist

Before shipping a skill, verify:

[ ] Name is lowercase, hyphenated, and descriptive
[ ] Description clearly states when to use the skill
[ ] Version is set (start with 1.0.0)
[ ] Output format is explicitly defined with a template
[ ] At least 3 clear rules constrain behavior
[ ] At least 1 example input/output is included
[ ] Instructions fit within ~2,000 tokens (move long references to references/)
[ ] Skill validates without errors (AgentSkill.Validate())
[ ] Tested with at least 2 different models

12. Security Considerations

Agent Skills are powerful, and that power requires care. Cisco's AI Defense team found that a significant number of publicly shared skills contained vulnerabilities, including prompt injection and credential exfiltration.

Best practices for skill security:

Review all skills before deployment. Treat SKILL.md files like code: review them in pull requests.
Be cautious with scripts/. Scripts execute on the host machine. Only include scripts from trusted sources. Consider sandboxing execution.
Use the allowed-tools field. Explicitly declare which system tools a skill can invoke. This helps runtime environments enforce boundaries.
Never put secrets in skills. No API keys, passwords, or tokens in SKILL.md files or their resources.
Validate inputs. If a skill processes user-provided data, ensure your agent validates that data before acting on it.
Pin versions. When loading remote skills, pin to specific versions or commit hashes rather than tracking main.

13. Industry Adoption

Agent Skills is not a theoretical standard. It has been adopted by the major players in the AI ecosystem:

Platform	How It Uses Agent Skills
OpenAI Codex	Reads skills from `.agents/skills` directories (docs)
GitHub Copilot	Loads skills from `.github/skills/` in repositories
VS Code	Native agent skills support in Copilot (docs)
Cursor	Supports SKILL.md for workspace-level agent customization
Block (Goose)	Open-source agent with full skills support
LM-Kit.NET	Complete implementation with registry, activator, hot reload, remote loading
Spring AI	Java/Spring implementation (blog)
Mintlify	Auto-generates skills from documentation sites (blog)

The standard was developed by Anthropic and published as open source. The broader agentic AI ecosystem, including related standards like MCP and AGENTS.md, is coordinated through the Agentic AI Foundation under the Linux Foundation, whose platinum members include AWS, Anthropic, Block, Google, Microsoft, and OpenAI.

14. Conclusion

Agent Skills solve a real, specific problem: how to give AI agents specialized, reusable expertise without overwhelming their context windows. The solution is elegant in its simplicity. A skill is a Markdown file in a folder. The agent loads it on demand. The output becomes consistent.

Here is what to remember:

A skill is a folder with a SKILL.md file. YAML frontmatter for metadata, Markdown body for instructions.
Progressive disclosure keeps context efficient. Metadata at startup, instructions on activation, resources on execution.
Skills are not tools, not prompts, not agents. They are portable knowledge modules that complement all three.
The standard is open and widely adopted. OpenAI, Microsoft, Anthropic, and a growing number of platforms support it.
LM-Kit.NET provides a complete implementation. Registry, activator, hot reload, remote loading, validation, tool integration.

Next Steps

Try the demo: Skill-Based Assistant on GitHub
Read the how-to guide: Add Skills to Your AI Assistant
Install LM-Kit.NET: NuGet package
Read the specification: agentskills.io/specification
Explore the API docs: docs.lm-kit.com

Start with one skill. Get it working. Then build a library your whole team can reuse.

🧰 Meet LM-Kit Tool Calling for Local Agents

Loïc Carrère — Fri, 17 Oct 2025 11:14:01 +0000

LM-Kit Tool Calling: Build Reliable Local AI Agents

TL;DR: LM-Kit .NET SDK now supports state-of-the-art tool calling for building AI agents in C#. Create on-device agents that discover, invoke, and chain tools with structured JSON schemas, safety policies, and human-in-the-loop controls, all running locally with full privacy. Works with thousands of local models from Mistral, LLaMA, Qwen, Granite, GPT-OSS, and more. Supports all tool calling modes: simple function, multiple function, parallel function, and parallel multiple function. No cloud dependencies, no API costs, complete control over your agent workflows.

What Are Tools in Agentic AI?

Tools are a fundamental part of agentic AI, alongside these core capabilities:

While language models excel at understanding and generating text, tools extend their abilities by letting them interact with the real world: searching the web for current information, executing code for calculations, accessing databases, reading files, or connecting to external services through APIs. Think of tools as the hands and eyes of an AI agent. They transform a conversational system into an agent that can accomplish tasks by bridging the gap between reasoning and action. When an agent needs to check the weather, analyze a spreadsheet, or send an email, it invokes the appropriate tool, receives the result, and incorporates that information into its response. This moves AI beyond pure text generation toward practical, real-world problem solving.

Interested in how agents retain and use context over time? Explore our deep dive on agent memory.

Why Local Agents Have Been Hard

Building AI agents that can actually do things locally has been surprisingly hard. You need:

Models that understand when and how to call external functions
Privacy without sending data to the cloud
A runtime that can parse tool calls, validate arguments, and inject results
Model-specific flows because each model has different tool calling formats and interaction patterns, requiring custom logic for interception, result injection, and action ordering
Safety controls to prevent infinite loops and runaway costs
Clear observability so you know what your agent is doing

Until now, most agentic frameworks forced a choice: powerful cloud-based agents with latency and privacy concerns, or limited local models without proper tool support. Today, that changes.

Why Tool Calling Changes Everything

With LM-Kit's new tool calling capabilities, your local agents can:

Ground answers in real data. No more hallucinated weather forecasts or exchange rates. Agents fetch actual API responses and can cite sources.
Chain complex workflows. For example: check the weather, convert temperature to the user's preferred units, then suggest activities. All in one conversational turn.
Maintain full privacy. Everything runs on-device. Your users' queries, tool arguments, and results never leave their machines.
Stay deterministic and safe. Typed schemas, validated inputs, policy controls, and approval hooks prevent agents from going rogue.
Scale with your domain. Add business APIs, internal databases, or external MCP catalogs as tools. The model learns to use them from descriptions and schemas alone.

What's New at a Glance

State-of-the-art tool calling, right in chatbot flows. Models decide when to call tools, pass structured JSON args, and use results to answer users accurately.
Dedicated flow support across model families like Mistral, GPT-OSS, Qwen, Granite, LLaMA, and more — all via one runtime.
Three ways to add tools:
- Implement ITool
- Annotate methods with [LMFunction]
- Import catalogs from MCP servers
Unified API that runs local SLMs with per-turn policy, guardrails, and events for human-in-the-loop and observability at every stage.
All function calling modes supported. Simple Function, Multiple Function, Parallel Function, and Parallel Multiple Function — choose strict sequencing or safe parallelism.
Model-aware tool call flow. Modern SLMs emit structured tool calls. LM-Kit parses calls, routes them to your tools, and feeds results back with correlation and clear result types for a reliable inference path.

How It Works: Getting Started

Here's a complete working example in under 20 lines:

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Agents.Tools;
using System.Text.Json;

// 1) Load a local model from the catalog
var model = LM.LoadFromModelID("gptoss:20b"); // OpenAI GPT-OSS 20B

// Optional: confirm tool-calling capability
if (!model.HasToolCalls) { /* choose a different model or fallback */ }

// 2) Create a multi-turn conversation
using var chat = new MultiTurnConversation(model);

// 3) Register tools (see three options below)
chat.Tools.Register(new WeatherTool());

// 4) Shape the behavior per turn
chat.ToolPolicy.Choice = ToolChoice.Auto; // let the model decide
chat.ToolPolicy.MaxCallsPerTurn = 3; // guard against loops

// 5) Ask a question
var reply = chat.Submit("Plan my weekend and check the weather in Toulouse.");
Console.WriteLine(reply.Content);

The model catalog includes GPT-OSS and many other families. LM.LoadFromModelID lets you pull a named card like gptoss:20b. You can also check HasToolCalls before you rely on tools.

See the Model Catalog documentation for details.

Try it now — GitHub sample

A production-ready console sample demonstrates multi-turn chat with tool calling (currency, weather, unit conversion), per-turn policies, progress feedback, and special commands. Jump to:

Create Multi-Turn Chatbot with Tools in .NET Applications

Three Ways to Add Tools

1) Implement ITool (Full Control)

Best when you need clear contracts and custom validation.

This snippet demonstrates implementing the ITool interface so an LLM can call your tool directly. It declares the tool contract (Name, Description, InputSchema), parses JSON args, runs your logic, and returns structured JSON to the model.

public sealed class WeatherTool : ITool
{
    public string Name => "get_weather";

    public string Description => "Get current weather for a city. Returns temperature, conditions, and optional hourly forecast.";

    // JSON Schema defines expected arguments
    public string InputSchema => """
    {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name (e.g., 'Paris' or 'New York')"}
        },
        "required": ["city"]
    }
    """;

    public async Task<string> InvokeAsync(string arguments, CancellationToken ct = default)
    {
        // Parse the model's JSON arguments
        var city = JsonDocument.Parse(arguments).RootElement.GetProperty("city").GetString();

        // Call your weather API
        var weatherData = await FetchWeatherAsync(city);

        // Return structured JSON the model can understand
        var result = new { city, temp_c = weatherData.Temp, conditions = weatherData.Conditions };
        return JsonSerializer.Serialize(result);
    }
}

// Register it
chat.Tools.Register(new WeatherTool());

Why use ITool? Complete control over validation, async execution, error handling, and result formatting.

2) Annotate Methods with LMFunction

Best for rapid prototyping and simple synchronous tools.

What it does: Add [LMFunction(name, description)] to public instance methods. LM-Kit discovers them and exposes each as an ITool, generating a JSON schema from method parameters.

How it's wired: Reflect and bind with LMFunctionToolBinder.FromType<MyDomainTools>() (or FromInstance/FromAssembly), then register the resulting tools via chat.Tools.Register(...).

public sealed class MyDomainTools
{
    [LMFunction("search_docs", "Search internal documentation by keyword. Returns top 5 matches.")]
    public string SearchDocs(string query)
    {
        var results = _documentIndex.Search(query).Take(5);
        return JsonSerializer.Serialize(new { hits = results });
    }

    [LMFunction("get_user_info", "Retrieve user profile and preferences.")]
    public string GetUserInfo(int userId)
    {
        var user = _database.GetUser(userId);
        return JsonSerializer.Serialize(user);
    }
}

// Automatically scan and register all annotated methods
var tools = LMFunctionToolBinder.FromType<MyDomainTools>();
chat.Tools.Register(tools);

Why use [LMFunction]? Less boilerplate. The binder generates schemas from parameter types and registers everything in one line.

3) Import MCP Catalogs (External Services)

Best for connecting to third-party tool ecosystems via the Model Context Protocol.

What it does: Uses McpClient to establish a JSON-RPC session with an MCP server, fetch its tool catalog, and adapt those tools so your agent can call them.

How it's wired: Create new McpClient(uri, httpClient) (optionally set a bearer token), then chat.Tools.Register(mcp, overwrite: false) to import the catalog; LM-Kit manages tools/list, tools/call, retries, and session persistence.

using LMKit.Mcp.Client;

// Connect to an MCP server
var mcp = new McpClient(
    new Uri("https://mcp.example.com/api"),
    new HttpClient()
);

// Import all available tools from the server
int toolCount = chat.Tools.Register(mcp, overwrite: false);
Console.WriteLine($"Imported {toolCount} tools from MCP server");

Why use MCP? Instant access to curated tool catalogs. The server handles tools/list and tools/call over JSON-RPC; LM-Kit validates schemas locally.

See McpClient documentation.

Execution Modes That Match Your Workflow

Choose the right policy for each conversational turn:

Simple Function

One tool, one answer.

chat.ToolPolicy.MaxCallsPerTurn = 1;
chat.ToolPolicy.Choice = ToolChoice.Required; // force at least one call

Example: "What is the weather in Tokyo?" → calls get_weather once → answers.

Multiple Function

Chain tools sequentially.

chat.ToolPolicy.MaxCallsPerTurn = 5;
chat.ToolPolicy.Choice = ToolChoice.Auto;

Example: "Convert 75°F to Celsius, then tell me if I need a jacket."

Calls convert_temperature(75, "F", "C") → gets 23.9°C
Calls get_weather("current_location") → gets conditions
Synthesizes answer → "It is 24°C and sunny. A light jacket should be fine."

Parallel Function

Execute multiple tools concurrently.

chat.ToolPolicy.AllowParallelCalls = true;
chat.ToolPolicy.MaxCallsPerTurn = 10;

Example: "Compare weather in Paris, London, and Berlin."

Calls get_weather("Paris"), get_weather("London"), get_weather("Berlin") simultaneously
Waits for all results → compares → answers

Only enable if your tools are idempotent and thread-safe.

Parallel Multiple Function

Combine chaining and parallelism.

Example: "Check weather in 3 cities, convert all temps to Fahrenheit, and recommend which to visit."

Parallel → fetches weather for 3 cities
Parallel → converts all temperatures
Sequential → recommends based on results

See ToolCallPolicy documentation for all options including ToolChoice.Specific and ForcedToolName. Defaults are conservative: parallel off, max calls capped.

Safety, Control, and Observability

Policy Controls

Configure safe defaults and per-turn limits. See ToolCallPolicy documentation.

chat.ToolPolicy = new ToolCallPolicy
{
    Choice = ToolChoice.Auto, // let model decide
    MaxCallsPerTurn = 3, // prevent runaway loops
    AllowParallelCalls = false, // safe default: sequential only

    // Optional: force a specific tool first
    // Choice = ToolChoice.Specific,
    // ForcedToolName = "authenticate_user"
};

Human in the Loop

Review, approve, or block tool execution. Hooks: BeforeToolInvocation, AfterToolInvocation, BeforeTokenSampling, MemoryRecall.

// Approve tool calls before execution
chat.BeforeToolInvocation += (sender, e) =>
{
    Console.WriteLine($"🔔 About to call: {e.ToolCall.Name}");
    Console.WriteLine($"   Arguments: {e.ToolCall.ArgumentsJson}");

    // Block sensitive operations
    if (e.ToolCall.Name == "delete_user" && !UserHasApproved())
    {
        e.Cancel = true;
        Console.WriteLine("   ❌ Blocked by policy");
    }
};

// Audit results after execution
chat.AfterToolInvocation += (sender, e) =>
{
    var result = e.ToolCallResult;
    Console.WriteLine($"✅ {result.ToolName} completed");
    Console.WriteLine($"   Status: {result.Type}");
    Console.WriteLine($"   Result: {result.ResultJson}");

    _telemetry.LogToolCall(result); // send to monitoring
};

// Override token sampling in real time
chat.BeforeTokenSampling += (sender, e) =>
{
    if (_needsDeterministicOutput)
        e.Sampling.Temperature = 0.1f;
};

// Control memory injection
chat.MemoryRecall += (sender, e) =>
{
    Console.WriteLine($"💭 Injecting memory: {e.Text.Substring(0, 50)}...");
    // e.Cancel = true; // optionally cancel
};

Structured Data Flow

Every call flows through a typed pipeline for reproducibility and clear logs.

→ Incoming: ToolCall with stable Id and ArgumentsJson.
← Outgoing: ToolCallResult with ToolCallId, ToolName, ResultJson, and Type (Success or Error).

Try It: Multi-Turn Chat Sample

Create Multi-Turn Chatbot with Tools in .NET Applications

Purpose: Demonstrates LM-Kit.NET's agentic tool-calling: during a conversation, the model can decide to call one or multiple tools to fetch data or run computations, pass JSON arguments that match each tool's InputSchema, and use each tool's JSON result to produce a grounded reply—while preserving full multi-turn context. Tools implement ITool and are managed by a registry; per-turn behavior is shaped via ToolChoice.

Why tools in chatbots?

Reliable, source-backed answers (weather, FX, conversions, business APIs).
Agentic chaining: call several tools in one turn and combine results.
Determinism & safety: typed schemas, clear failure modes, policy control.
Extensibility: implement ITool for domain logic; keep code auditable.
Efficiency: offload math/lookup to tools; keep the model focused on reasoning.

Target audience: Product & platform teams; DevOps & internal tools; B2B apps; educators & demos.

Problem solved: Actionable answers, deterministic conversions/quotes, multi-turn memory, easy extensibility.

Sample app:

Lets you choose a local model (or a custom URI)
Registers three tools (currency, weather, unit conversion)
Runs a multi-turn chat where the model decides when to call tools
Prints generation stats (tokens, stop reason, speed, context usage)

Key features:

Tool calling via JSON arguments
Full dialogue memory
Progress feedback (download/load bars)
Special commands: /reset, /continue, /regenerate
Multiple tool calls per turn (and across turns)

Built-in Tools

Tool name	Purpose	Online?	Notes
`convert_currency`	ECB rates via Frankfurter (latest or historical) + optional trend	Yes	No API key; business days; rounding & date support
`get_weather`	Open-Meteo current weather + optional short hourly forecast	Yes	No API key; geocoding + metric/us/si
`convert_units`	Offline conversions (length, mass, temperature, speed, etc.)	No	Temperature is non-linear; can list supported units

Tools implement ITool: Name, Description, InputSchema (JSON Schema), and InvokeAsync(string json) returning JSON.

Extend with your own tool:

chat.Tools.Register(new MyCustomTool()); // implements ITool

Use unique, stable, lowercase snake_case names.

Supported Models (Pick per Hardware)

Mistral Nemo 2407 12.2B (~7.7 GB VRAM)
Meta Llama 3.1 8B (~6 GB VRAM)
Google Gemma 3 4B Medium (~4 GB VRAM)
Microsoft Phi-4 Mini 3.82B (~3.3 GB VRAM)
Alibaba Qwen-3 8B (~5.6 GB VRAM)
Microsoft Phi-4 14.7B (~11 GB VRAM)
IBM Granite 4 7B (~6 GB VRAM)
Open-AI GPT-OSS 20B (~16 GB VRAM)
Or provide a custom model URI (GGUF/LMK)

Commands

/reset — clear conversation
/continue — continue last assistant message
/regenerate — new answer for last user input

Example Prompts

"Convert 125 USD to EUR and show a 7-day trend."
"Weather in Toulouse next 6 hours (metric)."
"Convert 65 mph to km/h." / "List pressure units."
"Now 75 °F to °C, then 2 km to miles."

Behavior & Policies (Quick Reference)

Tool selection policy: By default the sample lets the model decide (ToolChoice.Auto). You can Require / Forbid / Force a specific tool per turn.
Multiple tool calls: Supports several tool invocations per turn; outputs are injected back into context.
Schemas matter: Precise InputSchema + concise Description improve argument construction.
Networking: Currency & weather require internet; unit conversion is offline.
Errors: Clear exceptions for invalid inputs (units, dates, locations).

Getting Started

Prerequisites: .NET Framework 4.6.2 or .NET 6.0

Download:

git clone https://github.com/LM-Kit/lm-kit-net-samples.git
cd lm-kit-net-samples/console_net/multi_turn_chat_with_tools

Run:

dotnet build
dotnet run

Then pick a model or paste a custom URI. Chat naturally; the assistant will call one or multiple tools as needed. Use /reset, /continue, /regenerate anytime.

Project link: GitHub Repository

Complete Example: All Three Integration Paths

// Load a capable local model
var model = LM.LoadFromModelID("gptoss:20b");
using var chat = new MultiTurnConversation(model);

// 1) ITool implementation
chat.Tools.Register(new WeatherTool());

// 2) LMFunctionAttribute methods
var tools = LMFunctionToolBinder.FromType<MyDomainTools>();
chat.Tools.Register(tools);

// 3) MCP import
var mcp = new McpClient(new Uri("https://mcp.example/api"), new HttpClient());
chat.Tools.Register(mcp);

// Safety and behavior
chat.ToolPolicy = new ToolCallPolicy
{
    Choice = ToolChoice.Auto,
    MaxCallsPerTurn = 3,
    // AllowParallelCalls = true // enable only if tools are idempotent
};

// Human-in-the-loop
chat.BeforeToolInvocation += (_, e) => { /* approve or cancel */ };
chat.AfterToolInvocation += (_, e) => { /* log results */ };

// Run
var answer = chat.Submit(
    "Find 3 relevant docs for 'safety policy' and summarize.");
Console.WriteLine(answer.Content);

Why Go Local with LM-Kit?

vs. Cloud Agent Frameworks

Zero API costs: No per-token charges. Run unlimited conversations.
Complete privacy: User data never leaves the device. GDPR/HIPAA friendly.
Sub-100ms latency: Local inference eliminates network roundtrips entirely.
Works offline: Agents function without internet connectivity.
No rate limits: Scale to millions of requests without throttling.
Full control: Own the stack. No vendor lock-in or API deprecations.

vs. Basic Prompt Engineering

Type-safe schemas: JSON Schema validation catches bad arguments before execution.
Deterministic results: Clear success/error states, not fragile regex parsing.
Parallel execution: Run multiple tools concurrently when safe.
Full observability: Structured events at every stage, not log archaeology.
Testable contracts: Mock tools, inject results, replay conversations.
Error boundaries: Graceful failures with retry logic and fallbacks.

vs. Manual Function Calling

Model decides: Agent autonomously picks tools and arguments—no brittle if/else chains.
Auto-chaining: Multiple tool calls per turn, results fed back automatically.
90% less boilerplate: Register tools once, not per-model or per-prompt.
Built-in safety: Loop prevention, max-calls limits, approval hooks out of the box.
Model-agnostic API: Same code works across Mistral, LLaMA, Qwen, Granite, GPT-OSS.
Progressive enhancement: Add tools without refactoring conversation logic.

Performance and Limitations

Performance Expectations

Tool invocation overhead: ~2–5 ms per call (parsing + validation)
Network tools: 50–500 ms depending on API
Local tools: <1 ms
Model inference remains the primary latency factor.

Requirements

Models must support tool calling (check HasToolCalls).
Network-dependent tools require internet connectivity.
Parallel execution requires thread-safe, idempotent tools.
Recommended GPU memory: 6–16 GB VRAM depending on model size.

Known Limitations

Tool selection quality depends on clear descriptions and schemas.
Complex nested objects in arguments may confuse smaller models.
Very long tool chains (>10 calls) may exceed context windows.

Ready to Build?

Clone the sample

   git clone https://github.com/LM-Kit/lm-kit-net-samples.git
   cd lm-kit-net-samples/console_net/multi_turn_chat_with_tools

Pick your integration approach
- Need full control? Use ITool
- Prototyping quickly? Use [LMFunction]
- Using external catalogs? Use McpClient
Add your domain logic
Replace demo tools with your APIs, databases, or business logic.
Set policies that fit your use case
- Simple lookups: MaxCallsPerTurn = 1
- Complex workflows: MaxCallsPerTurn = 10 with approval hooks
Ship agents that actually work
On-device. Private. Reliable. Observable.

Start building agentic workflows that respect user privacy, run anywhere, and stay under your control.

🧬 Four Local Vector Storage Patterns for C# Developers

Loïc Carrère — Thu, 24 Apr 2025 09:13:23 +0000

Introduction

In this post, we'll break down four vector storage patterns supported by LM-Kit.

LM-Kit simplifies the complexity of embedding storage by offering a unified, developer-friendly interface that supports both instant prototyping and scalable deployment.

It supports four storage patterns, each tailored to different stages of your project:

In-Memory: Ideal for fast prototyping and low-volume tasks with zero setup.
Built-In Vector DB: Self-contained file-based storage for local tools or offline apps.
Qdrant Vector Store: External high-performance DB for cloud or large-scale deployments.
Custom IVectorStore: Build your own backend to integrate with proprietary systems.

All methods use the same DataSource abstraction, so you can switch storage backends without changing your code logic.

Modern AI apps, from semantic search to retrieval-augmented generation, rely on embeddings to turn text or images into dense vectors. LM-Kit gives you a few ways to handle those vectors: you can compute them in memory when needed or store them for the long haul. Each approach has tradeoffs in speed, complexity, and cost.

If you're new to embeddings, check out our Embeddings Glossary. Or better yet, talk to LM-Kit Maestro, our free chatbot generator: GitHub: LM-Kit/Maestro

The Four Patterns

LM-Kit supports four main embedding storage patterns, ranging from ephemeral in-memory use to persistent vector databases, so you can match your infrastructure to the needs of your app.

On-the-Fly (In-Memory) Embeddings
Persistent Storage with an External Vector Database
- Prebuilt Qdrant Vector Store
- Custom Vector Store via IVectorStore
Persistent Storage with LM-Kit's Built-In Vector DB

LM-Kit uses a DataSource class to manage all of this. It's your main tool for embedding storage, representing a collection that can handle anything from text and documents to images or web pages. A DataSource element contains multiple Section elements, and each section holds TextPartition¹ objects (basically embedding vectors). Metadata can be associated to DataSource and to Section structures.

DataSource (can include metadata)
- → Sections (can include metadata)
- → TextPartitions (stores dense vectors)

¹ The "Text" in TextPartition is a bit misleading now. These partitions can handle images too.

On-the-Fly Embeddings (In-Memory)

How it works

When you need an embedding, LM-Kit calls the model, gets the vector, and keeps it in RAM for immediate use. Nothing is written to disk. No external service needed.

Code Example

Create an in-memory vector database and perform retrieval:

// Define some strings from which we want to generate embeddings
string[] examples =
{
    "How do I bake a chocolate cake?",
    "What is the recipe for chocolate cake?",
    "I want to make a chocolate cake.",
    "Chocolate cake is delicious.",
    "How do I cook pasta?",
    "I need instructions to bake a cake.",
    "Baking requires precise measurements.",
    "I like vanilla ice cream.",
    "The weather is sunny today.",
    "What is the capital of France?",
    "Paris is a beautiful city.",
    "How can I improve my coding skills?",
    "Programming requires practice."
};

// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");

// Specify optional metadata to attach to the new collection (a.k.a. DataSource)
var collectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Create a new in-memory vector database
var collection = DataSource.CreateInMemoryDataSource(
    "my-collection",
    model,
    collectionMetadata);

// Compute embeddings to insert into the collection
var embedder = new Embedder(model);
List<DataSource.VectorEntry> vectorEntries = new List<DataSource.VectorEntry>();

// Run multithreaded embedding on the list of examples
var embeddings = embedder.GetEmbeddings(examples);

for (int index = 0; index < examples.Length; index++)
{
    vectorEntries.Add(new DataSource.VectorEntry(
        vector: embeddings[index],
        payload: examples[index]));
}

const string SectionIdentifier = "my-section-identifier";

// Specify optional metadata to attach to the new section.
// Note: a single collection can contain multiple sections.
var sectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Add the computed embedding vectors to the collection
collection.Upsert(
    SectionIdentifier,
    vectorEntries,
    sectionMetadata);

// Now perform search
// Build the query vector
string query = "How do I bake a chocolate cake?";
var queryVector = new Embedder(model).GetEmbeddings(query);

// Search for similar vectors across partitions
var similarPartitions = VectorSearch.FindMatchingPartitions(
    [collection],
    model,
    queryVector);

// Do something with the search results...

This example demonstrated how to set up an in-memory vector store using DataSource.CreateInMemoryDataSource, generate embeddings with a model, and organize them into a section with optional metadata. It finishes by performing a semantic search using a query vector. Everything runs locally in RAM, making it ideal for fast prototyping, real-time classification, or experimentation without persistent storage.

In-Memory Embeddings: When and Why to Use Them

Aspect	Details
Best When	• Rapid prototyping or experimentation • One-shot queries (no need to reuse later) • Small datasets that fit in memory • Real-time classification/semantic search
Upsides	• Zero infrastructure or setup • Instant start • No file I/O overhead
Downsides	• Lost on restart • Not suitable for large collections • No sharing across processes

Serialization Note

Although referred to as "in-memory", any DataSource instance can be serialized to a file or stream using the Serialize() method. It can then be fully restored into memory with the Deserialize() method.

This provides flexibility to:

Save your in-memory collections between sessions
Store intermediate states during experimentation
Transfer embeddings across environments without requiring an external vector database

Persistent Storage with an External Vector DB

For production use or anything large-scale, you'll want to persist your vectors in a proper database. LM-Kit supports this through the IVectorStore abstraction.

Qdrant Vector Store (Prebuilt)

LM-Kit offers an out-of-the-box integration with Qdrant via the QdrantEmbeddingStore class. Qdrant is an open-source, high-performance vector database that supports HNSW indexing and advanced payload filtering.

QdrantEmbeddingStore is a simple implementation of the IVectorStore interface. It has been open-sourced and is available as part of the dedicated LM-Kit.NET.Data.Connectors.Qdrant package.

The source code for this package is hosted in the LM-Kit.NET Data Connectors GitHub repository.

Additional prebuilt vector store integrations will be added progressively to the same repository. If you require a specific implementation on a short timeline, feel free to reach out to our team.

// Initializing store
// We're using local environment that we've started with: docker run -p 6333:6333 -p 6334:6334
// Check this tutorial to setup qdrant local environment: https://qdrant.tech/documentation/quickstart/
var store = new QdrantEmbeddingStore(new Uri("http://localhost:6334"));
var model = LM.LoadFromModelID("nomic-embed-text");
var collection = DataSource.CreateVectorStoreDataSource(store, "my-collection", model);

Qdrant Vector Store: When and Why to Use it

Aspect	Details
Best When	• Production-scale semantic search • Cloud or distributed deployments • Need advanced filtering by metadata • Sharing embeddings across multiple services
Upsides	• Battle-tested, open-source vector DB • High performance (HNSW indexing) • Powerful metadata filtering • Scales horizontally
Downsides	• Requires standing up a Qdrant instance • Network latency vs local • Additional operational overhead

Custom Vector Store with IVectorStore

If you're building your own backend or want to hook into existing systems, just implement the IVectorStore interface.

public interface IVectorStore
{
    public Task<bool> CollectionExistsAsync(string collectionIdentifier, CancellationToken cancellationToken = default);
    public Task CreateCollectionAsync(string collectionIdentifier, uint vectorSize, CancellationToken cancellationToken = default);
    public Task DeleteCollectionAsync(string collectionIdentifier, CancellationToken cancellationToken = default);
    public Task UpsertAsync(string collectionIdentifier, string id, float[] vectors, MetadataCollection metadata, CancellationToken cancellationToken = default);
    public Task DeleteFromMetadataAsync(string collectionIdentifier, MetadataCollection metadata, CancellationToken cancellationToken = default);
    public Task UpdateMetadataAsync(string collectionIdentifier, string id, MetadataCollection metadata, bool clearFirst, CancellationToken cancellationToken = default);
    public Task<MetadataCollection?> GetMetadataAsync(string collectionIdentifier, string id, CancellationToken cancellationToken = default);
    public Task<List<VectorRecord>> RetrieveFromMetadataAsync(string collectionIdentifier, MetadataCollection metadata, bool getVector, bool getMetadata, CancellationToken cancellationToken = default);
    public Task<List<VectorRecord>> SearchSimilarVectorsAsync(string collectionIdentifier, float[] vector, uint limit, bool getVector, bool getMetadata, CancellationToken cancellationToken = default);
}

Custom Vector Store: When and Why to Build Your Own

Aspect	Details
Best When	• Integrating with proprietary systems • Using an existing vector DB not yet supported by LM-Kit • Need custom indexing or sharding logic • Full control over storage/retrieval
Upsides	• Total flexibility • Can leverage existing infrastructure • Tailored to your exact requirements
Downsides	• You own the implementation and maintenance • More upfront development effort

Persistent Storage with LM-Kit's Built-In Vector DB

When you need durable embedding storage without deploying an external service, LM-Kit's built-in vector database is your go-to solution. Think of it as a SQLite for dense vectors: a self-contained, file-based engine optimized for storing and querying embeddings at scale. Designed to handle millions of vectors on a single node, it delivers low-latency insertions, deletions and searches even as your dataset grows.

Under the hood, it stores vectors and metadata in an optimized file format and provides two clear APIs for managing and querying the data:

DataSource.CreateFileDataSource(path, name, model, metadata, overwrite: true) - Initialize or overwrite a local vector store at the specified file path.
DataSource.LoadFromFile(path, model, readOnly: true) - Reopen an existing store for querying or modification.

These methods let you insert, delete, and search embeddings entirely on disk. This makes the built-in store ideal for rapid prototyping, desktop tools, or any scenario where you want portable, versionable vector storage without standing up a full vector-DB cluster.

Now, let's dive into some source code to see how LM-Kit's built-in vector storage works in practice, from creating a local database to querying it later.

Creating and Populating a Local Vector Database

// Define some strings from which we want to generate embeddings
string[] examples =
{
    "How do I bake a chocolate cake?",
    "What is the recipe for chocolate cake?",
    "I want to make a chocolate cake.",
    "Chocolate cake is delicious.",
    "How do I cook pasta?",
    "I need instructions to bake a cake.",
    "Baking requires precise measurements.",
    "I like vanilla ice cream.",
    "The weather is sunny today.",
    "What is the capital of France?",
    "Paris is a beautiful city.",
    "How can I improve my coding skills?",
    "Programming requires practice."
};

// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");

// Specify optional metadata to attach to the new collection (a.k.a. DataSource)
var collectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Create a new local vector database (overwriting if it already exists)
const string CollectionPath = "d:\\collection.ds";
var collection = DataSource.CreateFileDataSource(
    CollectionPath,
    "my-collection",
    model,
    collectionMetadata,
    overwrite: true);

// Compute embeddings to insert into the collection
var embedder = new Embedder(model);
List<DataSource.VectorEntry> vectorEntries = new List<DataSource.VectorEntry>();

// Run multithreaded embedding on the list of examples
var embeddings = embedder.GetEmbeddings(examples);

for (int index = 0; index < examples.Length; index++)
{
    vectorEntries.Add(new DataSource.VectorEntry(
        vector: embeddings[index],
        payload: examples[index]));
}

const string SectionIdentifier = "my-section-identifier";

// Specify optional metadata to attach to the new section.
// Note: a single collection can contain multiple sections.
var sectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Add the computed embedding vectors to the collection
collection.Upsert(
    SectionIdentifier,
    vectorEntries,
    sectionMetadata);

// Close the database
collection.Dispose();

Loading and Querying an Existing Database

// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");
const string CollectionPath = "d:\\collection.ds";

// Load our previously created database in read-only mode (sufficient for querying)
var collection = DataSource.LoadFromFile(
    CollectionPath,
    model,
    readOnly: true);

// Build the query vector
string query = "How do I bake a chocolate cake?";
var queryVector = new Embedder(model).GetEmbeddings(query);

// Search for similar vectors across partitions
var similarPartitions = VectorSearch.FindMatchingPartitions(
    [collection],
    model,
    queryVector);

// Do something with the search results...

Together, these two snippets show the full lifecycle of LM-Kit's built-in vector storage: how to create, populate, persist, and later reload your embedding collection for querying, all without relying on external infrastructure.

🎁 A CEO's Modest Proposal for the Brave

I'm offering a gift to anyone who manages to implement a faster .NET version of the 2 scripts above, without using LM-Kit.

LM-Kit's Built-In Vector DB: The Unsung Hero

Aspect	Details
Best When	• Desktop or local applications • Offline-first tools • Medium-scale datasets (up to millions of vectors) • No external database infrastructure available
Upsides	• Zero external dependencies • File-based portability • Fast local queries • Version control friendly
Downsides	• Single-machine scale • Not designed for distributed queries • Limited compared to specialized vector DBs

Conclusion

Embedding Storage Methods Compared

🔧 Method	✅ Best For	💾 Persistence	📈 Scale	🌐 Infra Required
In-Memory	Quick tests, small-scale prototyping	Temporary (can serialize manually)	Low	None
Built-In Vector DB	Local apps, offline tools, medium-scale use	Yes (file-based)	Medium (single machine)	None
Qdrant Vector Store	High-scale, distributed or cloud deployments	Yes	High	Qdrant instance
Custom via IVectorStore	Custom backends, proprietary infra	Yes (you implement it)	Varies	Your own infrastructure

Each method serves a purpose. Use in-memory embeddings for quick tests or when you're feeding results into something immediately. If you need persistence but want to keep things simple and local, go with LM-Kit's built-in vector DB. For large-scale or distributed systems, Qdrant is a solid external option. And if your stack is special, you can always bring your own vector store.

The best part? DataSource is unified. You can switch between these options without rewriting your code, just plug in a different backend and you're set.

Let us know what you're building. And if you're doing something wild with embeddings, we want to hear about it. ✨