<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Holger Imbery</title>
    <description>The latest articles on Forem by Holger Imbery (@holgerimbery).</description>
    <link>https://forem.com/holgerimbery</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2736716%2Fa3ddf62e-8581-4eea-b140-b3fc2121c057.png</url>
      <title>Forem: Holger Imbery</title>
      <link>https://forem.com/holgerimbery</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/holgerimbery"/>
    <language>en</language>
    <item>
      <title>Copilot Studio Billing – A Short Answer to a Frequent Question</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 02 May 2026 08:04:40 +0000</pubDate>
      <link>https://forem.com/holgerimbery/copilot-studio-billing-a-short-answer-to-a-frequent-question-50n5</link>
      <guid>https://forem.com/holgerimbery/copilot-studio-billing-a-short-answer-to-a-frequent-question-50n5</guid>
      <description>&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Organizations implementing Microsoft Copilot Studio often ask about the machine structure and how to monitor and analyze resource use across their deployed agents. This article addresses these common questions by examining the foundational principles of Copilot Studio's billing methodology. Specifically, the platform employs a usage-based billing model that operates through Copilot Credits, a standardized measurement unit that quantifies billable computational resources. Microsoft Copilot Studio gives you comprehensive visibility into consumption and costs through integrated Analytics. You get real-time visibility into billing metrics, consumption attribution, and capacity planning, without needing external tools or manual reconciliation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage-based billing with Copilot Credits
&lt;/h2&gt;

&lt;p&gt;Microsoft Copilot Studio employs a consumption-based billing model centered on Copilot Credits, a standardized unit of measurement that quantifies the billable computational resources your agent uses. We add up each agent's operational costs by summing all Copilot Credits used across your organizational tenant, providing a transparent, predictable billing framework. The volume of Copilot Credits your agent consumes is determined by multiple contributing factors, including the frequency and intensity of user interactions, the specific AI capabilities invoked during those interactions (such as natural language responses, backend action execution, or business process flows), and the overall complexity of the scenarios the agent must handle. This approach lets your costs scale with your actual usage and feature use across your organization. &lt;/p&gt;

&lt;h2&gt;
  
  
  Where to see billing and consumption
&lt;/h2&gt;

&lt;p&gt;Copilot Studio provides a dedicated Analytics page at the agent level that shows billing‑relevant data for a selected time period. This includes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p42tumzxnw0a7rmp2ij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p42tumzxnw0a7rmp2ij.png" alt="upgit_20260401_1775038928.png" width="385" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbfk25eo3xe6e29lgqrm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbfk25eo3xe6e29lgqrm.png" alt="upgit_20260401_1775040164.png" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Total billed Copilot Credits for the agent
&lt;/h3&gt;

&lt;p&gt;The Analytics dashboard provides comprehensive visibility into your agent's billing metrics through several key components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Billing trend visualization&lt;/strong&gt;: A temporal representation of Copilot Credit consumption plotted over your selected time period, enabling stakeholders to identify consumption patterns, peak usage intervals, and seasonal fluctuations in agent utilization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Activity-based consumption breakdown&lt;/strong&gt;: A detailed attribution analysis that segments credit consumption by interaction type and feature category. This granular view helps identify which capabilities—such as natural language processing, action executions, or business process integrations—are the primary drivers of your organization's computational costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Credit allocation and remaining capacity&lt;/strong&gt;: A dashboard element that displays your monthly credit allocation alongside actual consumption to date, providing clear visibility into remaining available credits within the current billing cycle and helping prevent unexpected cost overruns.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This analytical approach helps makers and administrators go beyond simple cost visibility to understand the factors and user behaviors that drive consumption patterns in their agents. &lt;/p&gt;

&lt;h3&gt;
  
  
  Near‑real‑time visibility
&lt;/h3&gt;

&lt;p&gt;Please note that consumption data in the Analytics experience isn't reflected immediately. Because data collection and aggregation are distributed across Microsoft's infrastructure, there is a deliberate processing interval between when an interaction occurs in your agent and when the corresponding credit consumption metrics appear in the Analytics dashboard. Specifically, recent user activity and associated Copilot Credit charges typically require several hours to propagate through the telemetry pipeline and surface in the analytics interface. This temporal lag between actual consumption and reported metrics is a critical consideration when conducting performance monitoring, particularly in scenarios involving new agent deployments, recent architectural modifications, or optimization initiatives. Organizations implementing consumption tracking during these periods should account for this delay when interpreting analytics data and making operational decisions based on observed billing trends. &lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;The relationship between billing metrics and agent design decisions is a core principle in building and running Microsoft Copilot Studio deployments. Organizations that recognize and leverage this connection are better positioned to make informed architectural decisions that balance functional requirements with cost efficiency. The Analytics experience provided within Copilot Studio serves a dual purpose that extends well beyond simple billing transparency. As a primary governance tool, it enables organizations to establish consumption baselines, define and enforce cost budgets, and implement optimization strategies at both the agent and organizational levels. This comprehensive analytical framework becomes increasingly critical as agents transition from the experimentation and proof-of-concept phases to production environments, where operational costs accumulate rapidly, and optimization opportunities become more constrained. By establishing clear visibility into consumption patterns during early development stages, teams can identify inefficient architectural patterns, optimize interaction flows, and refine AI capabilities—all before deploying agents at scale. Furthermore, the transparency provided by the Analytics dashboard facilitates organizational governance by empowering stakeholders to establish accountability for resource utilization, track consumption trends against budgetary targets, and make data-driven decisions regarding agent expansion, feature prioritization, and technology investments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links and resources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/requirements-messages-management?source=recommendations" rel="noopener noreferrer"&gt;billing rates&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Microsoft Copilot Studio's billing model, centered on usage-based Copilot Credits, provides a transparent and scalable framework for managing the costs associated with AI agent deployment. By leveraging the built-in Analytics experience, organizations can gain comprehensive insights into their agents' consumption patterns, enabling informed decision-making around agent design, optimization, and cost management. As organizations continue to adopt and scale their use of AI agents, understanding the nuances of billing and consumption visibility will be essential for maximizing the value of their investments in Microsoft Copilot Studio while maintaining control over operational costs.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>agents</category>
    </item>
    <item>
      <title>Microsoft IQ: The New Intelligence Layer for Enterprise AI Agents</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 25 Apr 2026 08:42:12 +0000</pubDate>
      <link>https://forem.com/holgerimbery/microsoft-iq-the-new-intelligence-layer-for-enterprise-ai-agents-281b</link>
      <guid>https://forem.com/holgerimbery/microsoft-iq-the-new-intelligence-layer-for-enterprise-ai-agents-281b</guid>
      <description>&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt; &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Microsoft's new IQ layers - &lt;strong&gt;Work IQ&lt;/strong&gt;, &lt;strong&gt;Fabric IQ&lt;/strong&gt;, and &lt;strong&gt;Foundry IQ&lt;/strong&gt; - are unified intelligence systems that give enterprise AI agents deep organizational context. Rather than relying solely on general knowledge, these layers ground agents in your company's real data, workflows, and knowledge domains, enabling them to make decisions with the credibility and awareness of experienced employees.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why read this:&lt;/strong&gt; If you're building or deploying AI agents in enterprise environments, understanding these three intelligence layers is essential. Learn what each layer does, the business value they unlock, practical developer integration steps with Copilot Studio, and real limitations to watch for - whether you're looking to protect your competitive edge or avoid common deployment pitfalls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: Why Enterprise Context Is the New AI Differentiator
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AI agents - autonomous systems that plan, reason, and act on behalf of users - are moving from experimentation to production at enterprise scale.&lt;/strong&gt; A July 2024 Capgemini survey of 1,100 companies with over \$1 billion in annual revenue found that &lt;strong&gt;82% plan to integrate AI agents within the next one to three years&lt;/strong&gt;, with only 7% reporting no plans at all Of those surveyed, 71% expect AI agents to drive automation, and 64% expect them to free human workers from repetitive tasks so they can focus on higher-value functions. Separately, KPMG's Q2 2025 AI Quarterly Pulse Survey of 130 U.S.-based C-suite leaders (from organizations with \$1 billion or more in revenue) reports that &lt;strong&gt;33% of organizations have now deployed at least some agents - a three-fold increase after two consecutive quarters at 11%&lt;/strong&gt;. The same survey found that &lt;strong&gt;82% of leaders agree their industry's competitive landscape will look fundamentally different within 24 months&lt;/strong&gt; due to AI.&lt;/p&gt;

&lt;p&gt;Yet deploying powerful language models alone does not guarantee business impact. Large language models ship with broad general knowledge but lack awareness of an organization's current data, internal processes, contractual obligations, and human workflow patterns. The real differentiator is no longer &lt;em&gt;how smart the model is&lt;/em&gt;, but &lt;strong&gt;how well it understands your organization&lt;/strong&gt;. At &lt;strong&gt;Microsoft Ignite 2025&lt;/strong&gt;, Microsoft addressed this gap by introducing a &lt;strong&gt;"Unified Context Layer"&lt;/strong&gt; comprising three tightly connected intelligence systems: &lt;strong&gt;Work IQ&lt;/strong&gt;, &lt;strong&gt;Fabric IQ&lt;/strong&gt;, and &lt;strong&gt;Foundry IQ&lt;/strong&gt;. Together, they form &lt;strong&gt;"Microsoft IQ"&lt;/strong&gt; - a unified intelligence layer spanning productivity, data, and knowledge - designed to ground AI agents with deep enterprise context so they can make reliable decisions and continuously optimize operations.&lt;/p&gt;

&lt;p&gt;Each IQ layer addresses a distinct dimension of organizational context:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;IQ Layer&lt;/th&gt;
&lt;th&gt;Context Domain&lt;/th&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Primary Function&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Work IQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User &amp;amp; work context&lt;/td&gt;
&lt;td&gt;Microsoft 365&lt;/td&gt;
&lt;td&gt;Captures collaboration signals - emails, meetings, chats, documents, relationships - and builds persistent memory of how people and teams work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fabric IQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Business &amp;amp; data context&lt;/td&gt;
&lt;td&gt;Microsoft Fabric&lt;/td&gt;
&lt;td&gt;Unifies analytical, operational, and real-time data into a governed semantic model with ontologies, graphs, and business rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Foundry IQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Knowledge &amp;amp; reasoning context&lt;/td&gt;
&lt;td&gt;Microsoft Foundry (Azure AI Foundry)&lt;/td&gt;
&lt;td&gt;Creates multi-source, permission-aware knowledge bases with agentic retrieval for grounded, citation-backed answers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each workload is standalone, but they can be used together to provide a comprehensive organizational context for agents. For business decision-makers, these layers translate into &lt;strong&gt;faster time-to-insight, more trustworthy AI outputs, and the ability to deploy agents that act with the contextual awareness of experienced employees rather than generic assistants&lt;/strong&gt;. For developers building with &lt;strong&gt;Microsoft Copilot Studio&lt;/strong&gt; (low-code) or the &lt;strong&gt;Microsoft Agent Framework / Agent 365&lt;/strong&gt; (pro-code), the IQ layers provide ready-made intelligence services that eliminate the need to hand-build retrieval pipelines, semantic models, or user-context systems from scratch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93d1mlgwb5q77aluvnjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93d1mlgwb5q77aluvnjj.png" alt="upgit_20260407_1775544328.png" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Work IQ - Personalizing AI with User &amp;amp; Collaboration Context
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Work IQ is the intelligence layer in Microsoft 365 that gives AI agents a real-time, permission-aware understanding of how people actually work.&lt;/strong&gt; It is built on three tightly integrated layers - &lt;strong&gt;Data, Memory, and Inference&lt;/strong&gt; - that work together to provide Microsoft 365 Copilot and custom agents with continuous contextual understanding of work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data&lt;/strong&gt; unifies signals from files, emails, meetings, chats, and business systems across Microsoft 365 to capture how work happens across the organization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; builds a persistent understanding of how people and teams work, enabling agents to stay aligned to priorities and remain consistent across tasks, apps, and sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference&lt;/strong&gt; brings together models, skills, and tools so agents can reason and take action while the &lt;strong&gt;Agent 365 control plane&lt;/strong&gt; ensures those actions remain observable, governed, and compliant.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Work IQ connects to organizational and personal data - SharePoint files, Outlook emails, Teams meetings - and builds personalized memory based on user preferences, habits, and workflows. &lt;strong&gt;Conversational memory&lt;/strong&gt; in Microsoft 365 Copilot, powered by Work IQ, enables it to retain context and important details across sessions by drawing on a user's work profile, instructions, preferences, and insights from past chats. Users stay in control and can review or delete these memories at any time.&lt;/p&gt;

&lt;p&gt;Importantly, Work IQ does not merely retrieve information - &lt;strong&gt;it interprets context&lt;/strong&gt;. This is why a Work IQ-enabled agent can answer questions like &lt;em&gt;"What did we decide last week about the field project budget?"&lt;/em&gt; or &lt;em&gt;"Summarise the latest customer escalations and draft a report"&lt;/em&gt;. It reasons over signals, patterns, and workflows rather than searching a document library.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategic Value for Business
&lt;/h3&gt;

&lt;p&gt;For decision-makers, Work IQ delivers &lt;strong&gt;personalization at scale&lt;/strong&gt; without sacrificing security. It enables AI agents to recognize who a user works with, what they focus on, and how they typically accomplish tasks. Microsoft is now exposing the power of Work IQ through &lt;strong&gt;APIs&lt;/strong&gt;, allowing developers to build AI agents targeting specific enterprise scenarios beyond what the built-in Copilot offers.&lt;/p&gt;

&lt;p&gt;Work IQ also surfaces &lt;strong&gt;workflow intelligence&lt;/strong&gt; - for example, identifying that operations teams are overloaded handling exceptions manually, observing long email chains, or spotting recurring "delay review" meetings that consume capacity. This kind of insight goes beyond analytics into organizational awareness, helping leaders understand &lt;em&gt;where human effort is being consumed&lt;/em&gt; and where agents can provide the most value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff consideration:&lt;/strong&gt; Work IQ can only be as effective as the signals it can access. It is accessed primarily through Copilot and works best when collaboration data is well-structured and consistently captured in Microsoft 365 tools. Organizations with fragmented communication (e.g., heavy use of external email, shadow IT chat tools, or poorly organized SharePoint) will see diminished returns until information management improves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Integration - Copilot Studio
&lt;/h3&gt;

&lt;p&gt;Work IQ is surfaced to developers as &lt;strong&gt;Model Context Protocol (MCP) tools&lt;/strong&gt; that can be attached to agents in Copilot Studio. The following step-by-step process adds the &lt;strong&gt;Work IQ Mail&lt;/strong&gt; server to an agent:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4or7wocb3mh2u7idd0jl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4or7wocb3mh2u7idd0jl.png" alt="upgit_20260407_1775544135.png" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sign in to &lt;strong&gt;Copilot Studio&lt;/strong&gt; and select or create your agent.&lt;/li&gt;
&lt;li&gt;Select the &lt;strong&gt;Tools&lt;/strong&gt; tab and then &lt;strong&gt;Add Tool&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;On the &lt;em&gt;Add tool&lt;/em&gt; page, select &lt;strong&gt;Model Context Protocol&lt;/strong&gt; to see Work IQ MCP servers and other MCP servers.&lt;/li&gt;
&lt;li&gt;Type &lt;strong&gt;"mail"&lt;/strong&gt; in the search box.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Work IQ Mail&lt;/strong&gt; and expand the &lt;strong&gt;connection&lt;/strong&gt; dropdown to select &lt;strong&gt;Create New Connection&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Create&lt;/strong&gt;, provide credentials, and complete the sign-in process.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Add and Configure&lt;/strong&gt; to complete the process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the agent&lt;/strong&gt; - for example, prompt: &lt;em&gt;"Send an email to [name] and ask how the hands-on lab is going."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;When asked to allow the Work IQ tool to connect and use services, select &lt;strong&gt;Allow&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After configuration, the agent can read email content, understand the context, and respond accordingly. &lt;strong&gt;Repeat these steps&lt;/strong&gt; for &lt;strong&gt;Work IQ Calendar&lt;/strong&gt; or &lt;strong&gt;Work IQ Teams&lt;/strong&gt; to extend the agent's capabilities with meeting insights, chats, and more&amp;gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisite:&lt;/strong&gt; A &lt;strong&gt;Microsoft 365 Copilot license&lt;/strong&gt; is required to use Work IQ MCP servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Integration - Microsoft Agent Framework (Agent 365)
&lt;/h3&gt;

&lt;p&gt;For pro-code developers, the Work IQ tooling infrastructure is built into the &lt;strong&gt;Microsoft Agent 365 SDK and CLI&lt;/strong&gt;, Microsoft Foundry, and Copilot Studio. Agent 365 provides a secure, centralized gateway for extending agents with enterprise-ready tools through Work IQ for Microsoft 365 services and custom tooling servers for specialized workflows. This means developers building agents programmatically - using the Agent 365 SDK in Python, C#, or other supported languages - can invoke Work IQ's capabilities (querying recent communications, documents, or calendar entries) as part of an agent's reasoning loop without manually calling Microsoft Graph APIs and interpreting raw data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Governance and Security
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;IT administrators retain full control&lt;/strong&gt; over Work IQ MCP tools through the &lt;strong&gt;Microsoft 365 admin center&lt;/strong&gt; under the &lt;strong&gt;Agents and Tools&lt;/strong&gt; section, where they can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;View all activated Work IQ MCP servers (Work IQ Mail, Work IQ Calendar, Work IQ Teams, and any custom servers).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Allow or block&lt;/strong&gt; specific servers based on organizational policies.&lt;/li&gt;
&lt;li&gt;Apply &lt;strong&gt;scoped permissions&lt;/strong&gt; so agents only access what they need.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an admin blocks a Work IQ MCP tool or MCP server, it blocks access for &lt;strong&gt;every user and every agent&lt;/strong&gt;. Permissions always take precedence over configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; is built in via &lt;strong&gt;Microsoft Defender&lt;/strong&gt;. Admins can run queries in &lt;strong&gt;Advanced Hunting&lt;/strong&gt; to inspect trace logs of tool calls made by agents, monitor execution details (which tools were invoked, parameters passed, and outcomes), and detect anomalies or unauthorized usage patterns.&lt;/p&gt;

&lt;p&gt;All Work IQ MCP servers also undergo &lt;strong&gt;continuous evaluation&lt;/strong&gt;to measure accuracy, latency, and reliability, ensuring production-grade robustness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fabric IQ - Turning Enterprise Data into Business-Meaningful Intelligence
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fabric IQ (preview) is a workload in Microsoft Fabric that unifies data across OneLake and organizes it in the language of your business.&lt;/strong&gt; The unified data is then exposed to analytics, AI agents, and applications with &lt;strong&gt;consistent semantic meaning and context&lt;/strong&gt;. While Work IQ understands &lt;em&gt;work&lt;/em&gt;, Fabric IQ understands &lt;em&gt;data&lt;/em&gt; - and specifically what data &lt;strong&gt;means&lt;/strong&gt; in business terms.&lt;/p&gt;

&lt;p&gt;Fabric IQ models business data through &lt;strong&gt;ontologies, semantic models, and graphs&lt;/strong&gt; so agents can reason over analytics in OneLake and Power BI. It combines the following items into one semantic intelligence workload&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fabric IQ Item&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ontology (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise vocabulary and semantic layer that defines entity types, relationships, properties, and condition-action rules (through Fabric Activator). Binds definitions to real data so downstream tools share the same language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plan (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unified no-code platform for collaborative planning, reporting, analytics, data integration, and management on a single platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graph (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native graph storage and compute for nodes, edges, and traversals over connected data - suited to path finding, dependency analysis, and graph algorithms. Integrated with the ontology item&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Agent (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conversational Q&amp;amp;A systems using generative AI that connect to the ontology to understand business concepts when answering questions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operations Agent (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI agent to monitor real-time data and recommend business actions, aware of business terminology from the ontology&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power BI Semantic Models&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Curated analytics models optimized for reporting and interactive analysis with measures, hierarchies, and relationships. Ontologies can be generated directly from them&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Several of these items are shared with other Fabric workloads (e.g., Graph and Operations Agent are also part of Real-Time Intelligence; Data Agent is shared with Data Science).&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategic Value for Business
&lt;/h3&gt;

&lt;p&gt;Fabric IQ delivers six key benefits identified by Microsoft:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unification of data&lt;/strong&gt; - Combines data from various OneLake sources (lakehouses, eventhouses, Power BI semantic models) into a single consistent model. Can also unify external operational data using OneLake shortcuts without copying data or building ETL pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent language across tools&lt;/strong&gt; - A single definition of a concept (like &lt;em&gt;Customer&lt;/em&gt;, &lt;em&gt;Material&lt;/em&gt;, or &lt;em&gt;Asset&lt;/em&gt;) drives how Power BI, notebooks, and agents interpret data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster onboarding&lt;/strong&gt; - Business concepts only need to be declared once, then new dashboards and AI experiences inherit that meaning automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance and trust&lt;/strong&gt; - Reduces duplication and inconsistent definitions across teams by enforcing clear semantics, while constraints improve data quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-domain reasoning&lt;/strong&gt; - Represents relationships between concepts with graph links, enabling traversals like &lt;em&gt;Order → Shipment → Temperature Sensor → Cold Chain Breach&lt;/em&gt; to explain outcomes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI readiness and decision-ready actions&lt;/strong&gt; - Provides structured grounding for copilots and agents so answers reflect enterprise language. Rules defined in the ontology (via Fabric Activator) enable governed, real-time actions (e.g., alerts and notifications) when conditions are met.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;strong&gt;finance teams&lt;/strong&gt;, this means an AI agent can answer questions like &lt;em&gt;"Why did request volume spike in the North region last month?"&lt;/em&gt; or &lt;em&gt;"Show anomalies in field service cycle time"&lt;/em&gt; - grounded in what the data &lt;strong&gt;means&lt;/strong&gt;, not just where it lives. The semantic model ensures that metrics such as "net profit" and "customer churn" are calculated exactly as defined by the business, so CFOs can trust the AI's output.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;operations teams&lt;/strong&gt;, Fabric IQ's ontology can model supply chain entities, inventory levels, delivery metrics, and their relationships. When an on-time delivery percentage dips below historical norms, a Fabric IQ-informed agent can detect the anomaly, correlate it with upstream bottlenecks, and surface the issue before it escalates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff consideration:&lt;/strong&gt; Fabric IQ requires upfront investment in &lt;strong&gt;semantic modeling&lt;/strong&gt;. Organizations must define ontologies and business rules, which demands collaboration between data engineers and domain experts. Once agents depend on business meaning, that meaning becomes production infrastructure - semantic models must be versioned, governed, deployed, and monitored with the same rigor applied to code. For organizations already using Power BI, existing data models provide a head start: they instantly serve as a catalyst, giving agents rich, business-specific context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Integration
&lt;/h3&gt;

&lt;p&gt;Developers interact with Fabric IQ primarily through the &lt;strong&gt;Microsoft Fabric portal&lt;/strong&gt;, where they can create ontologies, bind them to data sources, and build Data Agents or Operations Agents that reason over the unified data model. The recommended approach for choosing the right Fabric IQ item depends on the scenario:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ontology (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cross-domain consistency, governance, and AI/agent grounding; reasoning across processes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graph (preview)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Relationship-heavy questions (impact chains, communities, shortest paths) dominate; GQL-style pattern matching needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power BI Semantic Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Business users need trusted KPIs and fast visuals with dimensional modeling and governed datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Several key item relationships support combined use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ontology + Semantic Model:&lt;/strong&gt; Generate or align Power BI semantic models so terminology and KPIs stay consistent across reports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ontology + Graph:&lt;/strong&gt; Ontology declares which things connect and why; Graph stores and computes traversals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ontology + Data/Operations Agents:&lt;/strong&gt; Ontology grounds agents in shared business semantics and rules, enabling them to retrieve context, reason across domains, and trigger governed actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan + Semantic Model:&lt;/strong&gt; Plan connects to existing semantic models, allowing dimensions and measures to be used in planning sheets for seamless plan-versus-actuals analytics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Foundry IQ - Unified Knowledge Retrieval and Reasoning for AI Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Foundry IQ (preview) is a managed knowledge layer in Microsoft Foundry that enables the creation of configurable, multi-source &lt;em&gt;knowledge bases&lt;/em&gt; providing agents with permission-aware, citation-backed responses based on organizational data&lt;/strong&gt;. It tackles what many architects consider the hardest challenge in agent design: &lt;strong&gt;knowledge retrieval and grounding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A Foundry IQ knowledge base consists of &lt;strong&gt;knowledge sources&lt;/strong&gt; (connections to internal and external data stores) and &lt;strong&gt;parameters that control retrieval behavior&lt;/strong&gt;. Multiple agents can share the same knowledge base. When an agent queries the knowledge base, Foundry IQ uses &lt;strong&gt;agentic retrieval&lt;/strong&gt; - a multi-query pipeline - to process the query, retrieve relevant information, enforce user permissions, and return grounded answers with citations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Capabilities
&lt;/h3&gt;

&lt;p&gt;Foundry IQ offers the following capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-source knowledge bases:&lt;/strong&gt; Connect one knowledge base to multiple agents. Supported knowledge sources include &lt;strong&gt;Azure Blob Storage, SharePoint, OneLake&lt;/strong&gt;, and &lt;strong&gt;public web data&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated document processing:&lt;/strong&gt; Automate document chunking, &lt;strong&gt;vector embedding generation&lt;/strong&gt;, and metadata extraction for indexed knowledge sources. Schedule &lt;strong&gt;recurring indexer runs&lt;/strong&gt; for incremental data refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible query modes:&lt;/strong&gt; Issue &lt;strong&gt;keyword, vector, or hybrid queries&lt;/strong&gt; across indexed and remote knowledge sources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic retrieval engine:&lt;/strong&gt; Uses a large language model to &lt;strong&gt;plan queries, select sources, run parallel searches, and aggregate results&lt;/strong&gt;. The retrieval reasoning effort can be configured at three levels: &lt;strong&gt;minimal, low, or medium&lt;/strong&gt; for LLM processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extractive data with citations:&lt;/strong&gt; Returns answers with source references so agents can reason over raw content and &lt;strong&gt;trace answers back to source documents&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permission-aware:&lt;/strong&gt; Synchronizes &lt;strong&gt;access control lists (ACLs)&lt;/strong&gt; for supported sources and honors &lt;strong&gt;Microsoft Purview sensitivity labels&lt;/strong&gt;. Enforces permissions at query time so agents return only authorized content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity-based queries:&lt;/strong&gt; Runs queries under the caller's &lt;strong&gt;Microsoft Entra identity&lt;/strong&gt; for end-to-end permission enforcement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying indexing and retrieval infrastructure is powered by &lt;strong&gt;Azure AI Search&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Components
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge base&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top-level resource that orchestrates agentic retrieval. Defines which knowledge sources to query and parameters that control retrieval behavior, including retrieval reasoning effort (minimal, low, or medium)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge sources&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connections to indexed or remote content. A knowledge base references one or more knowledge sources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agentic retrieval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-query pipeline that decomposes complex questions into subqueries, executes them in parallel, semantically reranks results, and returns unified responses. Uses an optional LLM from Azure OpenAI in Foundry Models for query planning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Foundry IQ knowledge bases can be used in &lt;strong&gt;Foundry Agent Service, Microsoft Agent Framework, or any custom application&lt;/strong&gt; by calling the knowledge base APIs from Azure AI Search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategic Value for Business
&lt;/h3&gt;

&lt;p&gt;Foundry IQ is the layer that makes AI &lt;strong&gt;enterprise-grade for knowledge work&lt;/strong&gt;. It enables agents to understand contracts, policies, procedures, SLAs, regulatory constraints, and unstructured documents - and to reason across them safely. The combination of &lt;strong&gt;permission enforcement and source citations&lt;/strong&gt; directly addresses the two most common executive concerns about enterprise AI: data leakage and hallucination. Every assertion the agent makes can be traced to a vetted document, supporting trust and auditability.&lt;/p&gt;

&lt;p&gt;Foundry IQ serves as &lt;strong&gt;a single endpoint for high-quality organizational data&lt;/strong&gt; to maximize context for AI applications. Its knowledge retrieval engine runs over multiple data sources including Work IQ, Fabric IQ, Azure data services, custom web applications, and the web.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff consideration:&lt;/strong&gt; Because Foundry IQ indexes and retrieves content automatically, the &lt;strong&gt;quality of the knowledge base depends heavily on the quality and curation of source content&lt;/strong&gt;. Outdated, duplicated, or poorly written documents will produce lower-quality retrieval results. Organizations should invest in content hygiene (removing obsolete documents, standardizing formatting, clarifying ownership) before connecting sources to Foundry IQ. Additionally, Foundry IQ is currently in &lt;strong&gt;public preview&lt;/strong&gt; without a production service-level agreement, which means production-critical workloads should be tested thoroughly and planned around the preview constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Integration - Setting Up via the Microsoft Foundry Portal
&lt;/h3&gt;

&lt;p&gt;The typical portal-based workflow for Foundry IQ:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sign in to &lt;strong&gt;Microsoft Foundry&lt;/strong&gt; at &lt;a href="https://ai.azure.com" rel="noopener noreferrer"&gt;https://ai.azure.com&lt;/a&gt;. Ensure the &lt;strong&gt;"New Foundry"&lt;/strong&gt; toggle is on.&lt;/li&gt;
&lt;li&gt;Create a project or select an existing project.&lt;/li&gt;
&lt;li&gt;From the top menu, select &lt;strong&gt;Build&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;On the &lt;strong&gt;Knowledge&lt;/strong&gt; tab:

&lt;ul&gt;
&lt;li&gt;Create or connect to an existing search service that supports agentic retrieval.&lt;/li&gt;
&lt;li&gt;Create a knowledge base by adding one knowledge source at a time.&lt;/li&gt;
&lt;li&gt;Configure knowledge base properties for retrieval behavior.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;On the &lt;strong&gt;Agents&lt;/strong&gt; tab:

&lt;ul&gt;
&lt;li&gt;Create or select an existing agent.&lt;/li&gt;
&lt;li&gt;Connect to your knowledge base.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;playground&lt;/strong&gt; to send messages and refine your agent.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;For proof-of-concept testing, you can use the &lt;strong&gt;free tier for Azure AI Search&lt;/strong&gt; and a free allocation of tokens for agentic retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Integration - Connecting Foundry IQ to Agents Programmatically
&lt;/h3&gt;

&lt;p&gt;For pro-code developers, the connection from an agent to a Foundry IQ knowledge base uses the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; to facilitate tool calls. When invoked by the agent, the knowledge base orchestrates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plans and decomposes the user query into subqueries.&lt;/li&gt;
&lt;li&gt;Processes the subqueries simultaneously using keyword, vector, or hybrid techniques.&lt;/li&gt;
&lt;li&gt;Applies &lt;strong&gt;semantic reranking&lt;/strong&gt; to identify the most relevant results.&lt;/li&gt;
&lt;li&gt;Synthesizes the results into a &lt;strong&gt;unified response with source references&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SDK and API support&lt;/strong&gt; (as of the documentation)&amp;gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Python SDK&lt;/th&gt;
&lt;th&gt;C# SDK&lt;/th&gt;
&lt;th&gt;JavaScript SDK&lt;/th&gt;
&lt;th&gt;Java SDK&lt;/th&gt;
&lt;th&gt;REST API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft Foundry&lt;/td&gt;
&lt;td&gt;✔️&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;✔️&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt; for programmatic setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;strong&gt;Azure AI Search service&lt;/strong&gt; with a knowledge base containing one or more knowledge sources.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Microsoft Foundry project&lt;/strong&gt; with an LLM deployment (such as gpt-4.1-mini).&lt;/li&gt;
&lt;li&gt;Authentication and permissions configured on the search service and project.&lt;/li&gt;
&lt;li&gt;Python SDK version &lt;strong&gt;2.0.0 or later&lt;/strong&gt; or the &lt;strong&gt;2025-11-01-preview&lt;/strong&gt; REST API version:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For role-based access control (RBAC):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure AI User&lt;/strong&gt; role on the parent resource to access model deployments and create agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure AI Project Manager&lt;/strong&gt; role on the parent resource to create a project connection for MCP authentication.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;system-assigned managed identity&lt;/strong&gt; on the project for interactions with Azure AI Search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft provides an end-to-end Python sample on GitHub - the &lt;strong&gt;agentic-retrieval-pipeline-example&lt;/strong&gt; - for integrating Azure AI Search and Foundry Agent Service for knowledge retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Three IQ Layers Work Together - A Practical Scenario
&lt;/h2&gt;

&lt;p&gt;Understanding each IQ layer individually is important; understanding how they &lt;strong&gt;combine&lt;/strong&gt; is what unlocks transformative use cases. As one analysis frames it: think of a three-layer stack - &lt;strong&gt;Fabric IQ&lt;/strong&gt; at the foundation (structured data intelligence), &lt;strong&gt;Foundry IQ&lt;/strong&gt; in the middle (reasoning and knowledge grounding), and &lt;strong&gt;Work IQ&lt;/strong&gt; on top (human workflow intelligence).&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario: Supply Chain Delay Management (Operations)
&lt;/h3&gt;

&lt;p&gt;Consider a company building an AI agent to help manage supply chain delays:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fabric IQ&lt;/strong&gt; detects anomalies in delivery metrics. It sees that certain suppliers are trending late beyond historical norms, notices that on-time delivery percentages are dipping in specific regions, and correlates delays with upstream bottlenecks. &lt;strong&gt;This is data-driven awareness&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundry IQ&lt;/strong&gt; grounds the agent in supplier contracts, SLAs, penalty clauses, and internal policies. It understands what the agreement actually says about late deliveries, interprets escalation thresholds, and knows which suppliers have stricter terms. &lt;strong&gt;This is contextual reasoning&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Work IQ&lt;/strong&gt; observes that operations teams are overloaded handling these exceptions manually. It sees long email chains, recurring "delay review" meetings, and individuals spending hours every week tracking updates from vendors. It identifies patterns of reactive work consuming capacity. &lt;strong&gt;This is workflow intelligence&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;agent&lt;/strong&gt; combines all three streams: it recommends which delays need escalation based on contractual impact, drafts communications to suppliers referencing the correct SLA language, suggests internal reprioritization, and surfaces issues before they become crises.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Customer Service Knowledge Agent
&lt;/h3&gt;

&lt;p&gt;A customer service team deploys an agent using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Foundry IQ&lt;/strong&gt; as the primary knowledge source, indexing product manuals, troubleshooting guides, FAQs, and past support ticket resolutions. When a customer asks about error code E305, the agent retrieves the relevant manual section with a citation to the source document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Work IQ&lt;/strong&gt; to access recent internal communications - for example, identifying that an engineering team discussed this exact error in a Teams conversation last week and already developed a workaround. The agent can surface this workaround alongside the official documentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fabric IQ&lt;/strong&gt; to check whether the error is correlated with a particular product batch or region by querying the semantic model of manufacturing and logistics data, enabling the support agent to proactively notify affected customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Financial Reporting and Analysis (Finance)
&lt;/h3&gt;

&lt;p&gt;A finance team connects Fabric IQ to its consolidated financial data in OneLake and Power BI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Ontology&lt;/strong&gt; defines entities like &lt;em&gt;Account&lt;/em&gt;, &lt;em&gt;Transaction&lt;/em&gt;, &lt;em&gt;Cost Center&lt;/em&gt;, and &lt;em&gt;Region&lt;/em&gt; with standardized definitions for metrics like &lt;em&gt;Operating Margin&lt;/em&gt; and &lt;em&gt;Revenue Growth&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Data Agent&lt;/strong&gt; in Fabric IQ allows analysts to ask natural-language questions such as &lt;em&gt;"What are the top five cost centers by budget variance this quarter?"&lt;/em&gt; - grounded in the official semantic model, ensuring the answer uses Finance's own calculation methodology.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundry IQ&lt;/strong&gt; supplements this with knowledge from internal accounting policies, audit findings, and regulatory guidance documents, so the agent can explain &lt;em&gt;why&lt;/em&gt; a variance occurred and whether it triggers any policy-based escalation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Work IQ&lt;/strong&gt; can surface the context of recent discussions among the finance team (e.g., &lt;em&gt;"The CFO discussed this variance in Monday's meeting and requested a root-cause analysis by Friday"&lt;/em&gt;), ensuring the AI's recommendations are aligned with current priorities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Guidance for Copilot Studio and Agent Framework Developers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choosing the Right Development Platform
&lt;/h3&gt;

&lt;p&gt;Microsoft offers two primary paths for agent development, and &lt;strong&gt;both&lt;/strong&gt; can leverage the IQ layers. The choice depends on the developer persona and scenario complexity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Copilot Studio&lt;/th&gt;
&lt;th&gt;Microsoft Agent Framework (Agent 365 SDK)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target audience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Business users, makers, power users, fusion teams&lt;/td&gt;
&lt;td&gt;Professional developers, IT teams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Approach&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low-code / no-code with visual design canvas&lt;/td&gt;
&lt;td&gt;Pro-code (Python, C# SDKs) with full programmatic control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IQ integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Work IQ via MCP tools (Add Tool &amp;gt; Model Context Protocol); Foundry IQ via MCP connection; Fabric IQ via Fabric Data Agent&lt;/td&gt;
&lt;td&gt;Work IQ via Agent 365 SDK; Foundry IQ via knowledge base APIs and Python SDK; Fabric IQ via APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Governance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Built-in through Agent 365 control plane + M365 admin center&lt;/td&gt;
&lt;td&gt;Same Agent 365 governance + Azure RBAC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FAQ bots, task-specific assistants, business-process agents with moderate complexity&lt;/td&gt;
&lt;td&gt;Multi-agent orchestration, complex retrieval pipelines, custom reasoning logic, enterprise-grade production agents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Copilot Studio is aimed at &lt;strong&gt;low-code builders&lt;/strong&gt; while Azure AI Foundry serves &lt;strong&gt;pro-code developers&lt;/strong&gt;; Agent 365 delivers a consistent, developer-friendly experience backed by rigorous evaluation for accuracy, latency, and reliability across both paths.&lt;/p&gt;

&lt;p&gt;These platforms are not mutually exclusive. A common pattern is to use &lt;strong&gt;Foundry&lt;/strong&gt; to build and fine-tune the agent's reasoning backend (including Foundry IQ knowledge bases) and &lt;strong&gt;Copilot Studio&lt;/strong&gt; for the conversational front-end and deployment to Microsoft 365 channels. Microsoft also supports &lt;strong&gt;connecting a Foundry agent directly into Copilot Studio&lt;/strong&gt; for organizations that want pro-code control over the backend with low-code deployment】.&lt;/p&gt;

&lt;h3&gt;
  
  
  Additional Developer Resources
&lt;/h3&gt;

&lt;p&gt;Microsoft provides a &lt;strong&gt;hands-on learning experience&lt;/strong&gt; called the &lt;strong&gt;IQ Series&lt;/strong&gt; - an official GitHub repository (microsoft/iq-series) that includes video episodes, Jupyter notebooks, and Azure deployment templates spanning Foundry IQ, Work IQ, and Fabric IQ. This is a valuable starting point for developers exploring integration patterns.&lt;/p&gt;

&lt;p&gt;For Copilot Studio, the &lt;strong&gt;Copilot-Studio-and-Azure&lt;/strong&gt; GitHub repository includes a lab on &lt;strong&gt;Microsoft Foundry agentic retrieval&lt;/strong&gt; (labs/2.4-microsoft-foundry-agentic-retrieval) with a notebook (foundry-IQ-agents.ipynb) that demonstrates how to connect Copilot Studio agents to Foundry IQ.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Best Practices
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Curate your knowledge sources deliberately.&lt;/strong&gt; For Foundry IQ, prioritize high-quality, authoritative content - official policy libraries, product documentation, and knowledge articles that customer service reps already use. Remove outdated or duplicated material before indexing. Please use scheduled indexing for incremental data refresh so agents always use the current information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Invest in semantic modeling.&lt;/strong&gt; For Fabric IQ, collaborate with business domain experts to design ontologies that capture actual business rules, relationships, and terminology. Start from existing &lt;strong&gt;Power BI semantic models&lt;/strong&gt; when possible - they can be used as a bootstrap for ontologies, keeping language consistent across Fabric experiences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Clean up collaboration data.&lt;/strong&gt; For Work IQ, ensure that the organization's SharePoint structures are tidy, file ownership is clear, and key processes are documented. Reduce duplication and align Dataverse models with real business logic. As noted in an analysis by VisualLabs, &lt;em&gt;"Copilot cannot infer intent from SharePoint chaos"&lt;/em&gt; - AI amplifies what already exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Combine IQ layers for maximum impact. **Foundry IQ can incorporate Work IQ and Fabric IQ as data sources&lt;/strong&gt;, enabling custom agents to unify all three context dimensions through a single retrieval interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Govern and monitor rigorously.&lt;/strong&gt; Use the &lt;strong&gt;M365 admin center&lt;/strong&gt; for Work IQ tool governance and &lt;strong&gt;Azure RBAC&lt;/strong&gt; for Foundry IQ permissions. Use &lt;strong&gt;Microsoft Defender's Advanced Hunting&lt;/strong&gt; to audit all tool calls your agents make in production. For Foundry IQ, enforce permissions at query time by passing user tokens to filter results based on identity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Use the "Bring Your Own Model" capability strategically.&lt;/strong&gt; Copilot Studio supports connecting models from &lt;strong&gt;Microsoft Foundry's model catalog&lt;/strong&gt; (including GPT 4.5, Llama, DeepSeek, and &lt;strong&gt;11,000+ more models&lt;/strong&gt;) to specific prompt actions within an agent. This allows you to pick the best-performing model for each task - not just use a single model globally. Governance for these model connections is managed through &lt;strong&gt;Power Platform admin center policies&lt;/strong&gt; under the "Microsoft Foundry" connector.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case Summary
&lt;/h2&gt;

&lt;p&gt;The following table consolidates practical use cases across three domains, illustrating which IQ layers contribute and how:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Work IQ Contribution&lt;/th&gt;
&lt;th&gt;Fabric IQ Contribution&lt;/th&gt;
&lt;th&gt;Foundry IQ Contribution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customer Service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Intelligent support agent that resolves complex tickets&lt;/td&gt;
&lt;td&gt;Surfaces recent internal discussions about the issue (Teams chats, email threads)&lt;/td&gt;
&lt;td&gt;Correlates the issue with product batch data, defect rates, or regional patterns&lt;/td&gt;
&lt;td&gt;Retrieves official troubleshooting guides, product manuals, and policy documents with citations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Finance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automated financial analysis and variance reporting&lt;/td&gt;
&lt;td&gt;Identifies which finance team members have been discussing a variance and surfaces meeting action items&lt;/td&gt;
&lt;td&gt;Provides a governed semantic model of financial KPIs, ensuring consistent definitions (e.g., how "operating margin" is calculated)&lt;/td&gt;
&lt;td&gt;Grounds the agent in accounting policies, audit findings, and regulatory guidance documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supply chain delay advisor&lt;/td&gt;
&lt;td&gt;Detects that ops teams are overloaded with manual exception handling (long email chains, recurring meetings)&lt;/td&gt;
&lt;td&gt;Identifies anomalies in delivery metrics, correlates delays with upstream bottlenecks, and detects regional performance dips&lt;/td&gt;
&lt;td&gt;Retrieves supplier contracts, SLAs, and penalty clauses to determine contractual obligations and escalation thresholds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HR / Onboarding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;New employee onboarding assistant&lt;/td&gt;
&lt;td&gt;Understands who the new hire's team members are and what projects are active&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Retrieves onboarding guides, IT setup instructions, benefits documentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compliance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Regulatory compliance advisor&lt;/td&gt;
&lt;td&gt;Tracks which compliance officers have been communicating about a specific regulatory change&lt;/td&gt;
&lt;td&gt;Monitors regulatory metrics and flags anomalies against defined thresholds&lt;/td&gt;
&lt;td&gt;Retrieves the latest regulatory texts, internal policies, and past audit reports with citations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Addressing the Runtime and Governance Layer: Agent 365
&lt;/h2&gt;

&lt;p&gt;No discussion of enterprise agent deployment is complete without addressing &lt;strong&gt;who monitors these agents and who is accountable for them&lt;/strong&gt;. Microsoft's answer is &lt;strong&gt;Agent 365&lt;/strong&gt; - the runtime and governance layer that sits over all IQ workloads and agent interactions.&lt;/p&gt;

&lt;p&gt;Agent 365 monitors agent decisions, tracks accuracy over time, enforces compliance boundaries, and provides deep &lt;strong&gt;observability&lt;/strong&gt; into how agents behave in the real world. It gives visibility into what the agent is doing, why it is doing it, and whether it is staying within defined guardrails. This is not just logging - it is operational control: knowing when performance drifts, when policies change, and when human override is required. Without this layer, as one analysis notes, &lt;em&gt;"you have experiments - clever, promising, but fragile. With it, you have enterprise-grade systems that can scale responsibly."&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For organizations evaluating AI agent initiatives, the presence of Agent 365 as a centralized governance layer is a critical factor in the build-versus-buy decision. It combines extensibility, security, and compliance to help organizations confidently scale AI agents across productivity and business systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Turning Enterprise Context into Competitive Advantage
&lt;/h2&gt;

&lt;p&gt;Work IQ, Fabric IQ, and Foundry IQ represent a structural shift in how enterprises build AI agents. Rather than bolting generic AI onto existing workflows, these intelligence layers &lt;strong&gt;embed organizational understanding directly into the agent's reasoning process&lt;/strong&gt; - from user collaboration patterns (Work IQ) to governed business semantics (Fabric IQ) to secure, multi-source knowledge retrieval (Foundry IQ).&lt;/p&gt;

&lt;p&gt;For business decision-makers evaluating AI investments, the implication is clear: &lt;strong&gt;the value of enterprise AI is proportional to the quality and depth of context it can access&lt;/strong&gt;. Organizations that invest in structuring their work data (M365), defining business semantics (Fabric), and curating knowledge bases (Foundry) will extract far more value from AI agents than those relying on generic models. KPMG's survey found that among organizations scaling AI, the top ROI metrics are productivity (cited by 98% of leaders), profitability (97%), and improved performance and work quality (94%) - all outcomes that depend on context-rich, trustworthy AI rather than raw model capability.&lt;/p&gt;

&lt;p&gt;For developers, the three IQ layers provide &lt;strong&gt;a clear architecture and integration path&lt;/strong&gt; - whether through Copilot Studio's visual MCP tool integration or through the Agent 365 SDK's programmatic APIs. The key is to start with well-defined data foundations (clean SharePoint, governed Fabric models, curated knowledge sources) and progressively layer in IQ capabilities as agent scenarios mature.&lt;/p&gt;

&lt;p&gt;The organizations that will lead in the agentic era are those that recognize deployment is only the beginning - and that &lt;strong&gt;contextual intelligence, not model size, is the true differentiator&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
    </item>
    <item>
      <title>Connecting Azure Application Insights to Microsoft Copilot Studio: Unlocking Deep Analytics for Agentic Systems</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 18 Apr 2026 07:08:42 +0000</pubDate>
      <link>https://forem.com/holgerimbery/connecting-azure-application-insights-to-microsoft-copilot-studio-unlocking-deep-analytics-for-13m8</link>
      <guid>https://forem.com/holgerimbery/connecting-azure-application-insights-to-microsoft-copilot-studio-unlocking-deep-analytics-for-13m8</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Agentic systems demand visibility. By connecting Azure Application Insights to your Copilot Studio agents, you gain enterprise-grade monitoring that goes far beyond built-in analytics—enabling real-time diagnostics, performance optimization, and strategic business insights in a single integrated platform.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why read this&lt;/strong&gt;: You'll discover how to unlock faster issue detection and resolution, measure and improve user experience, demonstrate ROI to stakeholders, and establish the telemetry foundation that separates high-performing teams from those operating blindly. This guide walks through prerequisites, configuration steps, and best practices to help you implement a mature observability strategy immediately.&lt;/p&gt;

&lt;p&gt;As agentic systems grow in complexity and autonomy, visibility becomes critical. Analytics illuminate how agents interpret user intent, make decisions, and interact with external systems—transforming a "black box" into an understandable, debuggable, and continuously improving system. In production environments, telemetry reveals performance bottlenecks, catches errors before users notice them, and provides the evidence base for optimizing agent behavior and dialog flows. Without analytics, teams operate blindly; with it, they make data-driven decisions and build trust through transparency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of Connecting Azure Application Insights to Copilot Studio
&lt;/h2&gt;

&lt;p&gt;Connecting &lt;strong&gt;Azure Application Insights&lt;/strong&gt; to &lt;strong&gt;Microsoft Copilot Studio&lt;/strong&gt; agents significantly extends your monitoring, diagnostics, and analytics capabilities far beyond the native tooling provided by Copilot Studio alone. Application Insights, a powerful component of the broader &lt;strong&gt;Azure Monitor&lt;/strong&gt; platform, is a fully extensible Application Performance Management (APM) service designed to meet enterprise-scale requirements. This service captures granular message-level telemetry, topic trigger events, interaction latency measurements, custom domain-specific events, and comprehensive error details in near real-time, enabling immediate visibility into agent behavior. By establishing this integration, organizations gain access to both fine-grained technical observability, which empowers engineering teams to debug and optimize agent performance, and strategic usage intelligence, which informs business stakeholders about adoption patterns, user satisfaction trends, and operational efficiency metrics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4398sluzbf9vmm1a51wk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4398sluzbf9vmm1a51wk.png" alt="upgit_20260401_1775031491.png" width="800" height="952"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Benefits Overview
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Benefit Area&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Technical Capabilities&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Business / Strategic Value&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-Time Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Live telemetry stream of conversations; configurable Azure Monitor alerts for anomalies or thresholds&lt;/td&gt;
&lt;td&gt;Proactive issue detection minimizes downtime; enables swift scaling responses during usage spikes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Latency and performance data per interaction; Smart Detection flags unusual performance drops automatically&lt;/td&gt;
&lt;td&gt;Faster, more reliable agent increases user satisfaction; reduces abandonment from slow responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Diagnostics &amp;amp; Error Logging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatic capture of exceptions with full context (stack traces, conversation state, topic/step); custom telemetry events for domain-specific tracking&lt;/td&gt;
&lt;td&gt;Faster troubleshooting lowers support costs; higher reliability builds user trust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User Interaction Analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conversation counts, active users, channels, topics triggered, session durations — queryable via KQL&lt;/td&gt;
&lt;td&gt;Data-driven improvements to dialog design; evidence base for prioritizing development effort&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dashboards &amp;amp; Reporting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pre-built Copilot Studio Dashboard (Azure Workbook) with total conversations, latency, exceptions, tool usage, and topic analytics — editable and shareable&lt;/td&gt;
&lt;td&gt;Cross-functional visibility for technical and business stakeholders; supports ROI reporting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ecosystem Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connects to Power BI, Azure Data Lake, Azure Monitor alerts, and other Azure services&lt;/td&gt;
&lt;td&gt;Enterprise-grade reporting pipelines; cross-system correlation between bot telemetry and business outcomes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom Events &amp;amp; Extensibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Log a custom telemetry event" action in Copilot Studio for domain-specific tracking; KQL for arbitrary analysis&lt;/td&gt;
&lt;td&gt;Tailored KPI tracking (resolution rates, conversion events); structured A/B testing of agent configurations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How Telemetry Is Collected
&lt;/h2&gt;

&lt;p&gt;Setting up Application Insights for your Copilot Studio agents involves configuring where and how telemetry data flows. The process is straightforward but requires understanding key configuration options and prerequisites. Here's what you need to know to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Per-agent configuration&lt;/strong&gt;: There is no tenant-wide switch. Each agent must be connected to Application Insights individually.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Setup&lt;/strong&gt;: Add the Application Insights connection string in &lt;code&gt;Settings → Advanced → Application Insights&lt;/code&gt; within Copilot Studio.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Azure Subscription&lt;/strong&gt;: Required to use Application Insights.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Logging Options&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;Log activities&lt;/code&gt;: Logs all incoming/outgoing messages and events.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;Log sensitive Activity properties&lt;/code&gt;: Includes &lt;code&gt;userid&lt;/code&gt;, &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;text&lt;/code&gt;, and &lt;code&gt;speak&lt;/code&gt;. Off by default due to privacy implications.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Metrics and Data Are Available
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Custom Dimensions
&lt;/h3&gt;

&lt;p&gt;Telemetry records include rich metadata in the &lt;code&gt;customDimensions&lt;/code&gt; field:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Sample Values&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Type of activity&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;message&lt;/code&gt;, &lt;code&gt;conversationUpdate&lt;/code&gt;, &lt;code&gt;event&lt;/code&gt;, &lt;code&gt;invoke&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;channelId&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Channel identifier&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;emulator&lt;/code&gt;, &lt;code&gt;directline&lt;/code&gt;, &lt;code&gt;msteams&lt;/code&gt;, &lt;code&gt;webchat&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fromId&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sender identifier&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;id&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fromName&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Username from client&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;John Bonham&lt;/code&gt;, &lt;code&gt;Keith Moon&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;locale&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Client origin locale&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;en-us&lt;/code&gt;, &lt;code&gt;zh-cn&lt;/code&gt;, &lt;code&gt;de-de&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;recipientId&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Recipient identifier&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;id&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;recipientName&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Recipient name&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;John Bonham&lt;/code&gt;, &lt;code&gt;Keith Moon&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;text&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Text in message&lt;/td&gt;
&lt;td&gt;&lt;code&gt;find a coffee shop&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;designMode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Whether conversation occurred in the test canvas&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;True&lt;/code&gt; / &lt;code&gt;False&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;{: .important }&lt;br&gt;
Note: Data quality varies by channel. For example, unique user counts are only reliable when users are authenticated.&lt;/p&gt;
&lt;h3&gt;
  
  
  Built-In Copilot Studio Dashboard
&lt;/h3&gt;

&lt;p&gt;Copilot Studio provides a pre-built Azure Workbook dashboard that offers immediate visibility into your agent's performance and usage patterns. This dashboard aggregates key metrics without requiring custom configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Location&lt;/strong&gt;: &lt;code&gt;Application Insights → Monitoring → Workbooks → Copilot Studio Dashboard&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Metrics Included&lt;/strong&gt;: Total conversations, latency, exceptions, tool usage, topic analytics&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Customizable&lt;/strong&gt;: Add tiles using KQL, save and share dashboards with team members (requires Reader role)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  KQL Querying
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;Kusto Query Language (KQL)&lt;/strong&gt; to analyze telemetry data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let queryStartDate = ago(14d);
let queryEndDate = now();
let groupByInterval = 1d;
customEvents
| where timestamp &amp;gt; queryStartDate
| where timestamp &amp;lt; queryEndDate
| summarize uc=dcount(user_Id) by bin(timestamp, groupByInterval)
| render timechart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To exclude test conversations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customEvents
| extend isDesignMode = customDimensions['designMode']
| where isDesignMode == "False"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Built-In Analytics vs. Application Insights
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Built-In Analytics&lt;/th&gt;
&lt;th&gt;Azure Application Insights&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Track topic usage and completion&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes (with custom events)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Understand user satisfaction&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes (if instrumented)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debug dialog transitions&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitor API latency or errors&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualize trends over time&lt;/td&gt;
&lt;td&gt;yes (limited)&lt;/td&gt;
&lt;td&gt;yes (custom dashboards)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Correlate with external systems&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerting and anomaly detection&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;{: .important }&lt;br&gt;
Application Insights complements — not replaces — built-in analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Benefits
&lt;/h2&gt;

&lt;p&gt;Application Insights delivers powerful technical capabilities that transform agent monitoring and diagnostics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Live Metrics&lt;/strong&gt;: Real-time monitoring of bot activity.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Smart Detection&lt;/strong&gt;: Automatic anomaly and performance issue detection.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom Telemetry&lt;/strong&gt;: Log domain-specific events from within Copilot Studio.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Centralized Monitoring&lt;/strong&gt;: Consolidate logs, metrics, and traces across agents.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Scalability&lt;/strong&gt;: Monitor bots across multiple environments and regions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Extensibility&lt;/strong&gt;: Integrate with Power BI, Azure Data Lake, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Business Value and Strategic Advantages
&lt;/h2&gt;

&lt;p&gt;Application Insights transforms agent monitoring from a purely technical exercise into a strategic business tool. By connecting telemetry data to measurable outcomes, organizations can demonstrate ROI, accelerate innovation cycles, and build confidence in agentic systems. The following sections explore the key business advantages:&lt;/p&gt;

&lt;h3&gt;
  
  
  Data-Driven Decision-Making
&lt;/h3&gt;

&lt;p&gt;Telemetry provides the foundation for evidence-based improvements across your agent ecosystem.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Use telemetry to understand user behavior, optimize dialog flows, and prioritize development.&lt;/li&gt;
&lt;li&gt;  Dashboards and reports provide evidence for product decisions and stakeholder communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Operational Efficiency
&lt;/h3&gt;

&lt;p&gt;Application Insights dramatically reduces the time and effort required to maintain reliable agents in production. By automating detection and providing detailed diagnostics, teams can respond faster to issues and prevent recurring problems from consuming resources.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Reduce mean time to detect and resolve issues.&lt;/li&gt;
&lt;li&gt;  Identify systemic issues and eliminate recurring failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Customer Satisfaction
&lt;/h3&gt;

&lt;p&gt;Application Insights enables you to measure and enhance user experience by providing visibility into agent responsiveness and identifying friction points in conversations. By understanding where users encounter delays or confusion, teams can make targeted improvements that directly impact satisfaction and retention.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Improve response times and reduce errors.&lt;/li&gt;
&lt;li&gt;  Analyze drop-offs and confusion points to refine UX.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Compliance and Auditing
&lt;/h3&gt;

&lt;p&gt;In regulated industries and enterprises subject to data governance requirements, comprehensive audit trails and compliance documentation are non-negotiable. Application Insights provides the forensic capabilities needed to demonstrate regulatory compliance, investigate incidents, and maintain defensible records of agent behavior and data handling.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Maintain detailed logs for audit trails.&lt;/li&gt;
&lt;li&gt;  Support regulatory requirements with timestamped, queryable data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;Implementing a robust telemetry strategy requires discipline and intentionality. The following best practices will help you maximize the value of Application Insights while minimizing operational overhead and ensuring data quality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log Meaningful Events&lt;/strong&gt;: When instrumenting your Copilot Studio agents, focus exclusively on capturing events that provide actionable intelligence. This includes clear indicators of user intent (such as topic invocations or explicit requests), documented dialog outcomes (successful resolutions, escalations, or abandonment), and comprehensive error context. By avoiding noise and focusing on the signal, you reduce data volume while improving analysis quality and reducing query latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use Correlation IDs&lt;/strong&gt;: Implement correlation identifiers to link related activities across multiple services, dialogs, and organizational boundaries. This practice is essential in distributed systems where a user's interaction may involve multiple agents, backend APIs, and cloud services. Correlation IDs enable end-to-end tracing of requests, making it significantly easier to diagnose complex failures and understand latency across the entire interaction pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set Up Alerts&lt;/strong&gt;: Configure Azure Monitor alerts on critical performance thresholds and anomalies. Rather than waiting to discover problems through manual dashboard review, proactive alerting ensures that your team is immediately notified of concerning patterns—such as sudden spikes in error rates, performance degradation, or unexpected traffic patterns. This enables rapid response before issues escalate into user-facing problems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Review Dashboards Regularly&lt;/strong&gt;: Establish a cadence for reviewing Application Insights dashboards with relevant stakeholders—both technical teams who investigate issues and business stakeholders who track adoption metrics. Regular review sessions transform telemetry from a passive record into an active feedback loop that informs prioritization, guides feature development, and validates hypotheses about agent behavior and user satisfaction.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connect your Copilot Studio agent to Application Insights
&lt;/h2&gt;

&lt;p&gt;Establishing a connection between your Copilot Studio agent and Azure Application Insights requires careful configuration to ensure telemetry data flows correctly to your monitoring environment. This section provides comprehensive guidance on the setup process, prerequisites, and optional configuration settings that affect which data is captured and transmitted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites and Initial Setup
&lt;/h3&gt;

&lt;p&gt;Before you can establish a successful connection between your Copilot Studio agent and Application Insights, ensure that you have an active Azure subscription and an existing Application Insights resource. The connection process relies on authentication credentials stored in your agent's configuration so that you will need administrative access to both your Copilot Studio environment and your Azure resources.&lt;/p&gt;

&lt;p&gt;To initiate the connection process, navigate to the &lt;strong&gt;Settings&lt;/strong&gt; page for your agent within Copilot Studio. From the Settings page, locate and select the &lt;strong&gt;Advanced&lt;/strong&gt; tab. This tab contains configuration options not exposed in the standard settings interface and typically reserved for operational and monitoring settings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring the Application Insights Connection String
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F80rrub187s7syde2rker.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F80rrub187s7syde2rker.png" alt="upgit_20260401_1775031412.png" width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Within the Advanced settings tab, you will find a dedicated &lt;strong&gt;Application Insights&lt;/strong&gt; section. In this section, locate the &lt;strong&gt;Connection string&lt;/strong&gt; field and populate it with the connection string obtained from your Azure Application Insights resource.&lt;/p&gt;

&lt;p&gt;The connection string serves as the authentication and routing credential that enables your agent to transmit telemetry data to your specific Application Insights instance securely. Refer to the &lt;a href="https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview" rel="noopener noreferrer"&gt;Azure Monitor documentation&lt;/a&gt; for comprehensive instructions on locating and retrieving your connection string from your Application Insights resource in the Azure portal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optional Logging Configuration
&lt;/h3&gt;

&lt;p&gt;In addition to basic connectivity, Application Insights offers two optional configuration flags that let you control the scope and sensitivity of captured telemetry data. These settings provide flexibility to balance comprehensive monitoring with privacy and compliance considerations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log Activities&lt;/strong&gt;: When this setting is enabled, Application Insights captures comprehensive details of all incoming and outgoing messages exchanged between users and your agent, as well as all event notifications triggered during agent operation. This option provides maximum visibility into agent behavior and user interactions, enabling detailed diagnostics and comprehensive audit trails. However, enabling this option increases telemetry volume and may have cost implications for higher-traffic agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log Sensitive Activity Properties&lt;/strong&gt;: This setting governs whether certain data fields that may contain personally identifiable information (PII) or other sensitive information are included in logged telemetry. When enabled, the following properties are captured in logs: &lt;code&gt;userid&lt;/code&gt;, &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;text&lt;/code&gt;, and &lt;code&gt;speak&lt;/code&gt; (note that the &lt;code&gt;text&lt;/code&gt; and &lt;code&gt;speak&lt;/code&gt; properties apply exclusively to message-type activities and are not captured for other event types).&lt;/p&gt;

&lt;p&gt;By default, this setting is &lt;strong&gt;disabled&lt;/strong&gt; due to data privacy and compliance considerations. Organizations operating under strict data governance frameworks, healthcare regulations (HIPAA), financial regulations (PCI-DSS), or general privacy standards (GDPR) should exercise caution when enabling this option. If enabled, ensure that appropriate data retention policies, encryption measures, and access controls are in place to protect sensitive information logged to Application Insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Connecting your Copilot Studio agents to Azure Application Insights represents a fundamental shift in how you approach agent observability and operational excellence. By moving beyond built-in analytics, you gain enterprise-grade monitoring capabilities that illuminate every aspect of agent behavior—from granular message-level telemetry to high-level strategic insights about adoption and ROI.&lt;/p&gt;

&lt;p&gt;The integration delivers immediate practical benefits: faster issue detection and resolution, performance optimization grounded in real data, and comprehensive audit trails that satisfy regulatory requirements. It also turns your organization's relationship with agentic systems from speculative to evidence-driven, so you can scale confidently and keep improving.&lt;/p&gt;

&lt;p&gt;Whether your priority is reducing support costs, accelerating time-to-resolution, or demonstrating measurable business value to stakeholders, Application Insights provides the visibility and analytical power to achieve those goals. Start with the fundamentals—configure the connection string, enable appropriate logging, and establish a dashboard review cadence. As your telemetry practice matures, layer in custom events, automated alerts, and deeper correlation across your broader system ecosystem.&lt;/p&gt;

&lt;p&gt;Telemetry infrastructure pays off right away and grows over time as your teams build data-driven habits, and your agents become more reliable, responsive, and aligned with business goals.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>applicationinsights</category>
      <category>azure</category>
    </item>
    <item>
      <title>Building On‑Prem AI Agents with Azure Local, Foundry Local, and Microsoft Agent Framework</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 11 Apr 2026 07:26:18 +0000</pubDate>
      <link>https://forem.com/holgerimbery/building-on-prem-ai-agents-with-azure-local-foundry-local-and-microsoft-agent-framework-5a83</link>
      <guid>https://forem.com/holgerimbery/building-on-prem-ai-agents-with-azure-local-foundry-local-and-microsoft-agent-framework-5a83</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cloud-native architecture belongs on-premises too&lt;br&gt;&lt;br&gt;
&lt;strong&gt;On-premises and cloud-native are not contradictions — they are complementary&lt;/strong&gt;. While enterprises have spent years building cloud-native practices in the cloud, those same principles—containerization, orchestration, API-driven integration, and infrastructure-as-code - deliver even greater value when deployed on-premises. This guide shows you how to build production AI agents that (must) run locally and using cloud native deployment schemas with Azure Local, Foundry Local, and Microsoft Agent Framework - this is proving that cloud-native excellence is not constrained by your network boundary.&lt;br&gt;&lt;br&gt;
If you operate in regulated industries, manage constrained connectivity, or face data residency requirements, this architecture gives you the operational consistency of the cloud without leaving your premises.&lt;/p&gt;

&lt;p&gt;This article is the second in a series on Azure Local:&lt;br&gt;&lt;br&gt;
&lt;a href="https://holgerimbery.blog/azure-local-foundry-local-and-microsoft-365-local-a-comprehensive-guide-for-it-architects-and-decision-makers" rel="noopener noreferrer"&gt;Azure Local, Foundry Local, and Microsoft 365 Local: A Comprehensive Guide for IT Architects and Decision-Makers&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Enterprise teams are moving beyond “chatbots” toward agents that can retrieve internal knowledge, call tools, orchestrate workflows, and produce outcomes aligned to real business processes. The challenge is that many agent reference designs assume always‑on cloud connectivity and cloud-hosted inference. That assumption does not hold everywhere.&lt;/p&gt;

&lt;p&gt;In regulated industries, in plants and branches with constrained connectivity, or in environments where latency and data locality are non‑negotiable, the architecture has to follow the use case. This post describes a pragmatic design you can implement today by combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure Local&lt;/strong&gt; as the on‑prem infrastructure substrate, managed through &lt;strong&gt;Azure Arc&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AKS on Azure Local&lt;/strong&gt; as the standardized Kubernetes runtime for agent services and supporting components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundry Local (preview)&lt;/strong&gt; as the local inference runtime exposing an &lt;strong&gt;OpenAI‑compatible REST interface&lt;/strong&gt; for model calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Agent Framework (MAF)&lt;/strong&gt; as the agent and workflow layer, including tool integration, session/state management, middleware, and telemetry patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A critical insight: cloud-native architecture and practices are not limited to cloud deployments.&lt;/strong&gt; The principles—containerization, orchestration, infrastructure-as-code, API-driven integration, observability, and declarative state management—are equally valuable on‑premises. In fact, they become &lt;em&gt;more essential&lt;/em&gt; when your infrastructure cannot scale elastically or rely on the implicit redundancy of cloud regions. By applying cloud-native architecture to on‑prem agent deployments, you gain consistent operational models across locations, faster iteration, clear boundaries between layers, and the ability to treat infrastructure changes as routine rather than exceptional.&lt;/p&gt;

&lt;p&gt;One design choice drives everything that follows: &lt;strong&gt;separate the agent runtime from the model runtime.&lt;/strong&gt; You want the agent layer (routing, tools, workflows, state, observability) to evolve independently from inference, especially when local inference is in preview and can change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture in one picture (logical view)
&lt;/h2&gt;

&lt;p&gt;A practical baseline pattern is “local inference, centralized orchestration.”&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7s07rlie2kn629q44p3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7s07rlie2kn629q44p3.png" alt="upgit_20260409_1775761970.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This separation keeps your application surface stable by establishing a clear boundary between stateless agent logic and stateful model inference. Because the agent layer and model runtime are decoupled, you can update agent code, refine routing logic, add new tools, or adjust middleware without touching the inference layer. Tools can be added safely behind constrained proxies or API gateways, allowing you to apply fine-grained network controls and audit trails at the integration boundary. Governance policies, observability hooks, and logging patterns remain consistent across agent operations regardless of where inference is placed. Simultaneously, inference becomes a managed dependency that can scale, relocate, or upgrade independently of application code. This architectural separation is particularly valuable in regulated environments where model serving and application logic often require separate operational controls, hardware isolation, or audit commitments. By decoupling these layers, you achieve the flexibility to place inference close to hardware accelerators (GPUs, NPUs) and data sources without forcing agent code to depend on infrastructure choices that are still evolving, especially when the inference runtime is in preview status and subject to API changes or performance tuning.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where this approach fits (and where it does not)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  This is a good fit when
&lt;/h3&gt;

&lt;p&gt;This pattern becomes the right choice when one or more of the following constraints are fundamental to your deployment environment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data residency and regulatory compliance&lt;/strong&gt; are hard boundaries. When regulations, industry standards, or organizational policy require that prompts, retrieved context, and inference results remain physically within an on‑premises boundary—whether for financial data, healthcare records, or proprietary intelligence—local inference becomes non‑negotiable. Cloud-based APIs, even with encryption and data-deletion assertions, may not satisfy audit requirements or legal obligations in certain jurisdictions. In these cases, the agent architecture must be designed to keep the full inference pipeline local while still benefiting from cloud-based observability and control planes, where appropriate, via segregated connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency is a direct measure of operational usability.&lt;/strong&gt; In manufacturing plants, field service operations, retail branches, or other environments where agents serve human users on the shop floor or remote locations, response time is not a performance metric—it is a functional requirement that affects whether the agent is used at all. When users are waiting for a troubleshooting recommendation or a work instruction, a response that takes tens of seconds to traverse cloud networks is often abandoned. Local inference, combined with local agent orchestration, ensures that the slowest part of the response pipeline is your own internal network and compute capacity, not external connectivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connectivity cannot be assumed to be always available and high-bandwidth.&lt;/strong&gt; Many operational environments have constrained connectivity: scheduled outbound traffic windows, rate-limited connections, air-gapped subnets, or intentional network fragmentation for security isolation. The agent needs to function usefully within these constraints rather than degrade into a pass-through to cloud APIs. Azure Local supports this by enabling local execution and local state, while Arc provides a control-plane integration path when connectivity is available, rather than requiring continuous connectivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want cloud-native operational practices applied on‑premises.&lt;/strong&gt; This includes containerized deployments, Kubernetes orchestration for workload management, infrastructure-as-code for reproducibility, GitOps-driven delivery pipelines, policy enforcement at the runtime boundary, and standardized telemetry and logging. These practices are not exclusive to cloud deployments; they provide the same benefits on‑premises—clear separation of concerns, predictable deployments, and auditability—but require an infrastructure platform like Azure Local to realize them consistently.&lt;/p&gt;
&lt;h3&gt;
  
  
  When this pattern becomes problematic
&lt;/h3&gt;

&lt;p&gt;Reconsidering a local-first architecture is warranted in several practical scenarios. If your inference workload demands elastic horizontal scaling and you cannot predict peak capacity without overprovisioning on-premises infrastructure, then chasing elastic scale with local hardware becomes economically and operationally inefficient. Building auto-scaling logic that manages standby capacity across stateful models would contradict the efficiency argument for locality. Similarly, if your operational environment requires production-grade stability guarantees from the inference API layer with minimal risk of breaking changes between deployments, the current maturity of local inference runtimes (such as Foundry Local, which remains in preview) presents a material risk. Preview components introduce uncertainty regarding backward compatibility, performance-tuning recommendations, and troubleshooting depth, which may not align with production SLAs. Finally, if the problem you are solving is fundamentally deterministic—where steps follow a fixed sequence, validation rules are static, and branching logic is known in advance—a structured workflow orchestration tool or a conventional microservice often provides clearer observability, simpler debugging, and lower operational overhead than an agent. Not every problem with tools and state management requires agentic behavior; sometimes explicit choreography is both simpler and more reliable.&lt;/p&gt;

&lt;p&gt;These are the constraints that inform the "whether" decision. The next section moves to the "why Azure Local" specifically, grounded in use-case context rather than abstract on-premises philosophy.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Azure Local makes sense here (the use case drives locality)
&lt;/h2&gt;

&lt;p&gt;Azure Local is not the point of the architecture. It is the platform choice that becomes rational when the &lt;strong&gt;agent pattern has to follow the environment&lt;/strong&gt;: where data and tools live, what network rules allow, what latency targets are required, and what failure modes are acceptable.&lt;/p&gt;
&lt;h3&gt;
  
  
  1) The agent needs to live where the tools and data already are
&lt;/h3&gt;

&lt;p&gt;High‑value agents are typically tool-heavy, and the distribution of that tooling directly affects where the agent runtime should run. The model call itself—the inference step that generates a response — is only one part of the agent interaction. The larger portion of the interaction involves retrieving documents from internal repositories, querying operational databases, validating business rules and constraints, and writing outcomes back to systems of record. Each of these operations carries a latency cost, integration overhead, and often data governance implications.&lt;/p&gt;

&lt;p&gt;When the authoritative data sources and the systems that perform work live on‑premises—whether that is an ERP system, a manufacturing execution system, a document repository, or a service dispatch platform—moving the agent runtime closer to those systems becomes pragmatically necessary rather than architecturally optional. A remote agent calling back into on‑premises tools over the network incurs not only the latency of each call but also the operational complexity of maintaining secure, reliable network pathways between cloud and on‑premises infrastructure, managing retry logic for transient failures across that boundary, and reasoning about whether a failure in the agent's response came from the model inference or from a tool integration issue.&lt;/p&gt;

&lt;p&gt;Placing the agent runtime on‑premises reduces integration complexity by collapsing tool interactions into local function calls with minimal network hops. It also materially shrinks the trust boundary. Data that would otherwise traverse a cloud service boundary—even with encryption in transit and assertions of deletion—can remain inside the perimeter where it originated. Azure Local provides a consistent, repeatable substrate on which to host the runtime within the organizational boundary while still enabling cloud-native operational practices such as containerization, orchestration, and declarative configuration management that teams have come to expect.&lt;/p&gt;
&lt;h3&gt;
  
  
  2) Latency is a functional requirement, not a nice-to-have
&lt;/h3&gt;

&lt;p&gt;In operational scenarios, predictable response time is not an optimization target—it is a functional requirement embedded in the task itself. When a field technician, floor supervisor, or support worker invokes an agent for a troubleshooting recommendation or work instruction, they are typically performing a task that cannot proceed until they receive guidance. A response that arrives within seconds fits naturally into human workflow and decision-making; the user can act on it immediately and move forward. A response that takes tens of seconds—or worse, becomes non-deterministic depending on cloud API load—exceeds the mental context window of the task. Users abandon the agent, fall back to phone calls or manual lookups, or proceed without the agent's input entirely, making the agent operationally irrelevant regardless of its intelligence.&lt;/p&gt;

&lt;p&gt;The latency problem compounds when the agent's response is not a single inference call. A typical operational agent orchestrates multiple steps: retrieving context from a document repository, querying a database to validate prerequisites, calling an external service to fetch the current state, and then synthesizing a response. Each of these operations incurs round-trip time. When those dependencies live on-premises, and the agent runtime lives in a cloud region thousands of kilometers away, the baseline latency floor is determined by geography and internet backbone capacity, not by the execution speed of any individual component. You cannot optimize away the speed of light or your ISP's rate limiting. Placing the agent runtime and its tooling in the same local environment—where internal networks typically offer latency in the single-digit millisecond range—ensures that the slowest element becomes your own infrastructure capacity, which you can measure, predict, and scale. This transforms latency from an externality you absorb to a variable you control.&lt;/p&gt;

&lt;p&gt;Azure Local supports this placement strategy by providing AKS-hosted agent services and local model-serving infrastructure in a shared operational footprint. The inference engine, the agent orchestration layer, and the tool integrations all run in the same data center or facility where the authoritative systems live. This collapse of distance translates directly into a collapse of latency, which translates into usability in environments where response time affects task completion.&lt;/p&gt;
&lt;h3&gt;
  
  
  3) Connectivity constraints are a design input
&lt;/h3&gt;

&lt;p&gt;Many environments are not "cloud-connected" as cloud reference architectures assume. The assumption embedded in most cloud-native architecture guidance is that outbound connectivity is available, reliable, and incurs acceptable latency and throughput. In practice, many operational environments operate under very different constraints. Outbound traffic to public cloud endpoints may be restricted by security policy or rate-limited by egress gateways. Connectivity may be scheduled—available only during specific windows or subject to maintenance blackouts. In other cases, network segments may be deliberately disconnected by design: operations technology networks in manufacturing facilities, isolated domains in financial institutions, or intentionally air-gapped environments in highly regulated sectors all follow this pattern. Even when connectivity exists, it may be mediated by proxies, firewalls, or VPNs, adding latency and complexity to troubleshooting when the agent's inference or tool calls fail.&lt;/p&gt;

&lt;p&gt;Azure Local enables local execution of the agent runtime and inference engine regardless of whether upstream cloud connectivity is available. Simultaneously, it aligns with Azure's control-plane concepts and governance models via Azure Arc when connectivity is available. This dual capability means you can design and operate an agent system that functions reliably in disconnected or intermittently-connected scenarios without abandoning cloud-native operational practices. When connectivity is available, Arc can be used for centralized observability, policy enforcement, and update orchestration. When connectivity is unavailable, the local agent continues to function using local tools and data. This gives you an operational path that respects the actual constraints of your environment rather than forcing an architecture that assumes away those constraints or requires workarounds to compensate for them.&lt;/p&gt;
&lt;h3&gt;
  
  
  4) You can keep cloud-native operations without reinventing on‑prem deployment
&lt;/h3&gt;

&lt;p&gt;Teams generally want repeatable delivery, policy enforcement, and consistent observability. The conventional tension between on-premises deployments and cloud-native operations has historically forced a false choice: either accept the operational discipline and automation of cloud platforms at the cost of moving workloads outside your perimeter, or keep infrastructure on-premises and revert to manual configuration management, bespoke deployment scripts, and fragmented observability tooling.&lt;/p&gt;

&lt;p&gt;Azure Local plus AKS on Azure Local severs that coupling. Containerized deployments, GitOps-driven configuration management, Kubernetes namespaces, and declarative rollout strategies work identically whether your agent runtime is in a public cloud region or in your own data center. The infrastructure boundary becomes transparent to operational practices. Teams can maintain the same deployment pipelines, policy engines, and observability systems they have built for cloud workloads and apply them without modification to on-premises clusters. This continuity of tooling and process significantly reduces the operational friction that typically accompanies on-premises agent deployments. The "local" decision becomes a deployment location decision—a choice about where to run proven, familiar infrastructure patterns—rather than a return to bespoke server management, manual patching, and isolated monitoring infrastructure that would otherwise characterize traditional on-premises deployments.&lt;/p&gt;
&lt;h3&gt;
  
  
  5) Local inference forces you to manage capacity and hardware intentionally
&lt;/h3&gt;

&lt;p&gt;If inference is local, capacity planning and acceleration hardware become first-class concerns that demand explicit decision-making rather than outsourced abstraction. When inference runs in a public cloud service, capacity is nominally infinite—or at least, the perception of infinity is maintained through multi-tenancy and auto-scaling tiers that obscure the underlying hardware realities. Costs accumulate by token count and API call frequency, but the physical infrastructure remains opaque. The tradeoff is acceptable if your workload is occasional or bursty; the cost volatility is a known variable you can budget for.&lt;/p&gt;

&lt;p&gt;When inference runs locally, however, the hardware economics become tangible. A single GPU accelerator costs tens of thousands of dollars upfront, requires power and cooling infrastructure, and has a finite lifespan. Acquiring that hardware is no longer a usage-based charge smoothed into monthly billing; it is a capital expenditure that sits in your facility and has opportunity cost. This visibility forces intentional capacity planning: you must understand your typical inference load, peak throughput requirements, model sizes, and acceptable latency percentiles, and then purchase hardware that meets those requirements with some headroom for growth. You cannot simply add capacity by changing a tier or waiting for auto-scaling to provision more instances; you provision intentionally.&lt;/p&gt;

&lt;p&gt;Azure Local provides a platform to run and govern those resources, allowing you to isolate inference nodes, stage updates, and enforce change control without coupling the inference lifecycle to the agent code lifecycle. You can reserve specific nodes for specific models, apply resource quotas to prevent one workload from starving another, and manage hardware refreshes independently from application deployments. This separation of concerns means you can upgrade your inference engine or swap model versions without draining the entire cluster, and you can plan hardware replacement without triggering emergency application refactorings. The operational rigor this imposes is not a burden—it is an alignment of technical decision-making with the actual cost structure of your infrastructure.&lt;/p&gt;
&lt;h3&gt;
  
  
  6) The architecture stays incremental and reversible
&lt;/h3&gt;

&lt;p&gt;By separating agent runtime from model runtime, you establish a deployment boundary that allows you to make infrastructure decisions independently from application logic. This separation is critical in practice because it decouples two sources of change that typically move at different velocities: the agent orchestration layer (tools, workflows, routing, state management) tends to evolve rapidly as teams refine business logic and respond to operational feedback, while the inference runtime makes infrequent but high-impact decisions around model selection, hardware acceleration strategy, and inference node topology that are capital-intensive and difficult to reverse.&lt;/p&gt;

&lt;p&gt;Starting small means you can pilot with a single inference node running a small quantized model, then grow to multiple specialized nodes—some optimized for latency-sensitive operations, others for throughput—without requiring changes to the agent code itself. The agent layer continues to interact with inference through the same OpenAI-compatible API boundary, indifferent to whether a single GPU or a distributed cluster backs that endpoint. You can keep your agent API stable while swapping models; if a new quantization or a different model family becomes available, you can stage it on a secondary node and route traffic to validate behavior before completing the migration. You can change inference node placement by adjusting scheduling constraints or moving nodes between racks without triggering a redeploy of agent services. This mobility is not possible when agent code and inference are tightly coupled—for example, when inference decisions are embedded in application code or when the agent layer depends on model-specific features or tokenization strategies.&lt;/p&gt;

&lt;p&gt;Azure Local supports this incremental expansion by providing a consistent Kubernetes control plane and standard scheduling mechanisms that treat compute resources as fungible. Your initial pilot might span a single machine running AKS on Azure Local in a branch or regional office; as you validate the model and prove business value, you can expand to a small cluster in your primary data center. Each step remains operationally routine because you are not changing how workloads are deployed or managed—you are only changing the scale and distribution of resources. A pilot deployment and a production cluster follow the same GitOps patterns, use the same artifact promotion pipelines, and respond to the same observability signals, allowing you to graduate from proof-of-concept to production without a redesign of your delivery model or a learning curve on unfamiliar operational practices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical decision test:&lt;/strong&gt; Azure Local tends to be the right call when most of these are true: the authoritative tools/data are on‑prem, prompts and retrieved context must remain local, latency is a requirement, connectivity is constrained, and you want cloud-native operations in the same footprint.&lt;/p&gt;

&lt;p&gt;With that context, we can move from "why" to "how".&lt;/p&gt;
&lt;h2&gt;
  
  
  Step‑by‑step implementation runbook
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Phase 0 — Define boundaries: agent vs workflow, and what "done" means
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Write the outcome in business terms.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Define success in measurable outcomes, not in model features. Examples include reduced downtime, faster triage, fewer escalations, shorter handling time, or improved compliance auditability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Classify steps as agent or workflow.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use an &lt;strong&gt;agent&lt;/strong&gt; for open-ended interpretation, conversational assistance, flexible tool use, and summarization.&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;workflow&lt;/strong&gt; for deterministic steps, routing, approvals, checkpoints, and auditable state transitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Produce a tool inventory and trust boundary map.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For each tool, define authentication, authorization, validation, allowed destinations, and audit requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Teams often prototype by giving agents broad access "to move fast." That security debt becomes expensive later. Start with constrained proxies and allow-lists from day one.&lt;/p&gt;
&lt;h3&gt;
  
  
  Phase 1 — Platform baseline: Azure Local + Arc + AKS on Azure Local
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1.1 Establish baseline assumptions
&lt;/h4&gt;

&lt;p&gt;Decide upfront:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Topology: pilot node, datacenter cluster, or distributed sites.&lt;/li&gt;
&lt;li&gt;OS mix: Linux nodes, Windows nodes, or mixed.&lt;/li&gt;
&lt;li&gt;Acceleration: CPU only vs GPU/NPU inference nodes.&lt;/li&gt;
&lt;li&gt;Connectivity mode: connected, constrained, or partially disconnected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Constrained connectivity changes everything about artifact flow. Treat "how will nodes pull images and models?" as a first-class requirement (private registry, artifact promotion, caching).&lt;/p&gt;
&lt;h4&gt;
  
  
  1.2 Build a minimal AKS baseline (repeatable)
&lt;/h4&gt;

&lt;p&gt;Include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Namespaces for separation (&lt;code&gt;platform&lt;/code&gt;, &lt;code&gt;agents&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;observability&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Ingress and certificate strategy.&lt;/li&gt;
&lt;li&gt;Secrets management strategy.&lt;/li&gt;
&lt;li&gt;Logging/metrics pipeline.&lt;/li&gt;
&lt;li&gt;Network policies and egress controls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example namespace baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restricted"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tools&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restricted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Without early namespace boundaries and baseline policies, your cluster becomes a collection of special cases that are hard to govern and hard to migrate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2 — GitOps delivery (recommended even for pilots)
&lt;/h3&gt;

&lt;h3&gt;
  
  
  2.1 Repository layout pattern
&lt;/h3&gt;

&lt;p&gt;A structure that scales:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;clusters/&amp;lt;cluster-name&amp;gt;/&lt;/code&gt; for cluster-specific overlays&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;platform/&lt;/code&gt; for shared add-ons (ingress, monitoring, policy)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workloads/agents/&lt;/code&gt; for agent services&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workloads/tools/&lt;/code&gt; for tool proxies and connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2.2 Kustomization pattern (example)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.toolkit.fluxcd.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./workloads/agents/overlays/prod&lt;/span&gt;
  &lt;span class="na"&gt;prune&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;sourceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GitRepository&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;platform-repo&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; GitOps only reduces drift if "kubectl apply in production" is the exception with a documented break-glass process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3 — Foundational services: state, memory, and observability
&lt;/h3&gt;

&lt;p&gt;Make state explicit and intentional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conversation state&lt;/strong&gt; (threads, session context) belongs in agent stores designed for that purpose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business state&lt;/strong&gt; (work items, approvals, tickets) belongs in systems of record.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common supporting components on AKS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redis for caching and rate limiting&lt;/li&gt;
&lt;li&gt;PostgreSQL (or equivalent) for durable state&lt;/li&gt;
&lt;li&gt;A vector store if you implement local RAG&lt;/li&gt;
&lt;li&gt;OpenTelemetry collector for traces/metrics/logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Agent telemetry can explode. Define retention, sampling, and content redaction policies early. In regulated environments, you often cannot log raw prompts or retrieved text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4 — Install Foundry Local (preview) on inference nodes
&lt;/h3&gt;

&lt;p&gt;Treat Foundry Local as a managed runtime dependency.&lt;/p&gt;

&lt;h4&gt;
  
  
  4.1 Placement and isolation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Prefer dedicated inference nodes where possible.&lt;/li&gt;
&lt;li&gt;Place them where the acceleration hardware lives.&lt;/li&gt;
&lt;li&gt;Segment networking so AKS can reach them reliably while keeping exposure minimal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4.2 Endpoint discovery (avoid hard-coded ports)
&lt;/h4&gt;

&lt;p&gt;Prefer one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Discovery service pattern:&lt;/strong&gt; publish the current base URL into a config store that your agent services read.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway pattern:&lt;/strong&gt; place a stable internal proxy in front of Foundry Local to normalize routing and policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Hard-coded ports work in a lab and fail after reboots, upgrades, or runtime changes. Build discovery or stable routing into the design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5 — Network, TLS, and identity between AKS and Foundry Local
&lt;/h3&gt;

&lt;h4&gt;
  
  
  5.1 Connectivity options
&lt;/h4&gt;

&lt;p&gt;Common choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct HTTPS from agent pods to Foundry node IP/DNS&lt;/li&gt;
&lt;li&gt;Internal L4/L7 proxy for stable routing and policy&lt;/li&gt;
&lt;li&gt;Service mesh for mTLS and telemetry (only if you already operate one)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5.2 TLS strategy
&lt;/h4&gt;

&lt;p&gt;Use your standard PKI approach, if possible, and ensure that clients validate certificates by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; "Works with curl -k" is a warning sign, not a milestone. Fix trust chains early so insecure shortcuts do not become permanent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 6 — Implement the inference adapter in your MAF service
&lt;/h3&gt;

&lt;p&gt;Design goal: agent code calls a model client abstraction, not a concrete endpoint.&lt;/p&gt;

&lt;h4&gt;
  
  
  6.1 Configuration pattern (ConfigMap + Secret)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-config&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;FOUNDRY_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://foundry-local.internal.example"&lt;/span&gt;
  &lt;span class="na"&gt;FOUNDRY_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local-chat-model"&lt;/span&gt;
  &lt;span class="na"&gt;INFERENCE_TIMEOUT_SECONDS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;
  &lt;span class="na"&gt;INFERENCE_MAX_RETRIES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-secrets&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;stringData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;FOUNDRY_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;placeholder-if-required-by-client"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployment consuming it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maf-agent-api&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maf-agent-api&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maf-agent-api&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.local/agents/maf-agent-api:1.0.0&lt;/span&gt;
        &lt;span class="na"&gt;envFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;configMapRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-config&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-secrets&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;250m"&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2Gi"&lt;/span&gt;
        &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health/ready&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
          &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  6.2 Client policy (timeouts, retries, circuit breakers)
&lt;/h4&gt;

&lt;p&gt;Start with conservative defaults:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timeout: 20–60s depending on model/prompt size&lt;/li&gt;
&lt;li&gt;Retries: 1–2 for transient failures only&lt;/li&gt;
&lt;li&gt;Circuit breaker: open after repeated failures to prevent cascading latency&lt;/li&gt;
&lt;li&gt;Concurrency limits: protect inference nodes from overload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Without explicit backpressure, a single busy agent route can saturate inference and degrade every workload that shares the runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 7 — Tool integration with constrained proxies
&lt;/h3&gt;

&lt;p&gt;Do not give agents direct access to sensitive systems.&lt;/p&gt;

&lt;p&gt;Recommended approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy tool proxy services in a dedicated namespace.&lt;/li&gt;
&lt;li&gt;Restrict outbound connectivity to approved destinations only.&lt;/li&gt;
&lt;li&gt;Enforce authorization, validation, and allow-lists in the proxy.&lt;/li&gt;
&lt;li&gt;Log every invocation with correlation IDs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A default-deny egress policy concept:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tools-default-deny-egress&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tools&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;policyTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Egress&lt;/span&gt;
  &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; If you skip network controls early, you will discover "mystery dependencies" later when tools call endpoints that were never approved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 8 — Observability: correlate agent → tools → inference
&lt;/h3&gt;

&lt;p&gt;Minimum requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correlation ID propagated across inbound request, tool calls, inference calls, and response&lt;/li&gt;
&lt;li&gt;Latency breakdown (tool time vs inference time vs orchestration time)&lt;/li&gt;
&lt;li&gt;Error classification by category (tool failure, inference failure, policy block, timeout)&lt;/li&gt;
&lt;li&gt;Token/prompt size metadata if available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Decide what is safe to log. For many environments, metadata and hashes are acceptable, but raw prompts and retrieved snippets are not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 9 — Hardening: safety, governance, regression testing
&lt;/h3&gt;

&lt;p&gt;Hardening checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt and tool regression tests for critical flows&lt;/li&gt;
&lt;li&gt;Golden conversations for validation after runtime updates&lt;/li&gt;
&lt;li&gt;Tool schemas and allow-lists are enforced centrally&lt;/li&gt;
&lt;li&gt;Timeouts on every external call&lt;/li&gt;
&lt;li&gt;Rate limits per user and per route&lt;/li&gt;
&lt;li&gt;Graceful degradation when inference is unavailable (fallback to workflow/human)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Preview inference runtimes can introduce behavior changes that are not "errors" but still break user expectations. Without regression tests, you will find out in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 10 — Operations: versioning, rollouts, and capacity planning
&lt;/h3&gt;

&lt;h4&gt;
  
  
  10.1 Independent update cadencess
&lt;/h4&gt;

&lt;p&gt;Operate on separate cadences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent services: frequent updates via CI/CD&lt;/li&gt;
&lt;li&gt;Inference runtime: cautious updates via staged rollout&lt;/li&gt;
&lt;li&gt;Cluster/platform: regular maintenance windows&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  10.2 Rollout strategy
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Canary agent changes with a small traffic slice and compares latency/error rates&lt;/li&gt;
&lt;li&gt;Pin inference runtime versions and validate with representative load before expanding rollout&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  10.3 Capacity planning
&lt;/h4&gt;

&lt;p&gt;Define explicit SLOs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p95 latency target for a representative prompt&lt;/li&gt;
&lt;li&gt;maximum concurrent sessions per inference node&lt;/li&gt;
&lt;li&gt;acceptable queueing delay under peak load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Size for peaks and recovery scenarios. A thundering herd is common when shifts start, sites reconnect, or batch processes trigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical “day‑1 to day‑30” plan
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Day 1–3: Foundation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Define business outcomes and agent/workflow boundaries&lt;/li&gt;
&lt;li&gt;Stand up AKS baseline namespaces, ingress, and GitOps scaffolding&lt;/li&gt;
&lt;li&gt;Deploy telemetry pipeline and basic dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 4–10: Inference integration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install Foundry Local on inference nodes&lt;/li&gt;
&lt;li&gt;Implement endpoint discovery and TLS trust&lt;/li&gt;
&lt;li&gt;Add inference adapter in the MAF service with externalized configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 11–20: Tools and data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Build constrained tool proxies with allow-lists and audit logs&lt;/li&gt;
&lt;li&gt;Implement retrieval paths that keep data inside the boundary&lt;/li&gt;
&lt;li&gt;Add correlation IDs end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 21–30: Hardening and operations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add regression tests and golden conversations&lt;/li&gt;
&lt;li&gt;Implement rollouts, version pinning, and canary strategy&lt;/li&gt;
&lt;li&gt;Load test and finalize a capacity plan and operational runbooks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This stack is not about "on‑prem versus cloud." It is about aligning the agent pattern with the constraints imposed by the use case: data locality, tool proximity, latency targets, and network realities. Azure Local provides a consistent on‑prem platform for that pattern; AKS keeps operations cloud-native; Foundry Local enables local inference; and Agent Framework provides the application layer to build agents and workflows that map to real business outcomes. By following this architecture and implementation runbook, you can deliver production‑grade AI agents that run locally, proving that cloud-native excellence is not constrained by your network boundary.&lt;/p&gt;

</description>
      <category>azurelocal</category>
      <category>foundrylocal</category>
      <category>agents</category>
    </item>
    <item>
      <title>Testing Copilot Agents: When to Use Agent Evaluation vs. the Copilot Studio Kit</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sun, 05 Apr 2026 07:29:14 +0000</pubDate>
      <link>https://forem.com/holgerimbery/testing-copilot-agents-when-to-use-agent-evaluation-vs-the-copilot-studio-kit-4f3e</link>
      <guid>https://forem.com/holgerimbery/testing-copilot-agents-when-to-use-agent-evaluation-vs-the-copilot-studio-kit-4f3e</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Microsoft's Agent Evaluation GA announcement on March 31, 2026, update to &lt;a href="https://holgerimbery.blog/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview" rel="noopener noreferrer"&gt;Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt; &lt;br&gt;
Agent Evaluation and the Copilot Studio Kit are not competing tools—they represent a layered quality-assurance strategy. Agent Evaluation provides fast, AI-assisted behavioral validation embedded directly in Copilot Studio, ideal for iteration and rapid feedback. The Copilot Studio Kit delivers deterministic, enterprise-grade verification for production gates, compliance, and governance. This article breaks down what each tool does, when to use them, and how to adopt both as your agent quality matures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why read this&lt;/strong&gt;&lt;br&gt;
If you're building or scaling Copilot agents in your organization, you need clarity on testing strategy. This article cuts through the positioning and provides a practical decision framework for when to reach for Agent Evaluation versus the Copilot Studio Kit, with real-world scenarios showing how mature teams layer both tools across their development lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Microsoft shipped with Agent Evaluation (GA)
&lt;/h2&gt;

&lt;p&gt;On March 31, 2026, Microsoft announced the general availability of Agent Evaluation, marking a significant milestone in Copilot Studio's testing and validation capabilities. Agent Evaluation is now generally available and built directly into Copilot Studio. Its goal is to make agent quality visible, repeatable, and scalable without requiring external tools or setup. This GA release represents the culmination of Microsoft's efforts to democratize agent quality assurance, bringing evaluation capabilities previously limited to advanced setups directly into the hands of everyday agent makers in the Copilot Studio authoring environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core characteristics (as of 31. March 2026)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Integrated directly into the Copilot Studio authoring experience&lt;/strong&gt;&lt;br&gt;
Agent Evaluation is not a separate tool or external service. It lives within the Copilot Studio interface, where agents are built, allowing makers to validate their agents without context-switching or complex integrations. This tight integration reduces friction and encourages frequent validation during development cycles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designed to answer the production question:&lt;/strong&gt;&lt;br&gt;
"Can we trust this agent to behave correctly, consistently, and safely?"&lt;br&gt;
This core question drives the entire design philosophy. Agent Evaluation focuses on behavioral confidence—whether the agent produces appropriate, consistent, and safe responses across diverse scenarios and user inputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replaces unscalable manual testing and spot‑checking&lt;/strong&gt;&lt;br&gt;
Before Agent Evaluation, agent validation relied heavily on manual testing: individually testing scenarios, reviewing responses, and hoping coverage was adequate. This approach doesn't scale with agent complexity or usage volume. Agent Evaluation automates and scales this process through AI-assisted evaluation and reusable test sets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intended to be used before launch and continuously after changes&lt;/strong&gt;&lt;br&gt;
Agent Evaluation is not a one-time gate. It's designed for continuous validation: before initial launch, before deploying updates, and continuously as conversations flow through production. This shift from ceremonial testing to continuous validation aligns with modern DevOps practices.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation capabilities
&lt;/h3&gt;

&lt;p&gt;Agent Evaluation allows makers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Create evaluation sets from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manually added questions&lt;/li&gt;
&lt;li&gt;Imported test sets&lt;/li&gt;
&lt;li&gt;AI‑generated queries derived from agent metadata and knowledge sources &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Choose flexible evaluation methods, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact/partial match&lt;/li&gt;
&lt;li&gt;Semantic similarity&lt;/li&gt;
&lt;li&gt;Intent recognition&lt;/li&gt;
&lt;li&gt;Relevance and completeness scoring &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Mix AI‑generated and human‑defined scenarios to balance breadth and depth&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Reuse evaluations over time and run them via APIs for lifecycle testing&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key framing&lt;/strong&gt;:&lt;br&gt;
Agent Evaluation is positioned as a lightweight, AI‑assisted validation layer that fits naturally into everyday agent authoring and iteration. Unlike heavy external testing frameworks that require context switching and specialized infrastructure, Agent Evaluation operates within Copilot Studio itself, where agents are built. This embedded approach acknowledges that agent makers are iterating rapidly, testing comprehensively at each step, and need validation feedback within their authoring flow rather than as a post-production bottleneck. The AI-assisted scoring means makers don't need to hand-write every test case or define complex rubrics upfront; they can generate relevant test scenarios from their agent's own knowledge sources and metadata, then refine them. This makes evaluation accessible to makers of all skill levels and scales with agent complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Copilot Studio Kit provides for testing
&lt;/h2&gt;

&lt;p&gt;The Copilot Studio Kit (Power CAT) is a separate, solution‑based toolkit that augments Copilot Studio with enterprise‑grade testing, governance, and analytics. Developed by the Microsoft Power CAT (Patterns and Practices) team, the Kit represents a mature, production-ready framework built for organizations requiring rigorous quality assurance, regulatory compliance, and scalable CI/CD integration. While Agent Evaluation addresses everyday iteration and behavioral confidence within the authoring canvas, the Copilot Studio Kit provides the structural backbone for organizations that need deterministic verification, audit trails, multi-layer testing orchestration, and governance enforcement across large deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explicit testing capabilities
&lt;/h3&gt;

&lt;p&gt;The Kit supports structured, deterministic, and multi‑layer testing, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response Match (exact or conditional text comparison)&lt;/li&gt;
&lt;li&gt;Attachment Match (Adaptive Cards/files)&lt;/li&gt;
&lt;li&gt;Topic Match (requires Dataverse enrichment)&lt;/li&gt;
&lt;li&gt;Generative Answer evaluation using AI Builder and rubrics&lt;/li&gt;
&lt;li&gt;Multi‑turn tests running in a shared conversation context&lt;/li&gt;
&lt;li&gt;Plan Validation for generative orchestration (verifying which tools/actions are invoked, not just what the agent says)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Execution and automation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tests are executed via Copilot Studio APIs (Direct Line)&lt;/li&gt;
&lt;li&gt;Bulk creation and maintenance via Excel import/export&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Detailed run‑level telemetry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pass/fail&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Observed responses&lt;/li&gt;
&lt;li&gt;Aggregated metrics&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Results can be enriched with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Application Insights&lt;/li&gt;
&lt;li&gt;Dataverse conversation transcripts&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enterprise extensions beyond testing
&lt;/h3&gt;

&lt;p&gt;The Kit also includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conversation KPIs for Power BI&lt;/li&gt;
&lt;li&gt;Prompt Advisor&lt;/li&gt;
&lt;li&gt;Agent Inventory&lt;/li&gt;
&lt;li&gt;Agent Review Tool&lt;/li&gt;
&lt;li&gt;Compliance Hub with policy enforcement and SLA‑driven reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key framing&lt;/strong&gt;:&lt;br&gt;
The Copilot Studio Kit is built for verification, regression testing, production gates, and governance at scale. Unlike Agent Evaluation's lightweight, AI-assisted approach that lives within the authoring canvas, the Kit functions as an enterprise testing backbone designed for organizations that require deterministic verification, full audit trails, and regulatory compliance enforcement. It bridges the gap between development-time validation and production-readiness, enabling structured quality gates that align with enterprise DevOps pipelines. The Kit's emphasis on exact response matching, topic validation, and orchestration plan verification makes it essential for mission-critical deployments where agent behavior must be predictable, traceable, and compliant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Direct comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Agent Evaluation (GA)&lt;/th&gt;
&lt;th&gt;Copilot Studio Kit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Where it lives&lt;/td&gt;
&lt;td&gt;Built into Copilot Studio UI&lt;/td&gt;
&lt;td&gt;Separate Power CAT solution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary purpose&lt;/td&gt;
&lt;td&gt;Behavioral validation&lt;/td&gt;
&lt;td&gt;Functional verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup effort&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Higher (Dataverse, AI Builder, App Insights optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test creation&lt;/td&gt;
&lt;td&gt;Manual, import, AI‑generated&lt;/td&gt;
&lt;td&gt;Manual + Excel bulk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI‑assisted scoring&lt;/td&gt;
&lt;td&gt;Yes (core feature)&lt;/td&gt;
&lt;td&gt;Yes (Generative Answers via AI Builder)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic checks&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Strong (exact match, topic, attachments)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi‑turn scenarios&lt;/td&gt;
&lt;td&gt;Not explicitly documented&lt;/td&gt;
&lt;td&gt;Explicitly supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration plan validation&lt;/td&gt;
&lt;td&gt;Not documented&lt;/td&gt;
&lt;td&gt;Explicitly supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD &amp;amp; quality gates&lt;/td&gt;
&lt;td&gt;Implicit / API‑based&lt;/td&gt;
&lt;td&gt;Explicit pipeline integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance &amp;amp; compliance&lt;/td&gt;
&lt;td&gt;Not in scope&lt;/td&gt;
&lt;td&gt;First‑class feature&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How they relate (this is the key insight)
&lt;/h2&gt;

&lt;p&gt;Microsoft is not replacing the Copilot Studio Kit with Agent Evaluation.&lt;br&gt;
Instead, the sources show a clear layering strategy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Agent Evaluation&lt;br&gt;
→ Fast, AI‑assisted, in‑product validation&lt;br&gt;
→ Ideal for early feedback, iteration, and continuous confidence&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copilot Studio Kit&lt;br&gt;
→ Deep, deterministic, automatable verification&lt;br&gt;
→ Ideal for release gates, regression testing, orchestration correctness, and governance&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This positioning is also explicitly reflected in community and Microsoft guidance that frames Agent Evaluation as filling the gap that manual testing cannot scale, while the Kit remains the system‑level quality backbone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical takeaway for enterprise teams
&lt;/h2&gt;

&lt;p&gt;Based on what is explicitly documented:&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use each tool
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Agent Evaluation&lt;/strong&gt; is best suited for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rapid iteration cycles during agent development&lt;/li&gt;
&lt;li&gt;Early-stage quality validation before formal review&lt;/li&gt;
&lt;li&gt;Continuous behavioral checks without infrastructure complexity&lt;/li&gt;
&lt;li&gt;Scenarios where AI-assisted, semantic evaluation is sufficient&lt;/li&gt;
&lt;li&gt;Teams prioritizing speed of feedback over deterministic guarantees&lt;/li&gt;
&lt;li&gt;Questions like: "Is this agent generally behaving well after my last change?"
→ Use Agent Evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; is best suited for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production release gates and formal deployment approval&lt;/li&gt;
&lt;li&gt;Regression testing before pushing updates to production&lt;/li&gt;
&lt;li&gt;Regulatory and compliance-driven scenarios requiring audit trails&lt;/li&gt;
&lt;li&gt;Mission-critical agents where deterministic verification is mandatory&lt;/li&gt;
&lt;li&gt;Complex orchestration scenarios requiring plan and tool invocation validation&lt;/li&gt;
&lt;li&gt;Multi-turn conversations that need end-to-end correctness&lt;/li&gt;
&lt;li&gt;Questions like: "Did we break anything? Are the topics correct? Are the tools invoked? Can this ship?"
→ Use the Copilot Studio Kit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How they complement each other
&lt;/h3&gt;

&lt;p&gt;In mature setups, the tools are complementary, not competitive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Development phase&lt;/strong&gt;: Agent Evaluation provides fast feedback loops for iteration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-production phase&lt;/strong&gt;: Copilot Studio Kit enforces deterministic verification gates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production phase&lt;/strong&gt;: Both tools support continuous monitoring—Agent Evaluation for behavioral trends, the Kit for functional regression detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance phase&lt;/strong&gt;: The Kit's compliance and KPI tracking provide the enterprise audit trail and policy enforcement layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Organizations scaling from single-agent projects to enterprise deployments should expect to adopt both tools at different maturity stages, using them in sequence rather than as either/or choices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg53qyk5yjjz216z1x0fq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg53qyk5yjjz216z1x0fq.png" alt="upgit_20260404_1775315507.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Agent Evaluation and the Copilot Studio Kit represent Microsoft's thoughtful answer to the agent testing maturity curve. As organizations build, iterate, and scale agents from proof-of-concept to mission-critical systems, both tools play essential roles at different stages of the lifecycle.&lt;/p&gt;

&lt;p&gt;Agent Evaluation brings quality validation into the authoring experience, reducing friction in everyday iteration and making behavioral confidence accessible to all agent makers. Its AI-assisted approach acknowledges the reality of rapid development cycles and the need for fast feedback loops.&lt;/p&gt;

&lt;p&gt;The Copilot Studio Kit, by contrast, provides the deterministic backbone that enterprises require—exact verification, governance enforcement, regulatory compliance, and the audit trails necessary for mission-critical deployments.&lt;/p&gt;

&lt;p&gt;The key insight is that these tools are not competitors but complementary. Teams should adopt them in sequence, starting with Agent Evaluation during development for rapid iteration, then layering in the Copilot Studio Kit as the agent approaches production. Organizations serious about agent quality at scale will ultimately adopt both, using them to build confidence at every stage from ideation to production and beyond.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>agents</category>
      <category>agentevaluation</category>
    </item>
    <item>
      <title>Azure Local, Foundry Local, and Microsoft 365 Local: A Comprehensive Guide for IT Architects and Decision-Makers</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 04 Apr 2026 07:56:37 +0000</pubDate>
      <link>https://forem.com/holgerimbery/azure-local-foundry-local-and-microsoft-365-local-a-comprehensive-guide-for-it-architects-and-261p</link>
      <guid>https://forem.com/holgerimbery/azure-local-foundry-local-and-microsoft-365-local-a-comprehensive-guide-for-it-architects-and-261p</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cloud Capabilities Without Leaving Premises&lt;br&gt;&lt;br&gt;
As regulatory demands tighten, latency requirements become critical, and data sovereignty moves from a nice-to-have to a must-have, Microsoft has engineered a comprehensive answer: &lt;strong&gt;Sovereign Private Cloud&lt;/strong&gt;. This three-pillar platform—Azure Local infrastructure, Microsoft 365 Local productivity, and Foundry Local AI—enables organizations to operate complete, intelligent cloud systems entirely within their boundaries. Whether you're managing classified government systems, running millisecond-critical manufacturing operations, sustaining teams in air-gapped locations, or processing sensitive AI workloads behind regulatory firewalls, this guide walks you through architectures, deployment strategies, and real-world patterns for implementing on-premises cloud at enterprise scale.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reasons to Read This Article&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complete Platform Understanding&lt;/strong&gt;: Grasp all three components of this Sovereign Private Cloud approach, how they integrate, and which combination matches your operational model (connected, intermittently connected, or fully offline).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment Confidence&lt;/strong&gt;: Learn the hardware requirements, licensing models, connectivity tolerances, and planning phases required to deploy Azure Local (hyperconverged or disconnected), Microsoft 365 Local, and Foundry Local in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Case Alignment&lt;/strong&gt;: Identify whether your organization fits one of the key scenarios—government/defense data sovereignty, manufacturing low-latency control, retail edge compute, isolated locations, or confidential AI—with architectural patterns and reference implementations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic AI on Premises&lt;/strong&gt;: Discover how to build multi-agent AI systems using Microsoft Agent Framework + Foundry Local + Azure Local infrastructure, enabling autonomous reasoning and automation with zero cloud dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk Mitigation and Best Practices&lt;/strong&gt;: Understand connectivity tolerance, failover strategies, backup approaches, and testing protocols to ensure your on-premises cloud operates reliably and compliantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation Path&lt;/strong&gt;: Explore trial options (60-day Azure Local eval, free Foundry Local, partner-delivered M365 Local pilots) tailored to your budget and risk profile.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Azure Local is Microsoft's distributed infrastructure solution — formerly known as Azure Stack HCI — that extends Azure capabilities to customer-owned environments. It enables local deployment of both modern and legacy applications across distributed or sovereign locations, using Azure Arc as the unifying control plane. Azure Local is the foundation of Microsoft's Sovereign Private Cloud offering, which unifies three components: Azure Local (infrastructure), Microsoft 365 Local (productivity), and Foundry Local (AI inference) to deliver a full-stack private cloud that operates at any connectivity level — connected, intermittently connected, or fully disconnected. This article provides a comprehensive overview of these offerings, their use cases, deployment options, and best practices for IT architects and decision-makers considering on-premises Azure solutions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3wml57a7s32fj9o7ngh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3wml57a7s32fj9o7ngh.png" alt="upgit_20260331_1774988563.png" width="800" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Azure Local: Hyperconverged and Disconnected
&lt;/h2&gt;

&lt;p&gt;Azure Local addresses a set of business requirements for which the public cloud alone is insufficient: compute that must remain on-premises, mission-critical application resiliency, low-latency decision-making, and specific compliance mandates. Microsoft positions it as part of the adaptive cloud approach — bringing the cloud to the customer so they can build and innovate anywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hyperconverged Deployments (Connected Mode)
&lt;/h3&gt;

&lt;p&gt;A hyperconverged deployment of Azure Local consists of one machine or a cluster of machines connected to Azure. Clusters support 1 to 16 physical machines with hyperconverged storage (up to 8 machines in rack-aware configurations). The architecture is built on proven technologies: Hyper-V, Storage Spaces Direct, and Failover Clustering.&lt;br&gt;
In connected mode, the Azure cloud serves as the management plane. Administrators use the Azure Portal, Azure CLI, or PowerShell to view, monitor, and manage individual Azure Local instances or an entire fleet. Azure Local includes a secure-by-default configuration with more than 300 security settings, providing a consistent security baseline and a drift-control mechanism.&lt;br&gt;
Connectivity tolerance: If internet connectivity is lost, all host infrastructure and existing VMs continue to run normally. However, features that directly rely on cloud services become unavailable. Azure Local must successfully sync with Azure at least once every 30 consecutive days. If that window is exceeded, the cluster enters a reduced-functionality mode — existing VMs continue running, but new VMs cannot be created until connectivity is restored.&lt;/p&gt;
&lt;h3&gt;
  
  
  Disconnected Operations (Full Offline Mode)
&lt;/h3&gt;

&lt;p&gt;For environments where any cloud connectivity is undesired or impossible, disconnected operations bring the entire Azure control plane on-premises. Organizations can deploy and manage Azure Local instances, build VMs, and run containerized applications using select Azure Arc-enabled services from a local control plane that provides a familiar Azure Portal and Azure CLI experience — all without a connection to the Azure public cloud.&lt;br&gt;
Key constraint: Disconnected mode requires extra capacity for a dedicated management cluster to host the local control plane appliance.&lt;/p&gt;

&lt;p&gt;This management cluster has the following minimum hardware requirements:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;th&gt;Minimum Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Number of nodes&lt;/td&gt;
&lt;td&gt;3 nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory per node&lt;/td&gt;
&lt;td&gt;96 GB (appliance alone needs ≥64 GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cores per node&lt;/td&gt;
&lt;td&gt;24 physical cores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage per node&lt;/td&gt;
&lt;td&gt;2 TB SSD/NVMe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boot disk drive storage&lt;/td&gt;
&lt;td&gt;960 GB SSD/NVMe&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Disconnected operations are intended for organizations that cannot connect to Azure due to connectivity issues or regulatory restrictions. To procure this capability, a valid business justification and a Microsoft Customer Agreement for Enterprises (MCA-E) (or other eligible agreement type) are required.&lt;/p&gt;
&lt;h3&gt;
  
  
  Connected vs. Disconnected: Decision Framework
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision Factor&lt;/th&gt;
&lt;th&gt;Connected (Hyperconverged)&lt;/th&gt;
&lt;th&gt;Disconnected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud dependency&lt;/td&gt;
&lt;td&gt;Requires outbound HTTPS to Azure ≥ once per 30 days&lt;/td&gt;
&lt;td&gt;Zero cloud dependency; local control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Management plane&lt;/td&gt;
&lt;td&gt;Azure public cloud (Azure Portal, Arc)&lt;/td&gt;
&lt;td&gt;On-premises Azure Portal and CLI replica&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware overhead&lt;/td&gt;
&lt;td&gt;Workload cluster only (1–16 nodes)&lt;/td&gt;
&lt;td&gt;Workload cluster + dedicated 3-node management cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eligibility&lt;/td&gt;
&lt;td&gt;Any Azure subscription&lt;/td&gt;
&lt;td&gt;Requires MCA-E and business justification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Hybrid scenarios, branch offices, edge with periodic connectivity&lt;/td&gt;
&lt;td&gt;Air-gapped facilities, classified environments, remote sites without Internet&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Typical Use Cases
&lt;/h2&gt;

&lt;p&gt;Azure Local, Foundry Local, and Microsoft 365 Local serve scenarios in which the traditional public cloud alone cannot meet operational, regulatory, or latency requirements. The following use cases emerge from Microsoft's documentation and partner ecosystem:&lt;/p&gt;
&lt;h3&gt;
  
  
  Government and Defense (Data Sovereignty)
&lt;/h3&gt;

&lt;p&gt;Organizations in government, defense, and intelligence sectors require that data, operations, and control remain within organizational boundaries. Azure Local enables sovereign private clouds in which all workloads run locally. Microsoft 365 Local adds core collaboration tools — Exchange Server, SharePoint Server, and Skype for Business Server — that run entirely within the customer's sovereign operational boundary, keeping teams productive even when disconnected from the cloud. In disconnected mode, data residency and sovereign requirements are met without relying solely on public sovereign cloud controls. &lt;/p&gt;
&lt;h3&gt;
  
  
  Manufacturing and Industrial Operations (Low Latency &amp;amp; Reliability)
&lt;/h3&gt;

&lt;p&gt;Azure Local targets control systems and near real-time operations with extreme latency requirements — manufacturing execution systems, industrial quality assurance, and production line operations that must continue through network outages. On-premises compute clusters enable decisions in milliseconds without cloud round-trip delays. Azure Local's integration with Azure IoT Operations (deployed on AKS clusters enabled by Azure Arc on Azure Local) provides a turnkey approach for managing and processing IoT data at the edge. &lt;/p&gt;
&lt;h3&gt;
  
  
  Retail and Branch Offices (Edge Compute)
&lt;/h3&gt;

&lt;p&gt;Azure Local supports single-machine deployments through full clusters, making it suitable for distributed retail or branch scenarios where local AI inference at the source is needed — for example, self-checkout systems and loss-prevention applications in retail stores. The hyperconverged design ensures that even if WAN connectivity to central services drops, local operations continue uninterrupted. &lt;/p&gt;
&lt;h3&gt;
  
  
  Remote and Isolated Locations
&lt;/h3&gt;

&lt;p&gt;Industries operating in areas with limited network infrastructure — oil rigs, mining sites, rural clinics, and vessels at sea — benefit from operating in disconnected environments. Azure Local lets them use Azure Arc services and run workloads without relying on internet connectivity. Foundry Local extends this by enabling on-device inference of AI models in offline or bandwidth-constrained environments. &lt;/p&gt;
&lt;h3&gt;
  
  
  Confidential AI and Data Processing
&lt;/h3&gt;

&lt;p&gt;Organizations that need to run AI on sensitive data without exposing it to third-party clouds can combine Azure Local with Foundry Local. This enables local AI inferencing, where data is processed at the source. Foundry Local supports chat completions (text generation) and audio transcription (speech-to-text) through a single runtime that runs entirely on-device, with no cloud dependency for inference. Foundry Local now supports large multimodal models on Azure Local infrastructure, using the latest GPUs from partners like NVIDIA, so you can run advanced AI inference in sovereign environments.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Is Available
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Azure Local Core Infrastructure
&lt;/h3&gt;

&lt;p&gt;Azure Local is a full-stack infrastructure software running on validated hardware in customer facilities. It supports VMs, containers, and select Azure services locally while maintaining Azure-consistent management through Azure Arc.&lt;/p&gt;

&lt;p&gt;Features and architecture of hyperconverged deployments:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;Validated hardware from Microsoft partners; 1–16 machines per instance (max 8 for rack-aware clusters)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Storage Spaces Direct; external SAN storage in preview for qualified opportunities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Networking&lt;/td&gt;
&lt;td&gt;Customer-managed with physical switches and VLANs; optional software-defined networking (SDN)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local Services&lt;/td&gt;
&lt;td&gt;VMs for general-purpose workloads; AKS enabled by Azure Arc for containerized workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Management&lt;/td&gt;
&lt;td&gt;Azure Policy, Azure Monitor, Microsoft Defender for Cloud, and others via Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Metrics and logs sent to Azure Monitor and Log Analytics for infrastructure and workload resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Management Tools&lt;/td&gt;
&lt;td&gt;Azure Portal, CLI, ARM/Bicep/Terraform (cloud); PowerShell, Windows Admin Center, Hyper-V Manager, Failover Cluster Manager (local)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disaster Recovery&lt;/td&gt;
&lt;td&gt;Azure Backup, Azure Site Recovery, and non-Microsoft partners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;300+ security settings for consistent baseline and drift control; Trusted Launch for VMs; Microsoft Defender for Cloud integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Common Azure services on Azure Local:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Virtual machines&lt;/td&gt;
&lt;td&gt;Azure Local VMs enabled by Azure Arc (Windows/Linux, with Trusted Launch support)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Virtual desktops&lt;/td&gt;
&lt;td&gt;Azure Virtual Desktop (AVD) session hosts on-premises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container orchestration&lt;/td&gt;
&lt;td&gt;Azure Kubernetes Service (AKS) enabled by Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arc-enabled services&lt;/td&gt;
&lt;td&gt;Select Azure services for hybrid workloads via Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-performance databases&lt;/td&gt;
&lt;td&gt;SQL Server on Azure Local with extra resiliency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Media analytics&lt;/td&gt;
&lt;td&gt;Azure AI Video Indexer enabled by Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI chat assistants&lt;/td&gt;
&lt;td&gt;Azure Edge RAG (Preview) — turnkey RAG solution for custom chat over private data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IoT management&lt;/td&gt;
&lt;td&gt;Azure IoT Operations on AKS clusters on Azure Local&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Disconnected operations support a subset of these services via the local control plane:   &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Azure Portal&lt;/td&gt;
&lt;td&gt;Local portal experience similar to Azure Public&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Resource Manager (ARM)&lt;/td&gt;
&lt;td&gt;Subscriptions, resource groups, ARM templates, CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RBAC&lt;/td&gt;
&lt;td&gt;Role-based access control for subscriptions and resource groups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managed identity&lt;/td&gt;
&lt;td&gt;System-assigned managed identity for supported resource types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arc-enabled servers&lt;/td&gt;
&lt;td&gt;VM guest management for Azure Local VMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local VMs&lt;/td&gt;
&lt;td&gt;Windows or Linux VMs via disconnected operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arc-enabled Kubernetes (Preview)&lt;/td&gt;
&lt;td&gt;CNCF Kubernetes cluster management on Azure Local VMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AKS enabled by Arc (Preview)&lt;/td&gt;
&lt;td&gt;AKS on Azure Local in disconnected mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local device management&lt;/td&gt;
&lt;td&gt;Create and manage instances, add/remove nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Container Registry&lt;/td&gt;
&lt;td&gt;Store and retrieve container images and artifacts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Key Vault&lt;/td&gt;
&lt;td&gt;Store and access secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Policy&lt;/td&gt;
&lt;td&gt;Enforce standards and governance on new resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Deployment types include hyperconverged deployments, multi-rack deployments (in preview), Microsoft 365 on Local, and disconnected operations. Multi-rack deployments support larger configurations with prescriptive hardware BOMs featuring pre-integrated racks containing SAN storage, servers, and network devices; re-use of existing hardware is not supported for multi-rack at this time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Microsoft 365 Local
&lt;/h3&gt;

&lt;p&gt;Microsoft 365 Local runs Exchange Server, SharePoint Server, and Skype for Business Server on Azure Local infrastructure that is entirely customer-owned and managed. It supports both hybrid and fully disconnected deployments and provides an Azure-consistent management experience with a unified control plane. &lt;/p&gt;

&lt;p&gt;Core capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exchange, SharePoint, and Skype for Business&lt;/strong&gt;: Enterprise-grade email, document management, and unified communications on-premises, addressing stringent data residency requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certified and validated solutions&lt;/strong&gt;: Deployed on Azure Local Premier Solutions from hardware partners, guaranteeing compatibility for sovereign deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-stack validated reference architecture&lt;/strong&gt;: Prescriptive guidance for networking, storage, compute, and identity integration based on best practices for optimal performance and resiliency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sovereign Private Cloud capabilities&lt;/strong&gt;: Azure-consistent management with enhanced security features (encryption, access controls, compliance mechanisms) aligned with local regulatory frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid or fully disconnected support&lt;/strong&gt;: Connected mode uses Azure as the cloud control plane; disconnected mode uses a local control plane for complete isolation and air-gapped operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example large-scale server role allocation (connected mode): &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 servers configured as a three-node Azure Local instance for SharePoint Server and SQL Server workloads.&lt;/li&gt;
&lt;li&gt;4 servers each as single-node Azure Local instances for Exchange Server mailbox roles.&lt;/li&gt;
&lt;li&gt;2 servers each as single-node Azure Local instances for Exchange Server edge transport roles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft 365 Local is now generally available and must be deployed through a Microsoft-certified solution partner. Microsoft has committed to supporting these on-premises productivity server workloads through at least 2035.&lt;/p&gt;
&lt;h3&gt;
  
  
  Foundry Local
&lt;/h3&gt;

&lt;p&gt;Foundry Local is an on-device AI inference solution (currently in public preview) that enables local execution of AI models through a CLI, SDK, or REST API. It provides an OpenAI-compatible REST endpoint running entirely on-device, meaning prompts and model outputs are processed locally without being sent to the cloud.&lt;/p&gt;

&lt;p&gt;System requirements:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Windows 10 (x64), Windows 11 (x64/ARM), Windows Server 2025, macOS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum hardware&lt;/td&gt;
&lt;td&gt;8 GB RAM, 3 GB free disk space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommended hardware&lt;/td&gt;
&lt;td&gt;16 GB RAM, 15 GB free disk space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optional acceleration&lt;/td&gt;
&lt;td&gt;NVIDIA GPU (2000 series+), AMD GPU (6000 series+), AMD NPU, Intel iGPU, Intel NPU (32 GB+ memory), Qualcomm Snapdragon X Elite (8 GB+ memory), Qualcomm NPU, Apple silicon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Supported AI capabilities:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Foundry Local Service&lt;/td&gt;
&lt;td&gt;An OpenAI-compatible REST server providing a standard interface for inference. The endpoint is dynamically allocated when the service starts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime&lt;/td&gt;
&lt;td&gt;Executes optimized ONNX models on CPUs, GPUs, or NPUs; supports multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and quantized models for faster inference.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Management&lt;/td&gt;
&lt;td&gt;CLI and cache system for downloading, listing, and managing AI models locally.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Key architectural components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Foundry Local Service&lt;/strong&gt;: An OpenAI-compatible REST server providing a standard interface for inference. The endpoint is dynamically allocated when the service starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime&lt;/strong&gt;: Executes optimized ONNX models on CPUs, GPUs, or NPUs; supports multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and quantized models for faster inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Management&lt;/strong&gt;: CLI and cache system for downloading, listing, and managing AI models locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No Azure subscription is required to use Foundry Local on a device; it runs on local hardware with no recurring cloud costs for inference. &lt;br&gt;
For sovereign environments requiring heavier AI workloads, the integration of Foundry Local with Azure Local supports large-scale models utilizing the latest GPUs from NVIDIA, with Microsoft providing comprehensive support for deployments, updates, and operational health.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6phu5leex2ixbgcffym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6phu5leex2ixbgcffym.png" alt="upgit_20260331_1774988652.png" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites and Planning for Deployment
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Hardware and Catalog Selection
&lt;/h3&gt;

&lt;p&gt;Azure Local runs exclusively on validated hardware configurations listed in the Azure Local Solutions Catalog. Hardware solutions fall into three categories: Validated Nodes, Integrated Systems, and Premier Solutions. Premier Solutions delivers deep integration and validation for a smooth end-to-end experience. For hyperconverged deployments, you can reuse existing hardware only if it matches a supported configuration in the catalog; otherwise, upgrades or new hardware are required.&lt;br&gt;&lt;br&gt;
Each Azure Local machine in a hyperconverged cluster must meet system requirements for CPU, memory, storage, and network. For planning, the Azure Local Catalog and available sizing tools help estimate hardware requirements for the intended workload profile. Networking must be designed for redundancy and performance—typically using 10–25 GbE or higher links, physical switches, and VLANs. Optional SDN services can be enabled for software-defined networking.&lt;br&gt;&lt;br&gt;
For disconnected operations, plan additional capacity for the management cluster as detailed in Section 1 (3 nodes, 96 GB RAM/node, 24 cores/node, 2 TB SSD/node, 960 GB boot disk/node).&lt;br&gt;&lt;br&gt;
For Microsoft 365 Local, hardware must be an Azure Local Premier Solution that specifically meets the M365 Local requirements listed in the Azure Local Solutions Catalog. Please work with your authorized Microsoft partner to size the deployment appropriately. We have reference architectures for small-, mid-, and large-scale configurations tailored to your needs.  &lt;/p&gt;
&lt;h3&gt;
  
  
  Azure Subscription and Licensing
&lt;/h3&gt;

&lt;p&gt;An Azure subscription is required for Azure Local. The billing model charges a per-physical-core fee on on-premises machines, plus consumption-based charges for any additional Azure services used. All charges roll up to the existing Azure subscription. For disconnected operations, an eligible enterprise agreement (such as MCA-E) is also needed, and qualification must be discussed with the Microsoft account team before procurement.&lt;br&gt;&lt;br&gt;
Additional licensing considerations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS licenses for workload VMs (e.g., Windows Server)
&lt;/li&gt;
&lt;li&gt;Microsoft 365 server licenses if deploying M365 Local (Exchange, SharePoint, Skype)&lt;/li&gt;
&lt;li&gt;Foundry Local requires no Azure subscription and has no RBAC role requirements when running solely on-device.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Network Connectivity Planning
&lt;/h3&gt;

&lt;p&gt;In connected mode, each machine must have outbound HTTPS connectivity to well-known Azure endpoints at least every 30 days. If SDN is planned, review the SDN overview before deployment. Network and host requirements must be met per Microsoft's published specifications.&lt;br&gt;&lt;br&gt;
In disconnected mode, the local management cluster must be networked to the workload clusters within the customer's environment, but no external internet is required post-deployment (only registration data is exchanged during initial deployment, registration, and license renewal).  &lt;/p&gt;
&lt;h3&gt;
  
  
  Assessment and Planning Phases
&lt;/h3&gt;

&lt;p&gt;A structured planning process reduces risk. Microsoft and its partners typically follow phased engagement for Azure Local projects, especially for M365 Local:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Assessment&lt;/td&gt;
&lt;td&gt;Analyze organizational requirements, compliance needs, and desired outcomes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Planning&lt;/td&gt;
&lt;td&gt;Define hardware configurations, software solutions, migration, and integration strategies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Acquisition&lt;/td&gt;
&lt;td&gt;Procure necessary hardware, software, and licenses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Execute the planned rollout in accordance with best practices&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For disconnected operations, organizations must additionally identify workloads and application requirements for disconnected operation, and staff (or partners) with the capability to deploy and operate disconnected environments.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deploying Azure Local: Steps and Best Practices
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Cluster Installation and Registration
&lt;/h3&gt;

&lt;p&gt;For hyperconverged deployments, the Azure Local operating system can be downloaded from the Azure Portal, which includes a free 60-day trial. Alternatively, pre-integrated systems from OEM partners arrive with Azure Local pre-installed. After installing the OS on each server node and configuring the cluster (using Storage Spaces Direct for storage and Failover Clustering for high availability), the cluster must be registered with Azure Arc to enable cloud management through the Azure Portal and Arc tools.&lt;br&gt;&lt;br&gt;
Hardware can be purchased from any Microsoft hardware partner listed in the Azure Local Catalog, and the available sizing tool can help estimate hardware requirements before purchase.   &lt;/p&gt;
&lt;h3&gt;
  
  
  Post-Deployment Configuration
&lt;/h3&gt;

&lt;p&gt;Once registered, the Azure Local instance appears in the Azure Portal as a manageable resource. Post-deployment steps include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enabling Arc-enabled services&lt;/strong&gt;: Configure AKS clusters, Arc-enabled data services, or other platform services as needed for workload requirements.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Applying governance policies&lt;/strong&gt;: Use Azure Policy to enforce compliance standards across the on-premises environment, and configure Microsoft Defender for Cloud to assess and improve security posture.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setting up monitoring&lt;/strong&gt;: Configure Azure Monitor and Log Analytics for metrics and log collection from both infrastructure and workloads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keeping the environment current&lt;/strong&gt;: Azure Local provides Solution Updates that simplify keeping the entire stack up to date across OS, firmware, and drivers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For disconnected deployments, these management services are configured on the local control plane appliance rather than through Azure public endpoints. The local Azure Portal and CLI provide an equivalent experience for managing policies, deploying VMs, and monitoring infrastructure within the isolated environment.&lt;br&gt;&lt;br&gt;
Deploying Microsoft 365 Local&lt;br&gt;
M365 Local must be deployed through a Microsoft-certified solution partner. The partner follows the reference architecture to provision the required Azure Local instances and configure Exchange, SharePoint, and Skype for Business server roles. The reference architectures include prescriptive guidance for networking and security — covering virtual networks, network security groups, and load balancers to segment, isolate, and secure access to workloads. In connected mode, architectures use Azure as the cloud-connected control plane; in disconnected mode, they use a local control plane.&lt;br&gt;&lt;br&gt;
Organizations can contact their Microsoft account team or visit the Microsoft 365 Local General Availability sign-up page for information about authorized partners.   &lt;/p&gt;
&lt;h3&gt;
  
  
  Testing and Validation
&lt;/h3&gt;

&lt;p&gt;Thorough validation after deployment is critical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cluster validation&lt;/strong&gt;: Run the built-in validation tools to confirm hardware, storage, and network configurations meet requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VM and failover testing&lt;/strong&gt;: Create test VMs, perform live migrations between nodes, and simulate node failures to verify high availability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connectivity resilience (connected mode)&lt;/strong&gt;: Simulate internet outages to confirm workloads continue uninterrupted and that the cluster correctly reconnects and syncs within the 30-day window.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disconnected mode testing&lt;/strong&gt;: Verify that the local management portal supports all required operations (VM provisioning, policy enforcement, monitoring) without any external connectivity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backup and recovery validation&lt;/strong&gt;: Test backup and restore procedures using Azure Backup, Azure Site Recovery, or third-party solutions.
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Planning and Deploying VM Workloads
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Capacity Planning
&lt;/h3&gt;

&lt;p&gt;Unlike the elastic scaling of public Azure, on-premises capacity is finite. IT architects must right-size VMs based on the physical resources available in the Azure Local cluster, while maintaining headroom for peak loads and failover overhead. Consider future growth when sizing: adding capacity requires purchasing and deploying new server nodes — a slower process than cloud scaling. The Azure Local Catalog and sizing tools assist with estimating how many VMs of given sizes a cluster configuration can support. &lt;/p&gt;
&lt;h3&gt;
  
  
  Creating VMs via Azure Arc
&lt;/h3&gt;

&lt;p&gt;Azure Local manages VMs as Azure resources through the Azure Arc Resource Bridge. VMs can be created using the Azure Portal, Azure CLI, ARM templates, Bicep, or Terraform. The creation workflow through the Azure Portal involves:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to the Azure Local cluster resource and select + Create VM.&lt;/li&gt;
&lt;li&gt;Specify project details: subscription, resource group.&lt;/li&gt;
&lt;li&gt;Configure instance details: VM name, custom location (associated with the Azure Local cluster), security type (Standard or Trusted Launch), storage path, OS image, administrator account, vCPU count, memory allocation (static or dynamic — cannot be changed post-deployment).&lt;/li&gt;
&lt;li&gt;Optionally enable Guest Management for Arc extensions integration, Domain Join for Active Directory, and additional data disks.&lt;/li&gt;
&lt;li&gt;Configure networking: attach at least one network interface with appropriate IP allocation (DHCP or static).&lt;/li&gt;
&lt;li&gt;Review and create.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This information is based on the documented VM deployment process for Azure Local environments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image management&lt;/strong&gt;: Custom VM images (VHDs) can be uploaded or imported as templates. Preparing golden images — pre-hardened with security agents, configurations, and required software — streamlines consistent provisioning across the fleet.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Security for VM Workloads
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trusted Launch&lt;/strong&gt;: Supported for Azure Local VMs, enabling secure boot and virtual TPM (vTPM). The vTPM state automatically transfers within a cluster, and attestation confirms whether the VM started in a known-good state.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Defender for Cloud&lt;/strong&gt;: Can assess and improve the security posture of both the Azure Local instance and individual VMs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arc guest management&lt;/strong&gt;: Extensions can be deployed inside VMs for configuration management, monitoring, and security agent installation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  GPU Workloads
&lt;/h3&gt;

&lt;p&gt;For AI or graphics-intensive workloads, Azure Local supports GPU-equipped servers. GPUs can be made accessible to VMs through direct pass-through or shared via GPU partitioning (GPU-P), which allows a single physical GPU to be divided into multiple virtual GPUs for different workloads simultaneously. This is valuable when multiple AI inference services, rendering tasks, or data processing workloads need GPU acceleration concurrently. NVIDIA GPUs (such as A-series models) are validated for Azure Local deployments.&lt;/p&gt;
&lt;h2&gt;
  
  
  Tryout and Evaluation Options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local 60-Day Trial&lt;/td&gt;
&lt;td&gt;Download the Azure Local OS from the Azure Portal for a free 60-day evaluation for proof-of-concept deployments on your own hardware. Even a single validated server can be used to test core features. Microsoft's Azure Arc Jumpstart project provides step-by-step demo scenarios.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Foundry Local (Preview)&lt;/td&gt;
&lt;td&gt;Free, no Azure subscription required. Install via &lt;code&gt;winget install Microsoft.FoundryLocal&lt;/code&gt; (Windows) or &lt;code&gt;brew tap microsoft/foundrylocal &amp;amp;&amp;amp; brew install foundrylocal&lt;/code&gt; (macOS). Run a model immediately: &lt;code&gt;foundry model run qwen2.5-0.5b&lt;/code&gt;. Experiment with text generation and speech-to-text on existing hardware. Alternatively, download the installer from the Foundry Local GitHub repository.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft 365 Local&lt;/td&gt;
&lt;td&gt;No standalone trial download; engagement through Microsoft or a certified solution partner is required for proof-of-concept or pilot deployments. Contact your Microsoft account team or visit the M365 Local GA sign-up page. Hardware requirements are significant (enterprise-scale server configurations), so evaluations typically take place in partner labs or test environments.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning Resources&lt;/td&gt;
&lt;td&gt;Microsoft Learn modules, tutorials, and the Azure Arc Jumpstart provide guided lab experiences. Community blogs, partner solution briefs (from Dell, HPE, Lenovo, etc.), and the Microsoft Tech Community contain implementation case studies and architectural guidance.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tradeoff consideration for trials: The 60-day Azure Local trial enables self-service evaluation of the core hyperconverged platform and VM management. However, testing disconnected operations requires the dedicated management cluster hardware and MCA-E eligibility, which limits ad hoc experimentation. For M365 Local, the partner-delivered model ensures proper configuration, but it means organizations cannot independently test before engaging commercially. Foundry Local, by contrast, offers the lowest barrier to entry — it runs on a standard laptop or desktop with no cloud dependencies.&lt;/p&gt;
&lt;h2&gt;
  
  
  Appendix: Building Agentic AI Solutions with Azure Local, Microsoft Agent Framework, and Foundry Local
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Conceptual Overview
&lt;/h3&gt;

&lt;p&gt;Modern AI applications increasingly follow an agentic pattern — multiple specialized AI agents that reason, communicate, and act to perform complex tasks. Microsoft provides tools to develop and run these solutions entirely on local infrastructure by combining three components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Azure Local — the on-premises infrastructure providing compute, storage, networking, and (optionally) GPU acceleration.&lt;/li&gt;
&lt;li&gt;Foundry Local — the on-device AI inference runtime serving LLM and other models via an OpenAI-compatible API endpoint.&lt;/li&gt;
&lt;li&gt;Microsoft Agent Framework (MAF) — an open-source framework (Python and .NET SDKs) for building, orchestrating, and deploying AI agents and multi-agent workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Agent Framework was introduced as an open-source project by Microsoft and is hosted on GitHub at microsoft/agent-framework with over 8,300 stars and 1,400 forks. The latest release at the time of research was python-1.0.0rc5 (dated 2026-03-19).&lt;/p&gt;
&lt;h3&gt;
  
  
  Architecture Pattern
&lt;/h3&gt;

&lt;p&gt;A concrete reference implementation was published on the Microsoft Developer Community Blog, demonstrating real-world AI automation with Foundry Local and MAF — described as running with "no cloud subscription, no API keys, no internet required". The system uses four specialized agents orchestrated by MAF:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PlannerAgent&lt;/td&gt;
&lt;td&gt;Sends user commands to the Foundry Local LLM and produces a structured JSON action plan&lt;/td&gt;
&lt;td&gt;4–45 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SafetyAgent&lt;/td&gt;
&lt;td&gt;Validates actions against workspace bounds and schema constraints&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ExecutorAgent&lt;/td&gt;
&lt;td&gt;Dispatches validated actions to the target system (e.g., robotics simulator for inverse kinematics and gripper control)&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NarratorAgent&lt;/td&gt;
&lt;td&gt;Produces a template-based summary of actions taken (with optional LLM elaboration)&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The orchestration flow follows a sequential pipeline: User → Orchestrator → Planner → Safety → Executor → Target System, with the Narrator providing observability. &lt;/p&gt;

&lt;p&gt;In this reference, the PlannerAgent uses Foundry Local as its AI backend, invoking a local model (e.g., qwen2.5-coder-0.5b) via the standard OpenAI Python client pointing to the Foundry Local endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;foundry_local&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FoundryLocalManager&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FoundryLocalManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5-coder-0.5b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern — structured JSON output from an LLM, validated by a safety layer, dispatched to a domain-specific engine — generalizes beyond robotics to home automation, game AI, CAD, lab equipment, and any domain requiring safe, structured control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Patterns on Azure Local
&lt;/h3&gt;

&lt;p&gt;For production deployment of agentic AI on Azure Local infrastructure, the following layered architecture applies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1&lt;/strong&gt; — AI Model Hosting: One or more Azure Local VMs (or containers) running Foundry Local to serve AI models. For small models, a standard CPU-equipped VM suffices. For large multimodal models, VMs with dedicated GPU access on Azure Local infrastructure leverage the latest NVIDIA GPUs for high-throughput inference. Foundry Local automatically selects the best execution provider (NPU &amp;gt; GPU &amp;gt; CPU) for the available hardware.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2&lt;/strong&gt; — Agent Orchestration: The Microsoft Agent Framework runs as a service (in a container on AKS or in a VM) and orchestrates the multi-agent pipeline. It handles agent-to-agent communication, memory management, tool integrations, and calls to the Foundry Local inference endpoint. Domain-specific engines (simulation environments, database connectors, control system APIs) can be integrated as tools that agents invoke during execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 3&lt;/strong&gt; — Application Interface: A custom frontend (web application, dashboard, CLI, or API gateway) through which users submit tasks and receive results. This can be hosted on the same Azure Local cluster.
All inter-layer communication occurs over the cluster's internal network, keeping data fully on-premises and latency to a minimum.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Applicable Scenarios
&lt;/h3&gt;

&lt;p&gt;The combination of Azure Local + Foundry Local + MAF enables agentic AI solutions where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Industrial automation&lt;/strong&gt;: Agents interpret natural-language operator commands, plan machine actions, validate safety constraints, and execute robotic or process-control operations — all on the factory floor without cloud dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sovereign AI assistants&lt;/strong&gt;: Multi-agent systems that collate local data, reason using on-device LLMs, and provide decision support in classified or regulated environments (defense, finance, healthcare) where data must never leave the premises.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge intelligence&lt;/strong&gt;: IoT-connected environments where agents monitor sensor data streams, use local AI for anomaly detection and root-cause analysis, and actuate responses in real time — applicable to energy infrastructure, transportation systems, or smart facilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline automation&lt;/strong&gt;: Field operations, shipboard systems, or disaster-response scenarios where internet connectivity is unavailable but sophisticated AI reasoning and automation are still required.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Advantages and Tradeoffs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Advantages&lt;/strong&gt;: Running agentic AI entirely on Azure Local provides data sovereignty (all prompts, model outputs, and orchestration data remain local), low latency (no network hops to cloud endpoints), deterministic cost (no per-token API charges), and operational resilience (functions without internet).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tradeoffs&lt;/strong&gt;: On-device models are constrained by local GPU memory and compute — the largest cloud-hosted models (e.g., GPT-4 at full scale) may not be runnable locally without significant GPU investment. Model updates require manual download and deployment rather than automatic cloud-side updates. Additionally, Foundry Local remains in public preview, meaning features and supported models are still evolving and may have limitations before general availability. Organizations should evaluate whether the models available for local inference meet their quality bar for production use, and plan for a path to larger models as Foundry Local's support for large-scale models on Azure Local with NVIDIA GPUs matures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Azure Local, Foundry Local, and Microsoft 365 Local together form a cohesive platform for organizations seeking sovereign, on-premises cloud capabilities without compromise. As data residency, regulatory compliance, and operational resilience become non-negotiable requirements across industries, Microsoft's investment in distributed infrastructure and local AI inference reflects a fundamental shift in how enterprises architect their digital ecosystems.&lt;/p&gt;

&lt;p&gt;The combination of &lt;strong&gt;Azure Local&lt;/strong&gt; (providing edge-aware infrastructure and hybrid compute), &lt;strong&gt;Microsoft 365 Local&lt;/strong&gt; (delivering productivity and collaboration on-premises), and &lt;strong&gt;Foundry Local&lt;/strong&gt; (enabling local LLM inference) addresses the long-standing tension between cloud agility and data sovereignty. Whether your organization operates in a connected, intermittently connected, or fully disconnected environment, these solutions let you innovate locally without sacrificing the governance, scale, or intelligence that cloud-native architectures offer.&lt;/p&gt;

&lt;p&gt;For IT architects and decision-makers, the path forward is clear: evaluate your specific regulatory, latency, and data residency requirements; prototype on a small cluster or Azure Local Appliance; and progressively expand as organizational confidence and operational maturity grow. The learning curve is manageable, the economics are favorable for regulated industries, and the competitive advantage in markets demanding data sovereignty is significant.&lt;/p&gt;

&lt;p&gt;As Foundry Local and Azure Local move toward general availability and mature their feature sets, the case for Sovereign Private Cloud becomes stronger. The future of enterprise computing is not "cloud vs. on-premises" — it is a thoughtfully designed hybrid architecture that respects both business logic and the regulatory terrain in which that logic operates.&lt;/p&gt;

</description>
      <category>azurelocal</category>
      <category>foundrylocal</category>
      <category>m365local</category>
    </item>
    <item>
      <title>Practical Guideline: How to Move Agents Beyond POCs and Deliver Real Enterprise Value</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 21 Mar 2026 08:29:29 +0000</pubDate>
      <link>https://forem.com/holgerimbery/practical-guideline-how-to-move-agents-beyond-pocs-and-deliver-real-enterprise-value-3267</link>
      <guid>https://forem.com/holgerimbery/practical-guideline-how-to-move-agents-beyond-pocs-and-deliver-real-enterprise-value-3267</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I hear the same question repeatedly from customers exploring agent or broader AI adoption: &lt;em&gt;"How do we escape the endless POC phase and actually deliver real business value?"&lt;/em&gt; Most organizations get stuck prototyping broadly instead of executing narrowly, trapped in cycles of experimentation that never reach production. This practical guideline distills &lt;strong&gt;ten core principles&lt;/strong&gt; proven to move agents from ideation into measurable enterprise impact. Read on to discover how to anchor initiatives in real processes, maintain scope discipline, connect agents to live input channels, enforce production-grade behavior from day one, integrate with mission-critical systems early, deliver in short iteration cycles, create lightweight review processes, commit to real usage within 30 days, use multiple small agents, and plan for long-term flexibility—transforming your AI investment from experimentation into sustainable value delivery.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Organizations often get stuck in endless prototyping cycles because they experiment broadly rather than execute narrowly. This guideline distills the core principles that move agents from ideation to measurable business impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Anchor the Initiative in a Single Real Process
&lt;/h2&gt;

&lt;p&gt;Begin by identifying a single operational workflow where your organization currently loses productive time on a recurring basis. This workflow should exhibit characteristics such as repetitive manual steps, rule-based decision logic, or intensive data manipulation. Avoid starting with abstract experimentation or exploratory prototypes that lack connection to actual business operations.&lt;/p&gt;

&lt;p&gt;The economic rationale for this approach is straightforward. When you &lt;strong&gt;ground your agent development in a concrete, existing process&lt;/strong&gt;, you force alignment with &lt;strong&gt;real data sources&lt;/strong&gt;, &lt;strong&gt;actual system dependencies&lt;/strong&gt;, and &lt;strong&gt;measurable business outcomes&lt;/strong&gt;. This concrete anchoring eliminates the disconnection that is characteristic of laboratory environments and proof-of-concept work, which often remain isolated from production constraints and real-world variability.&lt;/p&gt;

&lt;p&gt;To establish this baseline understanding, you should document the current state of the process by answering these questions. &lt;strong&gt;First, identify what data inputs currently exist and where they originate within your organization.&lt;/strong&gt; &lt;strong&gt;Second, determine which specific steps within the workflow consume the most human effort&lt;/strong&gt; and therefore represent the highest opportunity for efficiency gains. &lt;strong&gt;Third, establish what quantifiable outcome should improve as a result of agent implementation&lt;/strong&gt;, whether measured in terms of time savings per transaction, reduction in human errors, or increase in process throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Keep the First Agent Extremely Narrow
&lt;/h2&gt;

&lt;p&gt;The most decisive factor in moving beyond proof-of-concept phases is maintaining &lt;strong&gt;scope discipline&lt;/strong&gt;. Organizations frequently fail to operationalize agents because they attempt to expand functionality too broadly before achieving stable baseline performance in a narrow domain. This expansion pattern increases complexity exponentially while simultaneously distributing development resources across multiple problem dimensions.&lt;/p&gt;

&lt;p&gt;The essential discipline requires that you define the agent's responsibilities in a single, unambiguous sentence of the form: &lt;strong&gt;"This agent is responsible for X and nothing beyond X."&lt;/strong&gt; This constraint forces explicit trade-offs between capability breadth and implementation depth, ensuring that resources concentrate on achieving reliable performance in one well-defined function rather than fragmented performance across multiple functions.&lt;/p&gt;

&lt;p&gt;Consider these concrete examples of appropriately scoped initial deployments. An agent might be chartered to &lt;strong&gt;look up customer pricing from a master database and return one verified result&lt;/strong&gt;, without attempting to negotiate, modify, or recommend alternative pricing. Another agent might be restricted to &lt;strong&gt;extracting structured fields from incoming documents and validating them against schema requirements&lt;/strong&gt;, without attempting interpretation or applying business rules. A third agent might be limited to &lt;strong&gt;classifying incoming inquiries into exactly three predefined categories&lt;/strong&gt;, without attempting subcategories or fuzzy classifications.&lt;/p&gt;

&lt;p&gt;This discipline against over-engineering serves multiple economic functions. It reduces the surface area for defects, shortens the time to reach measurable operational impact, and simplifies the governance model for operating the agent in production environments. By deferring expansion until baseline performance is established, organizations create a foundation of operational reliability upon which additional capabilities can be layered incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Connect the Agent to a Real Input Channel Early
&lt;/h2&gt;

&lt;p&gt;Manual testing through a studio UI creates an isolated environment that does not reflect operational reality. The agent is evaluated against synthetic inputs, clean data structures, and predetermined response patterns—conditions that rarely occur in production systems. &lt;strong&gt;Real business value emerges only when the agent receives actual operational requests&lt;/strong&gt; that contain the variability, ambiguity, and edge cases inherent in genuine work.&lt;/p&gt;

&lt;p&gt;Operational input channels include the following mechanisms through which work currently flows into your organization: &lt;strong&gt;forwarded email messages&lt;/strong&gt; containing unstructured customer inquiries, &lt;strong&gt;Teams chat messages&lt;/strong&gt; that combine urgent questions with side conversations, &lt;strong&gt;CRM cases&lt;/strong&gt; that reference prior interactions and incomplete context, and &lt;strong&gt;uploaded documents&lt;/strong&gt; that may contain inconsistent formatting or missing required fields. Each of these channels introduces distinct data quality challenges and user expectations.&lt;/p&gt;

&lt;p&gt;The economic rationale for early channel integration stems from the principle of &lt;strong&gt;revealed preference through actual behavior&lt;/strong&gt;. When users interact with the agent through their existing workflow channels rather than a lab environment, their usage patterns reveal which capabilities create genuine value and which create friction. Synthetic testing cannot substitute for this behavioral signal. Furthermore, &lt;strong&gt;exposure to live variability from the first iteration accelerates learning&lt;/strong&gt; about edge cases and failure modes that would otherwise remain hidden until full production deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: Select one input channel that currently delivers the highest volume of work into your target process and route genuine operational requests through the agent beginning in the first development cycle. This approach ensures that early versions contend with real data distributions and authentic user patterns rather than idealized test scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Enforce Production‑Grade Behavior From Day One
&lt;/h2&gt;

&lt;p&gt;Development environments and production environments typically operate under fundamentally different constraints and enforcement mechanisms. In many organizations, agents developed during the proof-of-concept phase operate with suspended governance controls, synthetic data, and permissive access policies that would never be acceptable in operational systems. This separation creates a structural barrier to production adoption because moving an agent from a development environment to production then requires a complete redevelopment of its data connectors, compliance controls, and operational behaviors.&lt;/p&gt;

&lt;p&gt;The efficient approach eliminates this artificial separation by imposing production-grade constraints from the initial development phase. This requires the agent to &lt;strong&gt;use the actual data sources&lt;/strong&gt; employees rely on for their daily work, rather than sanitized copies or test databases. The agent must &lt;strong&gt;apply existing data access restrictions and compliance controls&lt;/strong&gt; that govern access to sensitive information within your organization, rather than running with elevated or unrestricted permissions. The agent must maintain &lt;strong&gt;consistent tone, content, and response handling&lt;/strong&gt; in line with organizational standards, rather than developing ad hoc response patterns during development that would later require modification. The agent must draw on &lt;strong&gt;approved knowledge sources&lt;/strong&gt; that align with organizational information governance policies, rather than accessing ad hoc files or unvetted external data.&lt;/p&gt;

&lt;p&gt;This approach reduces the economic cost of production deployment by eliminating the need to redesign and re-implement governance controls at transition time. Additionally, exposing the agent to genuine constraints during development accelerates the identification of edge cases and failure modes that would otherwise remain hidden until production deployment, when they would be far more costly to address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: Establish the expectation that the first version of the agent operates under the same governance framework and data access policies as the final production system. This mindset collapses the artificial gap between proof-of-concept development and production readiness.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Integrate With One Mission‑Critical System Early
&lt;/h2&gt;

&lt;p&gt;From an operational perspective, &lt;strong&gt;a proof-of-concept implementation that remains disconnected from your organization's core systems generates negligible business value&lt;/strong&gt; regardless of how well the agent performs in isolation. The critical transformation occurs when the agent &lt;strong&gt;gains the ability to read from or write to systems that directly affect your operational workflows&lt;/strong&gt;, such as customer relationship management platforms, enterprise resource planning systems, document management repositories, or human resources information systems. At that point, the agent transitions from a theoretical capability into a practical tool that produces measurable outcomes within your existing business processes.&lt;/p&gt;

&lt;p&gt;The economic principle underlying this requirement is straightforward: &lt;strong&gt;manual handoff steps between system boundaries represent a fundamental source of friction and delay&lt;/strong&gt;. When an agent completes its analysis but requires a human to manually transfer its output into another system, you have failed to eliminate the bottleneck that prompted the agent's development in the first place. Conversely, when the agent can &lt;strong&gt;directly query information from, or write results to, systems where decisions take effect&lt;/strong&gt;, the entire workflow collapses into a unified operational flow that removes intermediate steps.&lt;/p&gt;

&lt;p&gt;Your implementation approach should prioritize identifying which single system integration would eliminate the greatest volume of repetitive manual work, and then building only the minimal version of that integration during the initial development phase. This targeted approach might manifest as the agent &lt;strong&gt;querying a system to retrieve structured reference data&lt;/strong&gt; that previously required manual lookup, &lt;strong&gt;writing a record to a system to capture the decision the agent has reached&lt;/strong&gt;, &lt;strong&gt;extracting and processing a document that originated from a system's document repository&lt;/strong&gt;, or &lt;strong&gt;triggering an automated workflow in a system that would otherwise require manual initiation&lt;/strong&gt;. Even a single integration point of this magnitude, when implemented at the outset rather than deferred until later phases, serves as a forcing function that exposes the real constraints your agent must navigate within your organization's operational environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Deliver in Short Iteration Cycles
&lt;/h2&gt;

&lt;p&gt;Extended design phases pose a structural impediment to effective agent development by deferring real-world validation and prolonging the time horizon before measurable feedback becomes available. Organizations attempting comprehensive upfront design face two competing failures: either they design systems that do not align with operational reality once implemented, or they extend the pre-implementation phase so long that organizational priorities shift before deployment occurs. &lt;strong&gt;Agents improve most rapidly through short cycles of genuine operational usage&lt;/strong&gt; because each cycle generates concrete evidence of performance gaps and behavioral mismatches that cannot be anticipated through laboratory analysis alone.&lt;/p&gt;

&lt;p&gt;The organizational practice that supports this principle involves establishing a &lt;strong&gt;defined release rhythm of 7 to 10 days&lt;/strong&gt; as the baseline cadence for detecting problems, gathering behavioral feedback, and incorporating refinements. This rhythm creates a predictable organizational rhythm while ensuring sufficient time for both development work and operational assessment. Within this structured cycle, the work proceeds through sequential phases: during the initial week, the focus concentrates on delivering a &lt;strong&gt;working version of the agent that operates within the narrowly defined scope&lt;/strong&gt; established by principle two. During the second week, attention shifts toward &lt;strong&gt;integrating the agent with the single mission-critical system&lt;/strong&gt; identified during principle five, which forces the agent to operate under genuine operational constraints. During the third week, the team prioritizes incorporating feedback-driven refinements based on direct observations of how the agent performs under real operational patterns and edge cases. By the fourth week, the agent &lt;strong&gt;transitions into daily-use rotation&lt;/strong&gt;, becoming a standard component of the operational workflow rather than an experimental capability.&lt;/p&gt;

&lt;p&gt;This structured iteration discipline transforms theoretical value into &lt;strong&gt;practical, measurable improvements&lt;/strong&gt; by compressing the feedback loop between hypothesis and evidence to a manageable timeframe. Organizations that maintain shorter iteration cycles identify defects and misalignments exponentially faster than those that attempt extended design phases, resulting in a substantially faster path to production-grade performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Create a Lightweight Review and Quality Model
&lt;/h2&gt;

&lt;p&gt;Heavyweight governance structures—comprehensive architecture review boards, multi-stage approval processes, and extensive documentation requirements—impose transaction costs that delay feedback cycles and create organizational friction. These formal processes were designed for environments where deployment cycles measured months and the cost of errors remained relatively stable. Agent development operates under fundamentally different constraints: deployment cycles measure days, and the cost of a minor behavioral inconsistency in an agent can compound over hundreds of interactions before detection.&lt;/p&gt;

&lt;p&gt;The economic principle underlying lightweight review processes is that &lt;strong&gt;not all decisions require the same deliberative overhead&lt;/strong&gt;. Operational decisions about individual agent behaviors—how to handle edge cases, whether a response meets quality standards, or how the agent should escalate undefined requests—benefit from &lt;strong&gt;frequent, lightweight validation&lt;/strong&gt; rather than formal approval hierarchies. Conversely, decisions about expanding an agent's scope or integrating new system connectors require structured deliberation, but this deliberation should remain episodic rather than continuous.&lt;/p&gt;

&lt;p&gt;The practical implementation of this principle involves establishing separate review cadences calibrated to the urgency of decisions. &lt;strong&gt;Weekly operational reviews&lt;/strong&gt; should examine direct evidence of agent performance, specifically documented failures, observed edge cases that the agent failed to handle correctly, and user experience friction points that emerged during actual operational usage. These reviews operate without approval authority; they serve as diagnostic sessions that generate recommendations for refinement. &lt;strong&gt;Monthly functional expansion decisions&lt;/strong&gt; should convene stakeholder representatives to evaluate whether the agent's scope should be widened, which integration points to add next, or whether the agent should be split into multiple specialized agents. These decisions operate with explicit approval authority because scope decisions determine resource allocation for the subsequent month's development work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standard templates&lt;/strong&gt; for agent instructions, response formats, and escalation procedures ensure consistency across agents without requiring case-by-case review. A template encodes learned patterns from prior agent implementations into repeatable structures that new agents can adopt immediately, reducing both development time and the likelihood of behavioral inconsistencies between agents.&lt;/p&gt;

&lt;p&gt;This calibrated review model reduces unnecessary transaction costs while maintaining deliberative oversight for decisions that require it, so alignment occurs without creating a delivery bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Commit to Real Usage Within 30 Days
&lt;/h2&gt;

&lt;p&gt;The principle of time-bound value realization addresses a fundamental problem in agent adoption: organizations frequently defer the transition from development to operational deployment indefinitely, justifying continued laboratory work with incremental improvements that never add up to genuine business impact. The economic cost of this deferral compounds over time because development resources consumed during extended proof-of-concept phases represent opportunity costs that could have been deployed toward other organizational priorities.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;disciplined commitment to a fixed time horizon&lt;/strong&gt; solves this problem by establishing an explicit deadline for demonstrating measurable operational value. The specific timeframe of &lt;strong&gt;thirty days&lt;/strong&gt; aligns with the typical organizational planning cycle, allowing early agent deployment results to inform resource allocation decisions for the subsequent planning period. This timeframe is sufficiently compressed to prevent indefinite deferral while remaining realistic for narrowly scoped agents integrated with single system connectors.&lt;/p&gt;

&lt;p&gt;The operational rule is straightforward: &lt;strong&gt;if the agent is not delivering quantifiable value within 30 days of initial deployment to a real operational channel, the scope must be simplified rather than expanded. **&lt;/strong&gt; This is not a judgment of development competence but rather a signal that the current scope-to-resource ratio has become misaligned. Value delivery failure indicates that either the scope remains too broad to achieve stability within the available development effort, or the integration points do not connect to work patterns that generate sufficient transaction volume to demonstrate impact. In either case, the remedy is to &lt;strong&gt;reduce scope further rather than to invest additional effort in the current design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This discipline creates accountability structures that prevent laboratory research from consuming indefinite organizational resources while also forcing difficult conversations about scope alignment early in the adoption cycle, before significant resource commitments have been made.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Use Multiple Small Agents Instead of One Overloaded One
&lt;/h2&gt;

&lt;p&gt;As operational processes expand and additional requirements pile up, assigning more responsibilities to a single agent compounds performance issues and makes the governance framework needed to maintain operational consistency much harder to manage. Each additional responsibility you layer onto an existing agent increases the dimensionality of the state space the agent must navigate, exponentially expanding the set of edge cases and behavioral scenarios that must be designed for, tested against, and monitored in production.&lt;/p&gt;

&lt;p&gt;From an economic perspective, this multifaceted complexity imposes two distinct costs. First, &lt;strong&gt;development velocity decreases substantially&lt;/strong&gt; as the cognitive burden of managing interdependencies between distinct responsibilities grows. When an agent handles both classification and task execution, modifications to classification logic require careful analysis of how those changes cascade through task execution behavior. Second, &lt;strong&gt;operational failure modes become increasingly difficult to isolate and remediate&lt;/strong&gt; because a performance problem observed at the system boundary may originate from any of several distinct layers of responsibility.&lt;/p&gt;

&lt;p&gt;The principle of agent specialization addresses this problem by establishing the &lt;strong&gt;discipline of splitting responsibilities across multiple focused agents&lt;/strong&gt; as operational scope expands. Rather than expanding a single agent to handle routing decisions, classification decisions, domain-specific task execution, and document processing in sequence, you would instead deploy four distinct agents, each responsible for a single function. The &lt;strong&gt;routing agent&lt;/strong&gt; receives incoming work and determines which specialized agent should handle the request. The &lt;strong&gt;classification agent&lt;/strong&gt; processes the routed work and assigns it to the appropriate category within a predefined taxonomy. The &lt;strong&gt;domain-specific task agent&lt;/strong&gt; performs the operational work within that category, calling back-end systems and generating results. The &lt;strong&gt;document processing agent&lt;/strong&gt; extracts structured information from unstructured documents and prepares it for downstream task agents.&lt;/p&gt;

&lt;p&gt;This decomposition yields multiple benefits that justify the additional engineering required to orchestrate multiple agents. Small, &lt;strong&gt;specialized agents reach production stability faster&lt;/strong&gt; because each agent operates within a constrained state space with fewer edge-case combinations. &lt;strong&gt;Governance remains explicit and traceable&lt;/strong&gt; because each agent has a single defined responsibility, making it straightforward to document expected behavior and audit actual behavior against that standard. &lt;strong&gt;Failure isolation becomes tractable&lt;/strong&gt; because a performance degradation can be attributed to a specific agent component rather than requiring analysis across all bundled responsibilities. When a specific agent begins exhibiting unexpected behavior, the blast radius of potential impact remains constrained to the specific function that agent performs, rather than cascading through multiple dependent responsibilities.&lt;/p&gt;

&lt;p&gt;Over extended operational timelines, this modular architecture provides additional economic value through &lt;strong&gt;reduced cost of capability evolution&lt;/strong&gt;. When organizational requirements change, you can modify or replace a single specialized agent without requiring a redesign of the entire set of responsibilities. This flexibility allows organizations to adapt their agent ecosystem as operational priorities change.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Plan for Long‑Term Flexibility
&lt;/h2&gt;

&lt;p&gt;Long-term organizational success with agent systems depends on architectural decisions that preserve future optionality without imposing excessive upfront complexity. Adoption frameworks and industry analysis show that organizations with modular architectures, rather than monolithic designs, have significantly lower total cost of ownership over multi-year operational timelines. The economic principle underlying this requirement is that &lt;strong&gt;modular systems distribute change costs across smaller component boundaries&lt;/strong&gt;, whereas monolithic systems concentrate change costs across tightly coupled dependencies.&lt;/p&gt;

&lt;p&gt;Your agent architecture should prioritize &lt;strong&gt;flexibility in integrating capabilities&lt;/strong&gt; by establishing well-defined interfaces between agents and external systems, rather than embedding system-specific logic directly into agent instructions or prompts. This approach means that when your organization adopts a new CRM platform or replaces a document management system, you can update the system integration layer without requiring redesign of agent behavior specifications. Additionally, the architecture should remain &lt;strong&gt;protocol-driven&lt;/strong&gt;, meaning that agents communicate with each other and with external systems through standardized APIs and message formats rather than through proprietary connectors. This discipline ensures that as your organization's technology infrastructure evolves, your agent ecosystem can adapt without requiring wholesale redevelopment.&lt;/p&gt;

&lt;p&gt;The practical implication of this principle is that your initial agent deployment should &lt;strong&gt;incorporate extensibility patterns from the outset&lt;/strong&gt; rather than deferring architectural considerations until later phases. When you define how an agent accesses your customer database, design that access pattern to accommodate a future change in the database platform without requiring modifications to the agent's core logic. When you establish how agents communicate with business systems, use standardized protocols and well-documented interfaces that would allow additional agents to access those same systems without requiring new connector development. This forward-looking engineering discipline imposes modest additional design effort during initial implementation but eliminates expensive rearchitecting work later as organizational requirements evolve and technology infrastructure changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: The Fast‑Path to Production
&lt;/h2&gt;

&lt;p&gt;Organizations that successfully transition agents from proof-of-concept phases into sustained operational deployment share a consistent pattern of implementation discipline. These ten principles represent a synthesis of organizational practices that have demonstrated measurable results across diverse operational contexts.&lt;/p&gt;

&lt;p&gt;The foundational requirement is to &lt;strong&gt;anchor agent development in a specific, real operational process&lt;/strong&gt; rather than pursue abstract experimentation. This grounding in actual business workflows ensures that agent capabilities connect directly to measurable organizational problems. Building on this foundation, &lt;strong&gt;maintaining scope discipline through narrowly defined initial agent responsibilities&lt;/strong&gt; creates the conditions for rapid stabilization and early demonstration of operational value. The agent should then &lt;strong&gt;receive genuine operational input&lt;/strong&gt; through the channels where work currently flows into the organization, exposing the agent to real data distributions and authentic user behaviors from the initial development phase.&lt;/p&gt;

&lt;p&gt;Throughout the development cycle, &lt;strong&gt;applying production-grade governance controls, data access policies, and behavioral standards from day one&lt;/strong&gt; eliminates the artificial gap between development and production environments. Simultaneously, &lt;strong&gt;integrating with at least one mission-critical system early in the development process&lt;/strong&gt; forces the agent to operate under genuine operational constraints rather than remaining isolated in a laboratory environment. The development methodology should employ &lt;strong&gt;short iteration cycles measured in weeks rather than months&lt;/strong&gt;, which compresses the feedback loop between hypothesis and evidence, enabling rapid identification of misalignments between designed behavior and operational reality.&lt;/p&gt;

&lt;p&gt;Supporting this development rhythm requires establishing &lt;strong&gt;lightweight review processes calibrated to the urgency of decisions&lt;/strong&gt;, and separating continuous operational assessments from episodic capability expansion decisions. Organizations must enforce &lt;strong&gt;time-bound value realization through a commitment to deliver measurable operational results within thirty days&lt;/strong&gt;, which prevents indefinite deferral of production deployment and forces disciplined conversations about scope alignment. As operational requirements expand, maintaining &lt;strong&gt;modular architectures that distribute capabilities across multiple specialized agents&lt;/strong&gt; rather than accumulating responsibilities within single agents preserves development velocity and simplifies operational governance. Finally, &lt;strong&gt;planning for long-term flexibility through well-defined interfaces and standardized protocols&lt;/strong&gt; enables the agent ecosystem to adapt as organizational technology infrastructure and business requirements evolve.&lt;/p&gt;

&lt;p&gt;These principles work together to create implementation patterns that compress the transition from conception to the delivery of operational value.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>adoption</category>
      <category>aifoundry</category>
    </item>
    <item>
      <title>Microsoft 365 E7: Why Microsoft's New License Is a Logical Step for Agent‑Driven Enterprises</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 14 Mar 2026 08:13:21 +0000</pubDate>
      <link>https://forem.com/holgerimbery/microsoft-365-e7-why-microsofts-new-license-is-a-logical-step-for-agent-driven-enterprises-46gp</link>
      <guid>https://forem.com/holgerimbery/microsoft-365-e7-why-microsofts-new-license-is-a-logical-step-for-agent-driven-enterprises-46gp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Microsoft's announcement of Microsoft 365 E7 in March 2026 marks a watershed moment in enterprise technology strategy. For the first time in over a decade, Microsoft introduced a new top-tier enterprise license—not to add incremental features, but to fundamentally reconceptualize how organizations govern both human workers and autonomous AI agents as integrated components of the workforce. At $99 per user per month, E7 bundles Microsoft 365 E5, Microsoft 365 Copilot Wave 3 with agentic capabilities, the Microsoft Entra Suite, and the newly introduced Agent 365 control plane. This consolidation signals that AI agents have transitioned from experimental pilots to production-grade organizational resources requiring enterprise-grade identity, access, compliance, and auditability frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why You Should Read This&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you lead enterprise technology strategy, manage cloud infrastructure, evaluate AI adoption roadmaps, or determine software licensing budgets, E7 represents a critical inflection point in how enterprises will architect their IT operating models over the next five years. This article explains not just what E7 includes, but why Microsoft built it—addressing the architectural gaps E5 left exposed as organizations scale agent deployment from hundreds of thousands to tens of millions of instances. You'll understand the economic logic, the governance infrastructure, and the strategic positioning underlying this licensing evolution, enabling you to make informed decisions about whether E7 aligns with your organization's agent deployment trajectory and control requirements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction: From Productivity Suites to Agent Platforms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Historical Context and Evolution
&lt;/h3&gt;

&lt;p&gt;Microsoft's enterprise licensing strategy has traditionally centered on supporting productivity and organizational efficiency through cloud services and security infrastructure. The Microsoft 365 E5 tier, introduced in 2015, represented the established enterprise standard, designed to address the comprehensive security, compliance, productivity, and governance requirements of large organizations during the cloud adoption phase. For the next 11 years, E5 served as the highest-tier enterprise licensing option within the Microsoft 365 portfolio.&lt;/p&gt;

&lt;h3&gt;
  
  
  The March 2026 Announcement
&lt;/h3&gt;

&lt;p&gt;On 9 March 2026, Microsoft announced the availability of Microsoft 365 E7, designated as the Frontier Suite, representing the first introduction of a new top‑tier enterprise license since the E5 tier was originally established in 2015. This announcement signals a deliberate architectural evolution in how Microsoft structures enterprise licensing and organizational governance at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Composition and Technical Structure
&lt;/h3&gt;

&lt;p&gt;The Microsoft 365 E7 offering, priced at $99 per user per month, consolidates multiple previously distinct components into a unified licensing structure. This bundled approach encompasses Microsoft 365 E5, Microsoft 365 Copilot, the complete Microsoft Entra Suite, and the newly introduced Agent 365 control plane. Each component addresses specific operational and governance requirements within the modern enterprise technology infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fundamental Architectural Shift
&lt;/h3&gt;

&lt;p&gt;The introduction of E7 should not be interpreted as a simple price adjustment or repackaging of existing capabilities. Rather, E7 represents a substantive architectural shift in Microsoft's strategic positioning and technical philosophy. Microsoft is fundamentally repositioning Microsoft 365 from a platform optimized for human-centric productivity and security to a comprehensive control plane that manages and governs an integrated, mixed workforce comprising both human workers and autonomous artificial intelligence agents. This transition reflects evolving organizational requirements as enterprises move beyond pilot implementations of AI technologies toward systematic, organization-wide agent deployment at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Microsoft 365 E7 Actually Includes
&lt;/h2&gt;

&lt;p&gt;Microsoft 365 E7 consolidates four distinct product components, previously available as separate subscription offerings, into a unified licensing structure. This architectural consolidation reflects Microsoft's strategic decision to bundle interdependent capabilities that are increasingly required for enterprise-scale deployment of autonomous AI agents. The following sections provide detailed technical specifications for each included component.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft 365 E5: Foundation Layer
&lt;/h3&gt;

&lt;p&gt;Microsoft 365 E5 is the foundational component of the E7 licensing tier, providing core productivity, compliance, security, and identity management capabilities. These capabilities encompass the complete Microsoft Office productivity suite, including Exchange Online for messaging infrastructure, SharePoint Online for content management and collaboration, Teams for unified communications, OneDrive for business cloud storage, Microsoft Defender for comprehensive threat protection, Microsoft Intune for mobile and device management, Microsoft Purview for data governance and compliance, and Power BI Pro for business analytics and visualization. These capabilities provide the fundamental infrastructure required for enterprise productivity, data protection, and organizational governance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft 365 Copilot (Wave 3): Advanced AI Integration
&lt;/h3&gt;

&lt;p&gt;The E7 tier includes Microsoft 365 Copilot at the Wave 3 release level, which represents a significant evolution in AI integration across the Microsoft 365 application portfolio. Copilot is embedded across Microsoft Word, Excel, PowerPoint, Outlook, Teams, and the Loop workspace collaboration platform. Beyond traditional copilot assistance functions, Wave 3 introduces expanded agentic capabilities that enable autonomous planning, decision-making, and action execution. Additionally, Wave 3 extends multi-model support to integrate with multiple language model providers, specifically OpenAI and Anthropic Claude, providing organizations with flexibility in selecting the underlying AI model infrastructure based on specific organizational requirements, performance characteristics, or policy constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft Entra Suite: Identity and Access Management
&lt;/h3&gt;

&lt;p&gt;The E7 offering includes the complete Microsoft Entra Suite, an expanded product tier that goes beyond the standard Entra ID P2 offering. The Entra Suite encompasses advanced identity verification, comprehensive access governance frameworks, Zero Trust network access architecture for conditional connectivity, and sophisticated conditional access policy enforcement mechanisms. These capabilities provide an enterprise-grade identity management and access control infrastructure necessary to manage both human and non-human (agent-based) organizational identities within a unified framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 365: Governance and Control Infrastructure
&lt;/h3&gt;

&lt;p&gt;Agent 365 represents a newly introduced governance and security layer specifically designed to manage autonomous AI agents at an organizational scale. Agent 365 provides centralized inventory tracking across both Microsoft-native and third-party AI agent frameworks; comprehensive observability and monitoring capabilities; policy enforcement mechanisms specific to agent behavior and resource utilization; and lifecycle management functionality, including agent provisioning, update orchestration, and controlled retirement procedures. This component addresses the operational requirement for centralized governance of non-human autonomous entities executing within enterprise systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Economic Analysis and Bundle Composition
&lt;/h3&gt;

&lt;p&gt;When these four components are purchased individually through separate licensing arrangements after July 2026, the aggregate monthly cost per user is projected to be approximately $117 USD. The E7 consolidation bundle is offered at $99 USD per user per month, representing an aggregate cost reduction of approximately 15–17% when compared to the sum of individually purchased components. This pricing structure reflects both the operational efficiency gains from unified licensing administration and Microsoft's strategic intent to incentivize adoption of the consolidated governance framework for agent deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why E7 Exists: E5 Was Built for the Cloud Era, Not the Agentic Era
&lt;/h2&gt;

&lt;p&gt;Microsoft executives have been explicit that E5 was designed "pre‑agentic".&lt;br&gt;&lt;br&gt;
E5 assumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;humans are the primary actors,&lt;/li&gt;
&lt;li&gt;automation is largely scripted,&lt;/li&gt;
&lt;li&gt;identities map cleanly to employees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern enterprises increasingly violate all three assumptions.  &lt;/p&gt;

&lt;p&gt;AI agents today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;act autonomously,&lt;/li&gt;
&lt;li&gt;access mailboxes, calendars, files, and APIs,&lt;/li&gt;
&lt;li&gt;execute multi‑step workflows over time,&lt;/li&gt;
&lt;li&gt;are often created outside central IT using low‑code or no‑code tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Agent 365: The Missing Control Plane Enterprises Have Been Lacking
&lt;/h2&gt;

&lt;p&gt;Agent 365 is the genuinely new element in E7, and the main reason E7 is more than a repackaged bundle.&lt;br&gt;
Agent 365 provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized agent inventory across Microsoft and third‑party frameworks&lt;/li&gt;
&lt;li&gt;Identity and access controls via Entra&lt;/li&gt;
&lt;li&gt;Security monitoring via Defender XDR&lt;/li&gt;
&lt;li&gt;Compliance and auditability via Purview&lt;/li&gt;
&lt;li&gt;Lifecycle management (provisioning, update, retirement)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially, Agent 365 does not build or host agents. It governs them.&lt;br&gt;
Compute and execution remain consumption‑based via Copilot Studio, Azure AI Foundry, or partner platforms.&lt;br&gt;
This mirrors how enterprises already separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application development,&lt;/li&gt;
&lt;li&gt;runtime infrastructure,&lt;/li&gt;
&lt;li&gt;identity and governance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why E7 Is a Good Move for Enterprises Leveraging Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It Normalizes Agents as Enterprise Identities
&lt;/h3&gt;

&lt;p&gt;Microsoft is treating agents as digital workers, subject to the same identity, access, and policy frameworks as humans.&lt;br&gt;
This is a necessary prerequisite for scaling agents beyond experimentation. &lt;/p&gt;

&lt;h3&gt;
  
  
  It Reduces Architectural Fragmentation
&lt;/h3&gt;

&lt;p&gt;Prior to E7, organizations had to stitch together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;E5,&lt;/li&gt;
&lt;li&gt;Copilot add‑ons,&lt;/li&gt;
&lt;li&gt;Entra extensions,&lt;/li&gt;
&lt;li&gt;emerging agent governance tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;E7 consolidates these into a single, coherent enterprise architecture aligned with Zero Trust principles. &lt;/p&gt;

&lt;h3&gt;
  
  
  It Shifts AI from "Assistance" to "Execution".
&lt;/h3&gt;

&lt;p&gt;Wave 3 of Copilot introduces agentic capabilities that plan, act, and execute, not just summarize or draft.&lt;br&gt;
E7 provides the governance layer required to allow that execution safely. &lt;br&gt;
Without E7‑level controls, many organizations would be forced to block these capabilities entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  It Aligns Cost Models with Reality
&lt;/h3&gt;

&lt;p&gt;While $99 per user appears high, E7 reflects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the true cost of enterprise security,&lt;/li&gt;
&lt;li&gt;identity governance for both humans and agents,&lt;/li&gt;
&lt;li&gt;reduced overhead compared to managing multiple SKUs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Importantly, Microsoft is signaling that agents will be licensed like users, potentially with hybrid subscription and consumption models over time. &lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft 365 Enterprise Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  E5 vs E5 + Copilot vs E7 (Frontier Suite)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Status (March 2026)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft 365 E7 GA: May 1, 2026 &lt;/li&gt;
&lt;li&gt;Pricing shown is list price (USD, per user/month) &lt;/li&gt;
&lt;li&gt;Consumption costs for building/running agents are not included in any plan&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;E5&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;E5 + Copilot&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;E7 (Frontier Suite)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What’s included&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full Microsoft 365 E5 (Office apps, Exchange/SharePoint/OneDrive, Teams*, Defender, Intune, Purview, Power BI Pro). No Copilot, no Agent 365, and no full Entra Suite.&lt;/td&gt;
&lt;td&gt;E5 (as left) &lt;strong&gt;plus&lt;/strong&gt; Microsoft 365 Copilot add‑on. No Agent 365, no full Entra Suite.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;E5 + Copilot + Entra Suite + Agent 365&lt;/strong&gt; in one SKU; positioned as the “Frontier Suite” for agent‑at‑scale scenarios.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;List price (USD/user/month)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$60&lt;/strong&gt; (from July 1, 2026).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$90&lt;/strong&gt; (E5 $60 + Copilot $30).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$99&lt;/strong&gt; (bundle). With Teams‑excluded option reported at &lt;strong&gt;$90.45&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot (Wave 3) agentic capabilities&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not included.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included&lt;/strong&gt; (via add‑on). Multi‑model (OpenAI + Anthropic) support arrives with Wave 3.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included by default&lt;/strong&gt; with Wave 3 agentic features (planning, acting across Microsoft 365).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent 365 (agent governance &amp;amp; control plane)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not included.&lt;/td&gt;
&lt;td&gt;Not included by default (can be added at &lt;strong&gt;$15&lt;/strong&gt;/user/month).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included&lt;/strong&gt;; GA on &lt;strong&gt;May 1, 2026&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Entra Suite&lt;/strong&gt; (beyond Entra ID P2)&lt;/td&gt;
&lt;td&gt;Not included (E5 includes Entra ID P2 but not the broader Entra Suite).&lt;/td&gt;
&lt;td&gt;Not included.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included&lt;/strong&gt; (e.g., Private Access, Internet Access, ID Governance/Protection, Verified ID).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security &amp;amp; compliance posture for agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Human‑centric controls only; no unified agent inventory/observability.&lt;/td&gt;
&lt;td&gt;Adds creation/use of Copilot agents but &lt;strong&gt;without&lt;/strong&gt; centralized agent governance unless Agent 365 is added.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Unified&lt;/strong&gt; agent inventory, policy enforcement, auditability across Defender/Entra/Purview.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bundle economics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Baseline plan.&lt;/td&gt;
&lt;td&gt;A la carte add‑on model. &lt;strong&gt;E5 + Copilot&lt;/strong&gt; = ~$90. Adding &lt;strong&gt;Entra Suite&lt;/strong&gt; (+$12) and &lt;strong&gt;Agent 365&lt;/strong&gt; (+$15) pushes to &lt;strong&gt;~$117&lt;/strong&gt;.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~$99&lt;/strong&gt; vs &lt;strong&gt;~$117&lt;/strong&gt; à la carte → &lt;strong&gt;~15–17% discount&lt;/strong&gt;; simpler procurement/governance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Availability date&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available now.&lt;/td&gt;
&lt;td&gt;Available now (Copilot GA prior to E7; Wave 3 rolling out).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;GA May 1, 2026&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Who it fits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Organizations prioritizing core productivity/security without near‑term agent scale‑out.&lt;/td&gt;
&lt;td&gt;Teams piloting Copilot or limited agentic use cases, willing to bolt on governance later.&lt;/td&gt;
&lt;td&gt;Enterprises &lt;strong&gt;standardizing&lt;/strong&gt; on agents (cross‑department), requiring &lt;strong&gt;identity‑first&lt;/strong&gt; governance, zero‑trust access, and consolidated risk controls.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Teams availability depends on regional licensing rules; E7 is also offered as a *&lt;/em&gt;“without Teams”** SKU in some regions.*&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent execution costs&lt;/strong&gt; (LLM tokens, orchestration runtime, long‑running workflows) are &lt;strong&gt;not included&lt;/strong&gt; in any license and must be budgeted separately.&lt;/li&gt;
&lt;li&gt;E7 is the &lt;strong&gt;first Microsoft 365 SKU designed explicitly for the agentic AI era&lt;/strong&gt;, not just productivity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Important Caveat: E7 Is Not the Full Cost of an Agentic Enterprise
&lt;/h2&gt;

&lt;p&gt;E7 does not include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent execution compute,&lt;/li&gt;
&lt;li&gt;LLM consumption,&lt;/li&gt;
&lt;li&gt;orchestration runtime costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These remain variable and are billed separately via Copilot Studio, Azure AI Foundry, or partner services.&lt;/p&gt;

&lt;p&gt;Enterprises should therefore view E7 as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the governance and control foundation, not the entire AI budget.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion: E7 as an Architectural Statement
&lt;/h2&gt;

&lt;p&gt;Microsoft 365 E7 represents a significant shift in how enterprises conceptualize their licensing strategy and operational architecture. Rather than functioning primarily as a licensing vehicle, E7 serves as a declaration that Microsoft 365 is positioned as the foundational operating system for enterprises that intend to operate within an agentic computing paradigm.&lt;/p&gt;

&lt;p&gt;For organizations that anticipate the need to execute a comprehensive deployment strategy involving autonomous AI agents across multiple business functions, integrate these agents into mission-critical business processes, and maintain the requisite levels of security governance, compliance attestation, and auditability, the scope and depth of capabilities provided by E7 are not superfluous. Instead, these capabilities represent structural and architectural necessities that must be addressed to enable safe and controlled agent deployment at an organizational scale.&lt;/p&gt;

&lt;p&gt;The evolution from E5 to E7 reflects a fundamental recalibration of enterprise platform design philosophy. The E5 licensing tier was optimized and engineered for the cloud-centric era, where enterprises sought to modernize their infrastructure, data management, and collaboration mechanisms through cloud-native services. E7, by contrast, is optimized for an organizational context in which the workforce composition includes both human workers and autonomous AI agents, each requiring appropriate identity governance, access controls, security monitoring, and compliance instrumentation within an integrated control plane.&lt;/p&gt;

&lt;p&gt;This architectural shift acknowledges that managing agents as first-class organizational entities—rather than as peripheral or experimental capabilities—requires the same level of systematic governance, policy enforcement, and observability that enterprises have come to expect from their core identity and security infrastructure.&lt;/p&gt;

</description>
      <category>microsoft365</category>
    </item>
    <item>
      <title>Introducing MATE: A Modular Testing Environment for AI Agents</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 07 Mar 2026 08:07:02 +0000</pubDate>
      <link>https://forem.com/holgerimbery/introducing-mate-a-modular-testing-environment-for-ai-agents-576l</link>
      <guid>https://forem.com/holgerimbery/introducing-mate-a-modular-testing-environment-for-ai-agents-576l</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
As AI agents become integral to business processes, reliable and repeatable testing is essential for confidence in deployment. This article introduces the &lt;strong&gt;Multi-Agent Test Environment (MATE)&lt;/strong&gt; – an enterprise-grade framework for automated testing of AI agents across platforms and frameworks – and explains how its modular design addresses key challenges in agent testing. We explore why testing AI agents is critical, delve into MATE's architecture and features, compare MATE with alternative testing approaches, and outline MATE's roadmap including red-team testing, enhanced cloud deployment, and support for emerging agent frameworks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Importance of Testing AI Agents
&lt;/h2&gt;

&lt;p&gt;AI &lt;strong&gt;agents&lt;/strong&gt; built with Microsoft Copilot Studio are powerful but complex systems. They combine &lt;strong&gt;natural language understanding, generative AI, and business logic&lt;/strong&gt;, often operating in critical scenarios (customer support, data retrieval, workflow automation, etc.). Ensuring these agents work correctly and safely under diverse conditions is as important as testing traditional software – if not more so. Key reasons why rigorous agent testing is essential include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reliability and Consistency:&lt;/strong&gt; Unlike deterministic software, AI agents can produce different answers to the same question due to their probabilistic nature. Without structured tests, one might only catch issues by &lt;strong&gt;manually typing questions and hoping for the right answer&lt;/strong&gt;, a fragile approach. Automated testing provides consistency – the same test can be run repeatedly to ensure the agent’s behavior remains reliable after updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-Grade Quality:&lt;/strong&gt; In enterprise deployments, an untested agent can lead to incorrect or even &lt;strong&gt;unsafe outputs&lt;/strong&gt;, damaging user trust or violating compliance. Ad-hoc testing that “relies on intuition instead of structured testing” &lt;strong&gt;doesn’t scale&lt;/strong&gt; for enterprise needs. Organizations require &lt;strong&gt;repeatable, at-scale test processes&lt;/strong&gt; to validate that agents meet quality standards (accuracy, relevance, safety) consistently before and after release.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex Multi-turn Interactions:&lt;/strong&gt; Copilot Studio agents often handle &lt;strong&gt;multi-turn conversations&lt;/strong&gt;, maintaining context across multiple user and agent turns. Testing these multi-step dialogues manually is time-consuming and error-prone. Automated test suites allow developers to simulate complex conversation flows (with varying user inputs, branching dialogs, tool invocations, etc.) and verify the end-to-end behavior in one run. This ensures that the agent can handle &lt;strong&gt;scenario-based conversations&lt;/strong&gt; robustly, from greeting to task completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nondeterministic and Generative Responses:&lt;/strong&gt; When agents use &lt;strong&gt;generative AI capabilities&lt;/strong&gt;, they might produce creative or unexpected phrasing. Verifying such responses is not as simple as exact string matching. Effective testing must evaluate responses on &lt;strong&gt;semantic correctness, completeness, and compliance&lt;/strong&gt;, even if wording varies. This introduces a challenge: how do you automatically judge an AI-generated answer’s quality? We’ll see how MATE tackles this with an &lt;strong&gt;AI-based “judge”&lt;/strong&gt; component.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent Updates and Continuous Integration:&lt;/strong&gt; Agents are rarely static – their underlying &lt;strong&gt;prompts, skills, and knowledge sources&lt;/strong&gt; evolve. Without automation, re-testing the agent after each change or on a schedule (for example, to catch drift or regressions) would be prohibitively labor-intensive. A good agent testing framework enables &lt;strong&gt;continuous integration (CI)&lt;/strong&gt; pipelines and nightly runs, so that any breaking change or quality degradation is caught early. This is crucial for &lt;strong&gt;scaling up&lt;/strong&gt; the number of agents in production while keeping maintenance overhead low.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency and Debugging:&lt;/strong&gt; When a test fails, developers need insights into &lt;em&gt;why&lt;/em&gt;. For example, did the agent retrieve the wrong data because of an intent misclassification? Or did it produce a partially correct answer that was marked as a failure due to a strict check? Good testing tools provide &lt;strong&gt;detailed reporting&lt;/strong&gt; – conversation transcripts, logs, and metrics – to help pinpoint the root cause of failures. This accelerates debugging and &lt;strong&gt;continuous improvement&lt;/strong&gt; of the agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary, robust testing of agents is the linchpin for &lt;strong&gt;trustworthy AI deployments&lt;/strong&gt;. It allows teams to &lt;strong&gt;validate functionality, accuracy, robustness, and safety&lt;/strong&gt; in a systematic way. This need has driven Microsoft to introduce solutions like the &lt;strong&gt;Power CAT Copilot Studio Kit&lt;/strong&gt; (a Power Platform solution for agent testing) and, more recently, the &lt;strong&gt;built-in Agent Evaluation&lt;/strong&gt; feature in Copilot Studio (now in preview). However, each solution comes with certain limitations or prerequisites, which the new &lt;strong&gt;MATE&lt;/strong&gt; aims to overcome. Before comparing approaches, let’s first introduce MATE and how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing MATE: A Modular Testing Framework for AI Agents
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Link to &lt;a href="https://github.com/holgerimbery/mate" rel="noopener noreferrer"&gt;MATE GitHub Repository&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Link to &lt;a href="https://github.com/holgerimbery/mate/wiki" rel="noopener noreferrer"&gt;MATE Wiki&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Multi-Agent Test Environment (MATE)&lt;/strong&gt; is an internal project and framework designed to provide &lt;strong&gt;automated, comprehensive testing&lt;/strong&gt; for AI agents, initially focusing on Microsoft Copilot Studio agents. MATE was created to address the challenges above by combining &lt;strong&gt;enterprise-grade tooling with a modular, extensible architecture&lt;/strong&gt;. In essence, MATE allows developers and testers to &lt;strong&gt;connect to a running Copilot Studio agent, simulate conversations, evaluate the agent’s responses against expected outcomes using AI, and produce detailed metrics and reports&lt;/strong&gt; – all in an automated fashion.&lt;/p&gt;



&lt;p&gt;MATE’s approach can be seen as bringing many of the benefits of the Copilot Studio Kit into a &lt;strong&gt;single, code-first testing environment&lt;/strong&gt;. Rather than a Power App solution, MATE is a &lt;strong&gt;pure .NET 9&lt;/strong&gt; application  that you can run in a container stack. This design choice means MATE operates outside the constraints of the Power Platform, giving developers more flexibility in how and where they run their tests.&lt;/p&gt;

&lt;p&gt;Let’s break down how MATE works and how it addresses key testing challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct Line Integration (Live Agent Testing):&lt;/strong&gt; MATE connects to the agent through its Direct Line API endpoint. This is the same interface used by real chat channels (like Teams or a custom web chat). By using Direct Line, MATE ensures it's testing the &lt;em&gt;deployed agent exactly as end-users experience it&lt;/em&gt;. The tool can send a sequence of user messages and receive the agent’s replies in turn, thereby automating full &lt;strong&gt;multi-turn conversations&lt;/strong&gt;. This addresses the challenge of multi-turn flows by allowing complex scenario scripts to be executed automatically. It’s effectively like an automated “test chat” but running dozens or hundreds of predefined conversations unattended.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Case Definition and Multi-turn Flows:&lt;/strong&gt; In MATE, you can define a &lt;strong&gt;test case&lt;/strong&gt; with multiple steps of user input (representing a conversation) and the expected outcomes. Expected outcomes can include:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expected Intent&lt;/strong&gt; and &lt;strong&gt;Entities&lt;/strong&gt; – i.e., which topic or action the agent should trigger and which key data (entities) it should extract.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptance Criteria&lt;/strong&gt; – specific conditions that constitute a pass/fail for the test (for example, certain keywords must appear in the answer, or a certain API call must be made).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference Answer&lt;/strong&gt; – an ideal answer text or outline for comparison.
Each test case can be labeled with a priority or category, useful for organizing large test suites (e.g., “P1 critical flows”, “Edge cases”, etc.). By supporting multi-step conversations in test cases, MATE ensures you can test end-to-end agent behavior, not just isolated single-turn Q&amp;amp;A.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;“Model-as-a-Judge” Evaluations:&lt;/strong&gt; One of MATE’s most powerful features is using an AI model to evaluate the quality of the agent’s response. Rather than relying only on hard-coded checks (exact matches or simple contained keywords), MATE sends the agent’s answer along with the reference answer and validation criteria to a &lt;strong&gt;Large Language Model (LLM)&lt;/strong&gt; – for instance, an &lt;strong&gt;Azure OpenAI GPT-4 model&lt;/strong&gt; – which acts as an impartial &lt;em&gt;judge&lt;/em&gt;. This &lt;strong&gt;AI Judge&lt;/strong&gt; scores the response across multiple &lt;strong&gt;evaluation dimensions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Task Success:&lt;/em&gt; Did the agent fulfill the user’s request or solve the user’s problem?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Intent Match:&lt;/em&gt; Did the agent correctly understand what the user was asking for?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Factuality:&lt;/em&gt; Is the information provided true and accurate (no hallucinations or incorrect data)?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Helpfulness/Completeness:&lt;/em&gt; Is the answer complete, well-structured, and does it address the user’s need effectively?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Safety/Compliance:&lt;/em&gt; Does the response avoid policy violations (no sensitive data exposure, no disallowed content)?
Each of these dimensions is scored (e.g. 0.0 to 1.0), and MATE can apply &lt;strong&gt;configurable weightings&lt;/strong&gt; to decide if a test passes or fails overall. For example, you may require 0.9+ on Task Success and Intent Match, tolerate a lower score on style metrics like Helpfulness, and demand a perfect score on Safety. This approach directly tackles the challenge of evaluating &lt;em&gt;nondeterministic generative answers&lt;/em&gt;: even if the agent’s wording differs from the expected answer, the AI Judge can still determine that the answer is essentially correct and useful. Conversely, if the agent’s response is irrelevant or contains errors, the AI Judge will assign low scores, causing the test to fail. This method provides a nuanced, context-aware evaluation that traditional automated tests struggle to achieve. (Internally, the AI Judge uses prompt-based prompting of an LLM with the expected answer or criteria to get these scores.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;In Addition, MATE also supports other judge types, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;RubricsJudge&lt;/strong&gt; – A fully deterministic judge that evaluates responses using explicit rules such as Contains, NotContains, and Regex, making it ideal for compliance, safety, and reproducible pass/fail checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HybridJudge&lt;/strong&gt; – A cost‑efficient combination judge that first gates responses with deterministic rubrics and then applies an LLM for deeper qualitative scoring only where needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CopilotStudioJudge&lt;/strong&gt; – A Copilot‑Studio‑specific LLM judge that is citation‑ and grounding‑aware, aligning evaluations with Copilot Studio’s default reasoning and response patterns:&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenericJudge&lt;/strong&gt; – A lightweight, zero‑cost judge based on simple keyword and regex matching, intended for fast smoke tests and offline or CI scenarios
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Automated Test Generation from Documentation:&lt;/strong&gt; Authoring a comprehensive set of test cases can be labor-intensive. MATE addresses this by allowing you to &lt;strong&gt;upload documents (PDFs or text files)&lt;/strong&gt; that are relevant to your agent’s domain or knowledge base. It then automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Extracts&lt;/strong&gt; text content from the documents (using a PDF parser).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexes&lt;/strong&gt; and &lt;strong&gt;chunks&lt;/strong&gt; the content for semantic analysis (using a Lucene-based index).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates&lt;/strong&gt; potential questions and answers from the content using an LLM.
The outcome is a set of suggested Q&amp;amp;A pairs or even multi-turn conversation scenarios derived from the documentation. For example, if you upload a product FAQ PDF, MATE can generate likely customer questions and the correct answers from that PDF. These can be reviewed and added to your test suites. This feature helps broaden test coverage &lt;em&gt;automatically&lt;/em&gt;, ensuring the agent is tested on real knowledge it’s supposed to have, and catching gaps where it might not respond correctly. It’s an intelligent way to keep tests in sync with content. (Notably, Copilot Studio Kit in the Power Platform also introduced an AI-based test generation in preview, which uses the agent’s topics and knowledge to generate example questions. MATE provides a similar capability but on external docs and with full control of the generated cases.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Detailed Reporting and Analysis:&lt;/strong&gt; After executing tests, MATE provides rich &lt;strong&gt;metrics and logs&lt;/strong&gt;. In the &lt;strong&gt;Web Dashboard&lt;/strong&gt;, you can see overall pass rates, success trends over time, and drill down into individual test runs. Each test run retains the &lt;strong&gt;transcript of the conversation&lt;/strong&gt; and the scores for each evaluation dimension, so you can inspect exactly where a particular test failed. This addresses transparency: instead of just “Test 5 failed”, you can see that it failed because, say, &lt;em&gt;Factuality&lt;/em&gt; scored low (perhaps the agent gave a wrong detail), and even read the conversation to diagnose the issue. MATE’s &lt;strong&gt;Runs&lt;/strong&gt; view lets you compare results between runs – useful for spotting regressions after an update. All test data (test cases, results, transcripts, etc.) are stored in a local &lt;strong&gt;PostgreSQL database&lt;/strong&gt; for quick retrieval and can be queried or exported for additional analysis.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Web UI and CLI for Different Use Cases:&lt;/strong&gt; MATE offers two interfaces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Web Application&lt;/strong&gt; (built with ASP.NET Blazor Server) for an interactive experience. This is ideal for exploratory testing, configuring your test suites, and reviewing results. The UI includes a setup wizard for initial configuration (entering your agent’s Direct Line credentials and your AI model info) to generate the necessary settings file. Testers can use the web UI to kick off test runs on-demand, monitor progress, and view results in real time.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Command-Line Interface (CLI)&lt;/strong&gt; tool for automation. The CLI allows you to run tests as part of scripts or pipelines. For example, you can incorporate &lt;code&gt;dotnet run --suite "Regression Suite"&lt;/code&gt; into a DevOps or GitHub Actions pipeline, so that whenever the agent’s bot is updated or its content changes, the test suite runs and verifies everything still works. The CLI returns an exit code indicating success or failure (0 if all tests passed, non-zero if any test failed), which CI systems can use to pass/fail a build. This enables true &lt;strong&gt;CI/CD for AI agents&lt;/strong&gt; – a failed test can halt a deployment, preventing flawed agent versions from going live.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Containerized and Extensible Architecture:&lt;/strong&gt; MATE is designed to be run in a self-hosted manner, giving teams full control. It doesn’t require a SaaS backend or a Dataverse environment – you just need a machine that can reach the internet for calling the agent service and the AI model endpoint. This avoids many of the &lt;strong&gt;Power Platform licensing constraints&lt;/strong&gt; associated with the Copilot Studio Kit (discussed later). The architecture is modular by design, with separate components (projects) for domain logic, data storage, core services, web UI, and CLI. This modularity not only enforces clean separation of concerns, but also sets the stage for supporting &lt;strong&gt;multiple types of agents in the future&lt;/strong&gt;. In fact, MATE’s roadmap includes extending support to other agent frameworks beyond Copilot Studio – the core logic (test execution, AI judging, etc.) can be adapted to different agent APIs by swapping out the integration layer. Early code commits already hint at multi-agent support being developed and even an upcoming &lt;strong&gt;“red teaming” module for adversarial testing&lt;/strong&gt; (there are structural hooks in the codebase for this, though the feature is not yet implemented). This means MATE is not a one-off tool but a growing platform for comprehensive AI agent testing across the board.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  MATE Architecture at a Glance
&lt;/h3&gt;

&lt;p&gt;Internally, MATE is built with a modern software architecture using the latest Microsoft stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;.NET 9&lt;/strong&gt; with C# – providing performance and cross-platform support.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ASP.NET Core Blazor Server&lt;/strong&gt; for the web front-end – delivering a rich interactive UI for managing tests and viewing results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity Framework Core (with PostgreSQL)&lt;/strong&gt; – for the local database that stores test cases, results, transcripts, etc., ensuring persistence without requiring an external DB server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure AI OpenAI SDK&lt;/strong&gt; – to connect to the AI Judge model hosted on Azure’s AI services (Azure OpenAI “Foundry”). This is how MATE queries an LLM for evaluation of answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lucene.NET&lt;/strong&gt; – used for full-text indexing in the document-driven test generation feature, to find relevant content in uploaded docs for question generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF processing libraries&lt;/strong&gt; (e.g., UglyToad.PdfPig) – to extract text from PDF documents for test generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serilog&lt;/strong&gt; – for structured logging of events and errors, helping with diagnosing issues in test executions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The solution is divided into several components (projects) reflecting a &lt;strong&gt;modular design&lt;/strong&gt;: a &lt;strong&gt;Domain&lt;/strong&gt; layer for core models and interfaces, a &lt;strong&gt;Data&lt;/strong&gt; layer for database access, a &lt;strong&gt;Core&lt;/strong&gt; services layer (implementing the judge logic, execution engine, etc.), a &lt;strong&gt;WebUI&lt;/strong&gt; for the front-end, and a &lt;strong&gt;CLI&lt;/strong&gt; project for the command-line interface. This modularity makes it easier to maintain and extend specific parts (for example, adding a new agent connector could be done by introducing a new service in the Core or a new API integration, without touching the UI or data layers).&lt;/p&gt;

&lt;p&gt;Overall, MATE is engineered to be a &lt;strong&gt;scalable, extensible test harness&lt;/strong&gt; for AI agents. It’s currently focused on Copilot Studio agents, but its principles apply broadly. Next, we’ll compare MATE to the Copilot Studio Kit – the established testing solution from Microsoft’s Power CAT team – to understand their differences and use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing MATE with Copilot Studio Kit
&lt;/h2&gt;

&lt;p&gt;Microsoft’s &lt;strong&gt;Power CAT Copilot Studio Kit&lt;/strong&gt; is an existing solution aimed at testing and managing Copilot Studio agents. It’s a &lt;strong&gt;Power Platform&lt;/strong&gt; solution (managed package) that provides a canvas app or model-driven app interface, along with Dataverse entities and Power Automate flows, enabling test case creation, automated test runs via the Direct Line API, and analytics (such as conversation transcripts, dashboards, etc.). The Copilot Studio Kit was instrumental in early adoption of agent testing – it allowed makers to do things like bulk import test cases (via Excel), run them from a UI, and even integrate with Azure DevOps pipelines via Power Platform build tools.&lt;/p&gt;

&lt;p&gt;However, the &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; has some inherent characteristics stemming from its Power Platform foundation. Below is a comparison of &lt;strong&gt;MATE vs. Copilot Studio Kit&lt;/strong&gt; across key dimensions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;MATE (Multi-Agent Test Environment)&lt;/th&gt;
&lt;th&gt;Copilot Studio Kit (Power CAT)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;.NET application (containarized). Runs locally or in cloud; launched via web UI or CLI on demand.&lt;/td&gt;
&lt;td&gt;Power Platform managed solution. Deployed to a Dataverse environment; accessed via Power Apps interface.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Licensing &amp;amp; Costs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;source-available - CC BY-NC 4.0 . Requires .NET runtime and an Azure OpenAI endpoint (for AI Judge) which may incur usage costs. No special Power Platform licensing needed beyond having a Copilot Studio agent to test.&lt;/td&gt;
&lt;td&gt;Provided by Microsoft Power CAT as a sample solution (available on GitHub). However, requires Power Platform &lt;strong&gt;premium licenses&lt;/strong&gt;: a Dataverse environment, and for certain features, &lt;strong&gt;AI Builder credits&lt;/strong&gt; (for generative answer analysis). Usage of Dataverse and Power Automate in the kit might consume capacity or require specific licenses.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Technology Stack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Modern .NET 9 stack; Blazor Web UI, CLI tool, local PostgreSQL DB. Integrates with Azure services (OpenAI) for evaluation. Highly customizable and extendable by developers (source code available).&lt;/td&gt;
&lt;td&gt;Low-code Power App + Dataverse. Relies on standard Power Platform tech (model-driven app or canvas app, Dataverse tables, Power Automate flows, AI Builder for some AI tasks). Customization is limited to what Power Platform allows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Creation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports manual creation of test cases via UI or by defining JSON/CSV, etc., and &lt;strong&gt;auto-generation&lt;/strong&gt; of test cases from documents using LLMs. Test cases can include multi-turn dialogues in one case. Organized into test suites for batch execution.&lt;/td&gt;
&lt;td&gt;Supports manual test case input (through the app or via Excel import/export). Also supports multi-turn test cases and offers some &lt;strong&gt;AI-assisted generation&lt;/strong&gt; of test questions from agent topics/knowledge (in Preview, via the Agent Evaluation integration). Test cases stored in Dataverse; grouping of tests supported (by test set).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs tests externally by connecting to the agent’s Direct Line channel (or Web Channel with secret &amp;amp; bot ID). Offers a &lt;strong&gt;CLI&lt;/strong&gt; for headless execution (suitable for CI pipelines) and a web interface for interactive runs. Test results are stored locally and displayed in the web UI with analytics.&lt;/td&gt;
&lt;td&gt;Executes tests through Copilot Studio’s Direct Line API as well, orchestrated by Power Automate flows under the hood. Typically run on-demand from the app’s interface. There is integration for pipelines via Power Platform build tools, though this is more complex to set up. Results are stored in Dataverse and can be viewed via in-app dashboards or exported.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation Methodology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;AI-driven semantic evaluation&lt;/strong&gt;: uses a GPT-based &lt;strong&gt;AI Judge&lt;/strong&gt; to score responses on multiple quality dimensions (task success, intent match, factual correctness, etc.). This allows flexible, semantic comparisons rather than simple exact matches. Configurable pass thresholds provide fine-grained control. Also supports explicit pass/fail rules (acceptance criteria) where needed.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Rule-based and some AI&lt;/strong&gt;: supports exact or partial &lt;strong&gt;response matching&lt;/strong&gt;, checking for expected keywords or presence of attachments, etc. For &lt;strong&gt;generative answers&lt;/strong&gt;, the kit uses &lt;strong&gt;AI Builder&lt;/strong&gt; to compare the agent’s answer with a reference answer for similarity. It also retrieves telemetry from &lt;strong&gt;Application Insights&lt;/strong&gt; to help explain failures. Plan validation tests examine the agent’s action plan against expected tools (for orchestration scenarios).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Modularity &amp;amp; Extensibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Designed to be &lt;strong&gt;modular and extensible&lt;/strong&gt;. The core can be extended to new agent types (plans to support other AI agent frameworks are in progress). The evaluation component (AI Judge) can be pointed to different models or adapted with different prompts. Being source-available, organizations can modify or extend MATE (e.g., add custom evaluation metrics, integrate with other data sources) to fit their needs.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Focused scope, limited extensibility.&lt;/strong&gt; The Copilot Studio Kit is specific to Copilot Studio agents and deeply tied to the Power Platform environment structure. It’s not architected to test arbitrary other agents. Customizing it generally means modifying the Power App or creating new Dataverse fields/flows, which requires Power Platform expertise.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data &amp;amp; Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stores test artifacts and results in a local database  within the application. No cloud infrastructure needed to get started; data stays within the user’s environment. For scaling up, the application could be hosted on a server or in Kubernetes (containerization support is under development). Because it’s self-hosted, data sovereignty and privacy can be managed internally.&lt;/td&gt;
&lt;td&gt;Relies on &lt;strong&gt;Dataverse&lt;/strong&gt; for storing tests and results, and optionally uses other services (App Insights, SharePoint, etc.) for logs and knowledge management. This provides seamless integration if you’re already within the Microsoft ecosystem, but it requires that all data be in the Power Platform cloud.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Table:&lt;/strong&gt; Comparison of MATE vs. Copilot Studio Kit across key aspects of testing functionality and usage.&lt;/p&gt;

&lt;p&gt;As seen above, &lt;strong&gt;MATE and Copilot Studio Kit share the same goal – improving agent quality through automated testing – but they differ in implementation approach&lt;/strong&gt;. MATE is more developer-oriented, offering flexibility, openness, and extensibility, whereas the Copilot Studio Kit is maker-friendly, integrated in the Power Platform with a ready-to-use interface but comes with platform constraints.&lt;/p&gt;

&lt;p&gt;From a Microsoft perspective, the &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; was a bridge solution to empower agent creators with testing capabilities before deeper platform features were available. Now, with &lt;strong&gt;Agent Evaluation built directly into Copilot Studio&lt;/strong&gt; (currently in preview), some capabilities of the kit are being absorbed into the product itself – for instance, AI-generated test queries and built-in execution of test sets. Still, the Kit provides additional tooling (like dashboards, inventory, governance features) that are useful in complex environments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;see articles:&lt;br&gt;&lt;br&gt;
 &lt;a href="https://preview.holgerimbery.blog/ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit" rel="noopener noreferrer"&gt;Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://preview.holgerimbery.blog/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview" rel="noopener noreferrer"&gt;Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;MATE, on the other hand, is an &lt;strong&gt;independent effort to provide a robust testing harness&lt;/strong&gt; that can evolve fast and go beyond what the closed-source product features offer. It is not limited by Power Platform’s boundaries (for example, one could imagine integrating MATE with other LLM evaluation criteria, or hooking it up to monitor backend APIs invoked by the agent). Additionally, MATE’s modular nature means it could incorporate &lt;strong&gt;other agent types&lt;/strong&gt; into the same testing dashboard. For example, if you have a fleet of different AI bots – some built in Copilot Studio, some using Azure OpenAI Orchestration, some third-party – MATE could theoretically be extended to test them all in one place, whereas the Copilot Studio Kit is only for Copilot Studio agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use which?&lt;/strong&gt; If you are a &lt;strong&gt;Power Platform maker or IT admin&lt;/strong&gt; who wants a straightforward, supported way to test Copilot Studio agents and you’re already comfortable with Power Apps and Dataverse, the &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; is a solid choice. It integrates nicely with the environment (and your data, logs, etc.) and doesn’t require coding to use. However, you’ll need the necessary licenses and some patience to configure the environment, and you won’t be able to easily customize how tests are evaluated beyond what Microsoft provides.&lt;/p&gt;

&lt;p&gt;If you are a &lt;strong&gt;developer or dev team&lt;/strong&gt; looking for a more flexible, code-driven approach – especially if you want to integrate agent testing into a DevOps pipeline or extend testing to specialized scenarios – &lt;strong&gt;MATE&lt;/strong&gt; is very appealing. It does require .NET and some setup, but it gives you &lt;strong&gt;full control&lt;/strong&gt;. You can run it locally for rapid iteration, include it in automated builds, and tweak it to your needs. There’s also no dependency on having a Power Platform environment or any particular license. You do need access to an &lt;strong&gt;Azure OpenAI service&lt;/strong&gt; (or you could swap in another LLM API if desired) to leverage the AI judge, but that is relatively straightforward for most enterprise scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roadmap and Future Enhancements in MATE
&lt;/h2&gt;

&lt;p&gt;MATE is an evolving project, and there are several notable enhancements on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Support for Additional Agent Types:&lt;/strong&gt; As of now, MATE &lt;strong&gt;supports testing Microsoft Copilot Studio agents exclusively&lt;/strong&gt;, because it specifically uses Copilot’s Direct Line API and related assumptions (like the concept of “topics” and Dataverse knowledge base) in its current version. However, the architecture is being extended to accommodate other agent platforms. Future versions are expected to introduce modules for &lt;strong&gt;other agent types&lt;/strong&gt; – for example, the ability to test &lt;strong&gt;Microsoft Agent Framework agents&lt;/strong&gt;, or even agents built with entirely different frameworks. This will broaden MATE’s applicability across various “agentic AI” solutions used within Microsoft and beyond, making it a one-stop testing hub for heterogeneous AI systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated Red Teaming:&lt;/strong&gt; In addition to “blue team” style functional testing (checking that the agent does what it’s supposed to), MATE aims to incorporate &lt;strong&gt;“red teaming”&lt;/strong&gt; capabilities. Red teaming in AI refers to attacking or stress-testing the agent with malicious or unexpected inputs to probe its defenses and safety measures. This can include testing the agent’s response to prompt injections, inappropriate content requests, or attempts to trick the agent into breaking rules. The goal is to ensure the agent is robust against misuse or adversarial users. The MATE codebase already contains the &lt;strong&gt;foundation for a Red Teaming module&lt;/strong&gt;, but this is currently just a skeleton (non-functional in the current release). Once completed, this feature will allow users to run a suite of adversarial tests (perhaps using predefined malicious prompts or common attack patterns) against their agents and get a report on vulnerabilities or policy compliance issues. This is a critical part of AI system testing, especially for enterprise scenarios, and its inclusion will differentiate MATE further by offering a more &lt;strong&gt;comprehensive safety evaluation&lt;/strong&gt; than what is currently possible with the Copilot Studio Kit or built-in Agent Evaluation (which, so far, focus on correctness and performance rather than adversarial robustness).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud and Scalable Deployment:&lt;/strong&gt; Presently, MATE runs as a local docker stack. Looking forward, the project plans to simplify &lt;strong&gt;deployment on Azure&lt;/strong&gt;, likely via containerization. Kubernetes support is on the roadmap, meaning you might be able to deploy MATE as a set of containers (web app, background worker, etc.) in an AKS (Azure Kubernetes Service) or similar environment. This will enable &lt;strong&gt;team-wide usage at scale&lt;/strong&gt; – multiple testers or developers could share a MATE instance, run tests concurrently, and store results in a central location, much like a web application service. Cloud deployment will also facilitate integration with other services (for example, connecting to Azure DevOps for automatic test triggers, or scaling out the AI Judge component). &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI/UX and Usability Improvements:&lt;/strong&gt; As an source-available project, MATE will continue to refine its user interface and ease of use. Features on the horizon could include richer test editing experiences (perhaps a visual conversation flow editor), more analytics dashboards (trend of agent performance over time, flakiness of certain tests, etc.), and integration with agent design tools (for example, pulling in agent topics or suggesting tests based on recent real user conversations – aligning with how built-in Agent Evaluation reuses Test Pane interactions).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Testing AI agents is no longer optional – it’s a necessity for any organization that wants to &lt;strong&gt;confidently deploy AI solutions&lt;/strong&gt;. Agent Development Solutions empower the creation of sophisticated AI Agents, but ensuring these agents function correctly, safely, and efficiently requires going beyond manual testing or one-off trials. This is where testing frameworks like Copilot Studio Kit and MATE come into play.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MATE (Multi-Agent Test Environment)&lt;/strong&gt; represents a next-generation approach to agent testing. It addresses the limitations of earlier tools by adopting a &lt;strong&gt;fully modular, code-first architecture&lt;/strong&gt; that can keep pace with the rapidly changing AI landscape. By using MATE, developers and testers gain the ability to thoroughly &lt;strong&gt;automate conversations with their agents, evaluate responses with the help of AI, generate tests from existing knowledge, and integrate all this into continuous delivery pipelines&lt;/strong&gt;. The outcome is a higher degree of assurance that your Copilot Studio agent will perform as expected when it’s in production – responding correctly to user queries, using the right tools, and staying within the guardrails.&lt;/p&gt;

&lt;p&gt;In comparison to the Power Platform-based Copilot Studio Kit, MATE offers more &lt;strong&gt;flexibility, extensibility, and independence&lt;/strong&gt;. You won’t be constrained by specific licensing or environment setups, and you can tailor the tool to your needs. On the other hand, it’s a more technical solution that may require developer effort to set up and maintain, whereas the Copilot Studio Kit is more turn-key if you’re already within Microsoft’s ecosystem. It’s encouraging to see both approaches available, as they cater to different audiences.&lt;/p&gt;

&lt;p&gt;Ultimately, MATE’s importance goes beyond just testing Copilot Studio agents. It signifies an evolving philosophy in the AI agent world: that &lt;strong&gt;testing and evaluation should be first-class citizens in the development lifecycle of AI systems&lt;/strong&gt;, just as they are in traditional software development. With AI models and agents becoming increasingly central to applications, tools like MATE help ensure we can trust these systems through systematic validation. MATE’s deep integration of AI for testing (using an AI to test another AI) is an innovative approach that can significantly enhance the rigor of evaluations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In summary, MATE enables teams to ship Copilot Studio agents (and, in the future, other AI agents) with greater confidence&lt;/strong&gt;. It provides the means to catch issues early, improve agents iteratively based on test feedback, and guard against regressions as agents evolve. By combining the power of automation with the wisdom of AI judging, MATE exemplifies a “test smarter” strategy for the era of generative AI – ensuring that our intelligent agents are not only smart, but also reliable, safe, and effective when they go to work for us.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>testing</category>
    </item>
    <item>
      <title>Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 14 Feb 2026 07:09:12 +0000</pubDate>
      <link>https://forem.com/holgerimbery/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview-l4a</link>
      <guid>https://forem.com/holgerimbery/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview-l4a</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Copilot Studio agents deserve testing at scale—but which tool fits your team? &lt;strong&gt;Agent Evaluation&lt;/strong&gt; brings lightweight, AI-powered checks into the Studio authoring UI for rapid iteration, while &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; delivers enterprise-grade multi-turn validation, plan verification, and telemetry for production gates. This guide cuts through the hype and shows you exactly when to reach for each tool—and why the best teams use both together.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why Read This&lt;/strong&gt; Choosing the right testing tool can make or break your Copilot Studio agent quality strategy. &lt;strong&gt;This article goes deeper&lt;/strong&gt; into how Copilot Studio Kit and Agent Evaluation complement—and differ in—their testing approaches. If you’ve read my earlier piece on &lt;a href="//./ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit"&gt;&lt;em&gt;Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit&lt;/em&gt;&lt;/a&gt; (2026-01-31), which focused on Kit capabilities alone, this comparison will help you decide &lt;em&gt;when&lt;/em&gt; to use Kit, &lt;em&gt;when&lt;/em&gt; to use Agent Evaluation, and &lt;em&gt;when&lt;/em&gt; to use both together. Perfect for teams scaling from dev iteration to production release gates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation vs. Verification in AI Agent Testing
&lt;/h2&gt;

&lt;p&gt;Before diving into the two tools, it’s worth clarifying a critical distinction in quality assurance: &lt;strong&gt;verification&lt;/strong&gt; and &lt;strong&gt;validation&lt;/strong&gt; address different concerns when testing conversational agents. &lt;strong&gt;Verification&lt;/strong&gt; answers the question “Did we build it right?” It focuses on ensuring your agent behaves as designed, follows the intended logic flows, and produces outputs that meet specifications. In practical terms, verification tests check that your agent’s instructions are correctly implemented, that topics route to the right handlers, and that expected responses are generated for known inputs. &lt;strong&gt;Validation&lt;/strong&gt; , by contrast, asks “Did we build the right thing?”—it assesses whether your agent actually meets user needs, provides accurate and helpful information, and performs well in real-world scenarios. Validation is inherently broader and more subjective; it often involves human judgment, user feedback, and quality metrics like relevance, groundedness, and user satisfaction. The Copilot Studio Kit excels at verification through structured test cases and telemetry analysis, while Agent Evaluation bridges both worlds by enabling quality assessments that combine deterministic checks (verification) with AI-based graders that approximate validation concerns. Most production-grade agent programs require both verification to catch implementation bugs and ensure consistency, and validation to ensure the agent truly solves the problem it was designed to solve.&lt;/p&gt;

&lt;p&gt;Microsoft provides two complementary approaches to test Copilot Studio agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; (Power CAT): A Dataverse-backed, installable toolkit that runs batch tests via the &lt;strong&gt;Direct Line API&lt;/strong&gt; , enriches results with &lt;strong&gt;Application Insights&lt;/strong&gt; / &lt;strong&gt;Dataverse&lt;/strong&gt; , and supports &lt;strong&gt;multi‑turn&lt;/strong&gt; and &lt;strong&gt;plan validation&lt;/strong&gt; scenarios, plus Excel import/export and dashboards. (&lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-overview" rel="noopener noreferrer"&gt;Kit overview&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-test-capabilities" rel="noopener noreferrer"&gt;Test capabilities&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-run-tests" rel="noopener noreferrer"&gt;Run tests&lt;/a&gt;, &lt;a href="https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent Evaluation (preview)&lt;/strong&gt;: A &lt;strong&gt;built‑in&lt;/strong&gt; Copilot Studio experience for creating &lt;strong&gt;evaluation sets&lt;/strong&gt; , generating prompts with AI, selecting &lt;strong&gt;test methods&lt;/strong&gt; (exact/partial, similarity/intent, AI‑judged quality metrics), and running structured checks directly from the authoring UI. (&lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/build-smarter-test-smarter-agent-evaluation-in-microsoft-copilot-studio/" rel="noopener noreferrer"&gt;Microsoft Copilot Blog announcement&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;Preview documentation&lt;/a&gt;, &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/whats-new-in-copilot-studio-october-2025/" rel="noopener noreferrer"&gt;What’s new&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Used together, &lt;strong&gt;Agent Evaluation&lt;/strong&gt; accelerates inner‑loop quality checks during design, while &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; anchors enterprise‑grade testing, telemetry, and CI/CD gates prior to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Each Option Provides
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Copilot Studio Kit (Power CAT)
&lt;/h3&gt;

&lt;p&gt;The Copilot Studio Kit is a comprehensive, installable testing framework built on Microsoft’s Power CAT (Power Customer Advisory Team) architecture. It extends Copilot Studio’s native capabilities by providing infrastructure for large-scale, automated agent validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Kit operates through a structured workflow: you define test cases within &lt;strong&gt;Agents&lt;/strong&gt; , organize them into &lt;strong&gt;Test Sets&lt;/strong&gt; , and execute these sets as &lt;strong&gt;Test Runs&lt;/strong&gt; on demand or via scheduled processes. Test execution occurs through the &lt;strong&gt;Direct Line API&lt;/strong&gt; , which simulates authentic user interactions with your published agent. Results are enriched with telemetry from &lt;strong&gt;Application Insights&lt;/strong&gt; and &lt;strong&gt;Dataverse&lt;/strong&gt; , capturing detailed diagnostic information, including which topics were invoked, confidence scores for intent matching, and end-to-end latency measurements. This multi-layer instrumentation enables root-cause analysis of test failures and performance anomalies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test types:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Kit supports multiple validation approaches to accommodate different agent behaviors.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Response Match&lt;/em&gt; tests verify that agent outputs match expected text or patterns.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Attachment/Adaptive Card&lt;/em&gt; tests validate that rich response elements (cards, files, structured outputs) are generated correctly.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Topic Match&lt;/em&gt; tests confirm that conversations trigger the intended dialog flows.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Generative Answers&lt;/em&gt; tests assess responses from generative models embedded in agents.&lt;/p&gt;

&lt;p&gt;Beyond single-turn exchanges, the Kit handles &lt;strong&gt;Multi‑turn&lt;/strong&gt; conversations - sequences of user inputs and agent responses within a single session - and &lt;strong&gt;Plan validation&lt;/strong&gt; , which verifies that generative orchestration components select and invoke the correct tools, actions, and connected agents in the proper sequence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run management:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Once you execute a Test Run, the Kit provides tools for iterative validation work. You can duplicate previous runs to establish baselines, re-run enrichment steps (regenerating Application Insights correlations and Dataverse transcripts) without re-executing the entire test, and analyze aggregate pass/fail statistics or drill down into individual case results to identify failure patterns and trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Artifacts and maintenance:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Kit’s source code, configuration schemas, and runbooks are maintained in the &lt;a href="https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit" rel="noopener noreferrer"&gt;Power CAT GitHub repository&lt;/a&gt;, where you can review implementation details, file issues, and contribute improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Evaluation (Preview) in Copilot Studio
&lt;/h3&gt;

&lt;p&gt;Agent Evaluation is a feature integrated directly into the Copilot Studio authoring environment, designed to facilitate the systematic evaluation of conversational agents during development. Unlike external toolkits or custom test harnesses, Agent Evaluation operates natively within the Studio UI, allowing authors to create, manage, and execute test cases without leaving the design context. This approach is intended to reduce the barrier to agent testing and encourage more frequent, iterative validation within the authoring workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Agent Evaluation introduces the concept of &lt;strong&gt;evaluation sets&lt;/strong&gt; , which are collections of test prompts and expected responses. Authors can create these sets manually, import them from CSV files, or generate them automatically using AI based on the agent’s metadata or knowledge sources. This flexibility supports both targeted test case authoring and rapid bootstrapping of test coverage. Additionally, logs from the &lt;strong&gt;Test Pane&lt;/strong&gt; - the interactive testing interface within Copilot Studio - can be reused to create evaluation sets, streamlining the conversion of ad hoc tests into structured checks.&lt;/p&gt;

&lt;p&gt;For each test case, authors define the user prompt, the expected answer, and the &lt;strong&gt;success criteria&lt;/strong&gt;. The evaluation process can then be run directly within Studio, producing aggregate and per-case results that are immediately accessible for inspection and troubleshooting. This tight integration with the authoring environment is intended to support a rapid feedback loop, enabling authors to identify and address issues early in the development cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test methods:&lt;/strong&gt; Agent Evaluation supports several methods for assessing agent responses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lexical matching&lt;/strong&gt; : This includes exact and partial string matching between the agent’s response and the expected answer. Lexical methods are useful for scenarios where deterministic outputs are required, such as FAQ responses or compliance-driven answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Similarity/Intent matching&lt;/strong&gt; : These methods use semantic similarity algorithms or intent classification to determine whether the agent’s response is sufficiently close in meaning to the expected answer, even if the wording differs. This is particularly relevant for conversational agents that employ generative models or paraphrasing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-judged quality metrics&lt;/strong&gt; : Agent Evaluation can apply AI-based graders to assess qualitative aspects of responses, such as relevance, completeness, and groundedness. These metrics provide a more nuanced view of agent performance, especially in open-ended or knowledge-intensive scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt; Please note that Agent Evaluation is currently in public preview. As such, its feature set, supported test methods, and grading criteria may evolve based on user feedback and ongoing development. The tool is not intended to replace Responsible AI (RAI) or safety reviews, which remain essential for production deployments of conversational agents. Instead, Agent Evaluation is best viewed as a complementary capability for improving agent quality during the iterative design and testing phases. (&lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;Preview docs&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Side‑by‑Side Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Copilot Studio Kit&lt;/th&gt;
&lt;th&gt;Agent Evaluation (preview)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workspace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Separate model‑driven app (Dataverse) you install&lt;/td&gt;
&lt;td&gt;Built directly into Copilot Studio authoring UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test data authoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dataverse entities; &lt;strong&gt;Excel&lt;/strong&gt; import/export&lt;/td&gt;
&lt;td&gt;Create/import CSV; reuse Test Pane; &lt;strong&gt;AI‑generate&lt;/strong&gt; prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Via &lt;strong&gt;Direct Line API&lt;/strong&gt; with cloud flows/enrichment&lt;/td&gt;
&lt;td&gt;Run directly in Studio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test methods&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Response/Attachment/Topic match; &lt;strong&gt;Generative Answers&lt;/strong&gt; ; &lt;strong&gt;Multi‑turn&lt;/strong&gt; ; &lt;strong&gt;Plan validation&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Exact/Partial match; &lt;strong&gt;Similarity/Intent&lt;/strong&gt; ; &lt;strong&gt;AI quality&lt;/strong&gt; graders&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enrichment with &lt;strong&gt;App Insights&lt;/strong&gt; + &lt;strong&gt;Dataverse&lt;/strong&gt; (topics, intent scores, latencies)&lt;/td&gt;
&lt;td&gt;Aggregate pass/fail and scores with drill‑downs in Studio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong for release gates, duplication, and re‑runs&lt;/td&gt;
&lt;td&gt;Strong for inner loop; roadmap for broader scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adjacent features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Part of a broader kit (Compliance Hub, KPIs, SharePoint sync)&lt;/td&gt;
&lt;td&gt;Focused on evaluation inside Studio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maturity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generally available toolkit (maintained by Power CAT)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Public preview&lt;/strong&gt; (subject to change)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prefer &lt;strong&gt;Agent Evaluation (preview)&lt;/strong&gt; when
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;You want &lt;strong&gt;fast, in‑Studio&lt;/strong&gt; feedback while iterating on instructions, topics, or knowledge.&lt;/li&gt;
&lt;li&gt;Your scenarios are &lt;strong&gt;single‑turn&lt;/strong&gt; (FAQ‑style) and benefit from &lt;strong&gt;lexical/semantic/quality&lt;/strong&gt; scoring.&lt;/li&gt;
&lt;li&gt;You need &lt;strong&gt;AI‑generated&lt;/strong&gt; test prompts to bootstrap coverage quickly.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Prefer &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; when
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;You must validate &lt;strong&gt;multi‑turn, end‑to‑end&lt;/strong&gt; flows in one conversation context.&lt;/li&gt;
&lt;li&gt;You need to verify &lt;strong&gt;generative orchestration plans&lt;/strong&gt; (correct tools/actions/connected agents).&lt;/li&gt;
&lt;li&gt;You require &lt;strong&gt;deep telemetry&lt;/strong&gt; (topics, intent scores, latencies) and &lt;strong&gt;App Insights&lt;/strong&gt; correlation.&lt;/li&gt;
&lt;li&gt;You manage &lt;strong&gt;release gates&lt;/strong&gt; with repeatable Test Runs and re‑runs of enrichment steps.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Practical Scenarios
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inner‑loop tuning for HR FAQ&lt;/strong&gt; → Use &lt;strong&gt;Agent Evaluation&lt;/strong&gt; with &lt;strong&gt;Similarity&lt;/strong&gt; and &lt;strong&gt;AI quality&lt;/strong&gt; graders; generate 25 prompts from metadata, add a few gold answers, iterate on failures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre‑release regression for tool‑using policy agent&lt;/strong&gt; → Use &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; : &lt;strong&gt;Multi‑turn&lt;/strong&gt; + &lt;strong&gt;Plan validation&lt;/strong&gt; ; App Insights + Dataverse enrichment to diagnose routing/tool‑selection issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic drift watch&lt;/strong&gt; → Weekly &lt;strong&gt;Agent Evaluation&lt;/strong&gt; runs for &lt;strong&gt;relevance/groundedness&lt;/strong&gt; ; investigate dips by reviewing knowledge changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI pipeline across multiple agents&lt;/strong&gt; → Nightly &lt;strong&gt;Kit&lt;/strong&gt; Test Runs with Excel‑managed test sets; export failures for triage; duplicate runs after fixes; investigate latencies via App Insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Strengths &amp;amp; Limitations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Copilot Studio Kit - strengths&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Multi‑turn, plan validation, deep observability (App Insights + Dataverse), and release‑friendly run management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot Studio Kit - limitations&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Separate installation/governance; relies on Direct Line and cloud flows; AI Builder needed for AI‑based answer analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Evaluation - strengths&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Native UI, AI‑generated test sets, flexible lexical/semantic/AI graders; quick to adopt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Evaluation - limitations&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Public preview; multi‑turn on the roadmap; does not replace RAI/safety reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Guide (Quick Reference)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need fast Studio‑native checks while editing?&lt;/strong&gt; → &lt;strong&gt;Agent Evaluation (preview)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need enterprise regression, multi‑turn, plan validation, telemetry?&lt;/strong&gt; → &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mature teams:&lt;/strong&gt; Use both - Evaluation for inner loop; Kit for outer loop, and promotion gates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Helpful How‑Tos &amp;amp; Deep‑dives (Clickable Links)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Copilot Studio Kit: &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-overview" rel="noopener noreferrer"&gt;Overview&lt;/a&gt; · &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-test-capabilities" rel="noopener noreferrer"&gt;Test capabilities&lt;/a&gt; · &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-run-tests" rel="noopener noreferrer"&gt;Run tests&lt;/a&gt; · &lt;a href="https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; · Community guides: &lt;a href="https://forwardforever.com/test-your-custom-copilot-with-power-cat-copilot-studio-kit/" rel="noopener noreferrer"&gt;Forward Forever&lt;/a&gt;, &lt;a href="https://www.matthewdevaney.com/configure-ms-auth-for-test-automation-in-copilot-studio-kit/" rel="noopener noreferrer"&gt;Matthew Devaney (MS Auth setup)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agent Evaluation (preview): &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/build-smarter-test-smarter-agent-evaluation-in-microsoft-copilot-studio/" rel="noopener noreferrer"&gt;Blog announcement&lt;/a&gt; · &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;Preview docs&lt;/a&gt; · &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/whats-new-in-copilot-studio-october-2025/" rel="noopener noreferrer"&gt;What’s new&lt;/a&gt; · Community guides: &lt;a href="https://sharepoint247.com/ai/how-to-evaluate-your-copilot-studio-agent/" rel="noopener noreferrer"&gt;How‑to&lt;/a&gt;, &lt;a href="https://dev.to/balagmadhu/agent-evaluation-in-action-tips-pitfalls-and-best-practices-5cje"&gt;Tips &amp;amp; pitfalls&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Testing conversational agents in Microsoft Copilot Studio requires a nuanced approach, as development team needs can vary significantly by project stage and agent complexity. The Copilot Studio Kit and Agent Evaluation (preview) are two distinct tools that address different aspects of the testing process.&lt;/p&gt;

&lt;p&gt;The Copilot Studio Kit is designed for scenarios requiring comprehensive, repeatable, and instrumented testing. It is particularly well-suited for validating multi-turn conversations, verifying orchestration plans, and collecting detailed telemetry for analysis. Because it operates as a separate, installable solution and integrates with Dataverse and Application Insights, it is most appropriate for teams that need to enforce quality gates before production releases or require historical tracking of test results. The Kit’s support for Excel-based test management and its extensibility through source code access make it a practical choice for organizations with established DevOps practices or those managing multiple agents at scale.&lt;/p&gt;

&lt;p&gt;Agent Evaluation (preview), on the other hand, is integrated directly into the Copilot Studio authoring environment. Its primary focus is to provide rapid feedback during agent development and tuning, especially for single-turn or FAQ-style interactions. The ability to generate test prompts with AI and apply a range of grading methods (from exact match to semantic similarity and quality metrics) makes it accessible to authors who may not have a background in test automation. However, as a preview feature, its capabilities and scope are still evolving, and it does not currently address multi-turn or orchestration scenarios in depth.&lt;/p&gt;

&lt;p&gt;In practice, many teams will find value in using both tools: Agent Evaluation for quick, iterative checks during agent design, and Copilot Studio Kit for more thorough validation prior to deployment. Understanding each tool’s intended use cases, technical requirements, and current limitations will help teams select the most appropriate approach for their workflow. As the Copilot Studio platform continues to evolve, the boundaries between these tools will likely shift, but the need for systematic, context-aware testing will remain constant to deliver reliable conversational agents.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>agentevaluation</category>
      <category>testing</category>
    </item>
    <item>
      <title>Microsoft Frontier Agents: A Deep Technical Overview</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 07 Feb 2026 10:57:11 +0000</pubDate>
      <link>https://forem.com/holgerimbery/microsoft-frontier-agents-a-deep-technical-overview-5h70</link>
      <guid>https://forem.com/holgerimbery/microsoft-frontier-agents-a-deep-technical-overview-5h70</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Microsoft Frontier Agents represent a fundamental shift in enterprise automation, moving from rigid rule-based systems to reasoning-capable AI agents that can gather information across organizational boundaries, synthesize complex data, and execute multi-step workflows autonomously. By operating within your existing Microsoft 365 infrastructure and security frameworks, Frontier Agents enable organizations to reimagine how work gets accomplished without requiring separate governance infrastructure or bypassing compliance controls.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Read on&lt;/strong&gt; , if you are an enterprise architect, IT administrator, or business leader evaluating how to incorporate AI-based reasoning and automation into your organization, this article provides the technical and strategic foundation you need to understand what Frontier Agents actually do, how they differ from traditional automation approaches, and what you should consider before implementing them. You will learn the specific capabilities of Frontier Agents, the governance mechanisms that control them, and the workflow redesign required to realize genuine value - not just the technology deployment itself. Whether you are preparing to participate in the Frontier program or evaluating whether agent-based automation makes sense for your organization, this deep technical overview will help you make more informed decisions about where and how to invest in this emerging technology.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is the Frontier Program?
&lt;/h2&gt;

&lt;p&gt;The Frontier program provides organizations with a structured mechanism to participate in early evaluation of emerging artificial intelligence capabilities within their existing Microsoft 365 infrastructure. Rather than deploying untested features across production systems, Frontier allows selected teams and administrators within an organization to voluntarily work with experimental AI features in a contained manner. These experimental features include various types of AI-powered agent systems - software entities capable of autonomous or semi-autonomous reasoning and action - as well as extensions to the Copilot assistant interface and specialized agent modes tailored for specific applications. Importantly, all of these experimental capabilities operate within the same security and data boundaries as the organization’s regular production Microsoft 365 environment. This design ensures that experimental features automatically inherit the organization’s established security policies, identity and access management through Entra ID, data location requirements, and compliance obligations. Organizations do not need to build separate infrastructure or bypass their existing governance frameworks to participate in these early evaluations.&lt;/p&gt;

&lt;p&gt;Key characteristics of the Frontier program include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native Tenant Integration:&lt;/strong&gt; Frontier agents operate seamlessly within your existing Microsoft 365 tenant, leveraging established security postures, Entra ID identity controls, data residency requirements, and compliance frameworks without requiring separate infrastructure or governance bypass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controlled Experimental Access:&lt;/strong&gt; Organizations can grant selective access to preview capabilities through granular admin controls, ensuring that experimental features remain isolated to designated teams while maintaining full visibility and auditability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback-Driven Iteration:&lt;/strong&gt; Microsoft actively incorporates operational data and user insights to refine, enhance, or retire Frontier capabilities, creating a responsive feedback loop that shapes agent behavior and feature maturity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Licensing and Activation Requirements:&lt;/strong&gt; Access to Frontier agents typically requires valid Microsoft 365 and Microsoft 365 Copilot licenses, with explicit administrator enablement through the Microsoft 365 Admin Center to control program participation at the organizational level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frontier Agents in Microsoft 365 Copilot
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Are Frontier Agents?
&lt;/h3&gt;

&lt;p&gt;Frontier Agents are software systems that are built directly into Microsoft 365 applications and the Copilot experience. These systems work by gathering information from across your organization, analyzing it step by step using logical reasoning, and then executing tasks that require multiple sequential steps.&lt;/p&gt;

&lt;p&gt;When you interact with a Frontier Agent, here is what actually happens beneath the surface: The agent first receives some information or a request from you. It then searches and retrieves data from various sources across your organization, including your emails, stored documents, calendar entries, meeting transcripts, and other business systems. Rather than simply retrieving information through a traditional search, the agent analyzes and reasons with the information it has gathered. It considers how different data elements relate to one another, understands the context in which that data exists, and then breaks down your original request into smaller, sequential steps that it can execute.&lt;/p&gt;

&lt;p&gt;Two concrete examples of Frontier Agents currently available are the Researcher and Analyst agents. The Researcher agent, for instance, can accept a specific question from you and then search through the documents, emails, and other sources available within your organization to locate relevant information. After gathering this information, it synthesizes the pieces to provide a comprehensive answer to your question. The Analyst agent operates on similar principles but is designed to examine data from multiple sources and identify patterns, trends, or themes that emerge across that data, rather than being built to answer a specific user question.&lt;/p&gt;

&lt;p&gt;An important distinction to note is that these agents are not limited to a single predefined action. Rather, they can take a single complex request from a user - such as “prepare a written summary of all customer feedback that arrived via email during the last quarter” - and break it down into multiple sequential steps. The agent might begin by identifying which emails and documents actually contain customer feedback. Following that, it might classify the feedback into categories by topic or theme. After categorization, it would extract the most significant points from each category. Finally, it would compile all of this into a single coherent summary document. Instead of a human staff member manually executing each step in sequence, the agent can reason through the problem and execute them in a logical order.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Scenarios
&lt;/h3&gt;

&lt;p&gt;To better understand how these agents function in practical situations, consider the following concrete examples of work that they can perform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Analyzing customer feedback across multiple communication channels:&lt;/strong&gt; Organizations receive customer feedback through various channels - email messages, support tickets, conversation logs, and direct messages. Rather than having a team member manually review months of accumulated customer communications to identify recurring problems, an agent can systematically examine all these disparate sources of customer input across your organization, identify which specific issues or concerns appear multiple times, and surface those recurring themes for leadership. This process would otherwise require significant manual effort to collate information from systems that were never designed to be analyzed together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Constructing comprehensive business review documents by synthesizing information from heterogeneous sources:&lt;/strong&gt; Businesses often need to prepare periodic review documents that combine information from multiple different places - strategic documents that were written and stored in your document systems, records of what was discussed and decided during meetings throughout the review period, and email conversations where important decisions and discussions occurred. An agent can gather these distinct types of information across different systems and synthesize them into a coherent narrative that presents a unified view of what occurred during the period in question.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Locating decision points that remain unresolved in organizational communications:&lt;/strong&gt; In any organization, communications - whether through email, chat systems, or meeting notes - often contain discussion about decisions that need to be made, but the outcome of those discussions remains unclear or incomplete. An agent can review your organization’s communications, identify discussions where no clear resolution was documented, and compile a summary that clarifies which decisions remain pending and who should be involved in resolving them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Evaluating patterns in historical meeting content and suggesting corresponding workflow adjustments:&lt;/strong&gt; Organizations hold many meetings over time, and the topics, themes, and concerns that emerge across multiple meetings often reveal something meaningful about where real work challenges lie. An agent can review the captured records from your organization’s past meetings, identify recurring themes and concerns across meetings and groups, and use that analysis to suggest where the organization might redirect its focus or restructure its processes to address the underlying issues that keep arising.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Microsoft Agent 365: The Control Plane Behind Frontier Agents
&lt;/h2&gt;

&lt;p&gt;Agent 365 is the administrative and governance layer that supports AI agents across an enterprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent Discovery and Inventory Management:&lt;/strong&gt; Agent 365 maintains a comprehensive record and catalog of all AI agents operating within an organization. This inventory function allows administrators and operators to understand which agents exist, where they are deployed, what purposes they serve, and their operational status at any given time. Rather than having agents scattered across the organization with no central visibility, this inventory system provides a unified view of the agent ecosystem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Identity-Based Resource Access Controls:&lt;/strong&gt; Agent 365 works in conjunction with Entra ID (Azure’s identity management system) to establish which specific resources, data sources, and systems that individual agents are allowed to access. Instead of granting agents unlimited access across an organization’s systems, Agent 365 enforces granular permission boundaries, ensuring each agent can access only the data, applications, and services necessary to perform their assigned functions. This principle of least privilege prevents an agent from inadvertently or maliciously accessing sensitive data or systems outside its defined scope.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Standardized Integration Interfaces and Development Kits:&lt;/strong&gt; Agent 365 provides standardized software development kits (SDKs) and application programming interfaces (APIs) that establish consistent patterns for how new agents can be built and integrated with existing organizational systems. Rather than building each agent in isolation using different approaches, these standardized interfaces ensure that agents built by different teams or vendors can work together and communicate in predictable ways, following established patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relationship Mapping and System Visualization:&lt;/strong&gt; Agent 365 provides administrators and architects with visual representations that illustrate how different agents relate to one another, how they connect to and interact with people in the organization, what data sources they access or modify, and how they fit into broader business workflows. This visualization capability helps organizations understand the dependencies and interactions across their entire agent estate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrated Security Operations and Threat Detection:&lt;/strong&gt; Agent 365 incorporates monitoring and alerting functions that work in coordination with Microsoft Defender (security threat detection) and Microsoft Purview (data governance and compliance). This integration enables security teams to monitor agent behavior for signs of anomalous or suspicious activity and compliance teams to track whether agents are handling regulated data appropriately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDK Components and Development Framework Support:&lt;/strong&gt; Agent 365 supplies developers with pre-built components, software libraries, and reference implementations that accelerate the process of creating new agents or extending existing ones. It also supports industry-standard protocols, such as the Model Context Protocol (MCP), which establishes common patterns for how components can exchange information and coordinate their behavior.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Role of Frontier Agents in Enterprise Transformation
&lt;/h2&gt;

&lt;p&gt;As organizations implement Frontier Agents, they are fundamentally reconsidering how work gets accomplished. Rather than viewing these systems as replacements for human workers, the most effective deployments treat agents as tools that handle specific, well-defined operational tasks while people focus on decision-making, judgment calls, and work that requires contextual understanding.&lt;/p&gt;

&lt;p&gt;This shift requires more than simply enabling a new technology. Organizations need to redesign their operational workflows to leverage agents’ capabilities. This means identifying which portions of existing processes are primarily repetitive information gathering, data synthesis, or sequential task execution - the kinds of work that agents handle effectively - and then separating those portions from the work that genuinely requires human judgment, client interaction, or strategic thinking. Once this separation occurs, you can structure a workflow where the agent handles the mechanical portions while a person reviews and takes responsibility for the decisions that matter.&lt;/p&gt;

&lt;p&gt;One practical implication worth considering: when an agent executes multiple steps in sequence to complete a task, there are often opportunities to build in checkpoints that allow a human to review the agent’s work before it moves to the next stage. This prevents compounding errors and allows people to correct the agent’s reasoning when it goes astray. Rather than treating agent output as completely reliable or completely unreliable, organizations learn to understand where agents tend to make mistakes and design their approval workflows accordingly.&lt;/p&gt;

&lt;p&gt;Another important consideration is that moving work to agents does not necessarily mean work disappears from the organization. Instead, the work’s character changes. Where someone previously spent time reading through emails to find relevant information, they now spend time reviewing the agent’s findings to verify that the needed information was captured. Where someone previously spent time copying data from one system into a report, they now spend time checking whether the agent’s synthesis of that data makes sense. This redistribution of work can free up capacity for higher-value activities, but only if organizations deliberately redirect that freed-up capacity toward them rather than simply eliminating positions and expecting the same output from fewer people.&lt;/p&gt;

&lt;h2&gt;
  
  
  Examples of Frontier Experiences Available Today
&lt;/h2&gt;

&lt;p&gt;The Frontier ecosystem includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Experimental Copilot agents in the Agent Store.&lt;/li&gt;
&lt;li&gt;Agent Mode in Excel for the web.&lt;/li&gt;
&lt;li&gt;Agents for Dynamics 365 workloads.&lt;/li&gt;
&lt;li&gt;Experimental functionalities in Word, PowerPoint, and Copilot Chat.&lt;/li&gt;
&lt;li&gt;AI-enhanced Windows 365 Cloud PC capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Administrators Enable and Manage Frontier Agents
&lt;/h2&gt;

&lt;p&gt;Administrators must:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Confirm licensing prerequisites.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable Frontier in the Admin Center. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3u0zzfl7tkwoix67pea.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3u0zzfl7tkwoix67pea.png" alt="upgit_20260124_1769262371.png" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assign Frontier access to users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Agent 365 tools to manage lifecycle, visibility, and compliance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide SDK access for development.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why Frontier Agents Matter for Enterprise Architects
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Understanding the Shift from Rule-Based to Reasoning-Based Systems
&lt;/h3&gt;

&lt;p&gt;Traditionally, enterprise automation has relied on rule-based systems - tools that follow explicit, predetermined logic written by engineers. These systems work well when the tasks they handle are predictable and straightforward: if condition A is true, then execute action B. However, rule-based systems struggle when circumstances change unexpectedly or when situations don’t fit neatly into predefined categories. When an edge case arises that no one anticipated, the rule-based system either fails or performs the wrong action, requiring human intervention or system modification.&lt;/p&gt;

&lt;p&gt;Frontier Agents introduce a different approach. Rather than following rigid if-then-else logic, these agents can examine a situation, understand context and nuance, reason through the problem using information available to them, and adapt their approach as they encounter new information. This capability is critical for enterprise architects because it enables them to automate and streamline work that was previously too variable or context-dependent to automate. Enterprise architects should understand where agents can genuinely improve their organizations and where the investment in agent systems would not yield meaningful returns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical Reasoning Capabilities and Their Implications
&lt;/h3&gt;

&lt;p&gt;Agents can perform multi-step reasoning that mirrors how skilled humans approach complex problems. An agent can gather information from multiple sources, recognize patterns and relationships within that information, adjust its understanding as it discovers new details, and modify its approach based on what it learns. This capability matters for enterprise architects because it changes which business processes become candidates for automation. Instead of limiting automation to straightforward, fixed-sequence tasks, architects can now consider automating substantially more nuanced work. However, architects need to think carefully about which problems benefit from this capability and which would be simpler to solve through other means. Not every situation requires sophisticated reasoning - sometimes simpler solutions serve organizations better.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified Governance Without Separate Infrastructure
&lt;/h3&gt;

&lt;p&gt;One of the most significant practical benefits for enterprise architects is that Frontier Agents operate within existing organizational boundaries. Rather than requiring architects to design separate systems, approval processes, or compliance frameworks for agent-based work, agents inherit your organization’s existing security, identity, and compliance structure. This simplifies governance - architects do not need to build parallel systems. However, architects considered how to ensure that existing governance frameworks accommodate agent operations and that identity and access controls properly constrain what agents can do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Human–Agent Workflows and Redesigned Work Processes
&lt;/h3&gt;

&lt;p&gt;Implementing agents require architects and process owners to rethink how work flows through the organization. It is not sufficient to add an agent to an existing workflow. Instead, organizations need to identify where human judgment and decision-making are essential, where human oversight is valuable but not essential, and where mechanical task execution can be delegated entirely to an agent. This often means redesigning workflows substantially. An enterprise architect should expect that workflow redesign will be more challenging and time-consuming than deploying agent technology alone and should budget time and expertise accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incremental Adoption and Learning-Based Deployment
&lt;/h3&gt;

&lt;p&gt;Rather than requiring organizations to commit to large-scale transformation immediately, agents can be introduced gradually. This allows organizations to learn how agents affect their work, identify which applications genuinely improve outcomes, develop troubleshooting skills, and build internal expertise. Enterprise architects should view this as an extended learning period in which the organization creates an understanding of what agents can do within their specific context, rather than a one-time implementation that commits the organization to a comprehensive agent-based transformation immediately. This approach carries risks - it means initial projects may not yield dramatic returns - but it also creates space for learning and course correction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Frontier program is a practical approach organizations can use to test experimental AI-based systems that perform multi-step reasoning and task execution within their existing Microsoft 365 environments. Rather than requiring organizations to establish entirely separate infrastructure, sign off on different governance frameworks, or bypass critical requirements, the Frontier program enables them to experiment while remaining within their organizations’ existing security and identity boundaries.&lt;/p&gt;

&lt;p&gt;Automate Frontier Agents - software systems built into Microsoft 365 applications that gather information from multiple sources, reason with it, and execute sequential tasks to accomplish complex objectives - provides a concrete technology that organizations can use to explore how AI-based reasoning and automation might apply to their own workflows. Agent 365, the administrative layer beneath these agents, provides the inventory management, access controls, monitoring, and coordination mechanisms organizations need to maintain visibility and control over agent operations at scale.&lt;/p&gt;

&lt;p&gt;In practice, organizations that choose to implement Frontier Agents will need to do more than enable new technology features. They will need to rethink which parts of their business processes benefit from having agents handle information gathering and task execution, how to structure approval workflows that allow human staff to focus on judgment and decision-making, and how to ensure that the technology actually produces better outcomes rather than simply shifting the character of the work without improving results. This redesign work is often more demanding than the technical implementation itself, but it is also where genuine value is created.&lt;/p&gt;

&lt;p&gt;Organizations that approach Frontier Agents as an extended learning opportunity - where initial projects serve as experiments to develop organizational understanding of what agents can accomplish within specific contexts - are likely to make more thoughtful implementation decisions than organizations that pursue large-scale deployment without this learning period. This incremental approach has both benefits and drawbacks: initial projects may not deliver dramatic returns on investment, but the organization will develop practical knowledge of where agents are genuinely helpful, where they create problems, and how to design workflows that leverage agent capabilities effectively.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>frontier</category>
      <category>microsoft365</category>
      <category>agents</category>
    </item>
    <item>
      <title>Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 31 Jan 2026 12:35:30 +0000</pubDate>
      <link>https://forem.com/holgerimbery/ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit-3k7n</link>
      <guid>https://forem.com/holgerimbery/ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit-3k7n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Shipping Copilot Studio agents without systematic, automated testing is risky: large language models (LLMs) are non‑deterministic, topic routing can drift, and integrations fail in ways casual “chat tests” won’t catch. Microsoft’s Copilot Studio Kit adds structured, repeatable testing (including multi‑turn and generative answer validation), integrates with Power Platform pipelines for gated deployments, and provides analytics and compliance tooling—so you can test as software teams do and ship with confidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why Read This Article&lt;/strong&gt; : AI agents powered by LLMs are unpredictable by nature. Casual testing—clicking through a chat interface a few times—is not enough to catch latent quality issues, topic confusion, or integration failures that emerge under real-world conditions. This guide walks you through a comprehensive, systematic approach to testing Copilot Studio agents that mirrors enterprise software practices: from defining repeatable test cases with the Copilot Studio Kit, to embedding quality gates in your deployment pipelines, to monitoring compliance and performance post-launch. If you are shipping agents to production—or planning to do so—you need structured, auditable testing to reduce risk and build stakeholder confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agent testing is different (and non‑optional) in Copilot Studio
&lt;/h2&gt;

&lt;p&gt;Conversational agents in Copilot Studio operate under fundamentally different constraints than traditional software. Where conventional applications follow deterministic code paths and produce consistent output for given inputs, conversational agents integrate multiple layers: natural language processing, topic routing, prompt-based reasoning, and knowledge retrieval across enterprise systems. This architectural difference creates testing challenges that cannot be addressed by conventional unit testing or ad hoc verification approaches.&lt;/p&gt;

&lt;p&gt;The non-deterministic nature of these systems stems from several sources. Large language models (LLMs) introduce inherent variability in response generation even when given identical inputs. Topic routing may inconsistently classify user intents, particularly for edge cases or ambiguous queries. Knowledge retrieval systems may return different result sets depending on index state, search ranking, or temporal data updates. When agents compose multiple actions—such as querying multiple data sources, applying filters, and synthesizing responses—the combinations of these variations compound, making manual testing insufficient to validate behavior across the full range of user interactions.&lt;/p&gt;

&lt;p&gt;Microsoft’s published guidance emphasizes that this variability necessitates systematic, large-scale evaluation rather than point-in-time chat testing. A developer might test an agent through a few interactive sessions and conclude it functions correctly based on limited exposure. However, when the agent encounters hundreds or thousands of real-world queries—including paraphrased variations, context-dependent follow-ups, and edge cases—latent quality issues, topic confusion, and integration failures emerge. Automated testing frameworks assess correctness and performance across representative input sets, providing measurable confidence in agent behavior.&lt;/p&gt;

&lt;p&gt;It is important to note that while automated evaluation effectively identifies accuracy and performance issues, Microsoft explicitly documents that this approach cannot replace responsible AI reviews or the safety filters built into governance processes. Automated testing validates the agent’s logical correctness and consistency, but human review remains essential for assessing potential harms, compliance with organizational policies, and alignment with responsible AI principles. These complementary activities must both occur during the release process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The testing toolbox at a glance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Copilot Studio Kit (Power CAT)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2afr73vnfhjugbk08i03.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2afr73vnfhjugbk08i03.png" alt="upgit_20260118_1768740326.png" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-overview" rel="noopener noreferrer"&gt;Copilot Studio Kit&lt;/a&gt; is an open‑source, solution‑aware extension that adds a formal testing and analysis layer to Copilot Studio. At its core, the Kit lets you define agents, tests, and test sets and then run batch tests against your agent through the Direct Line API. Results are not limited to raw strings; they can be enriched with Dataverse conversation transcripts (to expose the exact triggered topic and intent scores) and Azure Application Insights (for telemetry and failure diagnostics). The Kit supports deterministic checks (e.g., response matching and Attachment comparisons) as well as LLM-assisted validation for Generative answers using AI Builder—a critical capability when answers are non-deterministic. For complex scenarios, you can compose multi‑turn tests to validate end‑to‑end flows in a single conversation context, and use plan validation to ensure that agents with generative orchestration select the expected tools/actions above a configured threshold. In enterprise rollouts, the Kit’s managed solution model, Excel import/export for bulk test authoring, and optional Compliance Hub significantly shorten the path to repeatable, auditable test evidence.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need scalable, repeatable test execution with artifacts you can retain and trend over time.&lt;/li&gt;
&lt;li&gt;You must validate both deterministic paths (topic routing, attachments) and non‑deterministic LLM outputs (generative answers).&lt;/li&gt;
&lt;li&gt;You want to instrument tests with Dataverse and Application Insights to explain why a test passed/failed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Agent evaluation (in‑product, preview)&lt;/strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefn4qsdrvtyfym9ngbz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefn4qsdrvtyfym9ngbz4.png" alt="upgit_20260118_1768740528.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent evaluation is a built-in &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;preview&lt;/a&gt; feature of Copilot Studio that lets you author or generate test sets and run automated evaluations directly in the product. It is designed to measure answer quality and coverage at scale and can generate tests from your agent’s topics/knowledge or import them from a file. Evaluations are executed with a designated test user profile, which is essential when tools and knowledge sources require authentication. Because this feature is in preview, treat it as a complementary capability to the Kit: use it for fast, in-product evaluations and keep the Kit’s runs and exports for long-term retention and pipeline gating.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Early iterative cycles where makers want quick, in‑canvas evaluation runs.&lt;/li&gt;
&lt;li&gt;Seeding a broader test corpus by auto‑generating questions from topics/knowledge, then curating.&lt;/li&gt;
&lt;li&gt;Validating that a specific test identity can access the same tools/knowledge as target users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Power Platform pipelines
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46wcojdnczr8igc2khqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46wcojdnczr8igc2khqr.png" alt="upgit_20260118_1768740684.png" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power Platform pipelines provide the native ALM path to move solution-packaged agents across Dev → Test → Prod with approvals, audit trails, and environment isolation. Critically, pipelines can be extended to run Copilot Studio Kit tests as a pre-deployment quality gate: the deployment is paused, tests execute, results are evaluated against pass thresholds, and only then does the pipeline promote to the next stage. This pattern converts “publish from dev” into a governed release with repeatable validation, versioning, and rollback via managed solutions. In practice, you configure a pipeline host environment, wire cloud flows + Dataverse events to call the Kit’s test runner, and enforce gate criteria before production.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any production‑bound agent; manual promotion without a gate is a risk.&lt;/li&gt;
&lt;li&gt;Teams that need approvals, traceability, and the ability to block a release on failed tests.&lt;/li&gt;
&lt;li&gt;Organizations standardizing on solution‑based ALM across Power Platform assets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Direct Line performance testing
&lt;/h3&gt;

&lt;p&gt;Functional correctness is not sufficient if the agent cannot meet performance objectives. Microsoft’s guidance documents how to load and performance-test Copilot Studio agents using Direct Line over WebSockets (preferred for realistic behavior) or HTTP GET polling when WebSockets are not feasible. Test harnesses should track response times at the stages that affect user experience: Generate Token, Start Conversation, Send Activity, and Receive/Get Activities. These measurements, collected under realistic concurrency and payload conditions, enable you to baseline and detect regressions as topics, tools, and knowledge sources evolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before go‑live and on every significant change to prompts, tools, or knowledge that could affect latency.&lt;/li&gt;
&lt;li&gt;When you must validate SLA adherence (for example, first token and full‑response times).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CoE &amp;amp; Compliance
&lt;/h3&gt;

&lt;p&gt;Testing is part of a broader governance and compliance posture. Microsoft’s Phase 4 guidance emphasizes structured testing, deployment, and launch practices, including final security and compliance checks, telemetry enablement, and controlled rollout. The Copilot Studio Kit’s Compliance Hub complements this by continuously evaluating agent configurations captured via Agent Inventory against configurable thresholds, creating compliance cases, and supporting SLA-driven triage (manual review, quarantine, or delete). Together with Managed Environments, DLP policies, and CoE Starter Kit telemetry, these controls provide continuous post-deployment oversight of agents, reducing configuration drift and helping teams keep production behavior within approved boundaries.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organization‑wide programs where multiple business units ship agents and you need consistent review and enforcement.&lt;/li&gt;
&lt;li&gt;Environments with strict regulatory or data‑handling requirements that require continuous configuration posture checks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the Copilot Studio Kit for structured, repeatable tests (including multi-turn, generative-answer validation, enrichment, and exports). Use Agent evaluation (preview) for in-product, fast iteration. Enforce release quality with Power Platform pipelines by gating promotion on automated test results. Validate scalability and user-perceived latency with Direct Line performance testing. Finally, operate agents within a governed framework using Phase-4 practices and Compliance Hub to maintain compliance and configuration integrity over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase-4 testing (detailed guidance and checklist)
&lt;/h2&gt;

&lt;p&gt;Phase 4 refers to Microsoft’s “Testing, deployment, and launch” stage in the Copilot Studio governance and security best-practices sequence. It defines what must happen after build-time design and before (and immediately after) a production release. In practical terms, Phase 4 defines the quality gates, security checks, controlled rollout, and post-release monitoring you should apply to every Copilot Studio agent before it serves real users.&lt;/p&gt;

&lt;p&gt;Below is a concise, implementation-oriented summary of Phase 4 practices and their execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing and validation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Prove the agent’s functional behavior and non‑deterministic answer quality with repeatable, automated tests—not manual chats.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated scenario testing&lt;/strong&gt; : Use the Copilot Studio Kit to define tests and test sets (response/attachment/topic/generative, including multi-turn and plan validation) and run batches against the agent. Enrich results with Dataverse transcripts and Application Insights for root-cause analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD readiness&lt;/strong&gt; : Maintain test artifacts and runs as part of your release process. Microsoft’s Phase 4 guidance explicitly recommends automated testing and evaluation as a prerequisite for deployment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality gates in pipelines&lt;/strong&gt; : Integrate Kit test runs into Power Platform pipelines so a deployment pauses, executes tests, evaluates pass thresholds, and only then promotes to the next stage. This delivers an auditable “test-before-deploy” control.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final security and compliance checks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Ensure the production environment enforces the right data and access boundaries and that all Azure resources are approved.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data policies and RBAC&lt;/strong&gt; : Verify environment-level policies (e.g., DLP), role assignments, and connection security in the production environment—not just in Dev/Test. This prevents accidental connector drift at go-live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure resource review&lt;/strong&gt; : Confirm approvals for app registrations, networks, keys, and endpoints associated with the agent’s external dependencies. Use secure secret storage and rotate keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production knowledge sources&lt;/strong&gt; : Point the agent to the production document libraries and data sets (many teams test with separate SharePoint paths or sample data; Phase 4 requires verification of the production bindings).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Controlled production rollout
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Promote a versioned, solution‑packaged agent to production via ALM—not via ad‑hoc publish.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deploy via pipelines&lt;/strong&gt; : Package the agent in a Power Platform solution and promote Dev → Test → Prod with approvals, audit trail, and environment isolation. This is the supported, governed path for Copilot Studio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-deployment steps&lt;/strong&gt; : In the pipeline, add hooks to run Kit tests and evaluate pass rates as a quality gate before import into the target environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Launch plan&lt;/strong&gt; : Communicate availability and usage guidance to the intended audience and stakeholders as part of the release checklist.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enable monitoring and ongoing governance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Operate the agent as a managed service with telemetry, compliance posture tracking, and corrective workflows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry&lt;/strong&gt; : Configure Azure Application Insights for usage, performance, and error logging. This supports regression detection and incident response after go‑live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance operations&lt;/strong&gt; : Use the Copilot Studio Kit’s Compliance Hub to continuously evaluate agent configurations against policy thresholds, raise review cases, and track SLA‑bound remediation (review, quarantine, delete).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CoE integration&lt;/strong&gt; : Leverage the Power Platform CoE Starter Kit to inventory agents, watch adoption/health signals, and maintain platform‑wide governance routines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Phase 4 matters
&lt;/h3&gt;

&lt;p&gt;Phase 4 formalizes the last mile from “it works in development” to “it is safe to expose to users.” It replaces manual spot checks and one‑click publishing with automated validation, governed deployment, and observable operations—the baseline expected of enterprise‑grade AI agents. By following Phase 4 practices, teams reduce the risk of production incidents, data leaks, and compliance violations. They gain confidence that agents behave as intended under real‑world conditions and that any deviations are detected and addressed promptly. Ultimately, Phase 4 transforms Copilot Studio agents from experimental prototypes into reliable, governed components within the organizational AI landscape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test strategy for low‑code/no‑code agents
&lt;/h3&gt;

&lt;p&gt;Test types you should plan for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conversational/functional (does the response match expectations for known intents?)&lt;/strong&gt; – Use Kit’s Response match and Topic match.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generative answer validation (LLM output quality and guardrails)&lt;/strong&gt; – Use AI Builder-based Generative answers with Application Insights context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End-to-end scenarios across multiple turns and tools&lt;/strong&gt; – Use Multi-turn and Plan validation for generative orchestration to ensure the plan contains the expected tools/actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration (Dataverse, connectors, actions)&lt;/strong&gt; – Validate topic routing via Dataverse enrichment and attachment payloads (e.g., Adaptive Cards).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance &amp;amp; reliability under load&lt;/strong&gt; – Use Direct Line guidance to capture token/start/send/receive latencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety/compliance&lt;/strong&gt; – Follow governance phase guidance and consider Kit’s Compliance Hub to flag configuration policy issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to automate first&lt;/strong&gt; : high-volume intents, critical business workflows (e.g., authentication-gated actions), and generative answers that must adhere to strict constraints. Then expand to long-tail intents and exploratory questions using generated test sets (Agent evaluation) and bulk import (Kit).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementing automated testing with Copilot Studio Kit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Set up Copilot Studio Kit for automated testing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install the Kit from Marketplace or GitHub as a managed solution in your chosen environment; complete post‑deployment connection references.&lt;/li&gt;
&lt;li&gt;Configure the agent connection for testing:&lt;/li&gt;
&lt;li&gt;Set agent base configuration (name, region/token endpoint), and Direct Line channel security as applicable.&lt;/li&gt;
&lt;li&gt;Enable Dataverse enrichment to analyze conversation transcripts (topic names, intent scores).&lt;/li&gt;
&lt;li&gt;Enable Application Insights enrichment for diagnostics and negative tests (e.g., moderated/no result cases).&lt;/li&gt;
&lt;li&gt;Configure AI Builder as the LLM provider for Generative answers validation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your agent uses Microsoft Authentication, register an app and configure the Kit’s test automation so the test runner can authenticate as a user.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design test cases and test sets
&lt;/h3&gt;

&lt;p&gt;Supported test types (Kit):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response matches with string operators (equals/contains/starts/ends).&lt;/li&gt;
&lt;li&gt;Attachments comparison (JSON array) or AI Validation for structure/semantics.&lt;/li&gt;
&lt;li&gt;Topic match (requires Dataverse enrichment).&lt;/li&gt;
&lt;li&gt;Generative answers (requires AI Builder + optional Application Insights).&lt;/li&gt;
&lt;li&gt;Multi-turn (compose several tests into one conversation).&lt;/li&gt;
&lt;li&gt;Plan validation (ensure the generated plan includes expected tools/actions above a threshold).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bulk authoring/import: Use Kit’s Excel import/export to create or modify multiple tests efficiently. In-product test sets (Agent evaluation, preview): Create up to 100 test cases per set; generate questions from agent description/topics/knowledge or import from a file; run with a selected test user profile that has the right connections/authentication.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run, analyze, and iterate
&lt;/h3&gt;

&lt;p&gt;Run test sets against your agent; Kit records observed responses and latencies, and aggregates them. Enrichment adds topic routing and detailed diagnostics. Export results (CSV) for long‑term retention or integration with other tools. Use Kit’s analytics artifacts (e.g., Analyze Test Results) and conversation KPIs in Dataverse for trend analysis beyond the built‑in Copilot Studio analytics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automate “test → deploy” with pipelines
&lt;/h3&gt;

&lt;p&gt;Treat agents as solution components and move them with Power Platform pipelines; add automated Kit test runs as a pre‑deployment gate so only solutions that meet pass thresholds are deployed to Test/Prod. Microsoft’s guidance and Kit docs provide a pattern using cloud flows and Dataverse to pause the pipeline, run tests, evaluate the pass rate, and decide whether to continue or stop. This replaces the brittle “publish from dev” habit with auditable CI/CD and quality gates aligned to enterprise ALM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance and reliability testing via Direct Line
&lt;/h3&gt;

&lt;p&gt;For load testing, simulate real user behavior using Direct Line with WebSockets where possible; otherwise, use HTTP GET polling. Track and report response times for Generate Token, Start Conversation, Send Activity, and Receive/Get Activities to understand user‑perceived latency under load. Microsoft’s documentation provides detailed guidance on setting up these tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance: beyond the build—compliance, environments, and monitoring
&lt;/h2&gt;

&lt;p&gt;Microsoft’s governance phase recommends validating security and compliance, using ALM pipelines for controlled rollout, and enabling telemetry (Application Insights). The CoE Starter Kit and Kit’s Compliance Hub can help continuously evaluate agent posture and enforce approvals/quarantines when violations occur.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-patterns: why “chat a bit and publish” fails
&lt;/h2&gt;

&lt;p&gt;Clicking Publish pushes the current agent state to channels; it is not a deployment pipeline. Without environments, solutions, and quality gates, you lack versioning, rollback, and documented test evidence—unacceptable for enterprise operations. Use solutions, pipelines, and automated tests instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical checklist for Phase-4 testing and deployment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pre-requisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agent in a solution; environments for Dev/Test/Prod; pipeline host environment configured.&lt;/li&gt;
&lt;li&gt;Copilot Studio Kit installed and connected; App Insights + Dataverse enrichment; AI Builder available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test design
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Identify top intents and critical flows; write Response/Topic/Attachment tests.&lt;/li&gt;
&lt;li&gt;Add Generative answers tests for non‑deterministic responses (with validation instructions and sample answers).&lt;/li&gt;
&lt;li&gt;Compose Multi‑turn scenarios for end‑to‑end paths; add Plan validation for generative orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Automation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create test sets in Kit and (optionally) in Agent evaluation to complement coverage; export results to CSV for retention.&lt;/li&gt;
&lt;li&gt;Wire pre‑deployment pipeline steps to run Kit tests and enforce pass thresholds (quality gate).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Execute load tests through Direct Line, prefer WebSockets, and track the specific latencies Microsoft recommends.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Governance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Run security/compliance checks; monitor with CoE and Compliance Hub after go‑live.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Automated testing is essential for Copilot Studio agents due to their non‑deterministic nature and complex integrations. Microsoft’s Copilot Studio Kit provides a robust framework for defining, executing, and analyzing tests, enabling teams to validate both deterministic and generative behaviors at scale. By integrating these tests into Power Platform pipelines, teams can enforce quality gates that ensure only validated agents reach production. Complementing this with performance testing via Direct Line and ongoing governance through telemetry and compliance tools establishes a comprehensive lifecycle for reliable, enterprise-grade AI agents. Adopting these practices transforms Copilot Studio agents from experimental prototypes into trusted components of the organizational AI landscape, capable of delivering consistent value while adhering to security and compliance standards.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>powerplatform</category>
      <category>agentevaluation</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
