<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Holger Imbery</title>
    <description>The latest articles on Forem by Holger Imbery (@holgerimbery).</description>
    <link>https://forem.com/holgerimbery</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2736716%2Fa3ddf62e-8581-4eea-b140-b3fc2121c057.png</url>
      <title>Forem: Holger Imbery</title>
      <link>https://forem.com/holgerimbery</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/holgerimbery"/>
    <language>en</language>
    <item>
      <title>Building On‑Prem AI Agents with Azure Local, Foundry Local, and Microsoft Agent Framework</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 11 Apr 2026 07:26:18 +0000</pubDate>
      <link>https://forem.com/holgerimbery/building-on-prem-ai-agents-with-azure-local-foundry-local-and-microsoft-agent-framework-5a83</link>
      <guid>https://forem.com/holgerimbery/building-on-prem-ai-agents-with-azure-local-foundry-local-and-microsoft-agent-framework-5a83</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cloud-native architecture belongs on-premises too&lt;br&gt;&lt;br&gt;
&lt;strong&gt;On-premises and cloud-native are not contradictions — they are complementary&lt;/strong&gt;. While enterprises have spent years building cloud-native practices in the cloud, those same principles—containerization, orchestration, API-driven integration, and infrastructure-as-code - deliver even greater value when deployed on-premises. This guide shows you how to build production AI agents that (must) run locally and using cloud native deployment schemas with Azure Local, Foundry Local, and Microsoft Agent Framework - this is proving that cloud-native excellence is not constrained by your network boundary.&lt;br&gt;&lt;br&gt;
If you operate in regulated industries, manage constrained connectivity, or face data residency requirements, this architecture gives you the operational consistency of the cloud without leaving your premises.&lt;/p&gt;

&lt;p&gt;This article is the second in a series on Azure Local:&lt;br&gt;&lt;br&gt;
&lt;a href="https://holgerimbery.blog/azure-local-foundry-local-and-microsoft-365-local-a-comprehensive-guide-for-it-architects-and-decision-makers" rel="noopener noreferrer"&gt;Azure Local, Foundry Local, and Microsoft 365 Local: A Comprehensive Guide for IT Architects and Decision-Makers&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Enterprise teams are moving beyond “chatbots” toward agents that can retrieve internal knowledge, call tools, orchestrate workflows, and produce outcomes aligned to real business processes. The challenge is that many agent reference designs assume always‑on cloud connectivity and cloud-hosted inference. That assumption does not hold everywhere.&lt;/p&gt;

&lt;p&gt;In regulated industries, in plants and branches with constrained connectivity, or in environments where latency and data locality are non‑negotiable, the architecture has to follow the use case. This post describes a pragmatic design you can implement today by combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure Local&lt;/strong&gt; as the on‑prem infrastructure substrate, managed through &lt;strong&gt;Azure Arc&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AKS on Azure Local&lt;/strong&gt; as the standardized Kubernetes runtime for agent services and supporting components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundry Local (preview)&lt;/strong&gt; as the local inference runtime exposing an &lt;strong&gt;OpenAI‑compatible REST interface&lt;/strong&gt; for model calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Agent Framework (MAF)&lt;/strong&gt; as the agent and workflow layer, including tool integration, session/state management, middleware, and telemetry patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A critical insight: cloud-native architecture and practices are not limited to cloud deployments.&lt;/strong&gt; The principles—containerization, orchestration, infrastructure-as-code, API-driven integration, observability, and declarative state management—are equally valuable on‑premises. In fact, they become &lt;em&gt;more essential&lt;/em&gt; when your infrastructure cannot scale elastically or rely on the implicit redundancy of cloud regions. By applying cloud-native architecture to on‑prem agent deployments, you gain consistent operational models across locations, faster iteration, clear boundaries between layers, and the ability to treat infrastructure changes as routine rather than exceptional.&lt;/p&gt;

&lt;p&gt;One design choice drives everything that follows: &lt;strong&gt;separate the agent runtime from the model runtime.&lt;/strong&gt; You want the agent layer (routing, tools, workflows, state, observability) to evolve independently from inference, especially when local inference is in preview and can change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture in one picture (logical view)
&lt;/h2&gt;

&lt;p&gt;A practical baseline pattern is “local inference, centralized orchestration.”&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7s07rlie2kn629q44p3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7s07rlie2kn629q44p3.png" alt="upgit_20260409_1775761970.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This separation keeps your application surface stable by establishing a clear boundary between stateless agent logic and stateful model inference. Because the agent layer and model runtime are decoupled, you can update agent code, refine routing logic, add new tools, or adjust middleware without touching the inference layer. Tools can be added safely behind constrained proxies or API gateways, allowing you to apply fine-grained network controls and audit trails at the integration boundary. Governance policies, observability hooks, and logging patterns remain consistent across agent operations regardless of where inference is placed. Simultaneously, inference becomes a managed dependency that can scale, relocate, or upgrade independently of application code. This architectural separation is particularly valuable in regulated environments where model serving and application logic often require separate operational controls, hardware isolation, or audit commitments. By decoupling these layers, you achieve the flexibility to place inference close to hardware accelerators (GPUs, NPUs) and data sources without forcing agent code to depend on infrastructure choices that are still evolving, especially when the inference runtime is in preview status and subject to API changes or performance tuning.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where this approach fits (and where it does not)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  This is a good fit when
&lt;/h3&gt;

&lt;p&gt;This pattern becomes the right choice when one or more of the following constraints are fundamental to your deployment environment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data residency and regulatory compliance&lt;/strong&gt; are hard boundaries. When regulations, industry standards, or organizational policy require that prompts, retrieved context, and inference results remain physically within an on‑premises boundary—whether for financial data, healthcare records, or proprietary intelligence—local inference becomes non‑negotiable. Cloud-based APIs, even with encryption and data-deletion assertions, may not satisfy audit requirements or legal obligations in certain jurisdictions. In these cases, the agent architecture must be designed to keep the full inference pipeline local while still benefiting from cloud-based observability and control planes, where appropriate, via segregated connections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency is a direct measure of operational usability.&lt;/strong&gt; In manufacturing plants, field service operations, retail branches, or other environments where agents serve human users on the shop floor or remote locations, response time is not a performance metric—it is a functional requirement that affects whether the agent is used at all. When users are waiting for a troubleshooting recommendation or a work instruction, a response that takes tens of seconds to traverse cloud networks is often abandoned. Local inference, combined with local agent orchestration, ensures that the slowest part of the response pipeline is your own internal network and compute capacity, not external connectivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connectivity cannot be assumed to be always available and high-bandwidth.&lt;/strong&gt; Many operational environments have constrained connectivity: scheduled outbound traffic windows, rate-limited connections, air-gapped subnets, or intentional network fragmentation for security isolation. The agent needs to function usefully within these constraints rather than degrade into a pass-through to cloud APIs. Azure Local supports this by enabling local execution and local state, while Arc provides a control-plane integration path when connectivity is available, rather than requiring continuous connectivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want cloud-native operational practices applied on‑premises.&lt;/strong&gt; This includes containerized deployments, Kubernetes orchestration for workload management, infrastructure-as-code for reproducibility, GitOps-driven delivery pipelines, policy enforcement at the runtime boundary, and standardized telemetry and logging. These practices are not exclusive to cloud deployments; they provide the same benefits on‑premises—clear separation of concerns, predictable deployments, and auditability—but require an infrastructure platform like Azure Local to realize them consistently.&lt;/p&gt;
&lt;h3&gt;
  
  
  When this pattern becomes problematic
&lt;/h3&gt;

&lt;p&gt;Reconsidering a local-first architecture is warranted in several practical scenarios. If your inference workload demands elastic horizontal scaling and you cannot predict peak capacity without overprovisioning on-premises infrastructure, then chasing elastic scale with local hardware becomes economically and operationally inefficient. Building auto-scaling logic that manages standby capacity across stateful models would contradict the efficiency argument for locality. Similarly, if your operational environment requires production-grade stability guarantees from the inference API layer with minimal risk of breaking changes between deployments, the current maturity of local inference runtimes (such as Foundry Local, which remains in preview) presents a material risk. Preview components introduce uncertainty regarding backward compatibility, performance-tuning recommendations, and troubleshooting depth, which may not align with production SLAs. Finally, if the problem you are solving is fundamentally deterministic—where steps follow a fixed sequence, validation rules are static, and branching logic is known in advance—a structured workflow orchestration tool or a conventional microservice often provides clearer observability, simpler debugging, and lower operational overhead than an agent. Not every problem with tools and state management requires agentic behavior; sometimes explicit choreography is both simpler and more reliable.&lt;/p&gt;

&lt;p&gt;These are the constraints that inform the "whether" decision. The next section moves to the "why Azure Local" specifically, grounded in use-case context rather than abstract on-premises philosophy.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Azure Local makes sense here (the use case drives locality)
&lt;/h2&gt;

&lt;p&gt;Azure Local is not the point of the architecture. It is the platform choice that becomes rational when the &lt;strong&gt;agent pattern has to follow the environment&lt;/strong&gt;: where data and tools live, what network rules allow, what latency targets are required, and what failure modes are acceptable.&lt;/p&gt;
&lt;h3&gt;
  
  
  1) The agent needs to live where the tools and data already are
&lt;/h3&gt;

&lt;p&gt;High‑value agents are typically tool-heavy, and the distribution of that tooling directly affects where the agent runtime should run. The model call itself—the inference step that generates a response — is only one part of the agent interaction. The larger portion of the interaction involves retrieving documents from internal repositories, querying operational databases, validating business rules and constraints, and writing outcomes back to systems of record. Each of these operations carries a latency cost, integration overhead, and often data governance implications.&lt;/p&gt;

&lt;p&gt;When the authoritative data sources and the systems that perform work live on‑premises—whether that is an ERP system, a manufacturing execution system, a document repository, or a service dispatch platform—moving the agent runtime closer to those systems becomes pragmatically necessary rather than architecturally optional. A remote agent calling back into on‑premises tools over the network incurs not only the latency of each call but also the operational complexity of maintaining secure, reliable network pathways between cloud and on‑premises infrastructure, managing retry logic for transient failures across that boundary, and reasoning about whether a failure in the agent's response came from the model inference or from a tool integration issue.&lt;/p&gt;

&lt;p&gt;Placing the agent runtime on‑premises reduces integration complexity by collapsing tool interactions into local function calls with minimal network hops. It also materially shrinks the trust boundary. Data that would otherwise traverse a cloud service boundary—even with encryption in transit and assertions of deletion—can remain inside the perimeter where it originated. Azure Local provides a consistent, repeatable substrate on which to host the runtime within the organizational boundary while still enabling cloud-native operational practices such as containerization, orchestration, and declarative configuration management that teams have come to expect.&lt;/p&gt;
&lt;h3&gt;
  
  
  2) Latency is a functional requirement, not a nice-to-have
&lt;/h3&gt;

&lt;p&gt;In operational scenarios, predictable response time is not an optimization target—it is a functional requirement embedded in the task itself. When a field technician, floor supervisor, or support worker invokes an agent for a troubleshooting recommendation or work instruction, they are typically performing a task that cannot proceed until they receive guidance. A response that arrives within seconds fits naturally into human workflow and decision-making; the user can act on it immediately and move forward. A response that takes tens of seconds—or worse, becomes non-deterministic depending on cloud API load—exceeds the mental context window of the task. Users abandon the agent, fall back to phone calls or manual lookups, or proceed without the agent's input entirely, making the agent operationally irrelevant regardless of its intelligence.&lt;/p&gt;

&lt;p&gt;The latency problem compounds when the agent's response is not a single inference call. A typical operational agent orchestrates multiple steps: retrieving context from a document repository, querying a database to validate prerequisites, calling an external service to fetch the current state, and then synthesizing a response. Each of these operations incurs round-trip time. When those dependencies live on-premises, and the agent runtime lives in a cloud region thousands of kilometers away, the baseline latency floor is determined by geography and internet backbone capacity, not by the execution speed of any individual component. You cannot optimize away the speed of light or your ISP's rate limiting. Placing the agent runtime and its tooling in the same local environment—where internal networks typically offer latency in the single-digit millisecond range—ensures that the slowest element becomes your own infrastructure capacity, which you can measure, predict, and scale. This transforms latency from an externality you absorb to a variable you control.&lt;/p&gt;

&lt;p&gt;Azure Local supports this placement strategy by providing AKS-hosted agent services and local model-serving infrastructure in a shared operational footprint. The inference engine, the agent orchestration layer, and the tool integrations all run in the same data center or facility where the authoritative systems live. This collapse of distance translates directly into a collapse of latency, which translates into usability in environments where response time affects task completion.&lt;/p&gt;
&lt;h3&gt;
  
  
  3) Connectivity constraints are a design input
&lt;/h3&gt;

&lt;p&gt;Many environments are not "cloud-connected" as cloud reference architectures assume. The assumption embedded in most cloud-native architecture guidance is that outbound connectivity is available, reliable, and incurs acceptable latency and throughput. In practice, many operational environments operate under very different constraints. Outbound traffic to public cloud endpoints may be restricted by security policy or rate-limited by egress gateways. Connectivity may be scheduled—available only during specific windows or subject to maintenance blackouts. In other cases, network segments may be deliberately disconnected by design: operations technology networks in manufacturing facilities, isolated domains in financial institutions, or intentionally air-gapped environments in highly regulated sectors all follow this pattern. Even when connectivity exists, it may be mediated by proxies, firewalls, or VPNs, adding latency and complexity to troubleshooting when the agent's inference or tool calls fail.&lt;/p&gt;

&lt;p&gt;Azure Local enables local execution of the agent runtime and inference engine regardless of whether upstream cloud connectivity is available. Simultaneously, it aligns with Azure's control-plane concepts and governance models via Azure Arc when connectivity is available. This dual capability means you can design and operate an agent system that functions reliably in disconnected or intermittently-connected scenarios without abandoning cloud-native operational practices. When connectivity is available, Arc can be used for centralized observability, policy enforcement, and update orchestration. When connectivity is unavailable, the local agent continues to function using local tools and data. This gives you an operational path that respects the actual constraints of your environment rather than forcing an architecture that assumes away those constraints or requires workarounds to compensate for them.&lt;/p&gt;
&lt;h3&gt;
  
  
  4) You can keep cloud-native operations without reinventing on‑prem deployment
&lt;/h3&gt;

&lt;p&gt;Teams generally want repeatable delivery, policy enforcement, and consistent observability. The conventional tension between on-premises deployments and cloud-native operations has historically forced a false choice: either accept the operational discipline and automation of cloud platforms at the cost of moving workloads outside your perimeter, or keep infrastructure on-premises and revert to manual configuration management, bespoke deployment scripts, and fragmented observability tooling.&lt;/p&gt;

&lt;p&gt;Azure Local plus AKS on Azure Local severs that coupling. Containerized deployments, GitOps-driven configuration management, Kubernetes namespaces, and declarative rollout strategies work identically whether your agent runtime is in a public cloud region or in your own data center. The infrastructure boundary becomes transparent to operational practices. Teams can maintain the same deployment pipelines, policy engines, and observability systems they have built for cloud workloads and apply them without modification to on-premises clusters. This continuity of tooling and process significantly reduces the operational friction that typically accompanies on-premises agent deployments. The "local" decision becomes a deployment location decision—a choice about where to run proven, familiar infrastructure patterns—rather than a return to bespoke server management, manual patching, and isolated monitoring infrastructure that would otherwise characterize traditional on-premises deployments.&lt;/p&gt;
&lt;h3&gt;
  
  
  5) Local inference forces you to manage capacity and hardware intentionally
&lt;/h3&gt;

&lt;p&gt;If inference is local, capacity planning and acceleration hardware become first-class concerns that demand explicit decision-making rather than outsourced abstraction. When inference runs in a public cloud service, capacity is nominally infinite—or at least, the perception of infinity is maintained through multi-tenancy and auto-scaling tiers that obscure the underlying hardware realities. Costs accumulate by token count and API call frequency, but the physical infrastructure remains opaque. The tradeoff is acceptable if your workload is occasional or bursty; the cost volatility is a known variable you can budget for.&lt;/p&gt;

&lt;p&gt;When inference runs locally, however, the hardware economics become tangible. A single GPU accelerator costs tens of thousands of dollars upfront, requires power and cooling infrastructure, and has a finite lifespan. Acquiring that hardware is no longer a usage-based charge smoothed into monthly billing; it is a capital expenditure that sits in your facility and has opportunity cost. This visibility forces intentional capacity planning: you must understand your typical inference load, peak throughput requirements, model sizes, and acceptable latency percentiles, and then purchase hardware that meets those requirements with some headroom for growth. You cannot simply add capacity by changing a tier or waiting for auto-scaling to provision more instances; you provision intentionally.&lt;/p&gt;

&lt;p&gt;Azure Local provides a platform to run and govern those resources, allowing you to isolate inference nodes, stage updates, and enforce change control without coupling the inference lifecycle to the agent code lifecycle. You can reserve specific nodes for specific models, apply resource quotas to prevent one workload from starving another, and manage hardware refreshes independently from application deployments. This separation of concerns means you can upgrade your inference engine or swap model versions without draining the entire cluster, and you can plan hardware replacement without triggering emergency application refactorings. The operational rigor this imposes is not a burden—it is an alignment of technical decision-making with the actual cost structure of your infrastructure.&lt;/p&gt;
&lt;h3&gt;
  
  
  6) The architecture stays incremental and reversible
&lt;/h3&gt;

&lt;p&gt;By separating agent runtime from model runtime, you establish a deployment boundary that allows you to make infrastructure decisions independently from application logic. This separation is critical in practice because it decouples two sources of change that typically move at different velocities: the agent orchestration layer (tools, workflows, routing, state management) tends to evolve rapidly as teams refine business logic and respond to operational feedback, while the inference runtime makes infrequent but high-impact decisions around model selection, hardware acceleration strategy, and inference node topology that are capital-intensive and difficult to reverse.&lt;/p&gt;

&lt;p&gt;Starting small means you can pilot with a single inference node running a small quantized model, then grow to multiple specialized nodes—some optimized for latency-sensitive operations, others for throughput—without requiring changes to the agent code itself. The agent layer continues to interact with inference through the same OpenAI-compatible API boundary, indifferent to whether a single GPU or a distributed cluster backs that endpoint. You can keep your agent API stable while swapping models; if a new quantization or a different model family becomes available, you can stage it on a secondary node and route traffic to validate behavior before completing the migration. You can change inference node placement by adjusting scheduling constraints or moving nodes between racks without triggering a redeploy of agent services. This mobility is not possible when agent code and inference are tightly coupled—for example, when inference decisions are embedded in application code or when the agent layer depends on model-specific features or tokenization strategies.&lt;/p&gt;

&lt;p&gt;Azure Local supports this incremental expansion by providing a consistent Kubernetes control plane and standard scheduling mechanisms that treat compute resources as fungible. Your initial pilot might span a single machine running AKS on Azure Local in a branch or regional office; as you validate the model and prove business value, you can expand to a small cluster in your primary data center. Each step remains operationally routine because you are not changing how workloads are deployed or managed—you are only changing the scale and distribution of resources. A pilot deployment and a production cluster follow the same GitOps patterns, use the same artifact promotion pipelines, and respond to the same observability signals, allowing you to graduate from proof-of-concept to production without a redesign of your delivery model or a learning curve on unfamiliar operational practices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical decision test:&lt;/strong&gt; Azure Local tends to be the right call when most of these are true: the authoritative tools/data are on‑prem, prompts and retrieved context must remain local, latency is a requirement, connectivity is constrained, and you want cloud-native operations in the same footprint.&lt;/p&gt;

&lt;p&gt;With that context, we can move from "why" to "how".&lt;/p&gt;
&lt;h2&gt;
  
  
  Step‑by‑step implementation runbook
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Phase 0 — Define boundaries: agent vs workflow, and what "done" means
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Write the outcome in business terms.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Define success in measurable outcomes, not in model features. Examples include reduced downtime, faster triage, fewer escalations, shorter handling time, or improved compliance auditability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Classify steps as agent or workflow.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use an &lt;strong&gt;agent&lt;/strong&gt; for open-ended interpretation, conversational assistance, flexible tool use, and summarization.&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;workflow&lt;/strong&gt; for deterministic steps, routing, approvals, checkpoints, and auditable state transitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Produce a tool inventory and trust boundary map.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For each tool, define authentication, authorization, validation, allowed destinations, and audit requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Teams often prototype by giving agents broad access "to move fast." That security debt becomes expensive later. Start with constrained proxies and allow-lists from day one.&lt;/p&gt;
&lt;h3&gt;
  
  
  Phase 1 — Platform baseline: Azure Local + Arc + AKS on Azure Local
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1.1 Establish baseline assumptions
&lt;/h4&gt;

&lt;p&gt;Decide upfront:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Topology: pilot node, datacenter cluster, or distributed sites.&lt;/li&gt;
&lt;li&gt;OS mix: Linux nodes, Windows nodes, or mixed.&lt;/li&gt;
&lt;li&gt;Acceleration: CPU only vs GPU/NPU inference nodes.&lt;/li&gt;
&lt;li&gt;Connectivity mode: connected, constrained, or partially disconnected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Constrained connectivity changes everything about artifact flow. Treat "how will nodes pull images and models?" as a first-class requirement (private registry, artifact promotion, caching).&lt;/p&gt;
&lt;h4&gt;
  
  
  1.2 Build a minimal AKS baseline (repeatable)
&lt;/h4&gt;

&lt;p&gt;Include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Namespaces for separation (&lt;code&gt;platform&lt;/code&gt;, &lt;code&gt;agents&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;observability&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Ingress and certificate strategy.&lt;/li&gt;
&lt;li&gt;Secrets management strategy.&lt;/li&gt;
&lt;li&gt;Logging/metrics pipeline.&lt;/li&gt;
&lt;li&gt;Network policies and egress controls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example namespace baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restricted"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tools&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restricted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Without early namespace boundaries and baseline policies, your cluster becomes a collection of special cases that are hard to govern and hard to migrate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2 — GitOps delivery (recommended even for pilots)
&lt;/h3&gt;

&lt;h3&gt;
  
  
  2.1 Repository layout pattern
&lt;/h3&gt;

&lt;p&gt;A structure that scales:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;clusters/&amp;lt;cluster-name&amp;gt;/&lt;/code&gt; for cluster-specific overlays&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;platform/&lt;/code&gt; for shared add-ons (ingress, monitoring, policy)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workloads/agents/&lt;/code&gt; for agent services&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workloads/tools/&lt;/code&gt; for tool proxies and connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2.2 Kustomization pattern (example)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.toolkit.fluxcd.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./workloads/agents/overlays/prod&lt;/span&gt;
  &lt;span class="na"&gt;prune&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;sourceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GitRepository&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;platform-repo&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; GitOps only reduces drift if "kubectl apply in production" is the exception with a documented break-glass process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3 — Foundational services: state, memory, and observability
&lt;/h3&gt;

&lt;p&gt;Make state explicit and intentional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conversation state&lt;/strong&gt; (threads, session context) belongs in agent stores designed for that purpose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business state&lt;/strong&gt; (work items, approvals, tickets) belongs in systems of record.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common supporting components on AKS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redis for caching and rate limiting&lt;/li&gt;
&lt;li&gt;PostgreSQL (or equivalent) for durable state&lt;/li&gt;
&lt;li&gt;A vector store if you implement local RAG&lt;/li&gt;
&lt;li&gt;OpenTelemetry collector for traces/metrics/logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Agent telemetry can explode. Define retention, sampling, and content redaction policies early. In regulated environments, you often cannot log raw prompts or retrieved text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4 — Install Foundry Local (preview) on inference nodes
&lt;/h3&gt;

&lt;p&gt;Treat Foundry Local as a managed runtime dependency.&lt;/p&gt;

&lt;h4&gt;
  
  
  4.1 Placement and isolation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Prefer dedicated inference nodes where possible.&lt;/li&gt;
&lt;li&gt;Place them where the acceleration hardware lives.&lt;/li&gt;
&lt;li&gt;Segment networking so AKS can reach them reliably while keeping exposure minimal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4.2 Endpoint discovery (avoid hard-coded ports)
&lt;/h4&gt;

&lt;p&gt;Prefer one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Discovery service pattern:&lt;/strong&gt; publish the current base URL into a config store that your agent services read.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway pattern:&lt;/strong&gt; place a stable internal proxy in front of Foundry Local to normalize routing and policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Hard-coded ports work in a lab and fail after reboots, upgrades, or runtime changes. Build discovery or stable routing into the design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5 — Network, TLS, and identity between AKS and Foundry Local
&lt;/h3&gt;

&lt;h4&gt;
  
  
  5.1 Connectivity options
&lt;/h4&gt;

&lt;p&gt;Common choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct HTTPS from agent pods to Foundry node IP/DNS&lt;/li&gt;
&lt;li&gt;Internal L4/L7 proxy for stable routing and policy&lt;/li&gt;
&lt;li&gt;Service mesh for mTLS and telemetry (only if you already operate one)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5.2 TLS strategy
&lt;/h4&gt;

&lt;p&gt;Use your standard PKI approach, if possible, and ensure that clients validate certificates by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; "Works with curl -k" is a warning sign, not a milestone. Fix trust chains early so insecure shortcuts do not become permanent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 6 — Implement the inference adapter in your MAF service
&lt;/h3&gt;

&lt;p&gt;Design goal: agent code calls a model client abstraction, not a concrete endpoint.&lt;/p&gt;

&lt;h4&gt;
  
  
  6.1 Configuration pattern (ConfigMap + Secret)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-config&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;FOUNDRY_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://foundry-local.internal.example"&lt;/span&gt;
  &lt;span class="na"&gt;FOUNDRY_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local-chat-model"&lt;/span&gt;
  &lt;span class="na"&gt;INFERENCE_TIMEOUT_SECONDS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;
  &lt;span class="na"&gt;INFERENCE_MAX_RETRIES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-secrets&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;stringData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;FOUNDRY_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;placeholder-if-required-by-client"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployment consuming it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maf-agent-api&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maf-agent-api&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maf-agent-api&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.local/agents/maf-agent-api:1.0.0&lt;/span&gt;
        &lt;span class="na"&gt;envFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;configMapRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-config&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-secrets&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;250m"&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2Gi"&lt;/span&gt;
        &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health/ready&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
          &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
          &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  6.2 Client policy (timeouts, retries, circuit breakers)
&lt;/h4&gt;

&lt;p&gt;Start with conservative defaults:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timeout: 20–60s depending on model/prompt size&lt;/li&gt;
&lt;li&gt;Retries: 1–2 for transient failures only&lt;/li&gt;
&lt;li&gt;Circuit breaker: open after repeated failures to prevent cascading latency&lt;/li&gt;
&lt;li&gt;Concurrency limits: protect inference nodes from overload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Without explicit backpressure, a single busy agent route can saturate inference and degrade every workload that shares the runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 7 — Tool integration with constrained proxies
&lt;/h3&gt;

&lt;p&gt;Do not give agents direct access to sensitive systems.&lt;/p&gt;

&lt;p&gt;Recommended approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy tool proxy services in a dedicated namespace.&lt;/li&gt;
&lt;li&gt;Restrict outbound connectivity to approved destinations only.&lt;/li&gt;
&lt;li&gt;Enforce authorization, validation, and allow-lists in the proxy.&lt;/li&gt;
&lt;li&gt;Log every invocation with correlation IDs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A default-deny egress policy concept:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tools-default-deny-egress&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tools&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;policyTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Egress&lt;/span&gt;
  &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; If you skip network controls early, you will discover "mystery dependencies" later when tools call endpoints that were never approved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 8 — Observability: correlate agent → tools → inference
&lt;/h3&gt;

&lt;p&gt;Minimum requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correlation ID propagated across inbound request, tool calls, inference calls, and response&lt;/li&gt;
&lt;li&gt;Latency breakdown (tool time vs inference time vs orchestration time)&lt;/li&gt;
&lt;li&gt;Error classification by category (tool failure, inference failure, policy block, timeout)&lt;/li&gt;
&lt;li&gt;Token/prompt size metadata if available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Decide what is safe to log. For many environments, metadata and hashes are acceptable, but raw prompts and retrieved snippets are not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 9 — Hardening: safety, governance, regression testing
&lt;/h3&gt;

&lt;p&gt;Hardening checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt and tool regression tests for critical flows&lt;/li&gt;
&lt;li&gt;Golden conversations for validation after runtime updates&lt;/li&gt;
&lt;li&gt;Tool schemas and allow-lists are enforced centrally&lt;/li&gt;
&lt;li&gt;Timeouts on every external call&lt;/li&gt;
&lt;li&gt;Rate limits per user and per route&lt;/li&gt;
&lt;li&gt;Graceful degradation when inference is unavailable (fallback to workflow/human)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Preview inference runtimes can introduce behavior changes that are not "errors" but still break user expectations. Without regression tests, you will find out in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 10 — Operations: versioning, rollouts, and capacity planning
&lt;/h3&gt;

&lt;h4&gt;
  
  
  10.1 Independent update cadencess
&lt;/h4&gt;

&lt;p&gt;Operate on separate cadences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent services: frequent updates via CI/CD&lt;/li&gt;
&lt;li&gt;Inference runtime: cautious updates via staged rollout&lt;/li&gt;
&lt;li&gt;Cluster/platform: regular maintenance windows&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  10.2 Rollout strategy
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Canary agent changes with a small traffic slice and compares latency/error rates&lt;/li&gt;
&lt;li&gt;Pin inference runtime versions and validate with representative load before expanding rollout&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  10.3 Capacity planning
&lt;/h4&gt;

&lt;p&gt;Define explicit SLOs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;p95 latency target for a representative prompt&lt;/li&gt;
&lt;li&gt;maximum concurrent sessions per inference node&lt;/li&gt;
&lt;li&gt;acceptable queueing delay under peak load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operational gotcha:&lt;/strong&gt; Size for peaks and recovery scenarios. A thundering herd is common when shifts start, sites reconnect, or batch processes trigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical “day‑1 to day‑30” plan
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Day 1–3: Foundation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Define business outcomes and agent/workflow boundaries&lt;/li&gt;
&lt;li&gt;Stand up AKS baseline namespaces, ingress, and GitOps scaffolding&lt;/li&gt;
&lt;li&gt;Deploy telemetry pipeline and basic dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 4–10: Inference integration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install Foundry Local on inference nodes&lt;/li&gt;
&lt;li&gt;Implement endpoint discovery and TLS trust&lt;/li&gt;
&lt;li&gt;Add inference adapter in the MAF service with externalized configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 11–20: Tools and data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Build constrained tool proxies with allow-lists and audit logs&lt;/li&gt;
&lt;li&gt;Implement retrieval paths that keep data inside the boundary&lt;/li&gt;
&lt;li&gt;Add correlation IDs end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Day 21–30: Hardening and operations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add regression tests and golden conversations&lt;/li&gt;
&lt;li&gt;Implement rollouts, version pinning, and canary strategy&lt;/li&gt;
&lt;li&gt;Load test and finalize a capacity plan and operational runbooks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This stack is not about "on‑prem versus cloud." It is about aligning the agent pattern with the constraints imposed by the use case: data locality, tool proximity, latency targets, and network realities. Azure Local provides a consistent on‑prem platform for that pattern; AKS keeps operations cloud-native; Foundry Local enables local inference; and Agent Framework provides the application layer to build agents and workflows that map to real business outcomes. By following this architecture and implementation runbook, you can deliver production‑grade AI agents that run locally, proving that cloud-native excellence is not constrained by your network boundary.&lt;/p&gt;

</description>
      <category>azurelocal</category>
      <category>foundrylocal</category>
      <category>agents</category>
    </item>
    <item>
      <title>Testing Copilot Agents: When to Use Agent Evaluation vs. the Copilot Studio Kit</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sun, 05 Apr 2026 07:29:14 +0000</pubDate>
      <link>https://forem.com/holgerimbery/testing-copilot-agents-when-to-use-agent-evaluation-vs-the-copilot-studio-kit-4f3e</link>
      <guid>https://forem.com/holgerimbery/testing-copilot-agents-when-to-use-agent-evaluation-vs-the-copilot-studio-kit-4f3e</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Microsoft's Agent Evaluation GA announcement on March 31, 2026, update to &lt;a href="https://holgerimbery.blog/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview" rel="noopener noreferrer"&gt;Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt; &lt;br&gt;
Agent Evaluation and the Copilot Studio Kit are not competing tools—they represent a layered quality-assurance strategy. Agent Evaluation provides fast, AI-assisted behavioral validation embedded directly in Copilot Studio, ideal for iteration and rapid feedback. The Copilot Studio Kit delivers deterministic, enterprise-grade verification for production gates, compliance, and governance. This article breaks down what each tool does, when to use them, and how to adopt both as your agent quality matures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why read this&lt;/strong&gt;&lt;br&gt;
If you're building or scaling Copilot agents in your organization, you need clarity on testing strategy. This article cuts through the positioning and provides a practical decision framework for when to reach for Agent Evaluation versus the Copilot Studio Kit, with real-world scenarios showing how mature teams layer both tools across their development lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Microsoft shipped with Agent Evaluation (GA)
&lt;/h2&gt;

&lt;p&gt;On March 31, 2026, Microsoft announced the general availability of Agent Evaluation, marking a significant milestone in Copilot Studio's testing and validation capabilities. Agent Evaluation is now generally available and built directly into Copilot Studio. Its goal is to make agent quality visible, repeatable, and scalable without requiring external tools or setup. This GA release represents the culmination of Microsoft's efforts to democratize agent quality assurance, bringing evaluation capabilities previously limited to advanced setups directly into the hands of everyday agent makers in the Copilot Studio authoring environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core characteristics (as of 31. March 2026)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Integrated directly into the Copilot Studio authoring experience&lt;/strong&gt;&lt;br&gt;
Agent Evaluation is not a separate tool or external service. It lives within the Copilot Studio interface, where agents are built, allowing makers to validate their agents without context-switching or complex integrations. This tight integration reduces friction and encourages frequent validation during development cycles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designed to answer the production question:&lt;/strong&gt;&lt;br&gt;
"Can we trust this agent to behave correctly, consistently, and safely?"&lt;br&gt;
This core question drives the entire design philosophy. Agent Evaluation focuses on behavioral confidence—whether the agent produces appropriate, consistent, and safe responses across diverse scenarios and user inputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replaces unscalable manual testing and spot‑checking&lt;/strong&gt;&lt;br&gt;
Before Agent Evaluation, agent validation relied heavily on manual testing: individually testing scenarios, reviewing responses, and hoping coverage was adequate. This approach doesn't scale with agent complexity or usage volume. Agent Evaluation automates and scales this process through AI-assisted evaluation and reusable test sets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intended to be used before launch and continuously after changes&lt;/strong&gt;&lt;br&gt;
Agent Evaluation is not a one-time gate. It's designed for continuous validation: before initial launch, before deploying updates, and continuously as conversations flow through production. This shift from ceremonial testing to continuous validation aligns with modern DevOps practices.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation capabilities
&lt;/h3&gt;

&lt;p&gt;Agent Evaluation allows makers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Create evaluation sets from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manually added questions&lt;/li&gt;
&lt;li&gt;Imported test sets&lt;/li&gt;
&lt;li&gt;AI‑generated queries derived from agent metadata and knowledge sources &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Choose flexible evaluation methods, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact/partial match&lt;/li&gt;
&lt;li&gt;Semantic similarity&lt;/li&gt;
&lt;li&gt;Intent recognition&lt;/li&gt;
&lt;li&gt;Relevance and completeness scoring &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Mix AI‑generated and human‑defined scenarios to balance breadth and depth&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Reuse evaluations over time and run them via APIs for lifecycle testing&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key framing&lt;/strong&gt;:&lt;br&gt;
Agent Evaluation is positioned as a lightweight, AI‑assisted validation layer that fits naturally into everyday agent authoring and iteration. Unlike heavy external testing frameworks that require context switching and specialized infrastructure, Agent Evaluation operates within Copilot Studio itself, where agents are built. This embedded approach acknowledges that agent makers are iterating rapidly, testing comprehensively at each step, and need validation feedback within their authoring flow rather than as a post-production bottleneck. The AI-assisted scoring means makers don't need to hand-write every test case or define complex rubrics upfront; they can generate relevant test scenarios from their agent's own knowledge sources and metadata, then refine them. This makes evaluation accessible to makers of all skill levels and scales with agent complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Copilot Studio Kit provides for testing
&lt;/h2&gt;

&lt;p&gt;The Copilot Studio Kit (Power CAT) is a separate, solution‑based toolkit that augments Copilot Studio with enterprise‑grade testing, governance, and analytics. Developed by the Microsoft Power CAT (Patterns and Practices) team, the Kit represents a mature, production-ready framework built for organizations requiring rigorous quality assurance, regulatory compliance, and scalable CI/CD integration. While Agent Evaluation addresses everyday iteration and behavioral confidence within the authoring canvas, the Copilot Studio Kit provides the structural backbone for organizations that need deterministic verification, audit trails, multi-layer testing orchestration, and governance enforcement across large deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Explicit testing capabilities
&lt;/h3&gt;

&lt;p&gt;The Kit supports structured, deterministic, and multi‑layer testing, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response Match (exact or conditional text comparison)&lt;/li&gt;
&lt;li&gt;Attachment Match (Adaptive Cards/files)&lt;/li&gt;
&lt;li&gt;Topic Match (requires Dataverse enrichment)&lt;/li&gt;
&lt;li&gt;Generative Answer evaluation using AI Builder and rubrics&lt;/li&gt;
&lt;li&gt;Multi‑turn tests running in a shared conversation context&lt;/li&gt;
&lt;li&gt;Plan Validation for generative orchestration (verifying which tools/actions are invoked, not just what the agent says)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Execution and automation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tests are executed via Copilot Studio APIs (Direct Line)&lt;/li&gt;
&lt;li&gt;Bulk creation and maintenance via Excel import/export&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Detailed run‑level telemetry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pass/fail&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Observed responses&lt;/li&gt;
&lt;li&gt;Aggregated metrics&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Results can be enriched with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Application Insights&lt;/li&gt;
&lt;li&gt;Dataverse conversation transcripts&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enterprise extensions beyond testing
&lt;/h3&gt;

&lt;p&gt;The Kit also includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conversation KPIs for Power BI&lt;/li&gt;
&lt;li&gt;Prompt Advisor&lt;/li&gt;
&lt;li&gt;Agent Inventory&lt;/li&gt;
&lt;li&gt;Agent Review Tool&lt;/li&gt;
&lt;li&gt;Compliance Hub with policy enforcement and SLA‑driven reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key framing&lt;/strong&gt;:&lt;br&gt;
The Copilot Studio Kit is built for verification, regression testing, production gates, and governance at scale. Unlike Agent Evaluation's lightweight, AI-assisted approach that lives within the authoring canvas, the Kit functions as an enterprise testing backbone designed for organizations that require deterministic verification, full audit trails, and regulatory compliance enforcement. It bridges the gap between development-time validation and production-readiness, enabling structured quality gates that align with enterprise DevOps pipelines. The Kit's emphasis on exact response matching, topic validation, and orchestration plan verification makes it essential for mission-critical deployments where agent behavior must be predictable, traceable, and compliant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Direct comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Agent Evaluation (GA)&lt;/th&gt;
&lt;th&gt;Copilot Studio Kit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Where it lives&lt;/td&gt;
&lt;td&gt;Built into Copilot Studio UI&lt;/td&gt;
&lt;td&gt;Separate Power CAT solution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary purpose&lt;/td&gt;
&lt;td&gt;Behavioral validation&lt;/td&gt;
&lt;td&gt;Functional verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup effort&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Higher (Dataverse, AI Builder, App Insights optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test creation&lt;/td&gt;
&lt;td&gt;Manual, import, AI‑generated&lt;/td&gt;
&lt;td&gt;Manual + Excel bulk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI‑assisted scoring&lt;/td&gt;
&lt;td&gt;Yes (core feature)&lt;/td&gt;
&lt;td&gt;Yes (Generative Answers via AI Builder)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic checks&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Strong (exact match, topic, attachments)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi‑turn scenarios&lt;/td&gt;
&lt;td&gt;Not explicitly documented&lt;/td&gt;
&lt;td&gt;Explicitly supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration plan validation&lt;/td&gt;
&lt;td&gt;Not documented&lt;/td&gt;
&lt;td&gt;Explicitly supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD &amp;amp; quality gates&lt;/td&gt;
&lt;td&gt;Implicit / API‑based&lt;/td&gt;
&lt;td&gt;Explicit pipeline integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance &amp;amp; compliance&lt;/td&gt;
&lt;td&gt;Not in scope&lt;/td&gt;
&lt;td&gt;First‑class feature&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How they relate (this is the key insight)
&lt;/h2&gt;

&lt;p&gt;Microsoft is not replacing the Copilot Studio Kit with Agent Evaluation.&lt;br&gt;
Instead, the sources show a clear layering strategy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Agent Evaluation&lt;br&gt;
→ Fast, AI‑assisted, in‑product validation&lt;br&gt;
→ Ideal for early feedback, iteration, and continuous confidence&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copilot Studio Kit&lt;br&gt;
→ Deep, deterministic, automatable verification&lt;br&gt;
→ Ideal for release gates, regression testing, orchestration correctness, and governance&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This positioning is also explicitly reflected in community and Microsoft guidance that frames Agent Evaluation as filling the gap that manual testing cannot scale, while the Kit remains the system‑level quality backbone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical takeaway for enterprise teams
&lt;/h2&gt;

&lt;p&gt;Based on what is explicitly documented:&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use each tool
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Agent Evaluation&lt;/strong&gt; is best suited for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rapid iteration cycles during agent development&lt;/li&gt;
&lt;li&gt;Early-stage quality validation before formal review&lt;/li&gt;
&lt;li&gt;Continuous behavioral checks without infrastructure complexity&lt;/li&gt;
&lt;li&gt;Scenarios where AI-assisted, semantic evaluation is sufficient&lt;/li&gt;
&lt;li&gt;Teams prioritizing speed of feedback over deterministic guarantees&lt;/li&gt;
&lt;li&gt;Questions like: "Is this agent generally behaving well after my last change?"
→ Use Agent Evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; is best suited for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production release gates and formal deployment approval&lt;/li&gt;
&lt;li&gt;Regression testing before pushing updates to production&lt;/li&gt;
&lt;li&gt;Regulatory and compliance-driven scenarios requiring audit trails&lt;/li&gt;
&lt;li&gt;Mission-critical agents where deterministic verification is mandatory&lt;/li&gt;
&lt;li&gt;Complex orchestration scenarios requiring plan and tool invocation validation&lt;/li&gt;
&lt;li&gt;Multi-turn conversations that need end-to-end correctness&lt;/li&gt;
&lt;li&gt;Questions like: "Did we break anything? Are the topics correct? Are the tools invoked? Can this ship?"
→ Use the Copilot Studio Kit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How they complement each other
&lt;/h3&gt;

&lt;p&gt;In mature setups, the tools are complementary, not competitive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Development phase&lt;/strong&gt;: Agent Evaluation provides fast feedback loops for iteration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-production phase&lt;/strong&gt;: Copilot Studio Kit enforces deterministic verification gates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production phase&lt;/strong&gt;: Both tools support continuous monitoring—Agent Evaluation for behavioral trends, the Kit for functional regression detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance phase&lt;/strong&gt;: The Kit's compliance and KPI tracking provide the enterprise audit trail and policy enforcement layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Organizations scaling from single-agent projects to enterprise deployments should expect to adopt both tools at different maturity stages, using them in sequence rather than as either/or choices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg53qyk5yjjz216z1x0fq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg53qyk5yjjz216z1x0fq.png" alt="upgit_20260404_1775315507.png" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Agent Evaluation and the Copilot Studio Kit represent Microsoft's thoughtful answer to the agent testing maturity curve. As organizations build, iterate, and scale agents from proof-of-concept to mission-critical systems, both tools play essential roles at different stages of the lifecycle.&lt;/p&gt;

&lt;p&gt;Agent Evaluation brings quality validation into the authoring experience, reducing friction in everyday iteration and making behavioral confidence accessible to all agent makers. Its AI-assisted approach acknowledges the reality of rapid development cycles and the need for fast feedback loops.&lt;/p&gt;

&lt;p&gt;The Copilot Studio Kit, by contrast, provides the deterministic backbone that enterprises require—exact verification, governance enforcement, regulatory compliance, and the audit trails necessary for mission-critical deployments.&lt;/p&gt;

&lt;p&gt;The key insight is that these tools are not competitors but complementary. Teams should adopt them in sequence, starting with Agent Evaluation during development for rapid iteration, then layering in the Copilot Studio Kit as the agent approaches production. Organizations serious about agent quality at scale will ultimately adopt both, using them to build confidence at every stage from ideation to production and beyond.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>agents</category>
      <category>agentevaluation</category>
    </item>
    <item>
      <title>Azure Local, Foundry Local, and Microsoft 365 Local: A Comprehensive Guide for IT Architects and Decision-Makers</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 04 Apr 2026 07:56:37 +0000</pubDate>
      <link>https://forem.com/holgerimbery/azure-local-foundry-local-and-microsoft-365-local-a-comprehensive-guide-for-it-architects-and-261p</link>
      <guid>https://forem.com/holgerimbery/azure-local-foundry-local-and-microsoft-365-local-a-comprehensive-guide-for-it-architects-and-261p</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Cloud Capabilities Without Leaving Premises&lt;br&gt;&lt;br&gt;
As regulatory demands tighten, latency requirements become critical, and data sovereignty moves from a nice-to-have to a must-have, Microsoft has engineered a comprehensive answer: &lt;strong&gt;Sovereign Private Cloud&lt;/strong&gt;. This three-pillar platform—Azure Local infrastructure, Microsoft 365 Local productivity, and Foundry Local AI—enables organizations to operate complete, intelligent cloud systems entirely within their boundaries. Whether you're managing classified government systems, running millisecond-critical manufacturing operations, sustaining teams in air-gapped locations, or processing sensitive AI workloads behind regulatory firewalls, this guide walks you through architectures, deployment strategies, and real-world patterns for implementing on-premises cloud at enterprise scale.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reasons to Read This Article&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complete Platform Understanding&lt;/strong&gt;: Grasp all three components of this Sovereign Private Cloud approach, how they integrate, and which combination matches your operational model (connected, intermittently connected, or fully offline).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment Confidence&lt;/strong&gt;: Learn the hardware requirements, licensing models, connectivity tolerances, and planning phases required to deploy Azure Local (hyperconverged or disconnected), Microsoft 365 Local, and Foundry Local in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Case Alignment&lt;/strong&gt;: Identify whether your organization fits one of the key scenarios—government/defense data sovereignty, manufacturing low-latency control, retail edge compute, isolated locations, or confidential AI—with architectural patterns and reference implementations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic AI on Premises&lt;/strong&gt;: Discover how to build multi-agent AI systems using Microsoft Agent Framework + Foundry Local + Azure Local infrastructure, enabling autonomous reasoning and automation with zero cloud dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk Mitigation and Best Practices&lt;/strong&gt;: Understand connectivity tolerance, failover strategies, backup approaches, and testing protocols to ensure your on-premises cloud operates reliably and compliantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation Path&lt;/strong&gt;: Explore trial options (60-day Azure Local eval, free Foundry Local, partner-delivered M365 Local pilots) tailored to your budget and risk profile.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Azure Local is Microsoft's distributed infrastructure solution — formerly known as Azure Stack HCI — that extends Azure capabilities to customer-owned environments. It enables local deployment of both modern and legacy applications across distributed or sovereign locations, using Azure Arc as the unifying control plane. Azure Local is the foundation of Microsoft's Sovereign Private Cloud offering, which unifies three components: Azure Local (infrastructure), Microsoft 365 Local (productivity), and Foundry Local (AI inference) to deliver a full-stack private cloud that operates at any connectivity level — connected, intermittently connected, or fully disconnected. This article provides a comprehensive overview of these offerings, their use cases, deployment options, and best practices for IT architects and decision-makers considering on-premises Azure solutions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3wml57a7s32fj9o7ngh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3wml57a7s32fj9o7ngh.png" alt="upgit_20260331_1774988563.png" width="800" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Azure Local: Hyperconverged and Disconnected
&lt;/h2&gt;

&lt;p&gt;Azure Local addresses a set of business requirements for which the public cloud alone is insufficient: compute that must remain on-premises, mission-critical application resiliency, low-latency decision-making, and specific compliance mandates. Microsoft positions it as part of the adaptive cloud approach — bringing the cloud to the customer so they can build and innovate anywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hyperconverged Deployments (Connected Mode)
&lt;/h3&gt;

&lt;p&gt;A hyperconverged deployment of Azure Local consists of one machine or a cluster of machines connected to Azure. Clusters support 1 to 16 physical machines with hyperconverged storage (up to 8 machines in rack-aware configurations). The architecture is built on proven technologies: Hyper-V, Storage Spaces Direct, and Failover Clustering.&lt;br&gt;
In connected mode, the Azure cloud serves as the management plane. Administrators use the Azure Portal, Azure CLI, or PowerShell to view, monitor, and manage individual Azure Local instances or an entire fleet. Azure Local includes a secure-by-default configuration with more than 300 security settings, providing a consistent security baseline and a drift-control mechanism.&lt;br&gt;
Connectivity tolerance: If internet connectivity is lost, all host infrastructure and existing VMs continue to run normally. However, features that directly rely on cloud services become unavailable. Azure Local must successfully sync with Azure at least once every 30 consecutive days. If that window is exceeded, the cluster enters a reduced-functionality mode — existing VMs continue running, but new VMs cannot be created until connectivity is restored.&lt;/p&gt;
&lt;h3&gt;
  
  
  Disconnected Operations (Full Offline Mode)
&lt;/h3&gt;

&lt;p&gt;For environments where any cloud connectivity is undesired or impossible, disconnected operations bring the entire Azure control plane on-premises. Organizations can deploy and manage Azure Local instances, build VMs, and run containerized applications using select Azure Arc-enabled services from a local control plane that provides a familiar Azure Portal and Azure CLI experience — all without a connection to the Azure public cloud.&lt;br&gt;
Key constraint: Disconnected mode requires extra capacity for a dedicated management cluster to host the local control plane appliance.&lt;/p&gt;

&lt;p&gt;This management cluster has the following minimum hardware requirements:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;th&gt;Minimum Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Number of nodes&lt;/td&gt;
&lt;td&gt;3 nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory per node&lt;/td&gt;
&lt;td&gt;96 GB (appliance alone needs ≥64 GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cores per node&lt;/td&gt;
&lt;td&gt;24 physical cores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage per node&lt;/td&gt;
&lt;td&gt;2 TB SSD/NVMe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boot disk drive storage&lt;/td&gt;
&lt;td&gt;960 GB SSD/NVMe&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Disconnected operations are intended for organizations that cannot connect to Azure due to connectivity issues or regulatory restrictions. To procure this capability, a valid business justification and a Microsoft Customer Agreement for Enterprises (MCA-E) (or other eligible agreement type) are required.&lt;/p&gt;
&lt;h3&gt;
  
  
  Connected vs. Disconnected: Decision Framework
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision Factor&lt;/th&gt;
&lt;th&gt;Connected (Hyperconverged)&lt;/th&gt;
&lt;th&gt;Disconnected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud dependency&lt;/td&gt;
&lt;td&gt;Requires outbound HTTPS to Azure ≥ once per 30 days&lt;/td&gt;
&lt;td&gt;Zero cloud dependency; local control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Management plane&lt;/td&gt;
&lt;td&gt;Azure public cloud (Azure Portal, Arc)&lt;/td&gt;
&lt;td&gt;On-premises Azure Portal and CLI replica&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware overhead&lt;/td&gt;
&lt;td&gt;Workload cluster only (1–16 nodes)&lt;/td&gt;
&lt;td&gt;Workload cluster + dedicated 3-node management cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eligibility&lt;/td&gt;
&lt;td&gt;Any Azure subscription&lt;/td&gt;
&lt;td&gt;Requires MCA-E and business justification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Hybrid scenarios, branch offices, edge with periodic connectivity&lt;/td&gt;
&lt;td&gt;Air-gapped facilities, classified environments, remote sites without Internet&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Typical Use Cases
&lt;/h2&gt;

&lt;p&gt;Azure Local, Foundry Local, and Microsoft 365 Local serve scenarios in which the traditional public cloud alone cannot meet operational, regulatory, or latency requirements. The following use cases emerge from Microsoft's documentation and partner ecosystem:&lt;/p&gt;
&lt;h3&gt;
  
  
  Government and Defense (Data Sovereignty)
&lt;/h3&gt;

&lt;p&gt;Organizations in government, defense, and intelligence sectors require that data, operations, and control remain within organizational boundaries. Azure Local enables sovereign private clouds in which all workloads run locally. Microsoft 365 Local adds core collaboration tools — Exchange Server, SharePoint Server, and Skype for Business Server — that run entirely within the customer's sovereign operational boundary, keeping teams productive even when disconnected from the cloud. In disconnected mode, data residency and sovereign requirements are met without relying solely on public sovereign cloud controls. &lt;/p&gt;
&lt;h3&gt;
  
  
  Manufacturing and Industrial Operations (Low Latency &amp;amp; Reliability)
&lt;/h3&gt;

&lt;p&gt;Azure Local targets control systems and near real-time operations with extreme latency requirements — manufacturing execution systems, industrial quality assurance, and production line operations that must continue through network outages. On-premises compute clusters enable decisions in milliseconds without cloud round-trip delays. Azure Local's integration with Azure IoT Operations (deployed on AKS clusters enabled by Azure Arc on Azure Local) provides a turnkey approach for managing and processing IoT data at the edge. &lt;/p&gt;
&lt;h3&gt;
  
  
  Retail and Branch Offices (Edge Compute)
&lt;/h3&gt;

&lt;p&gt;Azure Local supports single-machine deployments through full clusters, making it suitable for distributed retail or branch scenarios where local AI inference at the source is needed — for example, self-checkout systems and loss-prevention applications in retail stores. The hyperconverged design ensures that even if WAN connectivity to central services drops, local operations continue uninterrupted. &lt;/p&gt;
&lt;h3&gt;
  
  
  Remote and Isolated Locations
&lt;/h3&gt;

&lt;p&gt;Industries operating in areas with limited network infrastructure — oil rigs, mining sites, rural clinics, and vessels at sea — benefit from operating in disconnected environments. Azure Local lets them use Azure Arc services and run workloads without relying on internet connectivity. Foundry Local extends this by enabling on-device inference of AI models in offline or bandwidth-constrained environments. &lt;/p&gt;
&lt;h3&gt;
  
  
  Confidential AI and Data Processing
&lt;/h3&gt;

&lt;p&gt;Organizations that need to run AI on sensitive data without exposing it to third-party clouds can combine Azure Local with Foundry Local. This enables local AI inferencing, where data is processed at the source. Foundry Local supports chat completions (text generation) and audio transcription (speech-to-text) through a single runtime that runs entirely on-device, with no cloud dependency for inference. Foundry Local now supports large multimodal models on Azure Local infrastructure, using the latest GPUs from partners like NVIDIA, so you can run advanced AI inference in sovereign environments.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Is Available
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Azure Local Core Infrastructure
&lt;/h3&gt;

&lt;p&gt;Azure Local is a full-stack infrastructure software running on validated hardware in customer facilities. It supports VMs, containers, and select Azure services locally while maintaining Azure-consistent management through Azure Arc.&lt;/p&gt;

&lt;p&gt;Features and architecture of hyperconverged deployments:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;Validated hardware from Microsoft partners; 1–16 machines per instance (max 8 for rack-aware clusters)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Storage Spaces Direct; external SAN storage in preview for qualified opportunities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Networking&lt;/td&gt;
&lt;td&gt;Customer-managed with physical switches and VLANs; optional software-defined networking (SDN)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local Services&lt;/td&gt;
&lt;td&gt;VMs for general-purpose workloads; AKS enabled by Azure Arc for containerized workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Management&lt;/td&gt;
&lt;td&gt;Azure Policy, Azure Monitor, Microsoft Defender for Cloud, and others via Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Metrics and logs sent to Azure Monitor and Log Analytics for infrastructure and workload resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Management Tools&lt;/td&gt;
&lt;td&gt;Azure Portal, CLI, ARM/Bicep/Terraform (cloud); PowerShell, Windows Admin Center, Hyper-V Manager, Failover Cluster Manager (local)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disaster Recovery&lt;/td&gt;
&lt;td&gt;Azure Backup, Azure Site Recovery, and non-Microsoft partners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;300+ security settings for consistent baseline and drift control; Trusted Launch for VMs; Microsoft Defender for Cloud integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Common Azure services on Azure Local:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Virtual machines&lt;/td&gt;
&lt;td&gt;Azure Local VMs enabled by Azure Arc (Windows/Linux, with Trusted Launch support)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Virtual desktops&lt;/td&gt;
&lt;td&gt;Azure Virtual Desktop (AVD) session hosts on-premises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container orchestration&lt;/td&gt;
&lt;td&gt;Azure Kubernetes Service (AKS) enabled by Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arc-enabled services&lt;/td&gt;
&lt;td&gt;Select Azure services for hybrid workloads via Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-performance databases&lt;/td&gt;
&lt;td&gt;SQL Server on Azure Local with extra resiliency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Media analytics&lt;/td&gt;
&lt;td&gt;Azure AI Video Indexer enabled by Azure Arc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI chat assistants&lt;/td&gt;
&lt;td&gt;Azure Edge RAG (Preview) — turnkey RAG solution for custom chat over private data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IoT management&lt;/td&gt;
&lt;td&gt;Azure IoT Operations on AKS clusters on Azure Local&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Disconnected operations support a subset of these services via the local control plane:   &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Azure Portal&lt;/td&gt;
&lt;td&gt;Local portal experience similar to Azure Public&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Resource Manager (ARM)&lt;/td&gt;
&lt;td&gt;Subscriptions, resource groups, ARM templates, CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RBAC&lt;/td&gt;
&lt;td&gt;Role-based access control for subscriptions and resource groups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managed identity&lt;/td&gt;
&lt;td&gt;System-assigned managed identity for supported resource types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arc-enabled servers&lt;/td&gt;
&lt;td&gt;VM guest management for Azure Local VMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local VMs&lt;/td&gt;
&lt;td&gt;Windows or Linux VMs via disconnected operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arc-enabled Kubernetes (Preview)&lt;/td&gt;
&lt;td&gt;CNCF Kubernetes cluster management on Azure Local VMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AKS enabled by Arc (Preview)&lt;/td&gt;
&lt;td&gt;AKS on Azure Local in disconnected mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local device management&lt;/td&gt;
&lt;td&gt;Create and manage instances, add/remove nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Container Registry&lt;/td&gt;
&lt;td&gt;Store and retrieve container images and artifacts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Key Vault&lt;/td&gt;
&lt;td&gt;Store and access secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Policy&lt;/td&gt;
&lt;td&gt;Enforce standards and governance on new resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Deployment types include hyperconverged deployments, multi-rack deployments (in preview), Microsoft 365 on Local, and disconnected operations. Multi-rack deployments support larger configurations with prescriptive hardware BOMs featuring pre-integrated racks containing SAN storage, servers, and network devices; re-use of existing hardware is not supported for multi-rack at this time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Microsoft 365 Local
&lt;/h3&gt;

&lt;p&gt;Microsoft 365 Local runs Exchange Server, SharePoint Server, and Skype for Business Server on Azure Local infrastructure that is entirely customer-owned and managed. It supports both hybrid and fully disconnected deployments and provides an Azure-consistent management experience with a unified control plane. &lt;/p&gt;

&lt;p&gt;Core capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exchange, SharePoint, and Skype for Business&lt;/strong&gt;: Enterprise-grade email, document management, and unified communications on-premises, addressing stringent data residency requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certified and validated solutions&lt;/strong&gt;: Deployed on Azure Local Premier Solutions from hardware partners, guaranteeing compatibility for sovereign deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-stack validated reference architecture&lt;/strong&gt;: Prescriptive guidance for networking, storage, compute, and identity integration based on best practices for optimal performance and resiliency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sovereign Private Cloud capabilities&lt;/strong&gt;: Azure-consistent management with enhanced security features (encryption, access controls, compliance mechanisms) aligned with local regulatory frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid or fully disconnected support&lt;/strong&gt;: Connected mode uses Azure as the cloud control plane; disconnected mode uses a local control plane for complete isolation and air-gapped operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example large-scale server role allocation (connected mode): &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 servers configured as a three-node Azure Local instance for SharePoint Server and SQL Server workloads.&lt;/li&gt;
&lt;li&gt;4 servers each as single-node Azure Local instances for Exchange Server mailbox roles.&lt;/li&gt;
&lt;li&gt;2 servers each as single-node Azure Local instances for Exchange Server edge transport roles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft 365 Local is now generally available and must be deployed through a Microsoft-certified solution partner. Microsoft has committed to supporting these on-premises productivity server workloads through at least 2035.&lt;/p&gt;
&lt;h3&gt;
  
  
  Foundry Local
&lt;/h3&gt;

&lt;p&gt;Foundry Local is an on-device AI inference solution (currently in public preview) that enables local execution of AI models through a CLI, SDK, or REST API. It provides an OpenAI-compatible REST endpoint running entirely on-device, meaning prompts and model outputs are processed locally without being sent to the cloud.&lt;/p&gt;

&lt;p&gt;System requirements:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Windows 10 (x64), Windows 11 (x64/ARM), Windows Server 2025, macOS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimum hardware&lt;/td&gt;
&lt;td&gt;8 GB RAM, 3 GB free disk space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommended hardware&lt;/td&gt;
&lt;td&gt;16 GB RAM, 15 GB free disk space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optional acceleration&lt;/td&gt;
&lt;td&gt;NVIDIA GPU (2000 series+), AMD GPU (6000 series+), AMD NPU, Intel iGPU, Intel NPU (32 GB+ memory), Qualcomm Snapdragon X Elite (8 GB+ memory), Qualcomm NPU, Apple silicon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Supported AI capabilities:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Foundry Local Service&lt;/td&gt;
&lt;td&gt;An OpenAI-compatible REST server providing a standard interface for inference. The endpoint is dynamically allocated when the service starts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ONNX Runtime&lt;/td&gt;
&lt;td&gt;Executes optimized ONNX models on CPUs, GPUs, or NPUs; supports multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and quantized models for faster inference.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Management&lt;/td&gt;
&lt;td&gt;CLI and cache system for downloading, listing, and managing AI models locally.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Key architectural components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Foundry Local Service&lt;/strong&gt;: An OpenAI-compatible REST server providing a standard interface for inference. The endpoint is dynamically allocated when the service starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime&lt;/strong&gt;: Executes optimized ONNX models on CPUs, GPUs, or NPUs; supports multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and quantized models for faster inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Management&lt;/strong&gt;: CLI and cache system for downloading, listing, and managing AI models locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No Azure subscription is required to use Foundry Local on a device; it runs on local hardware with no recurring cloud costs for inference. &lt;br&gt;
For sovereign environments requiring heavier AI workloads, the integration of Foundry Local with Azure Local supports large-scale models utilizing the latest GPUs from NVIDIA, with Microsoft providing comprehensive support for deployments, updates, and operational health.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6phu5leex2ixbgcffym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6phu5leex2ixbgcffym.png" alt="upgit_20260331_1774988652.png" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites and Planning for Deployment
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Hardware and Catalog Selection
&lt;/h3&gt;

&lt;p&gt;Azure Local runs exclusively on validated hardware configurations listed in the Azure Local Solutions Catalog. Hardware solutions fall into three categories: Validated Nodes, Integrated Systems, and Premier Solutions. Premier Solutions delivers deep integration and validation for a smooth end-to-end experience. For hyperconverged deployments, you can reuse existing hardware only if it matches a supported configuration in the catalog; otherwise, upgrades or new hardware are required.&lt;br&gt;&lt;br&gt;
Each Azure Local machine in a hyperconverged cluster must meet system requirements for CPU, memory, storage, and network. For planning, the Azure Local Catalog and available sizing tools help estimate hardware requirements for the intended workload profile. Networking must be designed for redundancy and performance—typically using 10–25 GbE or higher links, physical switches, and VLANs. Optional SDN services can be enabled for software-defined networking.&lt;br&gt;&lt;br&gt;
For disconnected operations, plan additional capacity for the management cluster as detailed in Section 1 (3 nodes, 96 GB RAM/node, 24 cores/node, 2 TB SSD/node, 960 GB boot disk/node).&lt;br&gt;&lt;br&gt;
For Microsoft 365 Local, hardware must be an Azure Local Premier Solution that specifically meets the M365 Local requirements listed in the Azure Local Solutions Catalog. Please work with your authorized Microsoft partner to size the deployment appropriately. We have reference architectures for small-, mid-, and large-scale configurations tailored to your needs.  &lt;/p&gt;
&lt;h3&gt;
  
  
  Azure Subscription and Licensing
&lt;/h3&gt;

&lt;p&gt;An Azure subscription is required for Azure Local. The billing model charges a per-physical-core fee on on-premises machines, plus consumption-based charges for any additional Azure services used. All charges roll up to the existing Azure subscription. For disconnected operations, an eligible enterprise agreement (such as MCA-E) is also needed, and qualification must be discussed with the Microsoft account team before procurement.&lt;br&gt;&lt;br&gt;
Additional licensing considerations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS licenses for workload VMs (e.g., Windows Server)
&lt;/li&gt;
&lt;li&gt;Microsoft 365 server licenses if deploying M365 Local (Exchange, SharePoint, Skype)&lt;/li&gt;
&lt;li&gt;Foundry Local requires no Azure subscription and has no RBAC role requirements when running solely on-device.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Network Connectivity Planning
&lt;/h3&gt;

&lt;p&gt;In connected mode, each machine must have outbound HTTPS connectivity to well-known Azure endpoints at least every 30 days. If SDN is planned, review the SDN overview before deployment. Network and host requirements must be met per Microsoft's published specifications.&lt;br&gt;&lt;br&gt;
In disconnected mode, the local management cluster must be networked to the workload clusters within the customer's environment, but no external internet is required post-deployment (only registration data is exchanged during initial deployment, registration, and license renewal).  &lt;/p&gt;
&lt;h3&gt;
  
  
  Assessment and Planning Phases
&lt;/h3&gt;

&lt;p&gt;A structured planning process reduces risk. Microsoft and its partners typically follow phased engagement for Azure Local projects, especially for M365 Local:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Assessment&lt;/td&gt;
&lt;td&gt;Analyze organizational requirements, compliance needs, and desired outcomes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Planning&lt;/td&gt;
&lt;td&gt;Define hardware configurations, software solutions, migration, and integration strategies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Acquisition&lt;/td&gt;
&lt;td&gt;Procure necessary hardware, software, and licenses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Execute the planned rollout in accordance with best practices&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For disconnected operations, organizations must additionally identify workloads and application requirements for disconnected operation, and staff (or partners) with the capability to deploy and operate disconnected environments.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deploying Azure Local: Steps and Best Practices
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Cluster Installation and Registration
&lt;/h3&gt;

&lt;p&gt;For hyperconverged deployments, the Azure Local operating system can be downloaded from the Azure Portal, which includes a free 60-day trial. Alternatively, pre-integrated systems from OEM partners arrive with Azure Local pre-installed. After installing the OS on each server node and configuring the cluster (using Storage Spaces Direct for storage and Failover Clustering for high availability), the cluster must be registered with Azure Arc to enable cloud management through the Azure Portal and Arc tools.&lt;br&gt;&lt;br&gt;
Hardware can be purchased from any Microsoft hardware partner listed in the Azure Local Catalog, and the available sizing tool can help estimate hardware requirements before purchase.   &lt;/p&gt;
&lt;h3&gt;
  
  
  Post-Deployment Configuration
&lt;/h3&gt;

&lt;p&gt;Once registered, the Azure Local instance appears in the Azure Portal as a manageable resource. Post-deployment steps include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enabling Arc-enabled services&lt;/strong&gt;: Configure AKS clusters, Arc-enabled data services, or other platform services as needed for workload requirements.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Applying governance policies&lt;/strong&gt;: Use Azure Policy to enforce compliance standards across the on-premises environment, and configure Microsoft Defender for Cloud to assess and improve security posture.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setting up monitoring&lt;/strong&gt;: Configure Azure Monitor and Log Analytics for metrics and log collection from both infrastructure and workloads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keeping the environment current&lt;/strong&gt;: Azure Local provides Solution Updates that simplify keeping the entire stack up to date across OS, firmware, and drivers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For disconnected deployments, these management services are configured on the local control plane appliance rather than through Azure public endpoints. The local Azure Portal and CLI provide an equivalent experience for managing policies, deploying VMs, and monitoring infrastructure within the isolated environment.&lt;br&gt;&lt;br&gt;
Deploying Microsoft 365 Local&lt;br&gt;
M365 Local must be deployed through a Microsoft-certified solution partner. The partner follows the reference architecture to provision the required Azure Local instances and configure Exchange, SharePoint, and Skype for Business server roles. The reference architectures include prescriptive guidance for networking and security — covering virtual networks, network security groups, and load balancers to segment, isolate, and secure access to workloads. In connected mode, architectures use Azure as the cloud-connected control plane; in disconnected mode, they use a local control plane.&lt;br&gt;&lt;br&gt;
Organizations can contact their Microsoft account team or visit the Microsoft 365 Local General Availability sign-up page for information about authorized partners.   &lt;/p&gt;
&lt;h3&gt;
  
  
  Testing and Validation
&lt;/h3&gt;

&lt;p&gt;Thorough validation after deployment is critical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cluster validation&lt;/strong&gt;: Run the built-in validation tools to confirm hardware, storage, and network configurations meet requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VM and failover testing&lt;/strong&gt;: Create test VMs, perform live migrations between nodes, and simulate node failures to verify high availability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connectivity resilience (connected mode)&lt;/strong&gt;: Simulate internet outages to confirm workloads continue uninterrupted and that the cluster correctly reconnects and syncs within the 30-day window.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disconnected mode testing&lt;/strong&gt;: Verify that the local management portal supports all required operations (VM provisioning, policy enforcement, monitoring) without any external connectivity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backup and recovery validation&lt;/strong&gt;: Test backup and restore procedures using Azure Backup, Azure Site Recovery, or third-party solutions.
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Planning and Deploying VM Workloads
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Capacity Planning
&lt;/h3&gt;

&lt;p&gt;Unlike the elastic scaling of public Azure, on-premises capacity is finite. IT architects must right-size VMs based on the physical resources available in the Azure Local cluster, while maintaining headroom for peak loads and failover overhead. Consider future growth when sizing: adding capacity requires purchasing and deploying new server nodes — a slower process than cloud scaling. The Azure Local Catalog and sizing tools assist with estimating how many VMs of given sizes a cluster configuration can support. &lt;/p&gt;
&lt;h3&gt;
  
  
  Creating VMs via Azure Arc
&lt;/h3&gt;

&lt;p&gt;Azure Local manages VMs as Azure resources through the Azure Arc Resource Bridge. VMs can be created using the Azure Portal, Azure CLI, ARM templates, Bicep, or Terraform. The creation workflow through the Azure Portal involves:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to the Azure Local cluster resource and select + Create VM.&lt;/li&gt;
&lt;li&gt;Specify project details: subscription, resource group.&lt;/li&gt;
&lt;li&gt;Configure instance details: VM name, custom location (associated with the Azure Local cluster), security type (Standard or Trusted Launch), storage path, OS image, administrator account, vCPU count, memory allocation (static or dynamic — cannot be changed post-deployment).&lt;/li&gt;
&lt;li&gt;Optionally enable Guest Management for Arc extensions integration, Domain Join for Active Directory, and additional data disks.&lt;/li&gt;
&lt;li&gt;Configure networking: attach at least one network interface with appropriate IP allocation (DHCP or static).&lt;/li&gt;
&lt;li&gt;Review and create.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This information is based on the documented VM deployment process for Azure Local environments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image management&lt;/strong&gt;: Custom VM images (VHDs) can be uploaded or imported as templates. Preparing golden images — pre-hardened with security agents, configurations, and required software — streamlines consistent provisioning across the fleet.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Security for VM Workloads
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trusted Launch&lt;/strong&gt;: Supported for Azure Local VMs, enabling secure boot and virtual TPM (vTPM). The vTPM state automatically transfers within a cluster, and attestation confirms whether the VM started in a known-good state.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Defender for Cloud&lt;/strong&gt;: Can assess and improve the security posture of both the Azure Local instance and individual VMs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arc guest management&lt;/strong&gt;: Extensions can be deployed inside VMs for configuration management, monitoring, and security agent installation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  GPU Workloads
&lt;/h3&gt;

&lt;p&gt;For AI or graphics-intensive workloads, Azure Local supports GPU-equipped servers. GPUs can be made accessible to VMs through direct pass-through or shared via GPU partitioning (GPU-P), which allows a single physical GPU to be divided into multiple virtual GPUs for different workloads simultaneously. This is valuable when multiple AI inference services, rendering tasks, or data processing workloads need GPU acceleration concurrently. NVIDIA GPUs (such as A-series models) are validated for Azure Local deployments.&lt;/p&gt;
&lt;h2&gt;
  
  
  Tryout and Evaluation Options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Azure Local 60-Day Trial&lt;/td&gt;
&lt;td&gt;Download the Azure Local OS from the Azure Portal for a free 60-day evaluation for proof-of-concept deployments on your own hardware. Even a single validated server can be used to test core features. Microsoft's Azure Arc Jumpstart project provides step-by-step demo scenarios.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Foundry Local (Preview)&lt;/td&gt;
&lt;td&gt;Free, no Azure subscription required. Install via &lt;code&gt;winget install Microsoft.FoundryLocal&lt;/code&gt; (Windows) or &lt;code&gt;brew tap microsoft/foundrylocal &amp;amp;&amp;amp; brew install foundrylocal&lt;/code&gt; (macOS). Run a model immediately: &lt;code&gt;foundry model run qwen2.5-0.5b&lt;/code&gt;. Experiment with text generation and speech-to-text on existing hardware. Alternatively, download the installer from the Foundry Local GitHub repository.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft 365 Local&lt;/td&gt;
&lt;td&gt;No standalone trial download; engagement through Microsoft or a certified solution partner is required for proof-of-concept or pilot deployments. Contact your Microsoft account team or visit the M365 Local GA sign-up page. Hardware requirements are significant (enterprise-scale server configurations), so evaluations typically take place in partner labs or test environments.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning Resources&lt;/td&gt;
&lt;td&gt;Microsoft Learn modules, tutorials, and the Azure Arc Jumpstart provide guided lab experiences. Community blogs, partner solution briefs (from Dell, HPE, Lenovo, etc.), and the Microsoft Tech Community contain implementation case studies and architectural guidance.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tradeoff consideration for trials: The 60-day Azure Local trial enables self-service evaluation of the core hyperconverged platform and VM management. However, testing disconnected operations requires the dedicated management cluster hardware and MCA-E eligibility, which limits ad hoc experimentation. For M365 Local, the partner-delivered model ensures proper configuration, but it means organizations cannot independently test before engaging commercially. Foundry Local, by contrast, offers the lowest barrier to entry — it runs on a standard laptop or desktop with no cloud dependencies.&lt;/p&gt;
&lt;h2&gt;
  
  
  Appendix: Building Agentic AI Solutions with Azure Local, Microsoft Agent Framework, and Foundry Local
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Conceptual Overview
&lt;/h3&gt;

&lt;p&gt;Modern AI applications increasingly follow an agentic pattern — multiple specialized AI agents that reason, communicate, and act to perform complex tasks. Microsoft provides tools to develop and run these solutions entirely on local infrastructure by combining three components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Azure Local — the on-premises infrastructure providing compute, storage, networking, and (optionally) GPU acceleration.&lt;/li&gt;
&lt;li&gt;Foundry Local — the on-device AI inference runtime serving LLM and other models via an OpenAI-compatible API endpoint.&lt;/li&gt;
&lt;li&gt;Microsoft Agent Framework (MAF) — an open-source framework (Python and .NET SDKs) for building, orchestrating, and deploying AI agents and multi-agent workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Agent Framework was introduced as an open-source project by Microsoft and is hosted on GitHub at microsoft/agent-framework with over 8,300 stars and 1,400 forks. The latest release at the time of research was python-1.0.0rc5 (dated 2026-03-19).&lt;/p&gt;
&lt;h3&gt;
  
  
  Architecture Pattern
&lt;/h3&gt;

&lt;p&gt;A concrete reference implementation was published on the Microsoft Developer Community Blog, demonstrating real-world AI automation with Foundry Local and MAF — described as running with "no cloud subscription, no API keys, no internet required". The system uses four specialized agents orchestrated by MAF:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PlannerAgent&lt;/td&gt;
&lt;td&gt;Sends user commands to the Foundry Local LLM and produces a structured JSON action plan&lt;/td&gt;
&lt;td&gt;4–45 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SafetyAgent&lt;/td&gt;
&lt;td&gt;Validates actions against workspace bounds and schema constraints&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ExecutorAgent&lt;/td&gt;
&lt;td&gt;Dispatches validated actions to the target system (e.g., robotics simulator for inverse kinematics and gripper control)&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NarratorAgent&lt;/td&gt;
&lt;td&gt;Produces a template-based summary of actions taken (with optional LLM elaboration)&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The orchestration flow follows a sequential pipeline: User → Orchestrator → Planner → Safety → Executor → Target System, with the Narrator providing observability. &lt;/p&gt;

&lt;p&gt;In this reference, the PlannerAgent uses Foundry Local as its AI backend, invoking a local model (e.g., qwen2.5-coder-0.5b) via the standard OpenAI Python client pointing to the Foundry Local endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;foundry_local&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FoundryLocalManager&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FoundryLocalManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5-coder-0.5b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern — structured JSON output from an LLM, validated by a safety layer, dispatched to a domain-specific engine — generalizes beyond robotics to home automation, game AI, CAD, lab equipment, and any domain requiring safe, structured control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Patterns on Azure Local
&lt;/h3&gt;

&lt;p&gt;For production deployment of agentic AI on Azure Local infrastructure, the following layered architecture applies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1&lt;/strong&gt; — AI Model Hosting: One or more Azure Local VMs (or containers) running Foundry Local to serve AI models. For small models, a standard CPU-equipped VM suffices. For large multimodal models, VMs with dedicated GPU access on Azure Local infrastructure leverage the latest NVIDIA GPUs for high-throughput inference. Foundry Local automatically selects the best execution provider (NPU &amp;gt; GPU &amp;gt; CPU) for the available hardware.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2&lt;/strong&gt; — Agent Orchestration: The Microsoft Agent Framework runs as a service (in a container on AKS or in a VM) and orchestrates the multi-agent pipeline. It handles agent-to-agent communication, memory management, tool integrations, and calls to the Foundry Local inference endpoint. Domain-specific engines (simulation environments, database connectors, control system APIs) can be integrated as tools that agents invoke during execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 3&lt;/strong&gt; — Application Interface: A custom frontend (web application, dashboard, CLI, or API gateway) through which users submit tasks and receive results. This can be hosted on the same Azure Local cluster.
All inter-layer communication occurs over the cluster's internal network, keeping data fully on-premises and latency to a minimum.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Applicable Scenarios
&lt;/h3&gt;

&lt;p&gt;The combination of Azure Local + Foundry Local + MAF enables agentic AI solutions where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Industrial automation&lt;/strong&gt;: Agents interpret natural-language operator commands, plan machine actions, validate safety constraints, and execute robotic or process-control operations — all on the factory floor without cloud dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sovereign AI assistants&lt;/strong&gt;: Multi-agent systems that collate local data, reason using on-device LLMs, and provide decision support in classified or regulated environments (defense, finance, healthcare) where data must never leave the premises.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge intelligence&lt;/strong&gt;: IoT-connected environments where agents monitor sensor data streams, use local AI for anomaly detection and root-cause analysis, and actuate responses in real time — applicable to energy infrastructure, transportation systems, or smart facilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline automation&lt;/strong&gt;: Field operations, shipboard systems, or disaster-response scenarios where internet connectivity is unavailable but sophisticated AI reasoning and automation are still required.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Advantages and Tradeoffs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Advantages&lt;/strong&gt;: Running agentic AI entirely on Azure Local provides data sovereignty (all prompts, model outputs, and orchestration data remain local), low latency (no network hops to cloud endpoints), deterministic cost (no per-token API charges), and operational resilience (functions without internet).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tradeoffs&lt;/strong&gt;: On-device models are constrained by local GPU memory and compute — the largest cloud-hosted models (e.g., GPT-4 at full scale) may not be runnable locally without significant GPU investment. Model updates require manual download and deployment rather than automatic cloud-side updates. Additionally, Foundry Local remains in public preview, meaning features and supported models are still evolving and may have limitations before general availability. Organizations should evaluate whether the models available for local inference meet their quality bar for production use, and plan for a path to larger models as Foundry Local's support for large-scale models on Azure Local with NVIDIA GPUs matures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Azure Local, Foundry Local, and Microsoft 365 Local together form a cohesive platform for organizations seeking sovereign, on-premises cloud capabilities without compromise. As data residency, regulatory compliance, and operational resilience become non-negotiable requirements across industries, Microsoft's investment in distributed infrastructure and local AI inference reflects a fundamental shift in how enterprises architect their digital ecosystems.&lt;/p&gt;

&lt;p&gt;The combination of &lt;strong&gt;Azure Local&lt;/strong&gt; (providing edge-aware infrastructure and hybrid compute), &lt;strong&gt;Microsoft 365 Local&lt;/strong&gt; (delivering productivity and collaboration on-premises), and &lt;strong&gt;Foundry Local&lt;/strong&gt; (enabling local LLM inference) addresses the long-standing tension between cloud agility and data sovereignty. Whether your organization operates in a connected, intermittently connected, or fully disconnected environment, these solutions let you innovate locally without sacrificing the governance, scale, or intelligence that cloud-native architectures offer.&lt;/p&gt;

&lt;p&gt;For IT architects and decision-makers, the path forward is clear: evaluate your specific regulatory, latency, and data residency requirements; prototype on a small cluster or Azure Local Appliance; and progressively expand as organizational confidence and operational maturity grow. The learning curve is manageable, the economics are favorable for regulated industries, and the competitive advantage in markets demanding data sovereignty is significant.&lt;/p&gt;

&lt;p&gt;As Foundry Local and Azure Local move toward general availability and mature their feature sets, the case for Sovereign Private Cloud becomes stronger. The future of enterprise computing is not "cloud vs. on-premises" — it is a thoughtfully designed hybrid architecture that respects both business logic and the regulatory terrain in which that logic operates.&lt;/p&gt;

</description>
      <category>azurelocal</category>
      <category>foundrylocal</category>
      <category>m365local</category>
    </item>
    <item>
      <title>Practical Guideline: How to Move Agents Beyond POCs and Deliver Real Enterprise Value</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 21 Mar 2026 08:29:29 +0000</pubDate>
      <link>https://forem.com/holgerimbery/practical-guideline-how-to-move-agents-beyond-pocs-and-deliver-real-enterprise-value-3267</link>
      <guid>https://forem.com/holgerimbery/practical-guideline-how-to-move-agents-beyond-pocs-and-deliver-real-enterprise-value-3267</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I hear the same question repeatedly from customers exploring agent or broader AI adoption: &lt;em&gt;"How do we escape the endless POC phase and actually deliver real business value?"&lt;/em&gt; Most organizations get stuck prototyping broadly instead of executing narrowly, trapped in cycles of experimentation that never reach production. This practical guideline distills &lt;strong&gt;ten core principles&lt;/strong&gt; proven to move agents from ideation into measurable enterprise impact. Read on to discover how to anchor initiatives in real processes, maintain scope discipline, connect agents to live input channels, enforce production-grade behavior from day one, integrate with mission-critical systems early, deliver in short iteration cycles, create lightweight review processes, commit to real usage within 30 days, use multiple small agents, and plan for long-term flexibility—transforming your AI investment from experimentation into sustainable value delivery.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Organizations often get stuck in endless prototyping cycles because they experiment broadly rather than execute narrowly. This guideline distills the core principles that move agents from ideation to measurable business impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Anchor the Initiative in a Single Real Process
&lt;/h2&gt;

&lt;p&gt;Begin by identifying a single operational workflow where your organization currently loses productive time on a recurring basis. This workflow should exhibit characteristics such as repetitive manual steps, rule-based decision logic, or intensive data manipulation. Avoid starting with abstract experimentation or exploratory prototypes that lack connection to actual business operations.&lt;/p&gt;

&lt;p&gt;The economic rationale for this approach is straightforward. When you &lt;strong&gt;ground your agent development in a concrete, existing process&lt;/strong&gt;, you force alignment with &lt;strong&gt;real data sources&lt;/strong&gt;, &lt;strong&gt;actual system dependencies&lt;/strong&gt;, and &lt;strong&gt;measurable business outcomes&lt;/strong&gt;. This concrete anchoring eliminates the disconnection that is characteristic of laboratory environments and proof-of-concept work, which often remain isolated from production constraints and real-world variability.&lt;/p&gt;

&lt;p&gt;To establish this baseline understanding, you should document the current state of the process by answering these questions. &lt;strong&gt;First, identify what data inputs currently exist and where they originate within your organization.&lt;/strong&gt; &lt;strong&gt;Second, determine which specific steps within the workflow consume the most human effort&lt;/strong&gt; and therefore represent the highest opportunity for efficiency gains. &lt;strong&gt;Third, establish what quantifiable outcome should improve as a result of agent implementation&lt;/strong&gt;, whether measured in terms of time savings per transaction, reduction in human errors, or increase in process throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Keep the First Agent Extremely Narrow
&lt;/h2&gt;

&lt;p&gt;The most decisive factor in moving beyond proof-of-concept phases is maintaining &lt;strong&gt;scope discipline&lt;/strong&gt;. Organizations frequently fail to operationalize agents because they attempt to expand functionality too broadly before achieving stable baseline performance in a narrow domain. This expansion pattern increases complexity exponentially while simultaneously distributing development resources across multiple problem dimensions.&lt;/p&gt;

&lt;p&gt;The essential discipline requires that you define the agent's responsibilities in a single, unambiguous sentence of the form: &lt;strong&gt;"This agent is responsible for X and nothing beyond X."&lt;/strong&gt; This constraint forces explicit trade-offs between capability breadth and implementation depth, ensuring that resources concentrate on achieving reliable performance in one well-defined function rather than fragmented performance across multiple functions.&lt;/p&gt;

&lt;p&gt;Consider these concrete examples of appropriately scoped initial deployments. An agent might be chartered to &lt;strong&gt;look up customer pricing from a master database and return one verified result&lt;/strong&gt;, without attempting to negotiate, modify, or recommend alternative pricing. Another agent might be restricted to &lt;strong&gt;extracting structured fields from incoming documents and validating them against schema requirements&lt;/strong&gt;, without attempting interpretation or applying business rules. A third agent might be limited to &lt;strong&gt;classifying incoming inquiries into exactly three predefined categories&lt;/strong&gt;, without attempting subcategories or fuzzy classifications.&lt;/p&gt;

&lt;p&gt;This discipline against over-engineering serves multiple economic functions. It reduces the surface area for defects, shortens the time to reach measurable operational impact, and simplifies the governance model for operating the agent in production environments. By deferring expansion until baseline performance is established, organizations create a foundation of operational reliability upon which additional capabilities can be layered incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Connect the Agent to a Real Input Channel Early
&lt;/h2&gt;

&lt;p&gt;Manual testing through a studio UI creates an isolated environment that does not reflect operational reality. The agent is evaluated against synthetic inputs, clean data structures, and predetermined response patterns—conditions that rarely occur in production systems. &lt;strong&gt;Real business value emerges only when the agent receives actual operational requests&lt;/strong&gt; that contain the variability, ambiguity, and edge cases inherent in genuine work.&lt;/p&gt;

&lt;p&gt;Operational input channels include the following mechanisms through which work currently flows into your organization: &lt;strong&gt;forwarded email messages&lt;/strong&gt; containing unstructured customer inquiries, &lt;strong&gt;Teams chat messages&lt;/strong&gt; that combine urgent questions with side conversations, &lt;strong&gt;CRM cases&lt;/strong&gt; that reference prior interactions and incomplete context, and &lt;strong&gt;uploaded documents&lt;/strong&gt; that may contain inconsistent formatting or missing required fields. Each of these channels introduces distinct data quality challenges and user expectations.&lt;/p&gt;

&lt;p&gt;The economic rationale for early channel integration stems from the principle of &lt;strong&gt;revealed preference through actual behavior&lt;/strong&gt;. When users interact with the agent through their existing workflow channels rather than a lab environment, their usage patterns reveal which capabilities create genuine value and which create friction. Synthetic testing cannot substitute for this behavioral signal. Furthermore, &lt;strong&gt;exposure to live variability from the first iteration accelerates learning&lt;/strong&gt; about edge cases and failure modes that would otherwise remain hidden until full production deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: Select one input channel that currently delivers the highest volume of work into your target process and route genuine operational requests through the agent beginning in the first development cycle. This approach ensures that early versions contend with real data distributions and authentic user patterns rather than idealized test scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Enforce Production‑Grade Behavior From Day One
&lt;/h2&gt;

&lt;p&gt;Development environments and production environments typically operate under fundamentally different constraints and enforcement mechanisms. In many organizations, agents developed during the proof-of-concept phase operate with suspended governance controls, synthetic data, and permissive access policies that would never be acceptable in operational systems. This separation creates a structural barrier to production adoption because moving an agent from a development environment to production then requires a complete redevelopment of its data connectors, compliance controls, and operational behaviors.&lt;/p&gt;

&lt;p&gt;The efficient approach eliminates this artificial separation by imposing production-grade constraints from the initial development phase. This requires the agent to &lt;strong&gt;use the actual data sources&lt;/strong&gt; employees rely on for their daily work, rather than sanitized copies or test databases. The agent must &lt;strong&gt;apply existing data access restrictions and compliance controls&lt;/strong&gt; that govern access to sensitive information within your organization, rather than running with elevated or unrestricted permissions. The agent must maintain &lt;strong&gt;consistent tone, content, and response handling&lt;/strong&gt; in line with organizational standards, rather than developing ad hoc response patterns during development that would later require modification. The agent must draw on &lt;strong&gt;approved knowledge sources&lt;/strong&gt; that align with organizational information governance policies, rather than accessing ad hoc files or unvetted external data.&lt;/p&gt;

&lt;p&gt;This approach reduces the economic cost of production deployment by eliminating the need to redesign and re-implement governance controls at transition time. Additionally, exposing the agent to genuine constraints during development accelerates the identification of edge cases and failure modes that would otherwise remain hidden until production deployment, when they would be far more costly to address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action&lt;/strong&gt;: Establish the expectation that the first version of the agent operates under the same governance framework and data access policies as the final production system. This mindset collapses the artificial gap between proof-of-concept development and production readiness.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Integrate With One Mission‑Critical System Early
&lt;/h2&gt;

&lt;p&gt;From an operational perspective, &lt;strong&gt;a proof-of-concept implementation that remains disconnected from your organization's core systems generates negligible business value&lt;/strong&gt; regardless of how well the agent performs in isolation. The critical transformation occurs when the agent &lt;strong&gt;gains the ability to read from or write to systems that directly affect your operational workflows&lt;/strong&gt;, such as customer relationship management platforms, enterprise resource planning systems, document management repositories, or human resources information systems. At that point, the agent transitions from a theoretical capability into a practical tool that produces measurable outcomes within your existing business processes.&lt;/p&gt;

&lt;p&gt;The economic principle underlying this requirement is straightforward: &lt;strong&gt;manual handoff steps between system boundaries represent a fundamental source of friction and delay&lt;/strong&gt;. When an agent completes its analysis but requires a human to manually transfer its output into another system, you have failed to eliminate the bottleneck that prompted the agent's development in the first place. Conversely, when the agent can &lt;strong&gt;directly query information from, or write results to, systems where decisions take effect&lt;/strong&gt;, the entire workflow collapses into a unified operational flow that removes intermediate steps.&lt;/p&gt;

&lt;p&gt;Your implementation approach should prioritize identifying which single system integration would eliminate the greatest volume of repetitive manual work, and then building only the minimal version of that integration during the initial development phase. This targeted approach might manifest as the agent &lt;strong&gt;querying a system to retrieve structured reference data&lt;/strong&gt; that previously required manual lookup, &lt;strong&gt;writing a record to a system to capture the decision the agent has reached&lt;/strong&gt;, &lt;strong&gt;extracting and processing a document that originated from a system's document repository&lt;/strong&gt;, or &lt;strong&gt;triggering an automated workflow in a system that would otherwise require manual initiation&lt;/strong&gt;. Even a single integration point of this magnitude, when implemented at the outset rather than deferred until later phases, serves as a forcing function that exposes the real constraints your agent must navigate within your organization's operational environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Deliver in Short Iteration Cycles
&lt;/h2&gt;

&lt;p&gt;Extended design phases pose a structural impediment to effective agent development by deferring real-world validation and prolonging the time horizon before measurable feedback becomes available. Organizations attempting comprehensive upfront design face two competing failures: either they design systems that do not align with operational reality once implemented, or they extend the pre-implementation phase so long that organizational priorities shift before deployment occurs. &lt;strong&gt;Agents improve most rapidly through short cycles of genuine operational usage&lt;/strong&gt; because each cycle generates concrete evidence of performance gaps and behavioral mismatches that cannot be anticipated through laboratory analysis alone.&lt;/p&gt;

&lt;p&gt;The organizational practice that supports this principle involves establishing a &lt;strong&gt;defined release rhythm of 7 to 10 days&lt;/strong&gt; as the baseline cadence for detecting problems, gathering behavioral feedback, and incorporating refinements. This rhythm creates a predictable organizational rhythm while ensuring sufficient time for both development work and operational assessment. Within this structured cycle, the work proceeds through sequential phases: during the initial week, the focus concentrates on delivering a &lt;strong&gt;working version of the agent that operates within the narrowly defined scope&lt;/strong&gt; established by principle two. During the second week, attention shifts toward &lt;strong&gt;integrating the agent with the single mission-critical system&lt;/strong&gt; identified during principle five, which forces the agent to operate under genuine operational constraints. During the third week, the team prioritizes incorporating feedback-driven refinements based on direct observations of how the agent performs under real operational patterns and edge cases. By the fourth week, the agent &lt;strong&gt;transitions into daily-use rotation&lt;/strong&gt;, becoming a standard component of the operational workflow rather than an experimental capability.&lt;/p&gt;

&lt;p&gt;This structured iteration discipline transforms theoretical value into &lt;strong&gt;practical, measurable improvements&lt;/strong&gt; by compressing the feedback loop between hypothesis and evidence to a manageable timeframe. Organizations that maintain shorter iteration cycles identify defects and misalignments exponentially faster than those that attempt extended design phases, resulting in a substantially faster path to production-grade performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Create a Lightweight Review and Quality Model
&lt;/h2&gt;

&lt;p&gt;Heavyweight governance structures—comprehensive architecture review boards, multi-stage approval processes, and extensive documentation requirements—impose transaction costs that delay feedback cycles and create organizational friction. These formal processes were designed for environments where deployment cycles measured months and the cost of errors remained relatively stable. Agent development operates under fundamentally different constraints: deployment cycles measure days, and the cost of a minor behavioral inconsistency in an agent can compound over hundreds of interactions before detection.&lt;/p&gt;

&lt;p&gt;The economic principle underlying lightweight review processes is that &lt;strong&gt;not all decisions require the same deliberative overhead&lt;/strong&gt;. Operational decisions about individual agent behaviors—how to handle edge cases, whether a response meets quality standards, or how the agent should escalate undefined requests—benefit from &lt;strong&gt;frequent, lightweight validation&lt;/strong&gt; rather than formal approval hierarchies. Conversely, decisions about expanding an agent's scope or integrating new system connectors require structured deliberation, but this deliberation should remain episodic rather than continuous.&lt;/p&gt;

&lt;p&gt;The practical implementation of this principle involves establishing separate review cadences calibrated to the urgency of decisions. &lt;strong&gt;Weekly operational reviews&lt;/strong&gt; should examine direct evidence of agent performance, specifically documented failures, observed edge cases that the agent failed to handle correctly, and user experience friction points that emerged during actual operational usage. These reviews operate without approval authority; they serve as diagnostic sessions that generate recommendations for refinement. &lt;strong&gt;Monthly functional expansion decisions&lt;/strong&gt; should convene stakeholder representatives to evaluate whether the agent's scope should be widened, which integration points to add next, or whether the agent should be split into multiple specialized agents. These decisions operate with explicit approval authority because scope decisions determine resource allocation for the subsequent month's development work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standard templates&lt;/strong&gt; for agent instructions, response formats, and escalation procedures ensure consistency across agents without requiring case-by-case review. A template encodes learned patterns from prior agent implementations into repeatable structures that new agents can adopt immediately, reducing both development time and the likelihood of behavioral inconsistencies between agents.&lt;/p&gt;

&lt;p&gt;This calibrated review model reduces unnecessary transaction costs while maintaining deliberative oversight for decisions that require it, so alignment occurs without creating a delivery bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Commit to Real Usage Within 30 Days
&lt;/h2&gt;

&lt;p&gt;The principle of time-bound value realization addresses a fundamental problem in agent adoption: organizations frequently defer the transition from development to operational deployment indefinitely, justifying continued laboratory work with incremental improvements that never add up to genuine business impact. The economic cost of this deferral compounds over time because development resources consumed during extended proof-of-concept phases represent opportunity costs that could have been deployed toward other organizational priorities.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;disciplined commitment to a fixed time horizon&lt;/strong&gt; solves this problem by establishing an explicit deadline for demonstrating measurable operational value. The specific timeframe of &lt;strong&gt;thirty days&lt;/strong&gt; aligns with the typical organizational planning cycle, allowing early agent deployment results to inform resource allocation decisions for the subsequent planning period. This timeframe is sufficiently compressed to prevent indefinite deferral while remaining realistic for narrowly scoped agents integrated with single system connectors.&lt;/p&gt;

&lt;p&gt;The operational rule is straightforward: &lt;strong&gt;if the agent is not delivering quantifiable value within 30 days of initial deployment to a real operational channel, the scope must be simplified rather than expanded. **&lt;/strong&gt; This is not a judgment of development competence but rather a signal that the current scope-to-resource ratio has become misaligned. Value delivery failure indicates that either the scope remains too broad to achieve stability within the available development effort, or the integration points do not connect to work patterns that generate sufficient transaction volume to demonstrate impact. In either case, the remedy is to &lt;strong&gt;reduce scope further rather than to invest additional effort in the current design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This discipline creates accountability structures that prevent laboratory research from consuming indefinite organizational resources while also forcing difficult conversations about scope alignment early in the adoption cycle, before significant resource commitments have been made.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Use Multiple Small Agents Instead of One Overloaded One
&lt;/h2&gt;

&lt;p&gt;As operational processes expand and additional requirements pile up, assigning more responsibilities to a single agent compounds performance issues and makes the governance framework needed to maintain operational consistency much harder to manage. Each additional responsibility you layer onto an existing agent increases the dimensionality of the state space the agent must navigate, exponentially expanding the set of edge cases and behavioral scenarios that must be designed for, tested against, and monitored in production.&lt;/p&gt;

&lt;p&gt;From an economic perspective, this multifaceted complexity imposes two distinct costs. First, &lt;strong&gt;development velocity decreases substantially&lt;/strong&gt; as the cognitive burden of managing interdependencies between distinct responsibilities grows. When an agent handles both classification and task execution, modifications to classification logic require careful analysis of how those changes cascade through task execution behavior. Second, &lt;strong&gt;operational failure modes become increasingly difficult to isolate and remediate&lt;/strong&gt; because a performance problem observed at the system boundary may originate from any of several distinct layers of responsibility.&lt;/p&gt;

&lt;p&gt;The principle of agent specialization addresses this problem by establishing the &lt;strong&gt;discipline of splitting responsibilities across multiple focused agents&lt;/strong&gt; as operational scope expands. Rather than expanding a single agent to handle routing decisions, classification decisions, domain-specific task execution, and document processing in sequence, you would instead deploy four distinct agents, each responsible for a single function. The &lt;strong&gt;routing agent&lt;/strong&gt; receives incoming work and determines which specialized agent should handle the request. The &lt;strong&gt;classification agent&lt;/strong&gt; processes the routed work and assigns it to the appropriate category within a predefined taxonomy. The &lt;strong&gt;domain-specific task agent&lt;/strong&gt; performs the operational work within that category, calling back-end systems and generating results. The &lt;strong&gt;document processing agent&lt;/strong&gt; extracts structured information from unstructured documents and prepares it for downstream task agents.&lt;/p&gt;

&lt;p&gt;This decomposition yields multiple benefits that justify the additional engineering required to orchestrate multiple agents. Small, &lt;strong&gt;specialized agents reach production stability faster&lt;/strong&gt; because each agent operates within a constrained state space with fewer edge-case combinations. &lt;strong&gt;Governance remains explicit and traceable&lt;/strong&gt; because each agent has a single defined responsibility, making it straightforward to document expected behavior and audit actual behavior against that standard. &lt;strong&gt;Failure isolation becomes tractable&lt;/strong&gt; because a performance degradation can be attributed to a specific agent component rather than requiring analysis across all bundled responsibilities. When a specific agent begins exhibiting unexpected behavior, the blast radius of potential impact remains constrained to the specific function that agent performs, rather than cascading through multiple dependent responsibilities.&lt;/p&gt;

&lt;p&gt;Over extended operational timelines, this modular architecture provides additional economic value through &lt;strong&gt;reduced cost of capability evolution&lt;/strong&gt;. When organizational requirements change, you can modify or replace a single specialized agent without requiring a redesign of the entire set of responsibilities. This flexibility allows organizations to adapt their agent ecosystem as operational priorities change.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Plan for Long‑Term Flexibility
&lt;/h2&gt;

&lt;p&gt;Long-term organizational success with agent systems depends on architectural decisions that preserve future optionality without imposing excessive upfront complexity. Adoption frameworks and industry analysis show that organizations with modular architectures, rather than monolithic designs, have significantly lower total cost of ownership over multi-year operational timelines. The economic principle underlying this requirement is that &lt;strong&gt;modular systems distribute change costs across smaller component boundaries&lt;/strong&gt;, whereas monolithic systems concentrate change costs across tightly coupled dependencies.&lt;/p&gt;

&lt;p&gt;Your agent architecture should prioritize &lt;strong&gt;flexibility in integrating capabilities&lt;/strong&gt; by establishing well-defined interfaces between agents and external systems, rather than embedding system-specific logic directly into agent instructions or prompts. This approach means that when your organization adopts a new CRM platform or replaces a document management system, you can update the system integration layer without requiring redesign of agent behavior specifications. Additionally, the architecture should remain &lt;strong&gt;protocol-driven&lt;/strong&gt;, meaning that agents communicate with each other and with external systems through standardized APIs and message formats rather than through proprietary connectors. This discipline ensures that as your organization's technology infrastructure evolves, your agent ecosystem can adapt without requiring wholesale redevelopment.&lt;/p&gt;

&lt;p&gt;The practical implication of this principle is that your initial agent deployment should &lt;strong&gt;incorporate extensibility patterns from the outset&lt;/strong&gt; rather than deferring architectural considerations until later phases. When you define how an agent accesses your customer database, design that access pattern to accommodate a future change in the database platform without requiring modifications to the agent's core logic. When you establish how agents communicate with business systems, use standardized protocols and well-documented interfaces that would allow additional agents to access those same systems without requiring new connector development. This forward-looking engineering discipline imposes modest additional design effort during initial implementation but eliminates expensive rearchitecting work later as organizational requirements evolve and technology infrastructure changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: The Fast‑Path to Production
&lt;/h2&gt;

&lt;p&gt;Organizations that successfully transition agents from proof-of-concept phases into sustained operational deployment share a consistent pattern of implementation discipline. These ten principles represent a synthesis of organizational practices that have demonstrated measurable results across diverse operational contexts.&lt;/p&gt;

&lt;p&gt;The foundational requirement is to &lt;strong&gt;anchor agent development in a specific, real operational process&lt;/strong&gt; rather than pursue abstract experimentation. This grounding in actual business workflows ensures that agent capabilities connect directly to measurable organizational problems. Building on this foundation, &lt;strong&gt;maintaining scope discipline through narrowly defined initial agent responsibilities&lt;/strong&gt; creates the conditions for rapid stabilization and early demonstration of operational value. The agent should then &lt;strong&gt;receive genuine operational input&lt;/strong&gt; through the channels where work currently flows into the organization, exposing the agent to real data distributions and authentic user behaviors from the initial development phase.&lt;/p&gt;

&lt;p&gt;Throughout the development cycle, &lt;strong&gt;applying production-grade governance controls, data access policies, and behavioral standards from day one&lt;/strong&gt; eliminates the artificial gap between development and production environments. Simultaneously, &lt;strong&gt;integrating with at least one mission-critical system early in the development process&lt;/strong&gt; forces the agent to operate under genuine operational constraints rather than remaining isolated in a laboratory environment. The development methodology should employ &lt;strong&gt;short iteration cycles measured in weeks rather than months&lt;/strong&gt;, which compresses the feedback loop between hypothesis and evidence, enabling rapid identification of misalignments between designed behavior and operational reality.&lt;/p&gt;

&lt;p&gt;Supporting this development rhythm requires establishing &lt;strong&gt;lightweight review processes calibrated to the urgency of decisions&lt;/strong&gt;, and separating continuous operational assessments from episodic capability expansion decisions. Organizations must enforce &lt;strong&gt;time-bound value realization through a commitment to deliver measurable operational results within thirty days&lt;/strong&gt;, which prevents indefinite deferral of production deployment and forces disciplined conversations about scope alignment. As operational requirements expand, maintaining &lt;strong&gt;modular architectures that distribute capabilities across multiple specialized agents&lt;/strong&gt; rather than accumulating responsibilities within single agents preserves development velocity and simplifies operational governance. Finally, &lt;strong&gt;planning for long-term flexibility through well-defined interfaces and standardized protocols&lt;/strong&gt; enables the agent ecosystem to adapt as organizational technology infrastructure and business requirements evolve.&lt;/p&gt;

&lt;p&gt;These principles work together to create implementation patterns that compress the transition from conception to the delivery of operational value.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>adoption</category>
      <category>aifoundry</category>
    </item>
    <item>
      <title>Microsoft 365 E7: Why Microsoft's New License Is a Logical Step for Agent‑Driven Enterprises</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 14 Mar 2026 08:13:21 +0000</pubDate>
      <link>https://forem.com/holgerimbery/microsoft-365-e7-why-microsofts-new-license-is-a-logical-step-for-agent-driven-enterprises-46gp</link>
      <guid>https://forem.com/holgerimbery/microsoft-365-e7-why-microsofts-new-license-is-a-logical-step-for-agent-driven-enterprises-46gp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Microsoft's announcement of Microsoft 365 E7 in March 2026 marks a watershed moment in enterprise technology strategy. For the first time in over a decade, Microsoft introduced a new top-tier enterprise license—not to add incremental features, but to fundamentally reconceptualize how organizations govern both human workers and autonomous AI agents as integrated components of the workforce. At $99 per user per month, E7 bundles Microsoft 365 E5, Microsoft 365 Copilot Wave 3 with agentic capabilities, the Microsoft Entra Suite, and the newly introduced Agent 365 control plane. This consolidation signals that AI agents have transitioned from experimental pilots to production-grade organizational resources requiring enterprise-grade identity, access, compliance, and auditability frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why You Should Read This&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you lead enterprise technology strategy, manage cloud infrastructure, evaluate AI adoption roadmaps, or determine software licensing budgets, E7 represents a critical inflection point in how enterprises will architect their IT operating models over the next five years. This article explains not just what E7 includes, but why Microsoft built it—addressing the architectural gaps E5 left exposed as organizations scale agent deployment from hundreds of thousands to tens of millions of instances. You'll understand the economic logic, the governance infrastructure, and the strategic positioning underlying this licensing evolution, enabling you to make informed decisions about whether E7 aligns with your organization's agent deployment trajectory and control requirements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction: From Productivity Suites to Agent Platforms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Historical Context and Evolution
&lt;/h3&gt;

&lt;p&gt;Microsoft's enterprise licensing strategy has traditionally centered on supporting productivity and organizational efficiency through cloud services and security infrastructure. The Microsoft 365 E5 tier, introduced in 2015, represented the established enterprise standard, designed to address the comprehensive security, compliance, productivity, and governance requirements of large organizations during the cloud adoption phase. For the next 11 years, E5 served as the highest-tier enterprise licensing option within the Microsoft 365 portfolio.&lt;/p&gt;

&lt;h3&gt;
  
  
  The March 2026 Announcement
&lt;/h3&gt;

&lt;p&gt;On 9 March 2026, Microsoft announced the availability of Microsoft 365 E7, designated as the Frontier Suite, representing the first introduction of a new top‑tier enterprise license since the E5 tier was originally established in 2015. This announcement signals a deliberate architectural evolution in how Microsoft structures enterprise licensing and organizational governance at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Composition and Technical Structure
&lt;/h3&gt;

&lt;p&gt;The Microsoft 365 E7 offering, priced at $99 per user per month, consolidates multiple previously distinct components into a unified licensing structure. This bundled approach encompasses Microsoft 365 E5, Microsoft 365 Copilot, the complete Microsoft Entra Suite, and the newly introduced Agent 365 control plane. Each component addresses specific operational and governance requirements within the modern enterprise technology infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fundamental Architectural Shift
&lt;/h3&gt;

&lt;p&gt;The introduction of E7 should not be interpreted as a simple price adjustment or repackaging of existing capabilities. Rather, E7 represents a substantive architectural shift in Microsoft's strategic positioning and technical philosophy. Microsoft is fundamentally repositioning Microsoft 365 from a platform optimized for human-centric productivity and security to a comprehensive control plane that manages and governs an integrated, mixed workforce comprising both human workers and autonomous artificial intelligence agents. This transition reflects evolving organizational requirements as enterprises move beyond pilot implementations of AI technologies toward systematic, organization-wide agent deployment at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Microsoft 365 E7 Actually Includes
&lt;/h2&gt;

&lt;p&gt;Microsoft 365 E7 consolidates four distinct product components, previously available as separate subscription offerings, into a unified licensing structure. This architectural consolidation reflects Microsoft's strategic decision to bundle interdependent capabilities that are increasingly required for enterprise-scale deployment of autonomous AI agents. The following sections provide detailed technical specifications for each included component.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft 365 E5: Foundation Layer
&lt;/h3&gt;

&lt;p&gt;Microsoft 365 E5 is the foundational component of the E7 licensing tier, providing core productivity, compliance, security, and identity management capabilities. These capabilities encompass the complete Microsoft Office productivity suite, including Exchange Online for messaging infrastructure, SharePoint Online for content management and collaboration, Teams for unified communications, OneDrive for business cloud storage, Microsoft Defender for comprehensive threat protection, Microsoft Intune for mobile and device management, Microsoft Purview for data governance and compliance, and Power BI Pro for business analytics and visualization. These capabilities provide the fundamental infrastructure required for enterprise productivity, data protection, and organizational governance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft 365 Copilot (Wave 3): Advanced AI Integration
&lt;/h3&gt;

&lt;p&gt;The E7 tier includes Microsoft 365 Copilot at the Wave 3 release level, which represents a significant evolution in AI integration across the Microsoft 365 application portfolio. Copilot is embedded across Microsoft Word, Excel, PowerPoint, Outlook, Teams, and the Loop workspace collaboration platform. Beyond traditional copilot assistance functions, Wave 3 introduces expanded agentic capabilities that enable autonomous planning, decision-making, and action execution. Additionally, Wave 3 extends multi-model support to integrate with multiple language model providers, specifically OpenAI and Anthropic Claude, providing organizations with flexibility in selecting the underlying AI model infrastructure based on specific organizational requirements, performance characteristics, or policy constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft Entra Suite: Identity and Access Management
&lt;/h3&gt;

&lt;p&gt;The E7 offering includes the complete Microsoft Entra Suite, an expanded product tier that goes beyond the standard Entra ID P2 offering. The Entra Suite encompasses advanced identity verification, comprehensive access governance frameworks, Zero Trust network access architecture for conditional connectivity, and sophisticated conditional access policy enforcement mechanisms. These capabilities provide an enterprise-grade identity management and access control infrastructure necessary to manage both human and non-human (agent-based) organizational identities within a unified framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 365: Governance and Control Infrastructure
&lt;/h3&gt;

&lt;p&gt;Agent 365 represents a newly introduced governance and security layer specifically designed to manage autonomous AI agents at an organizational scale. Agent 365 provides centralized inventory tracking across both Microsoft-native and third-party AI agent frameworks; comprehensive observability and monitoring capabilities; policy enforcement mechanisms specific to agent behavior and resource utilization; and lifecycle management functionality, including agent provisioning, update orchestration, and controlled retirement procedures. This component addresses the operational requirement for centralized governance of non-human autonomous entities executing within enterprise systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Economic Analysis and Bundle Composition
&lt;/h3&gt;

&lt;p&gt;When these four components are purchased individually through separate licensing arrangements after July 2026, the aggregate monthly cost per user is projected to be approximately $117 USD. The E7 consolidation bundle is offered at $99 USD per user per month, representing an aggregate cost reduction of approximately 15–17% when compared to the sum of individually purchased components. This pricing structure reflects both the operational efficiency gains from unified licensing administration and Microsoft's strategic intent to incentivize adoption of the consolidated governance framework for agent deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why E7 Exists: E5 Was Built for the Cloud Era, Not the Agentic Era
&lt;/h2&gt;

&lt;p&gt;Microsoft executives have been explicit that E5 was designed "pre‑agentic".&lt;br&gt;&lt;br&gt;
E5 assumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;humans are the primary actors,&lt;/li&gt;
&lt;li&gt;automation is largely scripted,&lt;/li&gt;
&lt;li&gt;identities map cleanly to employees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern enterprises increasingly violate all three assumptions.  &lt;/p&gt;

&lt;p&gt;AI agents today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;act autonomously,&lt;/li&gt;
&lt;li&gt;access mailboxes, calendars, files, and APIs,&lt;/li&gt;
&lt;li&gt;execute multi‑step workflows over time,&lt;/li&gt;
&lt;li&gt;are often created outside central IT using low‑code or no‑code tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Agent 365: The Missing Control Plane Enterprises Have Been Lacking
&lt;/h2&gt;

&lt;p&gt;Agent 365 is the genuinely new element in E7, and the main reason E7 is more than a repackaged bundle.&lt;br&gt;
Agent 365 provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized agent inventory across Microsoft and third‑party frameworks&lt;/li&gt;
&lt;li&gt;Identity and access controls via Entra&lt;/li&gt;
&lt;li&gt;Security monitoring via Defender XDR&lt;/li&gt;
&lt;li&gt;Compliance and auditability via Purview&lt;/li&gt;
&lt;li&gt;Lifecycle management (provisioning, update, retirement)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially, Agent 365 does not build or host agents. It governs them.&lt;br&gt;
Compute and execution remain consumption‑based via Copilot Studio, Azure AI Foundry, or partner platforms.&lt;br&gt;
This mirrors how enterprises already separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application development,&lt;/li&gt;
&lt;li&gt;runtime infrastructure,&lt;/li&gt;
&lt;li&gt;identity and governance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why E7 Is a Good Move for Enterprises Leveraging Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It Normalizes Agents as Enterprise Identities
&lt;/h3&gt;

&lt;p&gt;Microsoft is treating agents as digital workers, subject to the same identity, access, and policy frameworks as humans.&lt;br&gt;
This is a necessary prerequisite for scaling agents beyond experimentation. &lt;/p&gt;

&lt;h3&gt;
  
  
  It Reduces Architectural Fragmentation
&lt;/h3&gt;

&lt;p&gt;Prior to E7, organizations had to stitch together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;E5,&lt;/li&gt;
&lt;li&gt;Copilot add‑ons,&lt;/li&gt;
&lt;li&gt;Entra extensions,&lt;/li&gt;
&lt;li&gt;emerging agent governance tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;E7 consolidates these into a single, coherent enterprise architecture aligned with Zero Trust principles. &lt;/p&gt;

&lt;h3&gt;
  
  
  It Shifts AI from "Assistance" to "Execution".
&lt;/h3&gt;

&lt;p&gt;Wave 3 of Copilot introduces agentic capabilities that plan, act, and execute, not just summarize or draft.&lt;br&gt;
E7 provides the governance layer required to allow that execution safely. &lt;br&gt;
Without E7‑level controls, many organizations would be forced to block these capabilities entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  It Aligns Cost Models with Reality
&lt;/h3&gt;

&lt;p&gt;While $99 per user appears high, E7 reflects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the true cost of enterprise security,&lt;/li&gt;
&lt;li&gt;identity governance for both humans and agents,&lt;/li&gt;
&lt;li&gt;reduced overhead compared to managing multiple SKUs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Importantly, Microsoft is signaling that agents will be licensed like users, potentially with hybrid subscription and consumption models over time. &lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft 365 Enterprise Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  E5 vs E5 + Copilot vs E7 (Frontier Suite)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Status (March 2026)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft 365 E7 GA: May 1, 2026 &lt;/li&gt;
&lt;li&gt;Pricing shown is list price (USD, per user/month) &lt;/li&gt;
&lt;li&gt;Consumption costs for building/running agents are not included in any plan&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;E5&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;E5 + Copilot&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;E7 (Frontier Suite)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What’s included&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full Microsoft 365 E5 (Office apps, Exchange/SharePoint/OneDrive, Teams*, Defender, Intune, Purview, Power BI Pro). No Copilot, no Agent 365, and no full Entra Suite.&lt;/td&gt;
&lt;td&gt;E5 (as left) &lt;strong&gt;plus&lt;/strong&gt; Microsoft 365 Copilot add‑on. No Agent 365, no full Entra Suite.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;E5 + Copilot + Entra Suite + Agent 365&lt;/strong&gt; in one SKU; positioned as the “Frontier Suite” for agent‑at‑scale scenarios.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;List price (USD/user/month)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$60&lt;/strong&gt; (from July 1, 2026).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$90&lt;/strong&gt; (E5 $60 + Copilot $30).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$99&lt;/strong&gt; (bundle). With Teams‑excluded option reported at &lt;strong&gt;$90.45&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot (Wave 3) agentic capabilities&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not included.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included&lt;/strong&gt; (via add‑on). Multi‑model (OpenAI + Anthropic) support arrives with Wave 3.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included by default&lt;/strong&gt; with Wave 3 agentic features (planning, acting across Microsoft 365).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent 365 (agent governance &amp;amp; control plane)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not included.&lt;/td&gt;
&lt;td&gt;Not included by default (can be added at &lt;strong&gt;$15&lt;/strong&gt;/user/month).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included&lt;/strong&gt;; GA on &lt;strong&gt;May 1, 2026&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Entra Suite&lt;/strong&gt; (beyond Entra ID P2)&lt;/td&gt;
&lt;td&gt;Not included (E5 includes Entra ID P2 but not the broader Entra Suite).&lt;/td&gt;
&lt;td&gt;Not included.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Included&lt;/strong&gt; (e.g., Private Access, Internet Access, ID Governance/Protection, Verified ID).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security &amp;amp; compliance posture for agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Human‑centric controls only; no unified agent inventory/observability.&lt;/td&gt;
&lt;td&gt;Adds creation/use of Copilot agents but &lt;strong&gt;without&lt;/strong&gt; centralized agent governance unless Agent 365 is added.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Unified&lt;/strong&gt; agent inventory, policy enforcement, auditability across Defender/Entra/Purview.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bundle economics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Baseline plan.&lt;/td&gt;
&lt;td&gt;A la carte add‑on model. &lt;strong&gt;E5 + Copilot&lt;/strong&gt; = ~$90. Adding &lt;strong&gt;Entra Suite&lt;/strong&gt; (+$12) and &lt;strong&gt;Agent 365&lt;/strong&gt; (+$15) pushes to &lt;strong&gt;~$117&lt;/strong&gt;.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~$99&lt;/strong&gt; vs &lt;strong&gt;~$117&lt;/strong&gt; à la carte → &lt;strong&gt;~15–17% discount&lt;/strong&gt;; simpler procurement/governance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Availability date&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available now.&lt;/td&gt;
&lt;td&gt;Available now (Copilot GA prior to E7; Wave 3 rolling out).&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;GA May 1, 2026&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Who it fits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Organizations prioritizing core productivity/security without near‑term agent scale‑out.&lt;/td&gt;
&lt;td&gt;Teams piloting Copilot or limited agentic use cases, willing to bolt on governance later.&lt;/td&gt;
&lt;td&gt;Enterprises &lt;strong&gt;standardizing&lt;/strong&gt; on agents (cross‑department), requiring &lt;strong&gt;identity‑first&lt;/strong&gt; governance, zero‑trust access, and consolidated risk controls.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Teams availability depends on regional licensing rules; E7 is also offered as a *&lt;/em&gt;“without Teams”** SKU in some regions.*&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent execution costs&lt;/strong&gt; (LLM tokens, orchestration runtime, long‑running workflows) are &lt;strong&gt;not included&lt;/strong&gt; in any license and must be budgeted separately.&lt;/li&gt;
&lt;li&gt;E7 is the &lt;strong&gt;first Microsoft 365 SKU designed explicitly for the agentic AI era&lt;/strong&gt;, not just productivity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Important Caveat: E7 Is Not the Full Cost of an Agentic Enterprise
&lt;/h2&gt;

&lt;p&gt;E7 does not include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent execution compute,&lt;/li&gt;
&lt;li&gt;LLM consumption,&lt;/li&gt;
&lt;li&gt;orchestration runtime costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These remain variable and are billed separately via Copilot Studio, Azure AI Foundry, or partner services.&lt;/p&gt;

&lt;p&gt;Enterprises should therefore view E7 as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the governance and control foundation, not the entire AI budget.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion: E7 as an Architectural Statement
&lt;/h2&gt;

&lt;p&gt;Microsoft 365 E7 represents a significant shift in how enterprises conceptualize their licensing strategy and operational architecture. Rather than functioning primarily as a licensing vehicle, E7 serves as a declaration that Microsoft 365 is positioned as the foundational operating system for enterprises that intend to operate within an agentic computing paradigm.&lt;/p&gt;

&lt;p&gt;For organizations that anticipate the need to execute a comprehensive deployment strategy involving autonomous AI agents across multiple business functions, integrate these agents into mission-critical business processes, and maintain the requisite levels of security governance, compliance attestation, and auditability, the scope and depth of capabilities provided by E7 are not superfluous. Instead, these capabilities represent structural and architectural necessities that must be addressed to enable safe and controlled agent deployment at an organizational scale.&lt;/p&gt;

&lt;p&gt;The evolution from E5 to E7 reflects a fundamental recalibration of enterprise platform design philosophy. The E5 licensing tier was optimized and engineered for the cloud-centric era, where enterprises sought to modernize their infrastructure, data management, and collaboration mechanisms through cloud-native services. E7, by contrast, is optimized for an organizational context in which the workforce composition includes both human workers and autonomous AI agents, each requiring appropriate identity governance, access controls, security monitoring, and compliance instrumentation within an integrated control plane.&lt;/p&gt;

&lt;p&gt;This architectural shift acknowledges that managing agents as first-class organizational entities—rather than as peripheral or experimental capabilities—requires the same level of systematic governance, policy enforcement, and observability that enterprises have come to expect from their core identity and security infrastructure.&lt;/p&gt;

</description>
      <category>microsoft365</category>
    </item>
    <item>
      <title>Introducing MATE: A Modular Testing Environment for AI Agents</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 07 Mar 2026 08:07:02 +0000</pubDate>
      <link>https://forem.com/holgerimbery/introducing-mate-a-modular-testing-environment-for-ai-agents-576l</link>
      <guid>https://forem.com/holgerimbery/introducing-mate-a-modular-testing-environment-for-ai-agents-576l</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
As AI agents become integral to business processes, reliable and repeatable testing is essential for confidence in deployment. This article introduces the &lt;strong&gt;Multi-Agent Test Environment (MATE)&lt;/strong&gt; – an enterprise-grade framework for automated testing of AI agents across platforms and frameworks – and explains how its modular design addresses key challenges in agent testing. We explore why testing AI agents is critical, delve into MATE's architecture and features, compare MATE with alternative testing approaches, and outline MATE's roadmap including red-team testing, enhanced cloud deployment, and support for emerging agent frameworks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Importance of Testing AI Agents
&lt;/h2&gt;

&lt;p&gt;AI &lt;strong&gt;agents&lt;/strong&gt; built with Microsoft Copilot Studio are powerful but complex systems. They combine &lt;strong&gt;natural language understanding, generative AI, and business logic&lt;/strong&gt;, often operating in critical scenarios (customer support, data retrieval, workflow automation, etc.). Ensuring these agents work correctly and safely under diverse conditions is as important as testing traditional software – if not more so. Key reasons why rigorous agent testing is essential include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reliability and Consistency:&lt;/strong&gt; Unlike deterministic software, AI agents can produce different answers to the same question due to their probabilistic nature. Without structured tests, one might only catch issues by &lt;strong&gt;manually typing questions and hoping for the right answer&lt;/strong&gt;, a fragile approach. Automated testing provides consistency – the same test can be run repeatedly to ensure the agent’s behavior remains reliable after updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-Grade Quality:&lt;/strong&gt; In enterprise deployments, an untested agent can lead to incorrect or even &lt;strong&gt;unsafe outputs&lt;/strong&gt;, damaging user trust or violating compliance. Ad-hoc testing that “relies on intuition instead of structured testing” &lt;strong&gt;doesn’t scale&lt;/strong&gt; for enterprise needs. Organizations require &lt;strong&gt;repeatable, at-scale test processes&lt;/strong&gt; to validate that agents meet quality standards (accuracy, relevance, safety) consistently before and after release.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex Multi-turn Interactions:&lt;/strong&gt; Copilot Studio agents often handle &lt;strong&gt;multi-turn conversations&lt;/strong&gt;, maintaining context across multiple user and agent turns. Testing these multi-step dialogues manually is time-consuming and error-prone. Automated test suites allow developers to simulate complex conversation flows (with varying user inputs, branching dialogs, tool invocations, etc.) and verify the end-to-end behavior in one run. This ensures that the agent can handle &lt;strong&gt;scenario-based conversations&lt;/strong&gt; robustly, from greeting to task completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nondeterministic and Generative Responses:&lt;/strong&gt; When agents use &lt;strong&gt;generative AI capabilities&lt;/strong&gt;, they might produce creative or unexpected phrasing. Verifying such responses is not as simple as exact string matching. Effective testing must evaluate responses on &lt;strong&gt;semantic correctness, completeness, and compliance&lt;/strong&gt;, even if wording varies. This introduces a challenge: how do you automatically judge an AI-generated answer’s quality? We’ll see how MATE tackles this with an &lt;strong&gt;AI-based “judge”&lt;/strong&gt; component.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent Updates and Continuous Integration:&lt;/strong&gt; Agents are rarely static – their underlying &lt;strong&gt;prompts, skills, and knowledge sources&lt;/strong&gt; evolve. Without automation, re-testing the agent after each change or on a schedule (for example, to catch drift or regressions) would be prohibitively labor-intensive. A good agent testing framework enables &lt;strong&gt;continuous integration (CI)&lt;/strong&gt; pipelines and nightly runs, so that any breaking change or quality degradation is caught early. This is crucial for &lt;strong&gt;scaling up&lt;/strong&gt; the number of agents in production while keeping maintenance overhead low.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency and Debugging:&lt;/strong&gt; When a test fails, developers need insights into &lt;em&gt;why&lt;/em&gt;. For example, did the agent retrieve the wrong data because of an intent misclassification? Or did it produce a partially correct answer that was marked as a failure due to a strict check? Good testing tools provide &lt;strong&gt;detailed reporting&lt;/strong&gt; – conversation transcripts, logs, and metrics – to help pinpoint the root cause of failures. This accelerates debugging and &lt;strong&gt;continuous improvement&lt;/strong&gt; of the agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary, robust testing of agents is the linchpin for &lt;strong&gt;trustworthy AI deployments&lt;/strong&gt;. It allows teams to &lt;strong&gt;validate functionality, accuracy, robustness, and safety&lt;/strong&gt; in a systematic way. This need has driven Microsoft to introduce solutions like the &lt;strong&gt;Power CAT Copilot Studio Kit&lt;/strong&gt; (a Power Platform solution for agent testing) and, more recently, the &lt;strong&gt;built-in Agent Evaluation&lt;/strong&gt; feature in Copilot Studio (now in preview). However, each solution comes with certain limitations or prerequisites, which the new &lt;strong&gt;MATE&lt;/strong&gt; aims to overcome. Before comparing approaches, let’s first introduce MATE and how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing MATE: A Modular Testing Framework for AI Agents
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Link to &lt;a href="https://github.com/holgerimbery/mate" rel="noopener noreferrer"&gt;MATE GitHub Repository&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Link to &lt;a href="https://github.com/holgerimbery/mate/wiki" rel="noopener noreferrer"&gt;MATE Wiki&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Multi-Agent Test Environment (MATE)&lt;/strong&gt; is an internal project and framework designed to provide &lt;strong&gt;automated, comprehensive testing&lt;/strong&gt; for AI agents, initially focusing on Microsoft Copilot Studio agents. MATE was created to address the challenges above by combining &lt;strong&gt;enterprise-grade tooling with a modular, extensible architecture&lt;/strong&gt;. In essence, MATE allows developers and testers to &lt;strong&gt;connect to a running Copilot Studio agent, simulate conversations, evaluate the agent’s responses against expected outcomes using AI, and produce detailed metrics and reports&lt;/strong&gt; – all in an automated fashion.&lt;/p&gt;



&lt;p&gt;MATE’s approach can be seen as bringing many of the benefits of the Copilot Studio Kit into a &lt;strong&gt;single, code-first testing environment&lt;/strong&gt;. Rather than a Power App solution, MATE is a &lt;strong&gt;pure .NET 9&lt;/strong&gt; application  that you can run in a container stack. This design choice means MATE operates outside the constraints of the Power Platform, giving developers more flexibility in how and where they run their tests.&lt;/p&gt;

&lt;p&gt;Let’s break down how MATE works and how it addresses key testing challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct Line Integration (Live Agent Testing):&lt;/strong&gt; MATE connects to the agent through its Direct Line API endpoint. This is the same interface used by real chat channels (like Teams or a custom web chat). By using Direct Line, MATE ensures it's testing the &lt;em&gt;deployed agent exactly as end-users experience it&lt;/em&gt;. The tool can send a sequence of user messages and receive the agent’s replies in turn, thereby automating full &lt;strong&gt;multi-turn conversations&lt;/strong&gt;. This addresses the challenge of multi-turn flows by allowing complex scenario scripts to be executed automatically. It’s effectively like an automated “test chat” but running dozens or hundreds of predefined conversations unattended.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Case Definition and Multi-turn Flows:&lt;/strong&gt; In MATE, you can define a &lt;strong&gt;test case&lt;/strong&gt; with multiple steps of user input (representing a conversation) and the expected outcomes. Expected outcomes can include:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expected Intent&lt;/strong&gt; and &lt;strong&gt;Entities&lt;/strong&gt; – i.e., which topic or action the agent should trigger and which key data (entities) it should extract.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptance Criteria&lt;/strong&gt; – specific conditions that constitute a pass/fail for the test (for example, certain keywords must appear in the answer, or a certain API call must be made).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference Answer&lt;/strong&gt; – an ideal answer text or outline for comparison.
Each test case can be labeled with a priority or category, useful for organizing large test suites (e.g., “P1 critical flows”, “Edge cases”, etc.). By supporting multi-step conversations in test cases, MATE ensures you can test end-to-end agent behavior, not just isolated single-turn Q&amp;amp;A.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;“Model-as-a-Judge” Evaluations:&lt;/strong&gt; One of MATE’s most powerful features is using an AI model to evaluate the quality of the agent’s response. Rather than relying only on hard-coded checks (exact matches or simple contained keywords), MATE sends the agent’s answer along with the reference answer and validation criteria to a &lt;strong&gt;Large Language Model (LLM)&lt;/strong&gt; – for instance, an &lt;strong&gt;Azure OpenAI GPT-4 model&lt;/strong&gt; – which acts as an impartial &lt;em&gt;judge&lt;/em&gt;. This &lt;strong&gt;AI Judge&lt;/strong&gt; scores the response across multiple &lt;strong&gt;evaluation dimensions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Task Success:&lt;/em&gt; Did the agent fulfill the user’s request or solve the user’s problem?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Intent Match:&lt;/em&gt; Did the agent correctly understand what the user was asking for?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Factuality:&lt;/em&gt; Is the information provided true and accurate (no hallucinations or incorrect data)?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Helpfulness/Completeness:&lt;/em&gt; Is the answer complete, well-structured, and does it address the user’s need effectively?&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Safety/Compliance:&lt;/em&gt; Does the response avoid policy violations (no sensitive data exposure, no disallowed content)?
Each of these dimensions is scored (e.g. 0.0 to 1.0), and MATE can apply &lt;strong&gt;configurable weightings&lt;/strong&gt; to decide if a test passes or fails overall. For example, you may require 0.9+ on Task Success and Intent Match, tolerate a lower score on style metrics like Helpfulness, and demand a perfect score on Safety. This approach directly tackles the challenge of evaluating &lt;em&gt;nondeterministic generative answers&lt;/em&gt;: even if the agent’s wording differs from the expected answer, the AI Judge can still determine that the answer is essentially correct and useful. Conversely, if the agent’s response is irrelevant or contains errors, the AI Judge will assign low scores, causing the test to fail. This method provides a nuanced, context-aware evaluation that traditional automated tests struggle to achieve. (Internally, the AI Judge uses prompt-based prompting of an LLM with the expected answer or criteria to get these scores.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;In Addition, MATE also supports other judge types, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;RubricsJudge&lt;/strong&gt; – A fully deterministic judge that evaluates responses using explicit rules such as Contains, NotContains, and Regex, making it ideal for compliance, safety, and reproducible pass/fail checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HybridJudge&lt;/strong&gt; – A cost‑efficient combination judge that first gates responses with deterministic rubrics and then applies an LLM for deeper qualitative scoring only where needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CopilotStudioJudge&lt;/strong&gt; – A Copilot‑Studio‑specific LLM judge that is citation‑ and grounding‑aware, aligning evaluations with Copilot Studio’s default reasoning and response patterns:&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenericJudge&lt;/strong&gt; – A lightweight, zero‑cost judge based on simple keyword and regex matching, intended for fast smoke tests and offline or CI scenarios
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Automated Test Generation from Documentation:&lt;/strong&gt; Authoring a comprehensive set of test cases can be labor-intensive. MATE addresses this by allowing you to &lt;strong&gt;upload documents (PDFs or text files)&lt;/strong&gt; that are relevant to your agent’s domain or knowledge base. It then automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Extracts&lt;/strong&gt; text content from the documents (using a PDF parser).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexes&lt;/strong&gt; and &lt;strong&gt;chunks&lt;/strong&gt; the content for semantic analysis (using a Lucene-based index).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates&lt;/strong&gt; potential questions and answers from the content using an LLM.
The outcome is a set of suggested Q&amp;amp;A pairs or even multi-turn conversation scenarios derived from the documentation. For example, if you upload a product FAQ PDF, MATE can generate likely customer questions and the correct answers from that PDF. These can be reviewed and added to your test suites. This feature helps broaden test coverage &lt;em&gt;automatically&lt;/em&gt;, ensuring the agent is tested on real knowledge it’s supposed to have, and catching gaps where it might not respond correctly. It’s an intelligent way to keep tests in sync with content. (Notably, Copilot Studio Kit in the Power Platform also introduced an AI-based test generation in preview, which uses the agent’s topics and knowledge to generate example questions. MATE provides a similar capability but on external docs and with full control of the generated cases.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Detailed Reporting and Analysis:&lt;/strong&gt; After executing tests, MATE provides rich &lt;strong&gt;metrics and logs&lt;/strong&gt;. In the &lt;strong&gt;Web Dashboard&lt;/strong&gt;, you can see overall pass rates, success trends over time, and drill down into individual test runs. Each test run retains the &lt;strong&gt;transcript of the conversation&lt;/strong&gt; and the scores for each evaluation dimension, so you can inspect exactly where a particular test failed. This addresses transparency: instead of just “Test 5 failed”, you can see that it failed because, say, &lt;em&gt;Factuality&lt;/em&gt; scored low (perhaps the agent gave a wrong detail), and even read the conversation to diagnose the issue. MATE’s &lt;strong&gt;Runs&lt;/strong&gt; view lets you compare results between runs – useful for spotting regressions after an update. All test data (test cases, results, transcripts, etc.) are stored in a local &lt;strong&gt;PostgreSQL database&lt;/strong&gt; for quick retrieval and can be queried or exported for additional analysis.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Web UI and CLI for Different Use Cases:&lt;/strong&gt; MATE offers two interfaces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Web Application&lt;/strong&gt; (built with ASP.NET Blazor Server) for an interactive experience. This is ideal for exploratory testing, configuring your test suites, and reviewing results. The UI includes a setup wizard for initial configuration (entering your agent’s Direct Line credentials and your AI model info) to generate the necessary settings file. Testers can use the web UI to kick off test runs on-demand, monitor progress, and view results in real time.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Command-Line Interface (CLI)&lt;/strong&gt; tool for automation. The CLI allows you to run tests as part of scripts or pipelines. For example, you can incorporate &lt;code&gt;dotnet run --suite "Regression Suite"&lt;/code&gt; into a DevOps or GitHub Actions pipeline, so that whenever the agent’s bot is updated or its content changes, the test suite runs and verifies everything still works. The CLI returns an exit code indicating success or failure (0 if all tests passed, non-zero if any test failed), which CI systems can use to pass/fail a build. This enables true &lt;strong&gt;CI/CD for AI agents&lt;/strong&gt; – a failed test can halt a deployment, preventing flawed agent versions from going live.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Containerized and Extensible Architecture:&lt;/strong&gt; MATE is designed to be run in a self-hosted manner, giving teams full control. It doesn’t require a SaaS backend or a Dataverse environment – you just need a machine that can reach the internet for calling the agent service and the AI model endpoint. This avoids many of the &lt;strong&gt;Power Platform licensing constraints&lt;/strong&gt; associated with the Copilot Studio Kit (discussed later). The architecture is modular by design, with separate components (projects) for domain logic, data storage, core services, web UI, and CLI. This modularity not only enforces clean separation of concerns, but also sets the stage for supporting &lt;strong&gt;multiple types of agents in the future&lt;/strong&gt;. In fact, MATE’s roadmap includes extending support to other agent frameworks beyond Copilot Studio – the core logic (test execution, AI judging, etc.) can be adapted to different agent APIs by swapping out the integration layer. Early code commits already hint at multi-agent support being developed and even an upcoming &lt;strong&gt;“red teaming” module for adversarial testing&lt;/strong&gt; (there are structural hooks in the codebase for this, though the feature is not yet implemented). This means MATE is not a one-off tool but a growing platform for comprehensive AI agent testing across the board.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  MATE Architecture at a Glance
&lt;/h3&gt;

&lt;p&gt;Internally, MATE is built with a modern software architecture using the latest Microsoft stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;.NET 9&lt;/strong&gt; with C# – providing performance and cross-platform support.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ASP.NET Core Blazor Server&lt;/strong&gt; for the web front-end – delivering a rich interactive UI for managing tests and viewing results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity Framework Core (with PostgreSQL)&lt;/strong&gt; – for the local database that stores test cases, results, transcripts, etc., ensuring persistence without requiring an external DB server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure AI OpenAI SDK&lt;/strong&gt; – to connect to the AI Judge model hosted on Azure’s AI services (Azure OpenAI “Foundry”). This is how MATE queries an LLM for evaluation of answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lucene.NET&lt;/strong&gt; – used for full-text indexing in the document-driven test generation feature, to find relevant content in uploaded docs for question generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF processing libraries&lt;/strong&gt; (e.g., UglyToad.PdfPig) – to extract text from PDF documents for test generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serilog&lt;/strong&gt; – for structured logging of events and errors, helping with diagnosing issues in test executions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The solution is divided into several components (projects) reflecting a &lt;strong&gt;modular design&lt;/strong&gt;: a &lt;strong&gt;Domain&lt;/strong&gt; layer for core models and interfaces, a &lt;strong&gt;Data&lt;/strong&gt; layer for database access, a &lt;strong&gt;Core&lt;/strong&gt; services layer (implementing the judge logic, execution engine, etc.), a &lt;strong&gt;WebUI&lt;/strong&gt; for the front-end, and a &lt;strong&gt;CLI&lt;/strong&gt; project for the command-line interface. This modularity makes it easier to maintain and extend specific parts (for example, adding a new agent connector could be done by introducing a new service in the Core or a new API integration, without touching the UI or data layers).&lt;/p&gt;

&lt;p&gt;Overall, MATE is engineered to be a &lt;strong&gt;scalable, extensible test harness&lt;/strong&gt; for AI agents. It’s currently focused on Copilot Studio agents, but its principles apply broadly. Next, we’ll compare MATE to the Copilot Studio Kit – the established testing solution from Microsoft’s Power CAT team – to understand their differences and use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing MATE with Copilot Studio Kit
&lt;/h2&gt;

&lt;p&gt;Microsoft’s &lt;strong&gt;Power CAT Copilot Studio Kit&lt;/strong&gt; is an existing solution aimed at testing and managing Copilot Studio agents. It’s a &lt;strong&gt;Power Platform&lt;/strong&gt; solution (managed package) that provides a canvas app or model-driven app interface, along with Dataverse entities and Power Automate flows, enabling test case creation, automated test runs via the Direct Line API, and analytics (such as conversation transcripts, dashboards, etc.). The Copilot Studio Kit was instrumental in early adoption of agent testing – it allowed makers to do things like bulk import test cases (via Excel), run them from a UI, and even integrate with Azure DevOps pipelines via Power Platform build tools.&lt;/p&gt;

&lt;p&gt;However, the &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; has some inherent characteristics stemming from its Power Platform foundation. Below is a comparison of &lt;strong&gt;MATE vs. Copilot Studio Kit&lt;/strong&gt; across key dimensions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;MATE (Multi-Agent Test Environment)&lt;/th&gt;
&lt;th&gt;Copilot Studio Kit (Power CAT)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;.NET application (containarized). Runs locally or in cloud; launched via web UI or CLI on demand.&lt;/td&gt;
&lt;td&gt;Power Platform managed solution. Deployed to a Dataverse environment; accessed via Power Apps interface.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Licensing &amp;amp; Costs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;source-available - CC BY-NC 4.0 . Requires .NET runtime and an Azure OpenAI endpoint (for AI Judge) which may incur usage costs. No special Power Platform licensing needed beyond having a Copilot Studio agent to test.&lt;/td&gt;
&lt;td&gt;Provided by Microsoft Power CAT as a sample solution (available on GitHub). However, requires Power Platform &lt;strong&gt;premium licenses&lt;/strong&gt;: a Dataverse environment, and for certain features, &lt;strong&gt;AI Builder credits&lt;/strong&gt; (for generative answer analysis). Usage of Dataverse and Power Automate in the kit might consume capacity or require specific licenses.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Technology Stack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Modern .NET 9 stack; Blazor Web UI, CLI tool, local PostgreSQL DB. Integrates with Azure services (OpenAI) for evaluation. Highly customizable and extendable by developers (source code available).&lt;/td&gt;
&lt;td&gt;Low-code Power App + Dataverse. Relies on standard Power Platform tech (model-driven app or canvas app, Dataverse tables, Power Automate flows, AI Builder for some AI tasks). Customization is limited to what Power Platform allows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Creation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports manual creation of test cases via UI or by defining JSON/CSV, etc., and &lt;strong&gt;auto-generation&lt;/strong&gt; of test cases from documents using LLMs. Test cases can include multi-turn dialogues in one case. Organized into test suites for batch execution.&lt;/td&gt;
&lt;td&gt;Supports manual test case input (through the app or via Excel import/export). Also supports multi-turn test cases and offers some &lt;strong&gt;AI-assisted generation&lt;/strong&gt; of test questions from agent topics/knowledge (in Preview, via the Agent Evaluation integration). Test cases stored in Dataverse; grouping of tests supported (by test set).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs tests externally by connecting to the agent’s Direct Line channel (or Web Channel with secret &amp;amp; bot ID). Offers a &lt;strong&gt;CLI&lt;/strong&gt; for headless execution (suitable for CI pipelines) and a web interface for interactive runs. Test results are stored locally and displayed in the web UI with analytics.&lt;/td&gt;
&lt;td&gt;Executes tests through Copilot Studio’s Direct Line API as well, orchestrated by Power Automate flows under the hood. Typically run on-demand from the app’s interface. There is integration for pipelines via Power Platform build tools, though this is more complex to set up. Results are stored in Dataverse and can be viewed via in-app dashboards or exported.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evaluation Methodology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;AI-driven semantic evaluation&lt;/strong&gt;: uses a GPT-based &lt;strong&gt;AI Judge&lt;/strong&gt; to score responses on multiple quality dimensions (task success, intent match, factual correctness, etc.). This allows flexible, semantic comparisons rather than simple exact matches. Configurable pass thresholds provide fine-grained control. Also supports explicit pass/fail rules (acceptance criteria) where needed.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Rule-based and some AI&lt;/strong&gt;: supports exact or partial &lt;strong&gt;response matching&lt;/strong&gt;, checking for expected keywords or presence of attachments, etc. For &lt;strong&gt;generative answers&lt;/strong&gt;, the kit uses &lt;strong&gt;AI Builder&lt;/strong&gt; to compare the agent’s answer with a reference answer for similarity. It also retrieves telemetry from &lt;strong&gt;Application Insights&lt;/strong&gt; to help explain failures. Plan validation tests examine the agent’s action plan against expected tools (for orchestration scenarios).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Modularity &amp;amp; Extensibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Designed to be &lt;strong&gt;modular and extensible&lt;/strong&gt;. The core can be extended to new agent types (plans to support other AI agent frameworks are in progress). The evaluation component (AI Judge) can be pointed to different models or adapted with different prompts. Being source-available, organizations can modify or extend MATE (e.g., add custom evaluation metrics, integrate with other data sources) to fit their needs.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Focused scope, limited extensibility.&lt;/strong&gt; The Copilot Studio Kit is specific to Copilot Studio agents and deeply tied to the Power Platform environment structure. It’s not architected to test arbitrary other agents. Customizing it generally means modifying the Power App or creating new Dataverse fields/flows, which requires Power Platform expertise.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data &amp;amp; Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stores test artifacts and results in a local database  within the application. No cloud infrastructure needed to get started; data stays within the user’s environment. For scaling up, the application could be hosted on a server or in Kubernetes (containerization support is under development). Because it’s self-hosted, data sovereignty and privacy can be managed internally.&lt;/td&gt;
&lt;td&gt;Relies on &lt;strong&gt;Dataverse&lt;/strong&gt; for storing tests and results, and optionally uses other services (App Insights, SharePoint, etc.) for logs and knowledge management. This provides seamless integration if you’re already within the Microsoft ecosystem, but it requires that all data be in the Power Platform cloud.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Table:&lt;/strong&gt; Comparison of MATE vs. Copilot Studio Kit across key aspects of testing functionality and usage.&lt;/p&gt;

&lt;p&gt;As seen above, &lt;strong&gt;MATE and Copilot Studio Kit share the same goal – improving agent quality through automated testing – but they differ in implementation approach&lt;/strong&gt;. MATE is more developer-oriented, offering flexibility, openness, and extensibility, whereas the Copilot Studio Kit is maker-friendly, integrated in the Power Platform with a ready-to-use interface but comes with platform constraints.&lt;/p&gt;

&lt;p&gt;From a Microsoft perspective, the &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; was a bridge solution to empower agent creators with testing capabilities before deeper platform features were available. Now, with &lt;strong&gt;Agent Evaluation built directly into Copilot Studio&lt;/strong&gt; (currently in preview), some capabilities of the kit are being absorbed into the product itself – for instance, AI-generated test queries and built-in execution of test sets. Still, the Kit provides additional tooling (like dashboards, inventory, governance features) that are useful in complex environments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;see articles:&lt;br&gt;&lt;br&gt;
 &lt;a href="https://preview.holgerimbery.blog/ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit" rel="noopener noreferrer"&gt;Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://preview.holgerimbery.blog/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview" rel="noopener noreferrer"&gt;Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;MATE, on the other hand, is an &lt;strong&gt;independent effort to provide a robust testing harness&lt;/strong&gt; that can evolve fast and go beyond what the closed-source product features offer. It is not limited by Power Platform’s boundaries (for example, one could imagine integrating MATE with other LLM evaluation criteria, or hooking it up to monitor backend APIs invoked by the agent). Additionally, MATE’s modular nature means it could incorporate &lt;strong&gt;other agent types&lt;/strong&gt; into the same testing dashboard. For example, if you have a fleet of different AI bots – some built in Copilot Studio, some using Azure OpenAI Orchestration, some third-party – MATE could theoretically be extended to test them all in one place, whereas the Copilot Studio Kit is only for Copilot Studio agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use which?&lt;/strong&gt; If you are a &lt;strong&gt;Power Platform maker or IT admin&lt;/strong&gt; who wants a straightforward, supported way to test Copilot Studio agents and you’re already comfortable with Power Apps and Dataverse, the &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; is a solid choice. It integrates nicely with the environment (and your data, logs, etc.) and doesn’t require coding to use. However, you’ll need the necessary licenses and some patience to configure the environment, and you won’t be able to easily customize how tests are evaluated beyond what Microsoft provides.&lt;/p&gt;

&lt;p&gt;If you are a &lt;strong&gt;developer or dev team&lt;/strong&gt; looking for a more flexible, code-driven approach – especially if you want to integrate agent testing into a DevOps pipeline or extend testing to specialized scenarios – &lt;strong&gt;MATE&lt;/strong&gt; is very appealing. It does require .NET and some setup, but it gives you &lt;strong&gt;full control&lt;/strong&gt;. You can run it locally for rapid iteration, include it in automated builds, and tweak it to your needs. There’s also no dependency on having a Power Platform environment or any particular license. You do need access to an &lt;strong&gt;Azure OpenAI service&lt;/strong&gt; (or you could swap in another LLM API if desired) to leverage the AI judge, but that is relatively straightforward for most enterprise scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roadmap and Future Enhancements in MATE
&lt;/h2&gt;

&lt;p&gt;MATE is an evolving project, and there are several notable enhancements on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Support for Additional Agent Types:&lt;/strong&gt; As of now, MATE &lt;strong&gt;supports testing Microsoft Copilot Studio agents exclusively&lt;/strong&gt;, because it specifically uses Copilot’s Direct Line API and related assumptions (like the concept of “topics” and Dataverse knowledge base) in its current version. However, the architecture is being extended to accommodate other agent platforms. Future versions are expected to introduce modules for &lt;strong&gt;other agent types&lt;/strong&gt; – for example, the ability to test &lt;strong&gt;Microsoft Agent Framework agents&lt;/strong&gt;, or even agents built with entirely different frameworks. This will broaden MATE’s applicability across various “agentic AI” solutions used within Microsoft and beyond, making it a one-stop testing hub for heterogeneous AI systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated Red Teaming:&lt;/strong&gt; In addition to “blue team” style functional testing (checking that the agent does what it’s supposed to), MATE aims to incorporate &lt;strong&gt;“red teaming”&lt;/strong&gt; capabilities. Red teaming in AI refers to attacking or stress-testing the agent with malicious or unexpected inputs to probe its defenses and safety measures. This can include testing the agent’s response to prompt injections, inappropriate content requests, or attempts to trick the agent into breaking rules. The goal is to ensure the agent is robust against misuse or adversarial users. The MATE codebase already contains the &lt;strong&gt;foundation for a Red Teaming module&lt;/strong&gt;, but this is currently just a skeleton (non-functional in the current release). Once completed, this feature will allow users to run a suite of adversarial tests (perhaps using predefined malicious prompts or common attack patterns) against their agents and get a report on vulnerabilities or policy compliance issues. This is a critical part of AI system testing, especially for enterprise scenarios, and its inclusion will differentiate MATE further by offering a more &lt;strong&gt;comprehensive safety evaluation&lt;/strong&gt; than what is currently possible with the Copilot Studio Kit or built-in Agent Evaluation (which, so far, focus on correctness and performance rather than adversarial robustness).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud and Scalable Deployment:&lt;/strong&gt; Presently, MATE runs as a local docker stack. Looking forward, the project plans to simplify &lt;strong&gt;deployment on Azure&lt;/strong&gt;, likely via containerization. Kubernetes support is on the roadmap, meaning you might be able to deploy MATE as a set of containers (web app, background worker, etc.) in an AKS (Azure Kubernetes Service) or similar environment. This will enable &lt;strong&gt;team-wide usage at scale&lt;/strong&gt; – multiple testers or developers could share a MATE instance, run tests concurrently, and store results in a central location, much like a web application service. Cloud deployment will also facilitate integration with other services (for example, connecting to Azure DevOps for automatic test triggers, or scaling out the AI Judge component). &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI/UX and Usability Improvements:&lt;/strong&gt; As an source-available project, MATE will continue to refine its user interface and ease of use. Features on the horizon could include richer test editing experiences (perhaps a visual conversation flow editor), more analytics dashboards (trend of agent performance over time, flakiness of certain tests, etc.), and integration with agent design tools (for example, pulling in agent topics or suggesting tests based on recent real user conversations – aligning with how built-in Agent Evaluation reuses Test Pane interactions).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Testing AI agents is no longer optional – it’s a necessity for any organization that wants to &lt;strong&gt;confidently deploy AI solutions&lt;/strong&gt;. Agent Development Solutions empower the creation of sophisticated AI Agents, but ensuring these agents function correctly, safely, and efficiently requires going beyond manual testing or one-off trials. This is where testing frameworks like Copilot Studio Kit and MATE come into play.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MATE (Multi-Agent Test Environment)&lt;/strong&gt; represents a next-generation approach to agent testing. It addresses the limitations of earlier tools by adopting a &lt;strong&gt;fully modular, code-first architecture&lt;/strong&gt; that can keep pace with the rapidly changing AI landscape. By using MATE, developers and testers gain the ability to thoroughly &lt;strong&gt;automate conversations with their agents, evaluate responses with the help of AI, generate tests from existing knowledge, and integrate all this into continuous delivery pipelines&lt;/strong&gt;. The outcome is a higher degree of assurance that your Copilot Studio agent will perform as expected when it’s in production – responding correctly to user queries, using the right tools, and staying within the guardrails.&lt;/p&gt;

&lt;p&gt;In comparison to the Power Platform-based Copilot Studio Kit, MATE offers more &lt;strong&gt;flexibility, extensibility, and independence&lt;/strong&gt;. You won’t be constrained by specific licensing or environment setups, and you can tailor the tool to your needs. On the other hand, it’s a more technical solution that may require developer effort to set up and maintain, whereas the Copilot Studio Kit is more turn-key if you’re already within Microsoft’s ecosystem. It’s encouraging to see both approaches available, as they cater to different audiences.&lt;/p&gt;

&lt;p&gt;Ultimately, MATE’s importance goes beyond just testing Copilot Studio agents. It signifies an evolving philosophy in the AI agent world: that &lt;strong&gt;testing and evaluation should be first-class citizens in the development lifecycle of AI systems&lt;/strong&gt;, just as they are in traditional software development. With AI models and agents becoming increasingly central to applications, tools like MATE help ensure we can trust these systems through systematic validation. MATE’s deep integration of AI for testing (using an AI to test another AI) is an innovative approach that can significantly enhance the rigor of evaluations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In summary, MATE enables teams to ship Copilot Studio agents (and, in the future, other AI agents) with greater confidence&lt;/strong&gt;. It provides the means to catch issues early, improve agents iteratively based on test feedback, and guard against regressions as agents evolve. By combining the power of automation with the wisdom of AI judging, MATE exemplifies a “test smarter” strategy for the era of generative AI – ensuring that our intelligent agents are not only smart, but also reliable, safe, and effective when they go to work for us.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>testing</category>
    </item>
    <item>
      <title>Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 14 Feb 2026 07:09:12 +0000</pubDate>
      <link>https://forem.com/holgerimbery/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview-l4a</link>
      <guid>https://forem.com/holgerimbery/testing-copilot-studio-agents-copilot-studio-kit-vs-agent-evaluation-preview-l4a</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Copilot Studio agents deserve testing at scale—but which tool fits your team? &lt;strong&gt;Agent Evaluation&lt;/strong&gt; brings lightweight, AI-powered checks into the Studio authoring UI for rapid iteration, while &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; delivers enterprise-grade multi-turn validation, plan verification, and telemetry for production gates. This guide cuts through the hype and shows you exactly when to reach for each tool—and why the best teams use both together.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why Read This&lt;/strong&gt; Choosing the right testing tool can make or break your Copilot Studio agent quality strategy. &lt;strong&gt;This article goes deeper&lt;/strong&gt; into how Copilot Studio Kit and Agent Evaluation complement—and differ in—their testing approaches. If you’ve read my earlier piece on &lt;a href="//./ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit"&gt;&lt;em&gt;Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit&lt;/em&gt;&lt;/a&gt; (2026-01-31), which focused on Kit capabilities alone, this comparison will help you decide &lt;em&gt;when&lt;/em&gt; to use Kit, &lt;em&gt;when&lt;/em&gt; to use Agent Evaluation, and &lt;em&gt;when&lt;/em&gt; to use both together. Perfect for teams scaling from dev iteration to production release gates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation vs. Verification in AI Agent Testing
&lt;/h2&gt;

&lt;p&gt;Before diving into the two tools, it’s worth clarifying a critical distinction in quality assurance: &lt;strong&gt;verification&lt;/strong&gt; and &lt;strong&gt;validation&lt;/strong&gt; address different concerns when testing conversational agents. &lt;strong&gt;Verification&lt;/strong&gt; answers the question “Did we build it right?” It focuses on ensuring your agent behaves as designed, follows the intended logic flows, and produces outputs that meet specifications. In practical terms, verification tests check that your agent’s instructions are correctly implemented, that topics route to the right handlers, and that expected responses are generated for known inputs. &lt;strong&gt;Validation&lt;/strong&gt; , by contrast, asks “Did we build the right thing?”—it assesses whether your agent actually meets user needs, provides accurate and helpful information, and performs well in real-world scenarios. Validation is inherently broader and more subjective; it often involves human judgment, user feedback, and quality metrics like relevance, groundedness, and user satisfaction. The Copilot Studio Kit excels at verification through structured test cases and telemetry analysis, while Agent Evaluation bridges both worlds by enabling quality assessments that combine deterministic checks (verification) with AI-based graders that approximate validation concerns. Most production-grade agent programs require both verification to catch implementation bugs and ensure consistency, and validation to ensure the agent truly solves the problem it was designed to solve.&lt;/p&gt;

&lt;p&gt;Microsoft provides two complementary approaches to test Copilot Studio agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; (Power CAT): A Dataverse-backed, installable toolkit that runs batch tests via the &lt;strong&gt;Direct Line API&lt;/strong&gt; , enriches results with &lt;strong&gt;Application Insights&lt;/strong&gt; / &lt;strong&gt;Dataverse&lt;/strong&gt; , and supports &lt;strong&gt;multi‑turn&lt;/strong&gt; and &lt;strong&gt;plan validation&lt;/strong&gt; scenarios, plus Excel import/export and dashboards. (&lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-overview" rel="noopener noreferrer"&gt;Kit overview&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-test-capabilities" rel="noopener noreferrer"&gt;Test capabilities&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-run-tests" rel="noopener noreferrer"&gt;Run tests&lt;/a&gt;, &lt;a href="https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent Evaluation (preview)&lt;/strong&gt;: A &lt;strong&gt;built‑in&lt;/strong&gt; Copilot Studio experience for creating &lt;strong&gt;evaluation sets&lt;/strong&gt; , generating prompts with AI, selecting &lt;strong&gt;test methods&lt;/strong&gt; (exact/partial, similarity/intent, AI‑judged quality metrics), and running structured checks directly from the authoring UI. (&lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/build-smarter-test-smarter-agent-evaluation-in-microsoft-copilot-studio/" rel="noopener noreferrer"&gt;Microsoft Copilot Blog announcement&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;Preview documentation&lt;/a&gt;, &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/whats-new-in-copilot-studio-october-2025/" rel="noopener noreferrer"&gt;What’s new&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Used together, &lt;strong&gt;Agent Evaluation&lt;/strong&gt; accelerates inner‑loop quality checks during design, while &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; anchors enterprise‑grade testing, telemetry, and CI/CD gates prior to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Each Option Provides
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Copilot Studio Kit (Power CAT)
&lt;/h3&gt;

&lt;p&gt;The Copilot Studio Kit is a comprehensive, installable testing framework built on Microsoft’s Power CAT (Power Customer Advisory Team) architecture. It extends Copilot Studio’s native capabilities by providing infrastructure for large-scale, automated agent validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Kit operates through a structured workflow: you define test cases within &lt;strong&gt;Agents&lt;/strong&gt; , organize them into &lt;strong&gt;Test Sets&lt;/strong&gt; , and execute these sets as &lt;strong&gt;Test Runs&lt;/strong&gt; on demand or via scheduled processes. Test execution occurs through the &lt;strong&gt;Direct Line API&lt;/strong&gt; , which simulates authentic user interactions with your published agent. Results are enriched with telemetry from &lt;strong&gt;Application Insights&lt;/strong&gt; and &lt;strong&gt;Dataverse&lt;/strong&gt; , capturing detailed diagnostic information, including which topics were invoked, confidence scores for intent matching, and end-to-end latency measurements. This multi-layer instrumentation enables root-cause analysis of test failures and performance anomalies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test types:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Kit supports multiple validation approaches to accommodate different agent behaviors.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Response Match&lt;/em&gt; tests verify that agent outputs match expected text or patterns.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Attachment/Adaptive Card&lt;/em&gt; tests validate that rich response elements (cards, files, structured outputs) are generated correctly.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Topic Match&lt;/em&gt; tests confirm that conversations trigger the intended dialog flows.&lt;br&gt;&lt;br&gt;
&lt;em&gt;Generative Answers&lt;/em&gt; tests assess responses from generative models embedded in agents.&lt;/p&gt;

&lt;p&gt;Beyond single-turn exchanges, the Kit handles &lt;strong&gt;Multi‑turn&lt;/strong&gt; conversations - sequences of user inputs and agent responses within a single session - and &lt;strong&gt;Plan validation&lt;/strong&gt; , which verifies that generative orchestration components select and invoke the correct tools, actions, and connected agents in the proper sequence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run management:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Once you execute a Test Run, the Kit provides tools for iterative validation work. You can duplicate previous runs to establish baselines, re-run enrichment steps (regenerating Application Insights correlations and Dataverse transcripts) without re-executing the entire test, and analyze aggregate pass/fail statistics or drill down into individual case results to identify failure patterns and trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Artifacts and maintenance:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Kit’s source code, configuration schemas, and runbooks are maintained in the &lt;a href="https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit" rel="noopener noreferrer"&gt;Power CAT GitHub repository&lt;/a&gt;, where you can review implementation details, file issues, and contribute improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Evaluation (Preview) in Copilot Studio
&lt;/h3&gt;

&lt;p&gt;Agent Evaluation is a feature integrated directly into the Copilot Studio authoring environment, designed to facilitate the systematic evaluation of conversational agents during development. Unlike external toolkits or custom test harnesses, Agent Evaluation operates natively within the Studio UI, allowing authors to create, manage, and execute test cases without leaving the design context. This approach is intended to reduce the barrier to agent testing and encourage more frequent, iterative validation within the authoring workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt; Agent Evaluation introduces the concept of &lt;strong&gt;evaluation sets&lt;/strong&gt; , which are collections of test prompts and expected responses. Authors can create these sets manually, import them from CSV files, or generate them automatically using AI based on the agent’s metadata or knowledge sources. This flexibility supports both targeted test case authoring and rapid bootstrapping of test coverage. Additionally, logs from the &lt;strong&gt;Test Pane&lt;/strong&gt; - the interactive testing interface within Copilot Studio - can be reused to create evaluation sets, streamlining the conversion of ad hoc tests into structured checks.&lt;/p&gt;

&lt;p&gt;For each test case, authors define the user prompt, the expected answer, and the &lt;strong&gt;success criteria&lt;/strong&gt;. The evaluation process can then be run directly within Studio, producing aggregate and per-case results that are immediately accessible for inspection and troubleshooting. This tight integration with the authoring environment is intended to support a rapid feedback loop, enabling authors to identify and address issues early in the development cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test methods:&lt;/strong&gt; Agent Evaluation supports several methods for assessing agent responses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lexical matching&lt;/strong&gt; : This includes exact and partial string matching between the agent’s response and the expected answer. Lexical methods are useful for scenarios where deterministic outputs are required, such as FAQ responses or compliance-driven answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Similarity/Intent matching&lt;/strong&gt; : These methods use semantic similarity algorithms or intent classification to determine whether the agent’s response is sufficiently close in meaning to the expected answer, even if the wording differs. This is particularly relevant for conversational agents that employ generative models or paraphrasing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-judged quality metrics&lt;/strong&gt; : Agent Evaluation can apply AI-based graders to assess qualitative aspects of responses, such as relevance, completeness, and groundedness. These metrics provide a more nuanced view of agent performance, especially in open-ended or knowledge-intensive scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt; Please note that Agent Evaluation is currently in public preview. As such, its feature set, supported test methods, and grading criteria may evolve based on user feedback and ongoing development. The tool is not intended to replace Responsible AI (RAI) or safety reviews, which remain essential for production deployments of conversational agents. Instead, Agent Evaluation is best viewed as a complementary capability for improving agent quality during the iterative design and testing phases. (&lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;Preview docs&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Side‑by‑Side Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Copilot Studio Kit&lt;/th&gt;
&lt;th&gt;Agent Evaluation (preview)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workspace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Separate model‑driven app (Dataverse) you install&lt;/td&gt;
&lt;td&gt;Built directly into Copilot Studio authoring UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test data authoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dataverse entities; &lt;strong&gt;Excel&lt;/strong&gt; import/export&lt;/td&gt;
&lt;td&gt;Create/import CSV; reuse Test Pane; &lt;strong&gt;AI‑generate&lt;/strong&gt; prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Via &lt;strong&gt;Direct Line API&lt;/strong&gt; with cloud flows/enrichment&lt;/td&gt;
&lt;td&gt;Run directly in Studio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test methods&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Response/Attachment/Topic match; &lt;strong&gt;Generative Answers&lt;/strong&gt; ; &lt;strong&gt;Multi‑turn&lt;/strong&gt; ; &lt;strong&gt;Plan validation&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Exact/Partial match; &lt;strong&gt;Similarity/Intent&lt;/strong&gt; ; &lt;strong&gt;AI quality&lt;/strong&gt; graders&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enrichment with &lt;strong&gt;App Insights&lt;/strong&gt; + &lt;strong&gt;Dataverse&lt;/strong&gt; (topics, intent scores, latencies)&lt;/td&gt;
&lt;td&gt;Aggregate pass/fail and scores with drill‑downs in Studio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong for release gates, duplication, and re‑runs&lt;/td&gt;
&lt;td&gt;Strong for inner loop; roadmap for broader scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adjacent features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Part of a broader kit (Compliance Hub, KPIs, SharePoint sync)&lt;/td&gt;
&lt;td&gt;Focused on evaluation inside Studio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maturity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generally available toolkit (maintained by Power CAT)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Public preview&lt;/strong&gt; (subject to change)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prefer &lt;strong&gt;Agent Evaluation (preview)&lt;/strong&gt; when
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;You want &lt;strong&gt;fast, in‑Studio&lt;/strong&gt; feedback while iterating on instructions, topics, or knowledge.&lt;/li&gt;
&lt;li&gt;Your scenarios are &lt;strong&gt;single‑turn&lt;/strong&gt; (FAQ‑style) and benefit from &lt;strong&gt;lexical/semantic/quality&lt;/strong&gt; scoring.&lt;/li&gt;
&lt;li&gt;You need &lt;strong&gt;AI‑generated&lt;/strong&gt; test prompts to bootstrap coverage quickly.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Prefer &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; when
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;You must validate &lt;strong&gt;multi‑turn, end‑to‑end&lt;/strong&gt; flows in one conversation context.&lt;/li&gt;
&lt;li&gt;You need to verify &lt;strong&gt;generative orchestration plans&lt;/strong&gt; (correct tools/actions/connected agents).&lt;/li&gt;
&lt;li&gt;You require &lt;strong&gt;deep telemetry&lt;/strong&gt; (topics, intent scores, latencies) and &lt;strong&gt;App Insights&lt;/strong&gt; correlation.&lt;/li&gt;
&lt;li&gt;You manage &lt;strong&gt;release gates&lt;/strong&gt; with repeatable Test Runs and re‑runs of enrichment steps.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Practical Scenarios
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inner‑loop tuning for HR FAQ&lt;/strong&gt; → Use &lt;strong&gt;Agent Evaluation&lt;/strong&gt; with &lt;strong&gt;Similarity&lt;/strong&gt; and &lt;strong&gt;AI quality&lt;/strong&gt; graders; generate 25 prompts from metadata, add a few gold answers, iterate on failures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre‑release regression for tool‑using policy agent&lt;/strong&gt; → Use &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt; : &lt;strong&gt;Multi‑turn&lt;/strong&gt; + &lt;strong&gt;Plan validation&lt;/strong&gt; ; App Insights + Dataverse enrichment to diagnose routing/tool‑selection issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic drift watch&lt;/strong&gt; → Weekly &lt;strong&gt;Agent Evaluation&lt;/strong&gt; runs for &lt;strong&gt;relevance/groundedness&lt;/strong&gt; ; investigate dips by reviewing knowledge changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI pipeline across multiple agents&lt;/strong&gt; → Nightly &lt;strong&gt;Kit&lt;/strong&gt; Test Runs with Excel‑managed test sets; export failures for triage; duplicate runs after fixes; investigate latencies via App Insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Strengths &amp;amp; Limitations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Copilot Studio Kit - strengths&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Multi‑turn, plan validation, deep observability (App Insights + Dataverse), and release‑friendly run management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot Studio Kit - limitations&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Separate installation/governance; relies on Direct Line and cloud flows; AI Builder needed for AI‑based answer analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Evaluation - strengths&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Native UI, AI‑generated test sets, flexible lexical/semantic/AI graders; quick to adopt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Evaluation - limitations&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Public preview; multi‑turn on the roadmap; does not replace RAI/safety reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Guide (Quick Reference)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need fast Studio‑native checks while editing?&lt;/strong&gt; → &lt;strong&gt;Agent Evaluation (preview)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need enterprise regression, multi‑turn, plan validation, telemetry?&lt;/strong&gt; → &lt;strong&gt;Copilot Studio Kit&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mature teams:&lt;/strong&gt; Use both - Evaluation for inner loop; Kit for outer loop, and promotion gates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Helpful How‑Tos &amp;amp; Deep‑dives (Clickable Links)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Copilot Studio Kit: &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-overview" rel="noopener noreferrer"&gt;Overview&lt;/a&gt; · &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-test-capabilities" rel="noopener noreferrer"&gt;Test capabilities&lt;/a&gt; · &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-run-tests" rel="noopener noreferrer"&gt;Run tests&lt;/a&gt; · &lt;a href="https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; · Community guides: &lt;a href="https://forwardforever.com/test-your-custom-copilot-with-power-cat-copilot-studio-kit/" rel="noopener noreferrer"&gt;Forward Forever&lt;/a&gt;, &lt;a href="https://www.matthewdevaney.com/configure-ms-auth-for-test-automation-in-copilot-studio-kit/" rel="noopener noreferrer"&gt;Matthew Devaney (MS Auth setup)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agent Evaluation (preview): &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/build-smarter-test-smarter-agent-evaluation-in-microsoft-copilot-studio/" rel="noopener noreferrer"&gt;Blog announcement&lt;/a&gt; · &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;Preview docs&lt;/a&gt; · &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/whats-new-in-copilot-studio-october-2025/" rel="noopener noreferrer"&gt;What’s new&lt;/a&gt; · Community guides: &lt;a href="https://sharepoint247.com/ai/how-to-evaluate-your-copilot-studio-agent/" rel="noopener noreferrer"&gt;How‑to&lt;/a&gt;, &lt;a href="https://dev.to/balagmadhu/agent-evaluation-in-action-tips-pitfalls-and-best-practices-5cje"&gt;Tips &amp;amp; pitfalls&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Testing conversational agents in Microsoft Copilot Studio requires a nuanced approach, as development team needs can vary significantly by project stage and agent complexity. The Copilot Studio Kit and Agent Evaluation (preview) are two distinct tools that address different aspects of the testing process.&lt;/p&gt;

&lt;p&gt;The Copilot Studio Kit is designed for scenarios requiring comprehensive, repeatable, and instrumented testing. It is particularly well-suited for validating multi-turn conversations, verifying orchestration plans, and collecting detailed telemetry for analysis. Because it operates as a separate, installable solution and integrates with Dataverse and Application Insights, it is most appropriate for teams that need to enforce quality gates before production releases or require historical tracking of test results. The Kit’s support for Excel-based test management and its extensibility through source code access make it a practical choice for organizations with established DevOps practices or those managing multiple agents at scale.&lt;/p&gt;

&lt;p&gt;Agent Evaluation (preview), on the other hand, is integrated directly into the Copilot Studio authoring environment. Its primary focus is to provide rapid feedback during agent development and tuning, especially for single-turn or FAQ-style interactions. The ability to generate test prompts with AI and apply a range of grading methods (from exact match to semantic similarity and quality metrics) makes it accessible to authors who may not have a background in test automation. However, as a preview feature, its capabilities and scope are still evolving, and it does not currently address multi-turn or orchestration scenarios in depth.&lt;/p&gt;

&lt;p&gt;In practice, many teams will find value in using both tools: Agent Evaluation for quick, iterative checks during agent design, and Copilot Studio Kit for more thorough validation prior to deployment. Understanding each tool’s intended use cases, technical requirements, and current limitations will help teams select the most appropriate approach for their workflow. As the Copilot Studio platform continues to evolve, the boundaries between these tools will likely shift, but the need for systematic, context-aware testing will remain constant to deliver reliable conversational agents.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>agentevaluation</category>
      <category>testing</category>
    </item>
    <item>
      <title>Microsoft Frontier Agents: A Deep Technical Overview</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 07 Feb 2026 10:57:11 +0000</pubDate>
      <link>https://forem.com/holgerimbery/microsoft-frontier-agents-a-deep-technical-overview-5h70</link>
      <guid>https://forem.com/holgerimbery/microsoft-frontier-agents-a-deep-technical-overview-5h70</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Microsoft Frontier Agents represent a fundamental shift in enterprise automation, moving from rigid rule-based systems to reasoning-capable AI agents that can gather information across organizational boundaries, synthesize complex data, and execute multi-step workflows autonomously. By operating within your existing Microsoft 365 infrastructure and security frameworks, Frontier Agents enable organizations to reimagine how work gets accomplished without requiring separate governance infrastructure or bypassing compliance controls.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Read on&lt;/strong&gt; , if you are an enterprise architect, IT administrator, or business leader evaluating how to incorporate AI-based reasoning and automation into your organization, this article provides the technical and strategic foundation you need to understand what Frontier Agents actually do, how they differ from traditional automation approaches, and what you should consider before implementing them. You will learn the specific capabilities of Frontier Agents, the governance mechanisms that control them, and the workflow redesign required to realize genuine value - not just the technology deployment itself. Whether you are preparing to participate in the Frontier program or evaluating whether agent-based automation makes sense for your organization, this deep technical overview will help you make more informed decisions about where and how to invest in this emerging technology.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is the Frontier Program?
&lt;/h2&gt;

&lt;p&gt;The Frontier program provides organizations with a structured mechanism to participate in early evaluation of emerging artificial intelligence capabilities within their existing Microsoft 365 infrastructure. Rather than deploying untested features across production systems, Frontier allows selected teams and administrators within an organization to voluntarily work with experimental AI features in a contained manner. These experimental features include various types of AI-powered agent systems - software entities capable of autonomous or semi-autonomous reasoning and action - as well as extensions to the Copilot assistant interface and specialized agent modes tailored for specific applications. Importantly, all of these experimental capabilities operate within the same security and data boundaries as the organization’s regular production Microsoft 365 environment. This design ensures that experimental features automatically inherit the organization’s established security policies, identity and access management through Entra ID, data location requirements, and compliance obligations. Organizations do not need to build separate infrastructure or bypass their existing governance frameworks to participate in these early evaluations.&lt;/p&gt;

&lt;p&gt;Key characteristics of the Frontier program include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native Tenant Integration:&lt;/strong&gt; Frontier agents operate seamlessly within your existing Microsoft 365 tenant, leveraging established security postures, Entra ID identity controls, data residency requirements, and compliance frameworks without requiring separate infrastructure or governance bypass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controlled Experimental Access:&lt;/strong&gt; Organizations can grant selective access to preview capabilities through granular admin controls, ensuring that experimental features remain isolated to designated teams while maintaining full visibility and auditability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback-Driven Iteration:&lt;/strong&gt; Microsoft actively incorporates operational data and user insights to refine, enhance, or retire Frontier capabilities, creating a responsive feedback loop that shapes agent behavior and feature maturity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Licensing and Activation Requirements:&lt;/strong&gt; Access to Frontier agents typically requires valid Microsoft 365 and Microsoft 365 Copilot licenses, with explicit administrator enablement through the Microsoft 365 Admin Center to control program participation at the organizational level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frontier Agents in Microsoft 365 Copilot
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Are Frontier Agents?
&lt;/h3&gt;

&lt;p&gt;Frontier Agents are software systems that are built directly into Microsoft 365 applications and the Copilot experience. These systems work by gathering information from across your organization, analyzing it step by step using logical reasoning, and then executing tasks that require multiple sequential steps.&lt;/p&gt;

&lt;p&gt;When you interact with a Frontier Agent, here is what actually happens beneath the surface: The agent first receives some information or a request from you. It then searches and retrieves data from various sources across your organization, including your emails, stored documents, calendar entries, meeting transcripts, and other business systems. Rather than simply retrieving information through a traditional search, the agent analyzes and reasons with the information it has gathered. It considers how different data elements relate to one another, understands the context in which that data exists, and then breaks down your original request into smaller, sequential steps that it can execute.&lt;/p&gt;

&lt;p&gt;Two concrete examples of Frontier Agents currently available are the Researcher and Analyst agents. The Researcher agent, for instance, can accept a specific question from you and then search through the documents, emails, and other sources available within your organization to locate relevant information. After gathering this information, it synthesizes the pieces to provide a comprehensive answer to your question. The Analyst agent operates on similar principles but is designed to examine data from multiple sources and identify patterns, trends, or themes that emerge across that data, rather than being built to answer a specific user question.&lt;/p&gt;

&lt;p&gt;An important distinction to note is that these agents are not limited to a single predefined action. Rather, they can take a single complex request from a user - such as “prepare a written summary of all customer feedback that arrived via email during the last quarter” - and break it down into multiple sequential steps. The agent might begin by identifying which emails and documents actually contain customer feedback. Following that, it might classify the feedback into categories by topic or theme. After categorization, it would extract the most significant points from each category. Finally, it would compile all of this into a single coherent summary document. Instead of a human staff member manually executing each step in sequence, the agent can reason through the problem and execute them in a logical order.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Scenarios
&lt;/h3&gt;

&lt;p&gt;To better understand how these agents function in practical situations, consider the following concrete examples of work that they can perform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Analyzing customer feedback across multiple communication channels:&lt;/strong&gt; Organizations receive customer feedback through various channels - email messages, support tickets, conversation logs, and direct messages. Rather than having a team member manually review months of accumulated customer communications to identify recurring problems, an agent can systematically examine all these disparate sources of customer input across your organization, identify which specific issues or concerns appear multiple times, and surface those recurring themes for leadership. This process would otherwise require significant manual effort to collate information from systems that were never designed to be analyzed together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Constructing comprehensive business review documents by synthesizing information from heterogeneous sources:&lt;/strong&gt; Businesses often need to prepare periodic review documents that combine information from multiple different places - strategic documents that were written and stored in your document systems, records of what was discussed and decided during meetings throughout the review period, and email conversations where important decisions and discussions occurred. An agent can gather these distinct types of information across different systems and synthesize them into a coherent narrative that presents a unified view of what occurred during the period in question.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Locating decision points that remain unresolved in organizational communications:&lt;/strong&gt; In any organization, communications - whether through email, chat systems, or meeting notes - often contain discussion about decisions that need to be made, but the outcome of those discussions remains unclear or incomplete. An agent can review your organization’s communications, identify discussions where no clear resolution was documented, and compile a summary that clarifies which decisions remain pending and who should be involved in resolving them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Evaluating patterns in historical meeting content and suggesting corresponding workflow adjustments:&lt;/strong&gt; Organizations hold many meetings over time, and the topics, themes, and concerns that emerge across multiple meetings often reveal something meaningful about where real work challenges lie. An agent can review the captured records from your organization’s past meetings, identify recurring themes and concerns across meetings and groups, and use that analysis to suggest where the organization might redirect its focus or restructure its processes to address the underlying issues that keep arising.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Microsoft Agent 365: The Control Plane Behind Frontier Agents
&lt;/h2&gt;

&lt;p&gt;Agent 365 is the administrative and governance layer that supports AI agents across an enterprise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent Discovery and Inventory Management:&lt;/strong&gt; Agent 365 maintains a comprehensive record and catalog of all AI agents operating within an organization. This inventory function allows administrators and operators to understand which agents exist, where they are deployed, what purposes they serve, and their operational status at any given time. Rather than having agents scattered across the organization with no central visibility, this inventory system provides a unified view of the agent ecosystem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Identity-Based Resource Access Controls:&lt;/strong&gt; Agent 365 works in conjunction with Entra ID (Azure’s identity management system) to establish which specific resources, data sources, and systems that individual agents are allowed to access. Instead of granting agents unlimited access across an organization’s systems, Agent 365 enforces granular permission boundaries, ensuring each agent can access only the data, applications, and services necessary to perform their assigned functions. This principle of least privilege prevents an agent from inadvertently or maliciously accessing sensitive data or systems outside its defined scope.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Standardized Integration Interfaces and Development Kits:&lt;/strong&gt; Agent 365 provides standardized software development kits (SDKs) and application programming interfaces (APIs) that establish consistent patterns for how new agents can be built and integrated with existing organizational systems. Rather than building each agent in isolation using different approaches, these standardized interfaces ensure that agents built by different teams or vendors can work together and communicate in predictable ways, following established patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relationship Mapping and System Visualization:&lt;/strong&gt; Agent 365 provides administrators and architects with visual representations that illustrate how different agents relate to one another, how they connect to and interact with people in the organization, what data sources they access or modify, and how they fit into broader business workflows. This visualization capability helps organizations understand the dependencies and interactions across their entire agent estate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrated Security Operations and Threat Detection:&lt;/strong&gt; Agent 365 incorporates monitoring and alerting functions that work in coordination with Microsoft Defender (security threat detection) and Microsoft Purview (data governance and compliance). This integration enables security teams to monitor agent behavior for signs of anomalous or suspicious activity and compliance teams to track whether agents are handling regulated data appropriately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDK Components and Development Framework Support:&lt;/strong&gt; Agent 365 supplies developers with pre-built components, software libraries, and reference implementations that accelerate the process of creating new agents or extending existing ones. It also supports industry-standard protocols, such as the Model Context Protocol (MCP), which establishes common patterns for how components can exchange information and coordinate their behavior.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Role of Frontier Agents in Enterprise Transformation
&lt;/h2&gt;

&lt;p&gt;As organizations implement Frontier Agents, they are fundamentally reconsidering how work gets accomplished. Rather than viewing these systems as replacements for human workers, the most effective deployments treat agents as tools that handle specific, well-defined operational tasks while people focus on decision-making, judgment calls, and work that requires contextual understanding.&lt;/p&gt;

&lt;p&gt;This shift requires more than simply enabling a new technology. Organizations need to redesign their operational workflows to leverage agents’ capabilities. This means identifying which portions of existing processes are primarily repetitive information gathering, data synthesis, or sequential task execution - the kinds of work that agents handle effectively - and then separating those portions from the work that genuinely requires human judgment, client interaction, or strategic thinking. Once this separation occurs, you can structure a workflow where the agent handles the mechanical portions while a person reviews and takes responsibility for the decisions that matter.&lt;/p&gt;

&lt;p&gt;One practical implication worth considering: when an agent executes multiple steps in sequence to complete a task, there are often opportunities to build in checkpoints that allow a human to review the agent’s work before it moves to the next stage. This prevents compounding errors and allows people to correct the agent’s reasoning when it goes astray. Rather than treating agent output as completely reliable or completely unreliable, organizations learn to understand where agents tend to make mistakes and design their approval workflows accordingly.&lt;/p&gt;

&lt;p&gt;Another important consideration is that moving work to agents does not necessarily mean work disappears from the organization. Instead, the work’s character changes. Where someone previously spent time reading through emails to find relevant information, they now spend time reviewing the agent’s findings to verify that the needed information was captured. Where someone previously spent time copying data from one system into a report, they now spend time checking whether the agent’s synthesis of that data makes sense. This redistribution of work can free up capacity for higher-value activities, but only if organizations deliberately redirect that freed-up capacity toward them rather than simply eliminating positions and expecting the same output from fewer people.&lt;/p&gt;

&lt;h2&gt;
  
  
  Examples of Frontier Experiences Available Today
&lt;/h2&gt;

&lt;p&gt;The Frontier ecosystem includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Experimental Copilot agents in the Agent Store.&lt;/li&gt;
&lt;li&gt;Agent Mode in Excel for the web.&lt;/li&gt;
&lt;li&gt;Agents for Dynamics 365 workloads.&lt;/li&gt;
&lt;li&gt;Experimental functionalities in Word, PowerPoint, and Copilot Chat.&lt;/li&gt;
&lt;li&gt;AI-enhanced Windows 365 Cloud PC capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Administrators Enable and Manage Frontier Agents
&lt;/h2&gt;

&lt;p&gt;Administrators must:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Confirm licensing prerequisites.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable Frontier in the Admin Center. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3u0zzfl7tkwoix67pea.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3u0zzfl7tkwoix67pea.png" alt="upgit_20260124_1769262371.png" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assign Frontier access to users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Agent 365 tools to manage lifecycle, visibility, and compliance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide SDK access for development.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why Frontier Agents Matter for Enterprise Architects
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Understanding the Shift from Rule-Based to Reasoning-Based Systems
&lt;/h3&gt;

&lt;p&gt;Traditionally, enterprise automation has relied on rule-based systems - tools that follow explicit, predetermined logic written by engineers. These systems work well when the tasks they handle are predictable and straightforward: if condition A is true, then execute action B. However, rule-based systems struggle when circumstances change unexpectedly or when situations don’t fit neatly into predefined categories. When an edge case arises that no one anticipated, the rule-based system either fails or performs the wrong action, requiring human intervention or system modification.&lt;/p&gt;

&lt;p&gt;Frontier Agents introduce a different approach. Rather than following rigid if-then-else logic, these agents can examine a situation, understand context and nuance, reason through the problem using information available to them, and adapt their approach as they encounter new information. This capability is critical for enterprise architects because it enables them to automate and streamline work that was previously too variable or context-dependent to automate. Enterprise architects should understand where agents can genuinely improve their organizations and where the investment in agent systems would not yield meaningful returns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical Reasoning Capabilities and Their Implications
&lt;/h3&gt;

&lt;p&gt;Agents can perform multi-step reasoning that mirrors how skilled humans approach complex problems. An agent can gather information from multiple sources, recognize patterns and relationships within that information, adjust its understanding as it discovers new details, and modify its approach based on what it learns. This capability matters for enterprise architects because it changes which business processes become candidates for automation. Instead of limiting automation to straightforward, fixed-sequence tasks, architects can now consider automating substantially more nuanced work. However, architects need to think carefully about which problems benefit from this capability and which would be simpler to solve through other means. Not every situation requires sophisticated reasoning - sometimes simpler solutions serve organizations better.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified Governance Without Separate Infrastructure
&lt;/h3&gt;

&lt;p&gt;One of the most significant practical benefits for enterprise architects is that Frontier Agents operate within existing organizational boundaries. Rather than requiring architects to design separate systems, approval processes, or compliance frameworks for agent-based work, agents inherit your organization’s existing security, identity, and compliance structure. This simplifies governance - architects do not need to build parallel systems. However, architects considered how to ensure that existing governance frameworks accommodate agent operations and that identity and access controls properly constrain what agents can do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Human–Agent Workflows and Redesigned Work Processes
&lt;/h3&gt;

&lt;p&gt;Implementing agents require architects and process owners to rethink how work flows through the organization. It is not sufficient to add an agent to an existing workflow. Instead, organizations need to identify where human judgment and decision-making are essential, where human oversight is valuable but not essential, and where mechanical task execution can be delegated entirely to an agent. This often means redesigning workflows substantially. An enterprise architect should expect that workflow redesign will be more challenging and time-consuming than deploying agent technology alone and should budget time and expertise accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incremental Adoption and Learning-Based Deployment
&lt;/h3&gt;

&lt;p&gt;Rather than requiring organizations to commit to large-scale transformation immediately, agents can be introduced gradually. This allows organizations to learn how agents affect their work, identify which applications genuinely improve outcomes, develop troubleshooting skills, and build internal expertise. Enterprise architects should view this as an extended learning period in which the organization creates an understanding of what agents can do within their specific context, rather than a one-time implementation that commits the organization to a comprehensive agent-based transformation immediately. This approach carries risks - it means initial projects may not yield dramatic returns - but it also creates space for learning and course correction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Frontier program is a practical approach organizations can use to test experimental AI-based systems that perform multi-step reasoning and task execution within their existing Microsoft 365 environments. Rather than requiring organizations to establish entirely separate infrastructure, sign off on different governance frameworks, or bypass critical requirements, the Frontier program enables them to experiment while remaining within their organizations’ existing security and identity boundaries.&lt;/p&gt;

&lt;p&gt;Automate Frontier Agents - software systems built into Microsoft 365 applications that gather information from multiple sources, reason with it, and execute sequential tasks to accomplish complex objectives - provides a concrete technology that organizations can use to explore how AI-based reasoning and automation might apply to their own workflows. Agent 365, the administrative layer beneath these agents, provides the inventory management, access controls, monitoring, and coordination mechanisms organizations need to maintain visibility and control over agent operations at scale.&lt;/p&gt;

&lt;p&gt;In practice, organizations that choose to implement Frontier Agents will need to do more than enable new technology features. They will need to rethink which parts of their business processes benefit from having agents handle information gathering and task execution, how to structure approval workflows that allow human staff to focus on judgment and decision-making, and how to ensure that the technology actually produces better outcomes rather than simply shifting the character of the work without improving results. This redesign work is often more demanding than the technical implementation itself, but it is also where genuine value is created.&lt;/p&gt;

&lt;p&gt;Organizations that approach Frontier Agents as an extended learning opportunity - where initial projects serve as experiments to develop organizational understanding of what agents can accomplish within specific contexts - are likely to make more thoughtful implementation decisions than organizations that pursue large-scale deployment without this learning period. This incremental approach has both benefits and drawbacks: initial projects may not deliver dramatic returns on investment, but the organization will develop practical knowledge of where agents are genuinely helpful, where they create problems, and how to design workflows that leverage agent capabilities effectively.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>frontier</category>
      <category>microsoft365</category>
      <category>agents</category>
    </item>
    <item>
      <title>Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 31 Jan 2026 12:35:30 +0000</pubDate>
      <link>https://forem.com/holgerimbery/ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit-3k7n</link>
      <guid>https://forem.com/holgerimbery/ship-copilot-studio-agents-with-confidence-master-automated-testing-with-the-copilot-studio-kit-3k7n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Shipping Copilot Studio agents without systematic, automated testing is risky: large language models (LLMs) are non‑deterministic, topic routing can drift, and integrations fail in ways casual “chat tests” won’t catch. Microsoft’s Copilot Studio Kit adds structured, repeatable testing (including multi‑turn and generative answer validation), integrates with Power Platform pipelines for gated deployments, and provides analytics and compliance tooling—so you can test as software teams do and ship with confidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why Read This Article&lt;/strong&gt; : AI agents powered by LLMs are unpredictable by nature. Casual testing—clicking through a chat interface a few times—is not enough to catch latent quality issues, topic confusion, or integration failures that emerge under real-world conditions. This guide walks you through a comprehensive, systematic approach to testing Copilot Studio agents that mirrors enterprise software practices: from defining repeatable test cases with the Copilot Studio Kit, to embedding quality gates in your deployment pipelines, to monitoring compliance and performance post-launch. If you are shipping agents to production—or planning to do so—you need structured, auditable testing to reduce risk and build stakeholder confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agent testing is different (and non‑optional) in Copilot Studio
&lt;/h2&gt;

&lt;p&gt;Conversational agents in Copilot Studio operate under fundamentally different constraints than traditional software. Where conventional applications follow deterministic code paths and produce consistent output for given inputs, conversational agents integrate multiple layers: natural language processing, topic routing, prompt-based reasoning, and knowledge retrieval across enterprise systems. This architectural difference creates testing challenges that cannot be addressed by conventional unit testing or ad hoc verification approaches.&lt;/p&gt;

&lt;p&gt;The non-deterministic nature of these systems stems from several sources. Large language models (LLMs) introduce inherent variability in response generation even when given identical inputs. Topic routing may inconsistently classify user intents, particularly for edge cases or ambiguous queries. Knowledge retrieval systems may return different result sets depending on index state, search ranking, or temporal data updates. When agents compose multiple actions—such as querying multiple data sources, applying filters, and synthesizing responses—the combinations of these variations compound, making manual testing insufficient to validate behavior across the full range of user interactions.&lt;/p&gt;

&lt;p&gt;Microsoft’s published guidance emphasizes that this variability necessitates systematic, large-scale evaluation rather than point-in-time chat testing. A developer might test an agent through a few interactive sessions and conclude it functions correctly based on limited exposure. However, when the agent encounters hundreds or thousands of real-world queries—including paraphrased variations, context-dependent follow-ups, and edge cases—latent quality issues, topic confusion, and integration failures emerge. Automated testing frameworks assess correctness and performance across representative input sets, providing measurable confidence in agent behavior.&lt;/p&gt;

&lt;p&gt;It is important to note that while automated evaluation effectively identifies accuracy and performance issues, Microsoft explicitly documents that this approach cannot replace responsible AI reviews or the safety filters built into governance processes. Automated testing validates the agent’s logical correctness and consistency, but human review remains essential for assessing potential harms, compliance with organizational policies, and alignment with responsible AI principles. These complementary activities must both occur during the release process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The testing toolbox at a glance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Copilot Studio Kit (Power CAT)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2afr73vnfhjugbk08i03.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2afr73vnfhjugbk08i03.png" alt="upgit_20260118_1768740326.png" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-overview" rel="noopener noreferrer"&gt;Copilot Studio Kit&lt;/a&gt; is an open‑source, solution‑aware extension that adds a formal testing and analysis layer to Copilot Studio. At its core, the Kit lets you define agents, tests, and test sets and then run batch tests against your agent through the Direct Line API. Results are not limited to raw strings; they can be enriched with Dataverse conversation transcripts (to expose the exact triggered topic and intent scores) and Azure Application Insights (for telemetry and failure diagnostics). The Kit supports deterministic checks (e.g., response matching and Attachment comparisons) as well as LLM-assisted validation for Generative answers using AI Builder—a critical capability when answers are non-deterministic. For complex scenarios, you can compose multi‑turn tests to validate end‑to‑end flows in a single conversation context, and use plan validation to ensure that agents with generative orchestration select the expected tools/actions above a configured threshold. In enterprise rollouts, the Kit’s managed solution model, Excel import/export for bulk test authoring, and optional Compliance Hub significantly shorten the path to repeatable, auditable test evidence.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need scalable, repeatable test execution with artifacts you can retain and trend over time.&lt;/li&gt;
&lt;li&gt;You must validate both deterministic paths (topic routing, attachments) and non‑deterministic LLM outputs (generative answers).&lt;/li&gt;
&lt;li&gt;You want to instrument tests with Dataverse and Application Insights to explain why a test passed/failed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Agent evaluation (in‑product, preview)&lt;/strong&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefn4qsdrvtyfym9ngbz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fefn4qsdrvtyfym9ngbz4.png" alt="upgit_20260118_1768740528.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent evaluation is a built-in &lt;a href="https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-intro" rel="noopener noreferrer"&gt;preview&lt;/a&gt; feature of Copilot Studio that lets you author or generate test sets and run automated evaluations directly in the product. It is designed to measure answer quality and coverage at scale and can generate tests from your agent’s topics/knowledge or import them from a file. Evaluations are executed with a designated test user profile, which is essential when tools and knowledge sources require authentication. Because this feature is in preview, treat it as a complementary capability to the Kit: use it for fast, in-product evaluations and keep the Kit’s runs and exports for long-term retention and pipeline gating.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Early iterative cycles where makers want quick, in‑canvas evaluation runs.&lt;/li&gt;
&lt;li&gt;Seeding a broader test corpus by auto‑generating questions from topics/knowledge, then curating.&lt;/li&gt;
&lt;li&gt;Validating that a specific test identity can access the same tools/knowledge as target users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Power Platform pipelines
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46wcojdnczr8igc2khqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46wcojdnczr8igc2khqr.png" alt="upgit_20260118_1768740684.png" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Power Platform pipelines provide the native ALM path to move solution-packaged agents across Dev → Test → Prod with approvals, audit trails, and environment isolation. Critically, pipelines can be extended to run Copilot Studio Kit tests as a pre-deployment quality gate: the deployment is paused, tests execute, results are evaluated against pass thresholds, and only then does the pipeline promote to the next stage. This pattern converts “publish from dev” into a governed release with repeatable validation, versioning, and rollback via managed solutions. In practice, you configure a pipeline host environment, wire cloud flows + Dataverse events to call the Kit’s test runner, and enforce gate criteria before production.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any production‑bound agent; manual promotion without a gate is a risk.&lt;/li&gt;
&lt;li&gt;Teams that need approvals, traceability, and the ability to block a release on failed tests.&lt;/li&gt;
&lt;li&gt;Organizations standardizing on solution‑based ALM across Power Platform assets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Direct Line performance testing
&lt;/h3&gt;

&lt;p&gt;Functional correctness is not sufficient if the agent cannot meet performance objectives. Microsoft’s guidance documents how to load and performance-test Copilot Studio agents using Direct Line over WebSockets (preferred for realistic behavior) or HTTP GET polling when WebSockets are not feasible. Test harnesses should track response times at the stages that affect user experience: Generate Token, Start Conversation, Send Activity, and Receive/Get Activities. These measurements, collected under realistic concurrency and payload conditions, enable you to baseline and detect regressions as topics, tools, and knowledge sources evolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before go‑live and on every significant change to prompts, tools, or knowledge that could affect latency.&lt;/li&gt;
&lt;li&gt;When you must validate SLA adherence (for example, first token and full‑response times).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CoE &amp;amp; Compliance
&lt;/h3&gt;

&lt;p&gt;Testing is part of a broader governance and compliance posture. Microsoft’s Phase 4 guidance emphasizes structured testing, deployment, and launch practices, including final security and compliance checks, telemetry enablement, and controlled rollout. The Copilot Studio Kit’s Compliance Hub complements this by continuously evaluating agent configurations captured via Agent Inventory against configurable thresholds, creating compliance cases, and supporting SLA-driven triage (manual review, quarantine, or delete). Together with Managed Environments, DLP policies, and CoE Starter Kit telemetry, these controls provide continuous post-deployment oversight of agents, reducing configuration drift and helping teams keep production behavior within approved boundaries.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to use&lt;/strong&gt; :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organization‑wide programs where multiple business units ship agents and you need consistent review and enforcement.&lt;/li&gt;
&lt;li&gt;Environments with strict regulatory or data‑handling requirements that require continuous configuration posture checks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the Copilot Studio Kit for structured, repeatable tests (including multi-turn, generative-answer validation, enrichment, and exports). Use Agent evaluation (preview) for in-product, fast iteration. Enforce release quality with Power Platform pipelines by gating promotion on automated test results. Validate scalability and user-perceived latency with Direct Line performance testing. Finally, operate agents within a governed framework using Phase-4 practices and Compliance Hub to maintain compliance and configuration integrity over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase-4 testing (detailed guidance and checklist)
&lt;/h2&gt;

&lt;p&gt;Phase 4 refers to Microsoft’s “Testing, deployment, and launch” stage in the Copilot Studio governance and security best-practices sequence. It defines what must happen after build-time design and before (and immediately after) a production release. In practical terms, Phase 4 defines the quality gates, security checks, controlled rollout, and post-release monitoring you should apply to every Copilot Studio agent before it serves real users.&lt;/p&gt;

&lt;p&gt;Below is a concise, implementation-oriented summary of Phase 4 practices and their execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing and validation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Prove the agent’s functional behavior and non‑deterministic answer quality with repeatable, automated tests—not manual chats.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated scenario testing&lt;/strong&gt; : Use the Copilot Studio Kit to define tests and test sets (response/attachment/topic/generative, including multi-turn and plan validation) and run batches against the agent. Enrich results with Dataverse transcripts and Application Insights for root-cause analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD readiness&lt;/strong&gt; : Maintain test artifacts and runs as part of your release process. Microsoft’s Phase 4 guidance explicitly recommends automated testing and evaluation as a prerequisite for deployment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality gates in pipelines&lt;/strong&gt; : Integrate Kit test runs into Power Platform pipelines so a deployment pauses, executes tests, evaluates pass thresholds, and only then promotes to the next stage. This delivers an auditable “test-before-deploy” control.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final security and compliance checks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Ensure the production environment enforces the right data and access boundaries and that all Azure resources are approved.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data policies and RBAC&lt;/strong&gt; : Verify environment-level policies (e.g., DLP), role assignments, and connection security in the production environment—not just in Dev/Test. This prevents accidental connector drift at go-live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure resource review&lt;/strong&gt; : Confirm approvals for app registrations, networks, keys, and endpoints associated with the agent’s external dependencies. Use secure secret storage and rotate keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production knowledge sources&lt;/strong&gt; : Point the agent to the production document libraries and data sets (many teams test with separate SharePoint paths or sample data; Phase 4 requires verification of the production bindings).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Controlled production rollout
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Promote a versioned, solution‑packaged agent to production via ALM—not via ad‑hoc publish.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deploy via pipelines&lt;/strong&gt; : Package the agent in a Power Platform solution and promote Dev → Test → Prod with approvals, audit trail, and environment isolation. This is the supported, governed path for Copilot Studio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-deployment steps&lt;/strong&gt; : In the pipeline, add hooks to run Kit tests and evaluate pass rates as a quality gate before import into the target environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Launch plan&lt;/strong&gt; : Communicate availability and usage guidance to the intended audience and stakeholders as part of the release checklist.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enable monitoring and ongoing governance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt; : Operate the agent as a managed service with telemetry, compliance posture tracking, and corrective workflows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Telemetry&lt;/strong&gt; : Configure Azure Application Insights for usage, performance, and error logging. This supports regression detection and incident response after go‑live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance operations&lt;/strong&gt; : Use the Copilot Studio Kit’s Compliance Hub to continuously evaluate agent configurations against policy thresholds, raise review cases, and track SLA‑bound remediation (review, quarantine, delete).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CoE integration&lt;/strong&gt; : Leverage the Power Platform CoE Starter Kit to inventory agents, watch adoption/health signals, and maintain platform‑wide governance routines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Phase 4 matters
&lt;/h3&gt;

&lt;p&gt;Phase 4 formalizes the last mile from “it works in development” to “it is safe to expose to users.” It replaces manual spot checks and one‑click publishing with automated validation, governed deployment, and observable operations—the baseline expected of enterprise‑grade AI agents. By following Phase 4 practices, teams reduce the risk of production incidents, data leaks, and compliance violations. They gain confidence that agents behave as intended under real‑world conditions and that any deviations are detected and addressed promptly. Ultimately, Phase 4 transforms Copilot Studio agents from experimental prototypes into reliable, governed components within the organizational AI landscape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test strategy for low‑code/no‑code agents
&lt;/h3&gt;

&lt;p&gt;Test types you should plan for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conversational/functional (does the response match expectations for known intents?)&lt;/strong&gt; – Use Kit’s Response match and Topic match.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generative answer validation (LLM output quality and guardrails)&lt;/strong&gt; – Use AI Builder-based Generative answers with Application Insights context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End-to-end scenarios across multiple turns and tools&lt;/strong&gt; – Use Multi-turn and Plan validation for generative orchestration to ensure the plan contains the expected tools/actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration (Dataverse, connectors, actions)&lt;/strong&gt; – Validate topic routing via Dataverse enrichment and attachment payloads (e.g., Adaptive Cards).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance &amp;amp; reliability under load&lt;/strong&gt; – Use Direct Line guidance to capture token/start/send/receive latencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety/compliance&lt;/strong&gt; – Follow governance phase guidance and consider Kit’s Compliance Hub to flag configuration policy issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to automate first&lt;/strong&gt; : high-volume intents, critical business workflows (e.g., authentication-gated actions), and generative answers that must adhere to strict constraints. Then expand to long-tail intents and exploratory questions using generated test sets (Agent evaluation) and bulk import (Kit).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementing automated testing with Copilot Studio Kit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Set up Copilot Studio Kit for automated testing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install the Kit from Marketplace or GitHub as a managed solution in your chosen environment; complete post‑deployment connection references.&lt;/li&gt;
&lt;li&gt;Configure the agent connection for testing:&lt;/li&gt;
&lt;li&gt;Set agent base configuration (name, region/token endpoint), and Direct Line channel security as applicable.&lt;/li&gt;
&lt;li&gt;Enable Dataverse enrichment to analyze conversation transcripts (topic names, intent scores).&lt;/li&gt;
&lt;li&gt;Enable Application Insights enrichment for diagnostics and negative tests (e.g., moderated/no result cases).&lt;/li&gt;
&lt;li&gt;Configure AI Builder as the LLM provider for Generative answers validation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your agent uses Microsoft Authentication, register an app and configure the Kit’s test automation so the test runner can authenticate as a user.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design test cases and test sets
&lt;/h3&gt;

&lt;p&gt;Supported test types (Kit):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response matches with string operators (equals/contains/starts/ends).&lt;/li&gt;
&lt;li&gt;Attachments comparison (JSON array) or AI Validation for structure/semantics.&lt;/li&gt;
&lt;li&gt;Topic match (requires Dataverse enrichment).&lt;/li&gt;
&lt;li&gt;Generative answers (requires AI Builder + optional Application Insights).&lt;/li&gt;
&lt;li&gt;Multi-turn (compose several tests into one conversation).&lt;/li&gt;
&lt;li&gt;Plan validation (ensure the generated plan includes expected tools/actions above a threshold).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bulk authoring/import: Use Kit’s Excel import/export to create or modify multiple tests efficiently. In-product test sets (Agent evaluation, preview): Create up to 100 test cases per set; generate questions from agent description/topics/knowledge or import from a file; run with a selected test user profile that has the right connections/authentication.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run, analyze, and iterate
&lt;/h3&gt;

&lt;p&gt;Run test sets against your agent; Kit records observed responses and latencies, and aggregates them. Enrichment adds topic routing and detailed diagnostics. Export results (CSV) for long‑term retention or integration with other tools. Use Kit’s analytics artifacts (e.g., Analyze Test Results) and conversation KPIs in Dataverse for trend analysis beyond the built‑in Copilot Studio analytics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automate “test → deploy” with pipelines
&lt;/h3&gt;

&lt;p&gt;Treat agents as solution components and move them with Power Platform pipelines; add automated Kit test runs as a pre‑deployment gate so only solutions that meet pass thresholds are deployed to Test/Prod. Microsoft’s guidance and Kit docs provide a pattern using cloud flows and Dataverse to pause the pipeline, run tests, evaluate the pass rate, and decide whether to continue or stop. This replaces the brittle “publish from dev” habit with auditable CI/CD and quality gates aligned to enterprise ALM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance and reliability testing via Direct Line
&lt;/h3&gt;

&lt;p&gt;For load testing, simulate real user behavior using Direct Line with WebSockets where possible; otherwise, use HTTP GET polling. Track and report response times for Generate Token, Start Conversation, Send Activity, and Receive/Get Activities to understand user‑perceived latency under load. Microsoft’s documentation provides detailed guidance on setting up these tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance: beyond the build—compliance, environments, and monitoring
&lt;/h2&gt;

&lt;p&gt;Microsoft’s governance phase recommends validating security and compliance, using ALM pipelines for controlled rollout, and enabling telemetry (Application Insights). The CoE Starter Kit and Kit’s Compliance Hub can help continuously evaluate agent posture and enforce approvals/quarantines when violations occur.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-patterns: why “chat a bit and publish” fails
&lt;/h2&gt;

&lt;p&gt;Clicking Publish pushes the current agent state to channels; it is not a deployment pipeline. Without environments, solutions, and quality gates, you lack versioning, rollback, and documented test evidence—unacceptable for enterprise operations. Use solutions, pipelines, and automated tests instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical checklist for Phase-4 testing and deployment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pre-requisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agent in a solution; environments for Dev/Test/Prod; pipeline host environment configured.&lt;/li&gt;
&lt;li&gt;Copilot Studio Kit installed and connected; App Insights + Dataverse enrichment; AI Builder available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test design
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Identify top intents and critical flows; write Response/Topic/Attachment tests.&lt;/li&gt;
&lt;li&gt;Add Generative answers tests for non‑deterministic responses (with validation instructions and sample answers).&lt;/li&gt;
&lt;li&gt;Compose Multi‑turn scenarios for end‑to‑end paths; add Plan validation for generative orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Automation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create test sets in Kit and (optionally) in Agent evaluation to complement coverage; export results to CSV for retention.&lt;/li&gt;
&lt;li&gt;Wire pre‑deployment pipeline steps to run Kit tests and enforce pass thresholds (quality gate).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Execute load tests through Direct Line, prefer WebSockets, and track the specific latencies Microsoft recommends.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Governance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Run security/compliance checks; monitor with CoE and Compliance Hub after go‑live.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Automated testing is essential for Copilot Studio agents due to their non‑deterministic nature and complex integrations. Microsoft’s Copilot Studio Kit provides a robust framework for defining, executing, and analyzing tests, enabling teams to validate both deterministic and generative behaviors at scale. By integrating these tests into Power Platform pipelines, teams can enforce quality gates that ensure only validated agents reach production. Complementing this with performance testing via Direct Line and ongoing governance through telemetry and compliance tools establishes a comprehensive lifecycle for reliable, enterprise-grade AI agents. Adopting these practices transforms Copilot Studio agents from experimental prototypes into trusted components of the organizational AI landscape, capable of delivering consistent value while adhering to security and compliance standards.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>powerplatform</category>
      <category>agentevaluation</category>
      <category>ai</category>
    </item>
    <item>
      <title>Migrating from Agent Builder (Copilot Studio Lite) to Copilot Studio</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 24 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://forem.com/holgerimbery/migrating-from-agent-builder-copilot-studio-lite-to-copilot-studio-3bb7</link>
      <guid>https://forem.com/holgerimbery/migrating-from-agent-builder-copilot-studio-lite-to-copilot-studio-3bb7</guid>
      <description>&lt;p&gt;&lt;em&gt;Technical overview and configuration guide (January 2026)&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt; Moving your agents from Agent Builder(aka Copilot Studio lite) — the simplified creation tool embedded in Microsoft 365 Copilot — to the full-featured Copilot Studio environment represents a significant step forward in how you manage and deploy conversational AI within your organization. While Agent Builder excels at helping teams quickly prototype and launch straightforward agents for internal use, Copilot Studio provides the comprehensive tooling needed when those agents mature into reliable, enterprise-scale solutions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why read it?&lt;/strong&gt; Are you looking to scale your Copilot agents from small-team tools to enterprise-ready solutions? This article explains the compelling benefits, migration process, and critical considerations—so you can confidently elevate your agent strategy.&lt;/p&gt;

&lt;p&gt;This article uses the terms “Agent Builder” and “Copilot Studio Lite” interchangeably to refer to the same simplified agent-creation tool in Microsoft 365 Copilot. The full-featured environment is called “Copilot Studio.” Microsoft is continuously changing naming conventions, so please refer to the latest official documentation for the most current terminology.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The migration process itself is straightforward: you essentially create a copy of your existing agent in a Power Platform environment where Copilot Studio operates. However, the real value lies in what becomes available after migration. You gain access to proper version control mechanisms that let you track changes over time, stage updates across multiple environments, and roll back to previous configurations if something goes wrong. Analytics capabilities expand dramatically, giving you detailed insights into how users interact with your agents, where conversations succeed or fail, and which knowledge sources prove most valuable.&lt;/p&gt;

&lt;p&gt;From a governance perspective, the difference is substantial. Copilot Studio integrates with Power Platform’s security model, allowing you to implement role-based access controls, enforce data loss prevention policies, and maintain detailed audit trails of who changed what and when. You can configure connectors to external systems with appropriate guardrails, manage how agents access sensitive data, and ensure everything complies with your organization’s compliance requirements.&lt;/p&gt;

&lt;p&gt;The migration also opens the door to more sophisticated agent behaviors. You can implement conditional logic that routes conversations based on context, integrate with external APIs and services beyond what Agent Builder supports, and leverage Dataverse as a structured data layer for your agent’s knowledge and state management. If your organization runs separate development, testing, and production environments—as most enterprises do — Copilot Studio’s multi-environment support becomes essential for maintaining stable operations while continuously improving your agents.&lt;/p&gt;

&lt;p&gt;This article walks through the practical aspects of performing this migration: understanding which components transfer automatically and which require manual reconfiguration, meeting the licensing and environment prerequisites, and establishing a workflow that minimizes disruption to users who depend on your existing agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Migrate?
&lt;/h2&gt;

&lt;p&gt;Agent Builder is great for quickly building conversational agents within Microsoft 365 Copilot. But when you need advanced features like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version control, staged rollouts, and rollback options&lt;/li&gt;
&lt;li&gt;Deep usage analytics and dashboards&lt;/li&gt;
&lt;li&gt;Enterprise governance: role-based access, data policies, audit trails&lt;/li&gt;
&lt;li&gt;Complex customization: conditional logic, external API integrations&lt;/li&gt;
&lt;li&gt;Multi-environment management (dev, test, prod)&lt;/li&gt;
&lt;li&gt;Broader connectors and Dataverse integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copilot Studio is designed for these needs—making it the scalable, secure next step for production-ready bots.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Transfers in the Migration
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Understanding the Migration Mechanism
&lt;/h2&gt;

&lt;p&gt;When you’ve developed an agent using Agent Builder within Microsoft 365 Copilot and reach the point where you require capabilities that exist only in the complete Copilot Studio platform, Microsoft provides a direct migration path through the &lt;strong&gt;Copy to Copilot Studio&lt;/strong&gt; function.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Initiate the Copy
&lt;/h3&gt;

&lt;p&gt;You’ll find this option in the &lt;strong&gt;More options (…)&lt;/strong&gt; menu when viewing your agent in Agent Builder. The purpose of this function is to create a duplicate of your existing agent in the Power Platform environment where Copilot Studio operates, eliminating the need to recreate it manually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftr3wmr9jbb45r2fy6xbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftr3wmr9jbb45r2fy6xbo.png" alt="upgit_20260117_1768669061.png" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Important: This Is a Copy, Not a Move
&lt;/h3&gt;

&lt;p&gt;It’s important to understand that this is a &lt;strong&gt;copy operation&lt;/strong&gt; , not a move or synchronization. Your original agent in Agent Builder remains unchanged and continues to function exactly as before.&lt;/p&gt;

&lt;p&gt;What you get is an independent snapshot of that agent’s configuration at the moment you initiated the copy process. Any subsequent changes you make to either version—the original in Agent Builder or the copy in Copilot Studio—will not affect the other.&lt;/p&gt;

&lt;p&gt;This design allows you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continue using the Agent Builder version while you configure and test the Copilot Studio version&lt;/li&gt;
&lt;li&gt;Ensure no disruption to users who depend on the existing agent&lt;/li&gt;
&lt;li&gt;Validate the migrated version before fully transitioning&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Gets Transferred Automatically
&lt;/h2&gt;

&lt;p&gt;The copy operation handles a specific subset of your agent’s configuration; not everything is copied over, especially uploaded files. Understanding exactly what transfers automatically helps you plan the remaining setup work after migration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftufc1lli2rv49vbal8fd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftufc1lli2rv49vbal8fd.png" alt="upgit_20260117_1768669522.png" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select the target environment for your Copilot Studio agent during the copy process. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdq2pmrq1mviw3c3ovb71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdq2pmrq1mviw3c3ovb71.png" alt="upgit_20260117_1768669598.png" width="800" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic identification and behavior elements&lt;/strong&gt; that copy over include the agent’s name, the description that appears to users, and the core instructions that define how the agent should respond and behave. These static text fields define the foundation of your agent’s personality and purpose. The suggested prompts you configured to help users get started with the agent also transfer, maintaining that initial user experience guidance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual identity&lt;/strong&gt; is preserved through the automatic transfer of your agent’s icon, ensuring brand consistency between the Agent Builder and Copilot Studio versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge sources&lt;/strong&gt; represent a more nuanced aspect of the migration. SharePoint files, folders, and entire sites that you designated as knowledge sources in Agent Builder will transfer to Copilot Studio. In the same way, any websites you added as knowledge sources make the transition. These knowledge sources are the foundation of your agent’s ability to provide accurate, contextually relevant responses based on your organization’s specific information.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Requires Manual Reconfiguration
&lt;/h2&gt;

&lt;p&gt;Several configuration categories do not transfer automatically during the copy operation. These require deliberate action on your part once the agent exists in Copilot Studio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise knowledge accessed through Copilot connectors&lt;/strong&gt; represents a significant gap in the automatic transfer process. In contrast, basic SharePoint and website knowledge sources transfer; enterprise knowledge that relies on specialized Copilot connectors must be manually reconfigured in Copilot Studio after migration. This includes setting up the appropriate connectors and reestablishing the connection to your enterprise data sources. The underlying reason relates to how these connectors authenticate and authorize access—these security configurations don’t automatically replicate across environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scoped Copilot connectors&lt;/strong&gt; pose a limitation in the current Copilot Studio implementation. If your Agent Builder agent uses scoped connectors that restrict data access based on specific criteria, you should know that Copilot Studio doesn’t currently support this capability. You’ll need to evaluate whether alternative approaches exist for obtaining similar data scoping within Copilot Studio’s connector framework, or determine whether this is a blocking issue for your migration timeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedded files&lt;/strong&gt; you uploaded directly to Agent Builder don’t transfer during the copy operation. The system doesn’t migrate the binary file content itself; it only references external knowledge sources, such as SharePoint locations. After migration, you’ll need to upload these files again directly to Copilot Studio. This manual step lets you verify that files remain current and relevant for the Copilot Studio environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teams chats and meetings&lt;/strong&gt; that served as knowledge sources in Agent Builder require a different implementation approach in Copilot Studio. Instead of the embedded access mechanism used in Agent Builder, you’ll add the Power Platform connector for Microsoft Teams within Copilot Studio. This connector provides the access pathway to Teams content, though you may need to reconfigure specific permissions and scope settings to match your original Agent Builder configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Email content from Outlook&lt;/strong&gt; follows a similar pattern to Teams integration. If your Agent Builder agent drew knowledge from emails, you’ll need to add the Power Platform connector for Outlook in Copilot Studio and reconfigure which mailboxes or folders the agent can access as knowledge sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code generation, document creation, and chart generation capabilities&lt;/strong&gt; require enabling the code interpreter functionality in Copilot Studio. While Agent Builder may have included these features by default or with simpler configuration, Copilot Studio treats the code interpreter as an explicit capability you enable in the agent’s settings. Navigate to the appropriate settings section and activate the code interpreter to restore this functionality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Image generation from text prompts&lt;/strong&gt; represents a functional gap between Agent Builder and Copilot Studio. If your agent can generate images from user descriptions in Agent Builder, this capability isn’t currently available in Copilot Studio. Basic charts and graphs remain possible through the code interpreter capability, but photorealistic image generation or creative visual content creation from prompts isn’t supported. You’ll need to assess whether this limitation affects your agent’s core value proposition and plan accordingly—either by removing image generation from your agent’s advertised capabilities or by maintaining a parallel Agent Builder version for use cases that require it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise knowledge (Copilot connectors)&lt;/td&gt;
&lt;td&gt;Set up connectors after you copy your agent.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scoped Copilot connectors&lt;/td&gt;
&lt;td&gt;Not currently supported in Copilot Studio.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedded files&lt;/td&gt;
&lt;td&gt;Upload the files again in Copilot Studio.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Teams chats and meetings added as knowledge&lt;/td&gt;
&lt;td&gt;Add the Power Platform connector for Teams in Copilot Studio.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Emails added as knowledge&lt;/td&gt;
&lt;td&gt;Add the Power Platform connector for Outlook in Copilot Studio.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create documents, charts, and code&lt;/td&gt;
&lt;td&gt;Add code interpreter via agent settings in Copilot Studio.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create images from prompts&lt;/td&gt;
&lt;td&gt;Not currently supported in Copilot Studio. Basic charts and graphs are part of the code interpreter capability.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Licensing &amp;amp; Environment Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You need either a Microsoft 365 Copilot or Copilot Studio license. Trials may be available depending on tenant policies.&lt;/li&gt;
&lt;li&gt;The migration process depends on Power Platform environments: 

&lt;ul&gt;
&lt;li&gt;Must support Dataverse and be in a supported region&lt;/li&gt;
&lt;li&gt;You need appropriate security roles and environment accessibility&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Admin-managed governance, data, and connector policies are managed in Power Platform Admin Center; sharing limits may apply&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Post-Migration Considerations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Once copied, the new agent is independent—subsequent edits in Agent Builder won’t sync&lt;/li&gt;
&lt;li&gt;Every “Copy” operation creates a new agent instance&lt;/li&gt;
&lt;li&gt;Retire or archive old agents after ensuring continuity&lt;/li&gt;
&lt;li&gt;Periodically revisit environment-specific policies and data retention setups&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The process of moving your agents from Agent Builder into the complete Copilot Studio environment represents more than a simple platform upgrade—it fundamentally changes how your organization approaches conversational AI as a strategic capability rather than a tactical tool. This transition acknowledges that what began as a straightforward internal assistant has demonstrated sufficient value to warrant the infrastructure, governance, and operational discipline that enterprise software demands.&lt;/p&gt;

&lt;p&gt;From a practical standpoint, the migration provides access to capabilities that become essential as usage scales beyond small teams. Version control mechanisms let you track exactly what changed in your agent’s configuration over time, which is critical for troubleshooting unexpected behavior and understanding how the agent evolved to meet changing business needs. The ability to maintain separate development, testing, and production instances means you can experiment with improvements without disrupting users who depend on the current working version. Staged rollouts allow you to gradually introduce changes to subsets of users, monitoring for issues before full deployment.&lt;/p&gt;

&lt;p&gt;The analytics infrastructure in Copilot Studio offers visibility not available in Agent Builder. You gain detailed metrics on conversation patterns—where users struggle, which knowledge sources are most valuable, which questions the agent can’t answer effectively, and how response quality varies across user populations. This data becomes the foundation for continuous improvement, transforming agent optimization from guesswork into an evidence-based process.&lt;/p&gt;

&lt;p&gt;Governance capabilities matter increasingly as agents handle sensitive information or influence essential business processes. Copilot Studio integrates with Power Platform’s security model, providing role-based access controls to determine who can modify agent configurations, audit trails to document every change for compliance purposes, and data loss prevention policies to prevent agents from inadvertently exposing protected information. These controls become non-negotiable when regulatory requirements or corporate policies govern how AI systems must operate.&lt;/p&gt;

&lt;p&gt;The expanded customization options available in Copilot Studio enable more sophisticated agent behaviors that address complex business scenarios. You can implement conditional logic to route conversations based on context, integrate with external systems via a comprehensive connector ecosystem, and leverage Dataverse as a structured repository for your agents’ knowledge and state management. These capabilities transform agents from simple question-answering tools into intelligent participants in broader business workflows.&lt;/p&gt;

&lt;p&gt;Understanding the migration process — what transfers automatically versus what requires manual reconfiguration—helps you avoid surprises and allocate appropriate time and resources. The copy operation handles core elements like instructions, knowledge sources from SharePoint and websites, and visual identity. Still, it leaves connector-based enterprise knowledge, embedded files, and specific advanced capabilities requiring deliberate setup in the new environment. Planning this work ensures a smooth transition without extended periods of functionality unavailability.&lt;/p&gt;

&lt;p&gt;Please remember that the migration makes an independent copy rather than moving your agent throughout this process. This design allows you to maintain operational continuity with the existing Agent Builder version while you configure, test, and validate the Copilot Studio version. After confirming that the migrated agent is working correctly and that users have transitioned successfully, would you mind retiring the original to avoid a service availability gap?&lt;/p&gt;

</description>
      <category>githubcopilot</category>
      <category>copilotstudio</category>
      <category>agentbuilder</category>
      <category>migration</category>
    </item>
    <item>
      <title>Power Fx in Copilot Studio - The Computational Backbone of Enterprise Agents</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://forem.com/holgerimbery/power-fx-in-copilot-studio-the-computational-backbone-of-enterprise-agents-25k2</link>
      <guid>https://forem.com/holgerimbery/power-fx-in-copilot-studio-the-computational-backbone-of-enterprise-agents-25k2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Building production-ready (conversational) agents in Microsoft Copilot Studio requires mastering a critical architectural principle: the LLM handles the words, Power Fx handles the computation.&lt;/strong&gt; While large language models excel at understanding natural language and generating human-like responses, they cannot guarantee the deterministic, auditable calculations that enterprise automation requires. This article explains how Power Fx functions as the computational backbone of your agents, providing repeatable calculations, precise data transformations, and enforceable business rules—while the LLM manages the conversational flow. Whether you are implementing financial workflows, compliance-sensitive processes, or complex agent flows that integrate with Dataverse and Azure AI, understanding this separation of concerns transforms experimental chatbots into reliable enterprise automation solutions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Read on&lt;/strong&gt; to explore the practical formulas, variable scoping rules, and architectural patterns that make Power Fx essential for anyone building autonomous agents in the Microsoft ecosystem.&lt;/p&gt;

&lt;p&gt;As autonomous agents and agent flows in Microsoft Copilot Studio become more capable and increasingly central to enterprise automation, understanding Power Fx is no longer optional-it is &lt;strong&gt;essential&lt;/strong&gt;. The landscape of enterprise software development has undergone a fundamental transformation in recent years, with conversational interfaces and intelligent automation moving from experimental features to core components of business operations. Within this evolving ecosystem, Power Fx serves a critical architectural role in executing at-scale formula evaluation.&lt;/p&gt;

&lt;p&gt;Power Fx acts as the deterministic backbone of your agent’s logic, providing the predictable, rule-based execution layer that complements the reasoning and language interpretation performed by the LLM. This relationship between Power Fx and the LLM represents a carefully considered architectural pattern: the language model excels at interpreting natural language, understanding context, and generating human-like responses, while Power Fx handles the precise computational work-calculations, data transformations, orchestration, and state management—that demands absolute consistency and auditability.&lt;/p&gt;

&lt;p&gt;Consider the fundamental nature of how these systems operate. When a user interacts with an agent in Copilot Studio, their message arrives as unstructured natural language. The LLM processes this input, identifying intent, extracting entities, and determining which conversational flow to activate. However, once that intent is understood, the agent must often perform specific operations, such as retrieving data from a database, calculating values based on business rules, validating input against regulatory constraints, or determining which subsequent action to take based on complex conditions. These operations cannot rely on probabilistic reasoning or approximate outputs. They require exactness, repeatability, and transparency-qualities that define Power Fx as a formula language.&lt;/p&gt;

&lt;p&gt;This distinction becomes particularly significant in enterprise contexts where agents handle sensitive operations, financial transactions, compliance-related decisions, or integration with systems of record. In such environments, every calculation must produce identical results under identical conditions, every data transformation must be traceable, and every decision point must be explicable to auditors or regulatory bodies. Power Fx provides this foundation, creating a clear separation between the uncertainty inherent in language understanding and the precision required for executing business logic.&lt;/p&gt;

&lt;p&gt;For enterprise makers-especially those working with Dynamics 365, Azure AI, Dataverse, and embedded automation—Power Fx empowers you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement precise decision logic rather than relying solely on probabilistic reasoning.&lt;/li&gt;
&lt;li&gt;Ensure repeatable behavior for compliance-sensitive processes.&lt;/li&gt;
&lt;li&gt;Manipulate data, variables, collections, and system values in a predictable low-code expression language that aligns with existing Power Platform experience.&lt;/li&gt;
&lt;li&gt;Control how topics communicate with each other, how state flows through the conversation, and how the agent orchestrates automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building autonomous agents, approvals, orchestrations, or contextual AI inside business applications, then familiarity with Power Fx is a core competency. This article introduces Power Fx in the context of Copilot Studio agent flows, provides examples, and covers advanced concepts using Microsoft’s official materials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Power Fx in Copilot Studio - A Structured Language for Agent Logic
&lt;/h2&gt;

&lt;p&gt;Power Fx in Copilot Studio is fully integrated into the authoring canvas and can be used in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set a variable nodes&lt;/li&gt;
&lt;li&gt;Message and Question nodes&lt;/li&gt;
&lt;li&gt;Condition nodes&lt;/li&gt;
&lt;li&gt;Action nodes&lt;/li&gt;
&lt;li&gt;Adaptive Card logic&lt;/li&gt;
&lt;li&gt;Question behavior configurations&lt;/li&gt;
&lt;li&gt;and last but not least in Agents Flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows you to compute values, evaluate conditions, transform user input, and shape your agent’s responses. The placement of Power Fx within these specific nodes reflects a deliberate architectural decision: computational logic resides at the points where data enters the system, transforms during processing, or influences routing decisions. Rather than scattering imperative code throughout the conversational design, Power Fx expressions encapsulate discrete computational operations within well-defined boundaries, making the agent’s behavior more comprehensible during both initial development and subsequent maintenance.&lt;/p&gt;

&lt;p&gt;Copilot Studio supports a subset of Power Fx functions, not the complete Power Apps function library. This distinction matters for practitioners migrating existing formulas or designing complex computational logic. Many core table, text, date, and conditional functions behave identically to their Power Apps counterparts, allowing experienced Power Platform developers to transfer their existing knowledge directly into agent development. Functions governing table manipulation-such as Filter, ForAll, and AddColumns-generally operate as expected. Similarly, text processing functions, including Concatenate, Left, Right, and Text formatting operations, maintain their familiar semantics. Date and time calculations through functions such as DateAdd, DateDiff, and Now provide consistent behavior across the platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: Using Power Fx to Modify and Display Dates
&lt;/h2&gt;

&lt;p&gt;A foundational example from Microsoft’s documentation shows how Power Fx can improve user-facing output. In the tutorial, the agent receives an order number and wants to communicate an estimated delivery date.&lt;br&gt;&lt;br&gt;
Step-by-step&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create a new variable in a Set a variable value node. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gz4mbs00pk99u1b5jpj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gz4mbs00pk99u1b5jpj.png" alt="upgit_20260116_1768550346.png" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switch to the Formula tab. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp2qw3mow2u2cto10d63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp2qw3mow2u2cto10d63.png" alt="upgit_20260116_1768550424.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Insert this formula to compute the delivery date:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Text(
     DateAdd(
         Now(),
         3,
         TimeUnit.Days
     ),
     DateTimeFormat.LongDate
 )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This takes the current datetime, adds three days, and formats the result as a long date string, which is then stored in the Topic variable &lt;code&gt;Date&lt;/code&gt;: &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94kqsg0kc714b2m26ujq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94kqsg0kc714b2m26ujq.png" alt="upgit_20260116_1768550698.png" width="292" height="55"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This example highlights why Power Fx is necessary: your agent performs deterministic date transformation without involving the LLM, ensuring accuracy and consistency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Variable Prefixes in Copilot Studio
&lt;/h2&gt;

&lt;p&gt;Unlike Power Apps, variables in Copilot Studio require specific prefixes in Power Fx formulas. These prefixes are crucial for correct data access. They indicate the variable’s scope and context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Topic.&lt;/code&gt; for topic-level variables (e.g., &lt;code&gt;Topic.UserMessage&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;System.&lt;/code&gt; for system-level variables (e.g., &lt;code&gt;System.CurrentDateTime&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Global.&lt;/code&gt; for agent-level variables (e.g., &lt;code&gt;Global.AgentName&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These prefixes make the flow of state inside an agent explicit, which is essential for multi-step logic and orchestrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use literal values in a formula
&lt;/h2&gt;

&lt;p&gt;When working with Power Fx formulas, you are not restricted to referencing variables alone. The language also permits the direct inclusion of literal values-that is, fixed data values that you explicitly write into the formula itself. When you choose to embed a literal value within a formula expression, it is necessary to express that value according to the syntax conventions that correspond to its underlying data type. The table below provides a detailed enumeration of the data types supported in Power Fx, along with illustrative examples that demonstrate the specific formatting requirements for representing literal values of each type.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Format examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"hi"&lt;/code&gt;, &lt;code&gt;"hello world!"&lt;/code&gt;, &lt;code&gt;"copilot"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boolean&lt;/td&gt;
&lt;td&gt;Only &lt;code&gt;true&lt;/code&gt; or &lt;code&gt;false&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Number&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;1&lt;/code&gt;, &lt;code&gt;532&lt;/code&gt;, &lt;code&gt;5.258&lt;/code&gt;, &lt;code&gt;-9201&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Record and Table&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;[1]&lt;/code&gt;, &lt;code&gt;[45, 8, 2]&lt;/code&gt;, &lt;code&gt;["cats", "dogs"]&lt;/code&gt;, &lt;code&gt;{ id: 1 }&lt;/code&gt;, &lt;code&gt;{ message: "hello" }&lt;/code&gt;, &lt;code&gt;{ name: "John", info: { age: 25, weight: 175 } }&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DateTime&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Time(5,0,23)&lt;/code&gt;, &lt;code&gt;Date(2022,5,24)&lt;/code&gt;, &lt;code&gt;DateTimeValue("May 10, 2022 5:00:00 PM")&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Choice&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blank&lt;/td&gt;
&lt;td&gt;Only &lt;code&gt;Blank()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Common Power Fx formulas
&lt;/h2&gt;

&lt;p&gt;The following table lists data types and Power Fx formulas you can use with each data type. Links point directly to Microsoft’s official documentation for your reference.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Power Fx formulas&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;String&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-text" rel="noopener noreferrer"&gt;Text function&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-concatenate" rel="noopener noreferrer"&gt;Concat and Concatenate functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-len" rel="noopener noreferrer"&gt;Len function&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-lower-upper-proper" rel="noopener noreferrer"&gt;Lower, Upper, and Proper functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-ismatch" rel="noopener noreferrer"&gt;IsMatch, Match, and MatchAll functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-startswith" rel="noopener noreferrer"&gt;EndsWith and StartsWith functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-find" rel="noopener noreferrer"&gt;Find function&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-replace-substitute" rel="noopener noreferrer"&gt;Replace and Substitute functions&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Boolean&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-boolean" rel="noopener noreferrer"&gt;Boolean function&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-logicals" rel="noopener noreferrer"&gt;And, Or, and Not functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-if" rel="noopener noreferrer"&gt;If and Switch functions&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Number&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-value" rel="noopener noreferrer"&gt;Decimal, Float, and Value functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-round" rel="noopener noreferrer"&gt;Int, Round, RoundDown, RoundUp, and Trunc functions&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Record and Table&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-concatenate" rel="noopener noreferrer"&gt;Concat and Concatenate functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-table-counts" rel="noopener noreferrer"&gt;Count, CountA, CountIf, and CountRows functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-forall" rel="noopener noreferrer"&gt;ForAll function&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-first-last" rel="noopener noreferrer"&gt;First, FirstN, Index, Last, and LastN functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-filter-lookup" rel="noopener noreferrer"&gt;Filter, Search, and LookUp functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-json" rel="noopener noreferrer"&gt;JSON function&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-parsejson" rel="noopener noreferrer"&gt;ParseJSON function&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DateTime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-date-time" rel="noopener noreferrer"&gt;Date, DateTime, and Time functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-datevalue-timevalue" rel="noopener noreferrer"&gt;DateValue, TimeValue, and DateTimeValue functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-datetime-parts" rel="noopener noreferrer"&gt;Day, Month, Year, Hour, Minute, Second, and Weekday functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-now-today-istoday" rel="noopener noreferrer"&gt;Now, Today, IsToday, UTCNow, UTCToday, IsUTCToday functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-dateadd-datediff" rel="noopener noreferrer"&gt;DateAdd, DateDiff, and TimeZoneOffset functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-text" rel="noopener noreferrer"&gt;Text function&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Blank&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-isblank-isempty" rel="noopener noreferrer"&gt;Blank, Coalesce, IsBlank, and IsEmpty functions&lt;/a&gt;&lt;br&gt;&lt;a href="https://learn.microsoft.com/en-us/power-platform/power-fx/reference/function-iferror" rel="noopener noreferrer"&gt;Error, IfError, IsError, IsBlankOrError functions&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Power Fx (Copilot Studio) Cheat Sheet
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Numbers &amp;amp; Math
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Abs&lt;/td&gt;
&lt;td&gt;Absolute value of a number&lt;/td&gt;
&lt;td&gt;Abs(-55) → 55&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Round&lt;/td&gt;
&lt;td&gt;Round to the specified number of digits&lt;/td&gt;
&lt;td&gt;Round(3.14159, 2) → 3.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RoundDown&lt;/td&gt;
&lt;td&gt;Round down to the specified digits&lt;/td&gt;
&lt;td&gt;RoundDown(3.9, 0) → 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RoundUp&lt;/td&gt;
&lt;td&gt;Round up to the specified digits&lt;/td&gt;
&lt;td&gt;RoundUp(3.1, 0) → 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trunc&lt;/td&gt;
&lt;td&gt;Truncate a number toward zero&lt;/td&gt;
&lt;td&gt;Trunc(-3.9) → -3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Int&lt;/td&gt;
&lt;td&gt;Integer part of a number&lt;/td&gt;
&lt;td&gt;Int(7.8) → 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Value&lt;/td&gt;
&lt;td&gt;Convert text to number&lt;/td&gt;
&lt;td&gt;Value(“42”) → 42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decimal&lt;/td&gt;
&lt;td&gt;Convert string to decimal number&lt;/td&gt;
&lt;td&gt;Decimal(“12.5”) → 12.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average&lt;/td&gt;
&lt;td&gt;Average of a table expression or set of numbers&lt;/td&gt;
&lt;td&gt;Average([1,2,3,4]) → 2.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Degrees / Acos / Asin / Atan / Atan2 / Cos / Cot&lt;/td&gt;
&lt;td&gt;Trig functions (radians) and conversions&lt;/td&gt;
&lt;td&gt;Degrees(Acos(0.5)) → 60&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Text
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Concatenate / Concat&lt;/td&gt;
&lt;td&gt;Join strings or project and join from a table.&lt;/td&gt;
&lt;td&gt;Concatenate(“Hello “, “Holger”) → “Hello Holger”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;Convert values (like numbers/dates) to text with formatting.&lt;/td&gt;
&lt;td&gt;Text(1234.5, “[$-en-US]#,##0.00”) → “1,234.50”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Len&lt;/td&gt;
&lt;td&gt;Length of a string.&lt;/td&gt;
&lt;td&gt;Len(“Copilot”) → 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lower / Upper / Proper&lt;/td&gt;
&lt;td&gt;Case conversion.&lt;/td&gt;
&lt;td&gt;Proper(“microsoft copilot”) → “Microsoft Copilot”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Find&lt;/td&gt;
&lt;td&gt;Find substring position (1‑based).&lt;/td&gt;
&lt;td&gt;Find(“pilot”,”Copilot”) → 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replace / Substitute&lt;/td&gt;
&lt;td&gt;Replace text by position or substring.&lt;/td&gt;
&lt;td&gt;Substitute(“a-b-c”,”-“,”/”) → “a/b/c”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;StartsWith / EndsWith&lt;/td&gt;
&lt;td&gt;Prefix/suffix check.&lt;/td&gt;
&lt;td&gt;StartsWith(“Copilot”,”Co”) → true&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EncodeUrl / EncodeHTML&lt;/td&gt;
&lt;td&gt;URL/HTML escape text.&lt;/td&gt;
&lt;td&gt;EncodeUrl(“a b”) → “a%20b”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Char&lt;/td&gt;
&lt;td&gt;Return character from code.&lt;/td&gt;
&lt;td&gt;Char(10) → “\n” (newline)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Regex/Match&lt;/strong&gt; note: Some community posts mention gaps for Match/MatchAll support in Copilot Studio; validate in your environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logical &amp;amp; Conditional
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;And / Or / Not&lt;/td&gt;
&lt;td&gt;Boolean logic (&amp;amp;&amp;amp; also supported for And).&lt;/td&gt;
&lt;td&gt;And(5&amp;gt;3, 2&amp;lt;1) → false&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;If&lt;/td&gt;
&lt;td&gt;If/elseif/else branching. (Power Fx general)&lt;/td&gt;
&lt;td&gt;If(Score&amp;gt;=50,”pass”,”fail”) → “pass” (when Score=80)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Switch&lt;/td&gt;
&lt;td&gt;Multi-branch by matching a value. (Power Fx general)&lt;/td&gt;
&lt;td&gt;Switch(“B”,”A”,1,”B”,2,0) → 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coalesce&lt;/td&gt;
&lt;td&gt;First non‑blank value.&lt;/td&gt;
&lt;td&gt;Coalesce(Blank(), “fallback”) → “fallback”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boolean&lt;/td&gt;
&lt;td&gt;Convert text/number/dynamic to Boolean.&lt;/td&gt;
&lt;td&gt;Boolean(“true”) → true&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blank&lt;/td&gt;
&lt;td&gt;Produce a blank (NULL-like) value.&lt;/td&gt;
&lt;td&gt;Blank() → blank&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Date &amp;amp; Time
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Now / Today / UTCNow / UTCToday&lt;/td&gt;
&lt;td&gt;Current local/UTC date-time or date. (Power Fx general)&lt;/td&gt;
&lt;td&gt;Text(UTCNow(), “yyyy-mm-ddTHH:MMZ”) → e.g., “2026-01-16T13:11Z”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Date / DateTime / Time&lt;/td&gt;
&lt;td&gt;Construct date/time values.&lt;/td&gt;
&lt;td&gt;Date(2026,1,16) → 2026-01-16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DateValue / TimeValue / DateTimeValue&lt;/td&gt;
&lt;td&gt;Parse date/time from text.&lt;/td&gt;
&lt;td&gt;DateTimeValue(“May 10, 2022 5:00 PM”) → 2022-05-10 17:00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day / Month / Year / Hour / Minute / Second / Weekday&lt;/td&gt;
&lt;td&gt;Extract parts of a date-time.&lt;/td&gt;
&lt;td&gt;Year(Date(2026,1,16)) → 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DateAdd&lt;/td&gt;
&lt;td&gt;Add days/months/quarters/years.&lt;/td&gt;
&lt;td&gt;DateAdd(Date(2026,1,16), 30, Days) → 2026-02-15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DateDiff&lt;/td&gt;
&lt;td&gt;Difference between two dates in a unit.&lt;/td&gt;
&lt;td&gt;DateDiff(Date(2026,1,1), Date(2026,1,16), Days) → 15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EDate&lt;/td&gt;
&lt;td&gt;Add/subtract months, preserving day-of-month when possible.&lt;/td&gt;
&lt;td&gt;EDate(Date(2026,1,31), 1) → 2026-02-29 (leap handling varies by calendar)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TimeZoneOffset&lt;/td&gt;
&lt;td&gt;Minutes difference from UTC. (Power Fx general)&lt;/td&gt;
&lt;td&gt;TimeZoneOffset(Now()) → e.g., 60 for CET winter.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Tables, Records &amp;amp; Collections
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AddColumns&lt;/td&gt;
&lt;td&gt;Return a table with computed columns added.&lt;/td&gt;
&lt;td&gt;AddColumns([1,2,3], “Squared”, Value*Value) → [{Value:1,Squared:1}, …]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DropColumns&lt;/td&gt;
&lt;td&gt;Remove one or more columns.&lt;/td&gt;
&lt;td&gt;DropColumns(Table, “Temp”) → table without Temp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Count / CountA / CountIf / CountRows&lt;/td&gt;
&lt;td&gt;Count numbers, non‑blank, conditional, or rows.&lt;/td&gt;
&lt;td&gt;CountIf([1,2,3,4], Value&amp;gt;2) → 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filter&lt;/td&gt;
&lt;td&gt;Filter records by a predicate. (Power Fx general)&lt;/td&gt;
&lt;td&gt;Filter(Orders, Amount&amp;gt;1000) → high‑value orders&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search&lt;/td&gt;
&lt;td&gt;Full/partial text search on columns. (Power Fx general)&lt;/td&gt;
&lt;td&gt;Search(Products, “pro”, “Name”) → matching rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LookUp&lt;/td&gt;
&lt;td&gt;First record matching a condition. (Power Fx general)&lt;/td&gt;
&lt;td&gt;LookUp(Users, Email=”&lt;a href="mailto:a@b.com"&gt;a@b.com&lt;/a&gt;”) → record or blank&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distinct&lt;/td&gt;
&lt;td&gt;Unique values from a column/table.&lt;/td&gt;
&lt;td&gt;Distinct([1,1,2]) → [{Value:1},{Value:2}]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First / Last / FirstN / LastN / Index&lt;/td&gt;
&lt;td&gt;Access first/last/Nth items. (Power Fx general)&lt;/td&gt;
&lt;td&gt;Index([10,20,30],2) → 20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ForAll&lt;/td&gt;
&lt;td&gt;Iterate over a table and evaluate an expression. (Power Fx general)&lt;/td&gt;
&lt;td&gt;ForAll([1,2,3], Value*2) → [2,4,6]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AsType&lt;/td&gt;
&lt;td&gt;Treat a record reference as a specific table type.&lt;/td&gt;
&lt;td&gt;AsType(AnyRecord, MySchema) → typed record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Column / ColumnNames&lt;/td&gt;
&lt;td&gt;Retrieve column names/values from a Dynamic value.&lt;/td&gt;
&lt;td&gt;ColumnNames(DynamicVar) → [“Col1”,”Col2”,…]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  JSON &amp;amp; Dynamic
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;Convert a value to a JSON text string. (Power Fx general)&lt;/td&gt;
&lt;td&gt;JSON({a:1}) → “{“a”:1}”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ParseJSON&lt;/td&gt;
&lt;td&gt;Parse JSON text to an untyped object. (Power Fx general)&lt;/td&gt;
&lt;td&gt;ParseJSON(“{"x":5}”) → dynamic with x=5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Colors (useful for UI-rich responses or cards)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ColorValue&lt;/td&gt;
&lt;td&gt;Parse CSS name or hex color.&lt;/td&gt;
&lt;td&gt;ColorValue(“#0078D4”) → color value&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ColorFade&lt;/td&gt;
&lt;td&gt;Fade a color by a percentage.&lt;/td&gt;
&lt;td&gt;ColorFade(ColorValue(“#000000”), 0.5) → 50% lighter&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Web/Encoding Utilities
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EncodeUrl&lt;/td&gt;
&lt;td&gt;URL-encode special characters.&lt;/td&gt;
&lt;td&gt;EncodeUrl(“a+b c”) → “a%2Bb%20c”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EncodeHTML&lt;/td&gt;
&lt;td&gt;Escape characters for HTML context.&lt;/td&gt;
&lt;td&gt;EncodeHTML(“&lt;strong&gt;”) → “&lt;b&gt;”&lt;/b&gt;&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Conversions &amp;amp; Misc
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example → Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dec2Hex&lt;/td&gt;
&lt;td&gt;Convert number to hexadecimal string.&lt;/td&gt;
&lt;td&gt;Dec2Hex(255) → “FF”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boolean&lt;/td&gt;
&lt;td&gt;Convert text/number/dynamic to true/false.&lt;/td&gt;
&lt;td&gt;Boolean(1) → true&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Copilot Studio usage tips (quick)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Variables&lt;/strong&gt; : Always prefix (System., Global., Topic.).&lt;br&gt;&lt;br&gt;
Example in a condition:&lt;br&gt;&lt;br&gt;
If(Global.TotalSales &amp;gt; 100000, “High”, “Normal”)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Literals&lt;/strong&gt; : Enter values in the proper literal format (strings in quotes, tables in [], records in { }).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Number formatting&lt;/strong&gt; : Remember US decimal and comma separators when entering parameters and numeric literals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Power Fx Complements the LLM
&lt;/h2&gt;

&lt;p&gt;As discussed throughout this article, the architectural relationship between Power Fx and the large language model in Copilot Studio reflects a deliberate separation of concerns that addresses the distinct computational requirements of conversational agents in production environments.&lt;/p&gt;

&lt;p&gt;Power Fx provides several specific technical capabilities that prove essential when building agents that must operate reliably within enterprise constraints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic logic for calculations, data manipulations, conditional flows, and business rules.&lt;/strong&gt; When an agent needs to calculate tax amounts, apply discount rules, validate input against business constraints, or execute multi-step conditional logic, these operations must produce identical results every time the same inputs are provided. Power Fx expressions execute predictably, returning the same output for the same input state, which allows developers to reason about agent behavior with confidence. This repeatability becomes particularly important when agents handle financial calculations, regulatory compliance checks, or data transformations that feed into downstream systems, where inconsistencies could cause operational failures or data integrity issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Control over tables and collections, enabling reproducible filters, aggregations, and mappings.&lt;/strong&gt; Agents frequently work with structured data retrieved from Dataverse, SharePoint lists, SQL databases, or external APIs. Power Fx provides a functional programming model for manipulating these datasets, filtering records based on specific criteria, transforming record structures through projection operations, aggregating values across collections, and joining data from multiple sources. These operations occur within the agent runtime without requiring external service calls, reducing latency and simplifying the execution path. The table manipulation functions in Power Fx allow developers to express complex data operations concisely while maintaining clarity about what transformations occur at each step in the conversational flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A predictable mechanism for Adaptive Cards, Instructions, and Tool execution, integrating logic with user interface and automation components.&lt;/strong&gt; Adaptive Cards often display dynamic content computed from variables, user input, or system state. Power Fx expressions embedded within Adaptive Card definitions calculate values for display, control visibility of card elements, and format data appropriately for presentation. Similarly, when configuring instructions that guide the LLM’s behavior or when invoking tools and actions that connect to external systems, Power Fx expressions prepare parameters, validate preconditions, and transform results. This integration keeps computational logic close to the context where it executes, making the agent’s behavior more transparent during development and troubleshooting.&lt;/p&gt;

&lt;p&gt;This architectural division-where the language model handles natural language understanding, intent recognition, and response generation, while Power Fx handles computational operations, data manipulation, and business rule execution—establishes a foundation for building agents that are both conversationally capable and operationally reliable. The LLM component addresses the inherent ambiguity and variability of human language. In contrast, the Power Fx component ensures that the agent’s interactions with data and systems follow defined, testable, and auditable rules. This separation allows organizations to deploy conversational agents for production workloads where behavior must be consistent, outcomes must be traceable, and operations must comply with regulatory or security requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Importance in Agent Flows
&lt;/h2&gt;

&lt;p&gt;Agent flows in Copilot Studio represent the execution layer where autonomous agents interact with external systems and coordinate multi-step processes. These flows handle operations such as invoking HTTP endpoints, querying databases through connectors, triggering approval processes in Power Automate, and coordinating data exchange between Dataverse and third-party systems. Within this execution context, Power Fx functions as the computational layer that bridges conversational state and system integration requirements.&lt;/p&gt;

&lt;p&gt;Specifically, Power Fx expressions within agent flows address several technical requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data validation before action execution.&lt;/strong&gt; Before invoking an external API or updating a database record, the agent must verify that input parameters meet expected formats, fall within acceptable ranges, and satisfy business constraints. Power Fx expressions embedded in condition nodes or within action input configurations evaluate these constraints, preventing invalid data from reaching external systems. For example, validating that a submitted order total matches the sum of line items, or confirming that a date falls within an acceptable scheduling window.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parameter computation for system integrations.&lt;/strong&gt; External actions often require specific data formats, calculated identifiers, or transformed values. Power Fx formulas prepare these parameters by concatenating strings into required formats, converting data types, performing mathematical operations on numeric inputs, or extracting subsets of structured data. This transformation occurs within the agent flow itself, reducing the need for intermediate services or custom code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Routing decisions based on runtime state.&lt;/strong&gt; Agent flows frequently branch based on system variables, user permissions, data lookup results, or environmental conditions. Power Fx expressions in condition nodes evaluate these factors to determine execution paths, directing high-value transactions to approval workflows while auto-approving standard requests, routing requests to different backend systems based on geographic region, or selecting between cached and real-time data sources based on freshness requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistency across repeated executions.&lt;/strong&gt; Agent flows execute repeatedly across many user sessions, often handling similar requests with varying input parameters. Power Fx ensures that identical inputs produce identical computational results, making flow behavior predictable and testable. This consistency allows developers to validate flow logic through test cases and maintain confidence that production behavior matches development expectations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of conversational agents, agent flows, and Power Fx creates a technical architecture in which natural-language interfaces trigger structured automation processes. Understanding Power Fx becomes necessary when building agents that must reliably interact with enterprise systems, enforce business rules during automated processes, or maintain data integrity across conversational and transactional boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;As autonomous agents become increasingly central to enterprise automation, Power Fx has emerged not merely as a convenient formula language but as an essential architectural component that defines the boundary between conversational flexibility and computational precision. The relationship between Power Fx and the large language model in Copilot Studio represents a carefully considered separation of concerns: the LLM excels at interpreting natural language and managing conversational context, while Power Fx provides the deterministic execution layer for calculations, data transformations, business rule enforcement, and state management.&lt;/p&gt;

&lt;p&gt;This architectural division addresses fundamental requirements that enterprise agents must satisfy. When agents handle financial transactions, compliance-sensitive decisions, or integrations with systems of record, they require computational operations that produce identical results under identical conditions, logic that remains auditable and traceable, and behavior that remains consistent across thousands of user interactions. Power Fx delivers these guarantees through its functional programming model, explicit variable scoping with Topic, System, and Global prefixes, and comprehensive libraries for manipulating strings, dates, numbers, tables, and records.&lt;/p&gt;

&lt;p&gt;For enterprise makers working with Dynamics 365, Azure AI, Dataverse, and agent flows, Power Fx competency directly impacts the reliability, maintainability, and regulatory compliance of deployed solutions. Understanding how to implement conditional logic in condition nodes, transform data before action execution, validate input against business constraints, and prepare parameters for external system integrations determines whether conversational agents can transition from experimental prototypes to production workloads that handle critical business processes.&lt;/p&gt;

&lt;p&gt;The examples and patterns discussed throughout this article-from basic date formatting to complex table manipulations, from variable scoping rules to integration with Adaptive Cards and agent flows—form the foundation for building agents that are both conversationally capable and operationally reliable. As the Power Platform continues to evolve and autonomous agents take on increasingly sophisticated responsibilities, deep familiarity with Power Fx will remain a core competency for anyone designing enterprise-grade conversational automation within Microsoft’s ecosystem.&lt;/p&gt;

</description>
      <category>copilotstudio</category>
      <category>automation</category>
      <category>powerfx</category>
      <category>lowcode</category>
    </item>
    <item>
      <title>Building Intelligent, Agentic Applications in VS Code - A Technical Deep Dive into the AI Toolkit Extension Pack</title>
      <dc:creator>Holger Imbery</dc:creator>
      <pubDate>Sat, 27 Dec 2025 05:38:53 +0000</pubDate>
      <link>https://forem.com/holgerimbery/building-intelligent-agentic-applications-in-vs-code-a-technical-deep-dive-into-the-ai-toolkit-49ai</link>
      <guid>https://forem.com/holgerimbery/building-intelligent-agentic-applications-in-vs-code-a-technical-deep-dive-into-the-ai-toolkit-49ai</guid>
      <description>&lt;p&gt;&lt;em&gt;Technical overview and configuration guide (December 2025)&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary Lede&lt;/strong&gt; The Visual Studio Code AI Toolkit Extension Pack now delivers a complete local-to-cloud development workflow for intelligent, agentic applications—from interactive prototyping and multi-agent visualization to single-click deployment on Microsoft Foundry.&lt;br&gt;&lt;br&gt;
Recent updates expand the model catalog (including Anthropic Claude variants), introduce graph-based workflow visualization, and enable seamless conversion between declarative YAML and code-first agent implementations.&lt;br&gt;&lt;br&gt;
Combined with integrated tracing, evaluation frameworks, and MCP tool support, the toolkit transforms VS Code into a production-grade environment for building AI systems that are observable, testable, and operationally ready.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why read this article?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you’re building AI agents or intelligent applications, this deep dive shows you how to move from local experimentation to production deployment without changing tools or rewriting workflows.&lt;/p&gt;

&lt;p&gt;You’ll learn the technical architecture behind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model selection&lt;/li&gt;
&lt;li&gt;Agent orchestration&lt;/li&gt;
&lt;li&gt;MCP tool integration&lt;/li&gt;
&lt;li&gt;Evaluation pipelines&lt;/li&gt;
&lt;li&gt;Runtime tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Complete with concrete YAML-to-code examples in Python and C#.&lt;/p&gt;

&lt;p&gt;Whether you’re prototyping your first agent or scaling multi-agent systems, this guide maps the capabilities that reduce friction, improve quality assurance, and accelerate time-to-value.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Matters Technologically
&lt;/h2&gt;

&lt;p&gt;Modern intelligent applications are rarely monolithic; they are composed of multiple agents, external tools, datasets, and evaluation loops. The latest update explicitly targets this reality by enabling &lt;strong&gt;local-to-cloud continuity&lt;/strong&gt; : developers can author agents, run and trace them locally in VS Code, and then deploy to Microsoft Foundry with one click—preserving workflow context for orchestration, visualization, and evaluation in the cloud.&lt;/p&gt;

&lt;p&gt;This unification reduces friction between prototyping and production, so the same artifacts (YAML workflows, agent code, test scaffolds, traces) remain coherent across environments.&lt;/p&gt;

&lt;p&gt;The Intelligent Apps documentation complements that with a detailed breakdown of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model Tools&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent &amp;amp; Workflow Tools&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP Workflow&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it clear how the extension organizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resources&lt;/strong&gt; (models, agents, MCP servers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; (playground, bulk runs, evaluation, fine‑tuning, conversion, tracing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The effect is a predictable, composable development experience built on VS Code conventions, so teams can adopt AI features without retooling their entire stack.&lt;a href="https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio" rel="noopener noreferrer"&gt;Download the extension&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Business Benefits: From Experimentation to Operability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Faster Time-to-Value&lt;/strong&gt; :&lt;br&gt;&lt;br&gt;
Local debugging, hosted agent playgrounds, unified visualization, and single‑click deployment shorten the path from experimentation to a hosted, observable agent system. This reduces handoffs and rework, which are typical cost centers in AI projects transitioning to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance and Evaluability&lt;/strong&gt; :&lt;br&gt;&lt;br&gt;
The toolchain supports structured evaluation (metrics, tasks, datasets) and integrates tracing and graph visualization for multi‑agent workflows. This makes quality assessment and incident response more systematic, improving confidence in releases and reducing operational risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Choice and Portability&lt;/strong&gt; :&lt;br&gt;&lt;br&gt;
The model catalog spans hosted providers and local runtimes (ONNX, Ollama) and includes recent Anthropic models. Organizations can align model selection to privacy constraints, latency targets, and cost envelopes without changing development surfaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incremental Adoption&lt;/strong&gt; :&lt;br&gt;&lt;br&gt;
YAML‑based declarative workflows can be converted into code aligned with the Microsoft Agent Framework—enabling teams to start simple and progressively customize. This staged path mitigates the risk of over‑engineering early while preserving the option to extend later.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Detailed Overview of the VS Code AI Toolkit Extension
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio" rel="noopener noreferrer"&gt;extension&lt;/a&gt; exposes a structured VS Code view with distinct sections that map cleanly to an AI app’s lifecycle. Below is a technical tour of the major components.&lt;/p&gt;
&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Models&lt;/strong&gt; : Lists deployed and available models you can use in projects. It is the anchor for selecting runtime backends before entering playgrounds or builders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents&lt;/strong&gt; : Displays agents you have created or deployed through the toolkit. This centralizes agent artifacts used by downstream tools such as bulk runs and tracing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Servers&lt;/strong&gt; : Enumerates Model Context Protocol servers you’ve added, which provide tool‑use capabilities (databases, APIs, services). This turns agents from pure language generators into action‑taking systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  Model Tools
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model Catalog&lt;/strong&gt; : A unified browser over models from GitHub, ONNX, Ollama, OpenAI, Anthropic, Google, and others. Engineers can compare options and evaluate tradeoffs before binding a model to an agent. The Ignite update explicitly calls out Anthropic Claude variants now accessible here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5e8w3jzmj1epmasht2v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5e8w3jzmj1epmasht2v.png" alt="upgit_20251226_1766741165.png" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Playground&lt;/strong&gt; : An interactive chat/test surface for prompts, parameters, and multimodal inputs. It’s designed for rapid hypothesis testing—ideal before formalizing agent instructions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjv4zenkn47c88gjevvg3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjv4zenkn47c88gjevvg3.png" alt="upgit_20251226_1766741269.png" width="800" height="505"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conversion&lt;/strong&gt; : Utilities for converting, quantizing, and optimizing pretrained models for local execution (CPU/GPU/NPU). This aids portability and cost control, particularly for edge scenarios.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Fine‑tuning&lt;/strong&gt; : Workflows to adapt foundation models using custom datasets either locally (GPU) or in Azure Container Apps (GPU). This allows domain specialization without abandoning the VS Code environment.&lt;/p&gt;
&lt;h3&gt;
  
  
  Agent &amp;amp; Workflow Tools
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Agent Builder&lt;/strong&gt; : Previously known as Prompt Builder, this now emphasizes agent construction—authoring instructions, integrating tools (MCP servers), and emitting production‑ready code with structured outputs. Engineers can scaffold agents in Python or .NET, add function/tool calls, and iterate interactively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrv1g1kcx2qozbt4mpa6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrv1g1kcx2qozbt4mpa6.png" alt="upgit_20251226_1766741356.png" width="800" height="510"&gt;&lt;/a&gt; &lt;strong&gt;Bulk Run&lt;/strong&gt; : Batch execution across multiple models or prompts to compare outputs at scale. This is essential for regression testing and prompt robustness analysis.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Evaluation&lt;/strong&gt; : Dataset‑driven assessment using metrics such as relevance, similarity, coherence, and task‑specific criteria. It complements bulk runs to form a measurable confidence baseline.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Tracing&lt;/strong&gt; : Runtime telemetry over reasoning steps, tool calls, and latency hotspots. When paired with multi‑agent visualization, tracing enables efficient root‑cause analysis and performance tuning.&lt;/p&gt;
&lt;h3&gt;
  
  
  MCP Workflow
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Add MCP Server / Create New MCP Server&lt;/strong&gt; : Discover featured MCP servers, validate local Node/Python environments, and scaffold new servers. This extends agents with secure, composable capabilities to query systems, invoke APIs, and manipulate external state.&lt;/p&gt;
&lt;h3&gt;
  
  
  Visualization and Round‑Trip Development
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Graph Visualizer&lt;/strong&gt; : The Ignite update introduces interactive visualization of multi‑agent workflows, illuminating node execution and connections. This makes debugging and understanding complex orchestration feasible within the editor.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Local‑to‑Cloud Roundtrip&lt;/strong&gt; : Seamless switching between VS Code and the Foundry portal for YAML workflows, playgrounds, and templates—plus single‑click deployment. This aligns dev, ops, and evaluation artifacts without format drift.&lt;/p&gt;
&lt;h2&gt;
  
  
  Development Workflow: From YAML to Code and Back
&lt;/h2&gt;

&lt;p&gt;A pragmatic pattern for teams is to begin with declarative YAML workflows in Foundry for clarity and quick iteration. When customization demands increase—e.g., custom tool‑use, advanced control flow, or instrumentation—convert YAML to Agent Framework code and continue in Agent Builder. GitHub Copilot scaffolding helps produce maintainable code with tracing hooks and test scaffolding. This reduces context switching and preserves lineage from declarative prototypes to code‑first implementations.&lt;/p&gt;
&lt;h2&gt;
  
  
  Evaluation, Quality, and Operability
&lt;/h2&gt;

&lt;p&gt;Robust AI systems demand measurable quality. The toolkit’s Evaluation module standardizes dataset‑based comparisons and metric computation, and the Tracing module surfaces runtime behaviors, tool invocations, and timing. In multi‑agent setups, the graph visualizer further supports operational debugging. Collectively, these capabilities enable repeatable tests, incident triage, and performance baselining before and after deployment.&lt;/p&gt;
&lt;h2&gt;
  
  
  Model Strategy and Local Execution
&lt;/h2&gt;

&lt;p&gt;The extension’s catalog allows mixing hosted providers with local execution via ONNX and Ollama. Local modes are valuable for privacy (data residency), latency (edge), and cost (avoiding per‑token charges for specific workloads). Where domain‑specific performance is needed, fine‑tuning in local GPUs or Azure Container Apps helps align responses to proprietary data without a wholesale platform shift. Model conversion ensures the operational footprint matches target hardware, including NPU acceleration paths.&lt;/p&gt;
&lt;h2&gt;
  
  
  Practical Onboarding
&lt;/h2&gt;

&lt;p&gt;Installation is straightforward through the Marketplace; once installed, the AI Toolkit icon appears in the Activity Bar. The Get Started walkthrough introduces you to playground usage and guides you through key views. From there, the usual order of operations is: pick a model in the Catalog, validate prompts in the Playground, construct an agent in Agent Builder (adding MCP tools where necessary), test at scale with Bulk Run, evaluate with datasets and metrics, trace runtime behavior, and finally deploy to Foundry for managed orchestration and visualization.&lt;/p&gt;
&lt;h2&gt;
  
  
  YAML to Code Example: Customer Support Agent with MCP Tools
&lt;/h2&gt;

&lt;p&gt;A minimal, illustrative example that shows how a declarative YAML workflow (as you’d author in Microsoft Foundry and open in VS Code) can be “converted” into a code-first agent implementation in Python using the same concepts—model selection, agent instructions, tool use via MCP servers, tracing, and structured outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; : The YAML schema for Foundry agent workflows can evolve; the sample below is intentionally simplified to make the mapping clear. In practice, you’d open YAML workflows in VS Code and iterate or move to code using the AI Toolkit’s Agent Builder with Copilot-assisted scaffolding. The latest update explicitly highlights YAML workflows and code-first customization, MCP tool integration, and multi-agent visualization.&lt;/p&gt;
&lt;h3&gt;
  
  
  YAML (declarative) — simplified agent workflow
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# file: customer_support_agent.yaml
version: 0.3
workflow:
  name: customer_support_agent
  description: &amp;gt;
    Respond to customer queries by retrieving account/order context
    and grounding the answer in internal knowledge and web documentation.

runtime:
  model: anthropic/claude-sonnet-4.5 # Example model from the catalog
  temperature: 0.2
  max_tokens: 1024
  trace: true

inputs:
  - name: user_query
    type: string
    required: true
  - name: user_id
    type: string
    required: true

instructions:
  role: system
  content: |
    You are a customer support agent. Always ground answers in retrieved data.
    If data is missing or uncertain, ask a clarifying question.
    Return a structured JSON with fields: `answer`, `sources`, `confidence`.

tools:
  - type: mcp
    name: mcp-postgres
    title: "Customer DB"
    actions:
      - id: fetch_user_context
        description: "Get profile, subscription status, recent orders."
        parameters:
          user_id: string
  - type: mcp
    name: mcp-knowledgebase
    title: "Internal KB"
    actions:
      - id: kb_search
        description: "Search internal support articles"
        parameters:
          query: string
  - type: http
    name: web_docs
    base_url: "https://example-docs.company.com/api/search"
    actions:
      - id: external_doc_search
        parameters:
          query: string

orchestration:
  steps:
    - id: get_context
      call: fetch_user_context
      with:
        user_id: ""
    - id: kb
      call: kb_search
      with:
        query: ""
    - id: docs
      call: external_doc_search
      with:
        query: ""
    - id: answer
      llm:
        prompt: |
          User question: 
          Context: 
          KB hits: 
          External docs: 
          Compose an actionable answer. Include citations and confidence score.
      output_schema:
        type: object
        properties:
          answer: { type: string }
          sources: { type: array, items: { type: string } }
          confidence: { type: number }

policies:
  grounding: required
  pii_handling: redaction

outputs:
  from_step: answer

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What this conveys:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model/runtime settings (model name, temperature, tracing).&lt;/li&gt;
&lt;li&gt;Inputs the agent needs (user query, user ID).&lt;/li&gt;
&lt;li&gt;System instructions guiding behavior.&lt;/li&gt;
&lt;li&gt;Tools (two MCP servers + an HTTP tool) and action signatures.&lt;/li&gt;
&lt;li&gt;Orchestration that calls tools, then composes a final LLM answer with a structured schema.&lt;/li&gt;
&lt;li&gt;Policies for grounding and PII handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These elements align with the AI Toolkit guidance on agent instructions, MCP tool use, and structured outputs; YAML workflows are explicitly highlighted in the latest update to facilitate easy editing and conversion for code-first customization.&lt;/p&gt;
&lt;h3&gt;
  
  
  Equivalent code (Python) — agent, tools, tracing, structured output
&lt;/h3&gt;

&lt;p&gt;Below is a conceptual Python implementation that mirrors the YAML’s behavior. It registers tools (including MCP-backed actions), defines agent instructions, handles orchestration, and returns a typed result. In VS Code, you’d typically scaffold this in Agent Builder and refine it with Copilot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# file: customer_support_agent.py
from dataclasses import dataclass
from typing import List, Dict, Any, Optional

# --- Model wrapper (conceptual) ---
class LLM:
    def __init__ (self, name: str, temperature: float = 0.2, max_tokens: int = 1024, trace: bool = True):
        self.name = name
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.trace = trace

    def invoke(self, system: str, prompt: str, tools_context: Dict[str, Any]) -&amp;gt; Dict[str, Any]:
        """
        Invoke the model with system instructions + composed prompt.
        Tracing hooks would capture tool calls, token usage, latency, etc.
        In AI Toolkit, tracing is integrated; here we show conceptually.
        """
        # ... integrate with your runtime (e.g., Anthropic/OpenAI via AI Toolkit)
        # Return a structured JSON per output schema.
        return {
            "answer": "Here is a grounded response with steps and references...",
            "sources": tools_context.get("sources", []),
            "confidence": 0.82
        }

# --- MCP tool interfaces (conceptual stubs) ---
@dataclass
class MCPAction:
    id: str
    description: str

class MCPServer:
    def __init__ (self, name: str, title: str, actions: List[MCPAction]):
        self.name = name
        self.title = title
        self.actions = {a.id: a for a in actions}

    def call(self, action_id: str, **kwargs) -&amp;gt; Dict[str, Any]:
        # In VS Code, you configure MCP servers in Agent Builder; the runtime validates
        # your Node/Python environment and lets agents call tools programmatically.
        # Here we simulate dispatch.
        # e.g., fetch_user_context(user_id=...), kb_search(query=...), etc.
        if action_id == "fetch_user_context":
            return {"user": {"id": kwargs["user_id"], "tier": "Pro"}, "orders": [{"id": "O-1001", "status": "shipped"}]}
        if action_id == "kb_search":
            return {"hits": [{"id": "KB-42", "title": "Reset password"}, {"id": "KB-7", "title": "Refund policy"}]}
        raise KeyError(f"Unknown action: {action_id}")

# --- HTTP tool (simplified) ---
class HttpTool:
    def __init__ (self, base_url: str):
        self.base_url = base_url
    def get(self, endpoint: str, **params) -&amp;gt; Dict[str, Any]:
        # Replace with real HTTP calls or use an MCP wrapper that performs HTTP.
        return {"hits": [{"url": "https://example-docs.company.com/articles/abc", "title": "Policy overview"}]}

# --- Structured output schema ---
@dataclass
class AgentOutput:
    answer: str
    sources: List[str]
    confidence: float

# --- Agent implementation ---
class CustomerSupportAgent:
    def __init__ (self, model_name: str = "anthropic/claude-sonnet-4.5"):
        self.llm = LLM(name=model_name, temperature=0.2, max_tokens=1024, trace=True)
        # Tool registry mirrors YAML:
        self.db = MCPServer(
            name="mcp-postgres",
            title="Customer DB",
            actions=[MCPAction(id="fetch_user_context", description="Get profile, subscription status, recent orders.")]
        )
        self.kb = MCPServer(
            name="mcp-knowledgebase",
            title="Internal KB",
            actions=[MCPAction(id="kb_search", description="Search internal support articles")]
        )
        self.web = HttpTool(base_url="https://example-docs.company.com/api/search")

        # System instructions (from YAML 'instructions'):
        self.system_instructions = (
            "You are a customer support agent. Always ground answers in retrieved data. "
            "If data is missing or uncertain, ask a clarifying question. "
            "Return a structured JSON with fields: `answer`, `sources`, `confidence`."
        )

    def run(self, user_query: str, user_id: str) -&amp;gt; AgentOutput:
        # Orchestration steps:

        # 1) fetch_user_context
        context = self.db.call("fetch_user_context", user_id=user_id)

        # 2) kb_search
        kb_hits = self.kb.call("kb_search", query=user_query)

        # 3) external_doc_search (HTTP, simplified)
        docs_hits = self.web.get("/search", query=user_query)

        # Compose prompt (YAML step 'answer.llm.prompt')
        prompt = (
            f"User question: {user_query}\n"
            f"Context: {context}\n"
            f"KB hits: {kb_hits}\n"
            f"External docs: {docs_hits}\n"
            "Compose an actionable answer. Include citations and confidence score.\n"
        )

        # Invoke the LLM with tracing enabled (per runtime.trace)
        result = self.llm.invoke(
            system=self.system_instructions,
            prompt=prompt,
            tools_context={
                "sources": [
                    *(h.get("id") for h in kb_hits.get("hits", [])),
                    *(hit.get("url") for hit in docs_hits.get("hits", []))
                ]
            }
        )

        # Validate against the expected schema
        return AgentOutput(
            answer=result["answer"],
            sources=result.get("sources", []),
            confidence=float(result.get("confidence", 0.0))
        )

# --- Example usage ---
if __name__ == " __main__":
    agent = CustomerSupportAgent()
    output = agent.run(user_query="How do I update my billing address?", user_id="U-12345")
    print(output)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;System Instructions and Runtime Configuration&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The system instructions and runtime parameters (model selection, temperature, tracing) translate directly from YAML declarations into code constructor arguments or instance state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Tool Integration&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
MCP tools defined in YAML become instantiated server objects with callable actions in your code. VS Code’s Agent Builder validates the execution environment and assists with configuring featured MCP servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestration Flow&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The orchestration steps declared in YAML map to explicit, ordered method calls in code. These calls retrieve context from tool results and ultimately compose an LLM invocation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured Output Enforcement&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The YAML output schema corresponds to a typed data structure (such as an &lt;code&gt;AgentOutput&lt;/code&gt; dataclass in Python or record in C#), ensuring type safety and schema compliance at the agent boundary.&lt;/p&gt;
&lt;h3&gt;
  
  
  How you’d do this inside VS Code AI Toolkit
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open the YAML workflow in VS Code&lt;/strong&gt; (via Foundry extension) and review steps, tools, and instructions. The latest update enables round‑trip editing and single‑click deployment between VS Code and Foundry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use Agent Builder to scaffold an agent&lt;/strong&gt; : &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paste the system instructions&lt;/li&gt;
&lt;li&gt;Set model/runtime parameters&lt;/li&gt;
&lt;li&gt;Attach MCP servers (featured MCP servers are discoverable and validated)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Translate orchestration steps&lt;/strong&gt; : &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For each YAML step, create corresponding code functions/method calls&lt;/li&gt;
&lt;li&gt;Copilot can generate the scaffolds and function signatures aligned with best‑practice patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run locally with tracing&lt;/strong&gt; and iterate in the Hosted Agents Playground: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the graph visualizer to understand multi‑agent flows&lt;/li&gt;
&lt;li&gt;Debug step execution with integrated tracing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Evaluate using datasets and metrics&lt;/strong&gt; : &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test relevance, similarity, coherence, or task‑specific criteria&lt;/li&gt;
&lt;li&gt;When the agent meets acceptance criteria, deploy to Foundry to manage orchestration and enterprise hosting&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Optional: C# skeleton for the same agent
&lt;/h3&gt;

&lt;p&gt;If your runtime preference is .NET, here’s a skeleton showing the exact mapping. In VS Code, you’d choose the language when scaffolding the agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// file: CustomerSupportAgent.cs (conceptual)
public sealed class CustomerSupportAgent
{
    private readonly Llm _llm;
    private readonly McpServer _db;
    private readonly McpServer _kb;
    private readonly HttpTool _web;
    private readonly string _systemInstructions =
        "You are a customer support agent... Return JSON with `answer`, `sources`, `confidence`.";

    public CustomerSupportAgent(string modelName = "anthropic/claude-sonnet-4.5")
    {
        _llm = new Llm(modelName, temperature: 0.2, maxTokens: 1024, trace: true);
        _db = McpServer.Create("mcp-postgres")
                       .WithAction("fetch_user_context", "Get profile, subscription status, recent orders.");
        _kb = McpServer.Create("mcp-knowledgebase")
                       .WithAction("kb_search", "Search internal support articles");
        _web = new HttpTool("https://example-docs.company.com/api/search");
    }

    public AgentOutput Run(string userQuery, string userId)
    {
        var context = _db.Call("fetch_user_context", new { user_id = userId });
        var kbHits = _kb.Call("kb_search", new { query = userQuery });
        var docs = _web.Get("/search", new { query = userQuery });

        var prompt = $@"User question: {userQuery}
Context: {Serialize(context)}
KB hits: {Serialize(kbHits)}
External docs: {Serialize(docs)}
Compose an actionable answer. Include citations and confidence score.";

        var result = _llm.Invoke(_systemInstructions, prompt, toolsContext: new {
            sources = ExtractSources(kbHits, docs)
        });

        return new AgentOutput(result.answer, result.sources, (double)result.confidence);
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mapping cheat‑sheet (YAML → Code)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;YAML Element&lt;/th&gt;
&lt;th&gt;Code Mapping&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;runtime.model&lt;/code&gt;, &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;max_tokens&lt;/code&gt;, &lt;code&gt;trace&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;LLM constructor/fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;instructions.role: system&lt;/code&gt; / &lt;code&gt;content&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Agent’s system prompt string passed as &lt;code&gt;system&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;tools.type: mcp&lt;/code&gt; and &lt;code&gt;actions&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;MCP server objects with callable actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;orchestration.steps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ordered function/method calls culminating in &lt;code&gt;llm.invoke(...)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;output_schema&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dataclass/DTO enforcing structured return types&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The VS Code AI Toolkit Extension Pack, especially with the latest update, provides a comprehensive environment for building intelligent, agentic applications that span local experimentation to cloud deployment.&lt;/p&gt;

&lt;p&gt;By supporting declarative YAML workflows alongside code-first implementations in Python and C#, it offers flexibility for teams at different maturity levels.&lt;/p&gt;

&lt;p&gt;The integration of MCP tools, tracing, evaluation, and multi-agent visualization addresses key challenges in operability and quality assurance.&lt;/p&gt;

&lt;p&gt;This deep dive illustrated how to leverage these capabilities effectively, empowering developers to create robust AI systems with confidence and efficiency.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>development</category>
      <category>vscode</category>
      <category>aitoolkit</category>
    </item>
  </channel>
</rss>
