Forem: Holger Imbery

Power Platform Governance: A Practitioner's Reference (Part 2)

Holger Imbery — Sat, 23 May 2026 00:00:00 +0000

Summary lede. Part 2 of this two-article reference picks up where last week’s foundations ended. With the platform perimeter in place — environments, Managed Environments, environment groups, DLP, tenant isolation, identity, and monitoring — this article works through the resources that live inside that perimeter and the processes that move them through it: Copilot Studio agents, agent authentication and connector governance, Application Lifecycle Management and pipelines, solution checker and quality gates, licensing and capacity governance, and change management and release rings. A closing recap then ties both articles together into a single posture.

Who this is for. Same audience as part 1 — Power Platform administrators, security and compliance officers, Center of Excellence leads, architects, and makers. Part 2 assumes you have read part 1 or have an equivalent grasp of environment design, Managed Environments, environment groups, and DLP. Where part 1 answered “what is the perimeter and how do we hold it?”, part 2 answers “what gets built inside it, and how does that work move from a maker’s hands into production safely?” The format is unchanged: each control begins with the reason it exists, moves to the how, and points you to the authoritative Microsoft Learn page.

A note on the kits (continues from part 1).

Two pieces of widely-deployed community tooling have changed status: the CoE Starter Kit is no longer actively maintained — its core capabilities are now part of the Power Platform admin center (Inventory, Usage, Monitor, Actions) — and the ALM Accelerator for Power Platform is formally deprecated, with Power Platform Pipelines as the named replacement. Sections 9.14 (agent inventory) and 11 (ALM and pipelines) reflect that. The detailed write-up of the CoE Starter Kit transition lives in part 1, section 8.6; the ALM Accelerator transition is summarised in section 11.6 of this part.

Recap: where part 1 left off

As we saw last week, platform-level governance is the foundation that everything else rests on. Part 1 walked through eight sections of foundations:

Environment strategy. Environments are the unit of governance because DLP, security roles, capacity, and compliance boundaries all follow environment lines. A topology that mirrors your delivery lifecycle (default for personal productivity, dev/test/prod for solutions, dedicated CoE and training environments) gives every later control a place to attach.
The default environment. It cannot be deleted, every licensed user is a maker in it, and Dataverse cannot be removed once added. Renaming it, attaching a restrictive DLP policy, enabling default environment routing, and converting it to a Managed Environment turn it from a shadow-IT magnet into a known, bounded surface.
Managed Environments. A premium governance layer that adds sharing limits, the weekly digest, solution checker enforcement, maker welcome content, pipelines hosting, IP firewall, extended backup, and DLP for desktop flows. Most of part 2’s controls depend on Managed Environments being on.
Environment groups and rules. Groups apply policy in bulk to Managed Environments. When a rule is published at the group level, the corresponding setting becomes read-only inside each member environment, which is what makes policy enforceable rather than advisory.
Data Loss Prevention. Connectors are sorted into Business, Non-business, and Blocked. Data cannot flow between Business and Non-business in the same resource. Endpoint filtering and connector action control let you keep useful connectors while blocking their dangerous edges.
Tenant isolation and cross-tenant controls. Restricts which external Entra ID tenants can be the identity source for connections. Defaults differ by tenant age — tenants created on or after 30 March 2026 ship with isolation enabled and both directions defaulting to deny.
Security roles and identity. Layered authorization: Entra ID at the tenant, Dataverse roles inside each environment, service principals for non-interactive work, Conditional Access and MFA on the Power Platform API.
Monitoring, analytics, and tenant inventory. Built-in admin center analytics (the Inventory , Usage , Monitor , and Actions experiences), the weekly digest, Microsoft Purview audit log, Application Insights export, and Microsoft Sentinel for SOC integration. The CoE Starter Kit is no longer actively maintained; its inventory and attestation scenarios are now Microsoft’s responsibility in the admin center, reachable through the Power Platform inventory API and the Power Platform for Admins V2 connector for any custom automation.

The minimum viable foundation from part 1 was: restrict default environment creation, attach a tenant-wide baseline DLP policy, enable default environment routing with a Managed Environments group, turn on tenant-level analytics, lean on the Power Platform admin center’s Inventory , Usage , Monitor , and Actions experiences (the CoE Starter Kit is no longer actively maintained), and verify your tenant isolation defaults. Everything in part 2 assumes that foundation is in place. The section numbering continues from part 1, starting at section 9.

9. Governing Microsoft Copilot Studio

Copilot Studio agents are Power Platform resources. They are stored in environments, secured with Dataverse, shared through the same sharing model as apps and flows, and subject to DLP policies. A Copilot Studio agent is not a separate governance domain; it is a new resource type inside the platform perimeter. The controls below are specific to agents and combine with the platform-wide ones covered in part 1.

9.1 Decide agent posture in tenant settings

Why. The most basic question — “should agents exist in this tenant at all, and where?” — is answered at the tenant level. Skipping this step makes every later control fight a current.

How. In Power Platform admin center > Settings > Tenant settings , control whether Copilot Studio is enabled, who can create agents, whether agents can be created in the default environment, and which AI features (generative answers, autonomous triggers, external models) are available.

Learn more. Tenant settings reference.

9.2 Use the Copilot Studio admin center for agent-specific controls

Why. Some agent controls live outside the Power Platform admin center because they are specific to Copilot Studio’s data model.

How. Visit admin.copilotstudio.microsoft.com to control sharing org-wide, tenant-wide data access settings for agents, and transcript retention.

Learn more. Copilot Studio admin center.

9.3 Restrict who can share agents tenant-wide

Why. “Share with my entire organization” is a one-click action that bypasses normal sharing limits. Without restriction, any maker can publish a tenant-wide agent.

How. The recent tenant setting “Choose who has permission to share agents with your entire organization” allows three modes: all users, no users, or specific groups. Restrict to a CoE or approvals group for production-grade sharing.

Learn more. Control how agents are shared.

9.4 Cap agent sharing per environment

Why. Even with tenant-wide sharing locked down, intra-tenant oversharing is still possible. Sharing limits cap blast radius per agent.

How. In a Managed Environment, apply Limit sharing to cap viewers per agent and disable sharing via security groups. Mechanics are in part 1, section 3.

Learn more. Limit sharing.

9.5 Apply group rules for agent sharing

Why. Sharing limits per environment are tedious to manage at scale. Group rules let you apply the same sharing posture across many environments at once.

How. Use the Sharing agents with Editor permissions and Sharing agents with Viewer permissions environment group rules.

Learn more. Rules for environment groups.

9.6 Set maker welcome content for Copilot Studio

Why. Most makers entering Copilot Studio will not have read your governance documentation. The welcome banner is the cheapest place to put guardrails in front of them.

How. Configure maker welcome content in Managed Environments to surface agent guidelines as makers enter Copilot Studio.

Learn more. Copilot Studio security and governance.

9.7 Run solution checker on agent solutions

Why. Static analysis catches structural problems in agents the same way it does in apps and flows — missing fallbacks, knowledge sources without authentication, references to blocked connectors.

How. Solution checker runs at solution import in Managed Environments. Mechanics are covered in section 12.

Learn more. Solution checker enforcement.

9.8 Govern AI features per environment group

Why. Generative and external AI features carry different risk profiles in different environments. Production with regulated data needs a stricter stance than a pilot environment.

How. Use the AI-powered Copilot features, Generative AI settings, External models, AI prompts, and Preview and experimental AI models rules to turn capabilities off where they should not be available.

Learn more. Rules for environment groups.

9.9 Control transcript access

Why. Conversation transcripts can contain sensitive prompts and customer data. Default-on transcript retention is a poor fit for regulated environments.

How. The Accessing transcripts from conversations in Copilot Studio agents rule turns transcript retention off for environments in regulated groups.

Learn more. Rules for environment groups.

9.10 Block high-risk agent connectors in DLP

Why. DLP on agent connectors is the only way to stop entire classes of bad behavior at design time — public agents, ad-hoc HTTP, autonomous triggers — without playing whack-a-mole.

How. Common blocks for regulated tenants:

Chat without Microsoft Entra ID authentication in Copilot Studio — prevents publishing public, unauthenticated agents.
Direct Line channels in Copilot Studio — block if only Teams, SharePoint, and Microsoft 365 Copilot channels are approved.
Skills — prevents invocation of arbitrary Bot Framework skills.
HTTP requests — prevents ad-hoc outbound calls from agents.
Event triggers — disables autonomous agents until they are approved.

Enforcement is real time; makers see validation errors while authoring.

Learn more. Configure data policies for agents.

9.11 Constrain knowledge sources

Why. Knowledge sources are how agents reach data. A loose knowledge configuration is a loose data perimeter.

How. Apply scope-aware controls to each source type:

A broad SharePoint URL grants the agent the site’s entire content tree — to makers who already have access. Grant scope at the document library or folder level where possible.
Public website knowledge is fetched at runtime; treat it as untrusted input for prompt-injection exposure.
Files uploaded as knowledge are stored in the agent’s Dataverse tables and inherit environment backup and data residency.

Use sensitivity labels and Microsoft Purview to label and classify content before it is indexed.

Learn more. Copilot Studio security and governance.

9.12 Pick an authentication mode per agent

Why. Authentication mode determines whether sharing limits apply at all. Sharing limits apply only to agents that require authentication; public agents bypass the sharing model entirely.

How. Three modes exist:

Authenticate with Microsoft (Entra ID, default) — recommended for regulated environments.
Authenticate manually — generic OAuth.
No authentication — public; not recommended for tenants with sensitive data.

Require Entra ID authentication in regulated environments by blocking the no-authentication connector in DLP. Blocking no-authentication is the prerequisite for sharing limits to be meaningful at all.

Learn more. Configure user authentication.

9.13 Approve a deployment channel matrix

Why. Each channel has its own risk profile and operational model. Without an approved matrix, channel selection becomes a maker-by-maker decision.

How. Enforce an approved channel matrix through tenant settings and DLP:

Channel	Control
Microsoft Teams	Admin-approved app catalog; manifests are solution-aware
Microsoft 365 Copilot	Tenant Copilot extension governance
SharePoint	Site permissions; viewer licensing applies at runtime
Custom website	Block for regulated environments; pair with IP allow-list
Direct Line	Block by default

Learn more. Copilot Studio security and governance.

9.14 Monitor agents with the right surface for the question

Why. Different questions have different right surfaces. Behavior questions, audit questions, and inventory questions each have a primary tool.

How. Match the surface to the question:

Agent analytics in Copilot Studio for sessions, resolution rate, and escalations.
Copilot Studio audit logs in Microsoft Purview for create, publish, share, knowledge change, and tool add events.
Application Insights export (Managed Environments) for detailed traces.
Power Platform admin center Inventory (and the Power Platform inventory API / Power Platform for Admins V2 connector) for owner, environment, authentication mode, and last publish across all agents in the tenant. This is the replacement surface for the CoE Starter Kit’s Power Platform agents inventory table, which is no longer actively maintained.

Learn more. Copilot Studio audit logs in Purview, Power Platform inventory API.

9.15 Treat autonomous agents as a higher-risk class

Why. Autonomous (event-triggered) agents run without interactive sessions, which expands the attack surface and removes the human-in-the-loop signal that interactive agents give you.

How. Keep autonomous agents in a dedicated environment, require code review on triggers, and disable them with the Event triggers DLP connector in environments that are not ready.

Learn more. Configure data policies for agents.

9.16 Govern MCP servers as connectors

Why. Model Context Protocol (MCP) servers extend agents with external tools. From a governance standpoint, each MCP server is an external connector; treating it informally creates an unaudited integration path.

How. Treat each MCP server as a custom connector for governance: review the server code or vendor, restrict outbound network paths, and block ad-hoc registration by makers.

Learn more. Agents and MCP.

10. Agent Authentication and Connector Governance

Agents built in Copilot Studio interact with data and services through connectors, knowledge sources, and tools. Connector governance for agents uses the same Power Platform DLP engine applied to Power Apps and Power Automate, but a few controls are agent-specific.

10.1 Choose an authentication mode

Why. Authentication mode determines who can use the agent and whether platform sharing controls apply at all.

How. Pick one of three modes:

Authenticate with Microsoft (default) — uses Microsoft Entra ID. The agent requires sign-in; sharing limits apply.
Authenticate manually — generic OAuth 2.0 (Okta, Auth0, custom IdP). Agent requires sign-in but identity lives outside Entra ID.
No authentication — public link or embedded web chat. Sharing limits are bypassed.

Block the connector Chat without Microsoft Entra ID authentication in Copilot Studio in the tenant baseline DLP to remove the No authentication option entirely.

Learn more. Configure user authentication.

10.2 Use Entra Agent ID for outbound calls

Why. Older agent integrations relied on stored client secrets, which are awkward to rotate and easy to leak. Entra Agent ID replaces that pattern with a managed identity model that the platform owns.

How. Each Copilot Studio agent is automatically provisioned with its own Entra Agent ID — a Microsoft Entra service principal of subtype Agent — when the agent is created. The agent presents this identity on outbound calls. Administrators do not configure trusted app registrations for “agent federation”; the Agent ID is managed by the platform and visible in the Entra ID portal under enterprise applications. Where legacy connectors still rely on stored secrets, rotate them through Azure Key Vault.

Learn more. Entra Agent ID for Copilot Studio agents.

10.3 Block high-risk connectors

Why. A handful of connectors account for most of the risk in agent tooling. Blocking them by default forces every exception to be justified.

How. Common blocks for regulated tenants:

HTTP, HTTP with Microsoft Entra ID (or endpoint-filter them to an allow-list).
Chat without Microsoft Entra ID authentication in Copilot Studio.
Skills (Bot Framework skills as tools).
Event triggers (autonomous agents).
Custom connectors unless catalog-approved.

Learn more. Configure data policies for agents.

10.4 Use action control for partial approval

Why. Some connectors are too useful to block entirely but too dangerous to allow in full. Action control gives you a middle ground.

How. Allow read actions and block write actions on high-risk connectors — for example, allow Get rows on SQL Server but block Execute a SQL query.

Learn more. Connector action control.

10.5 A scripted policy baseline

Why. A baseline that lives only in the UI is not portable across tenants and not testable in code review. PowerShell-defined baselines fix both problems.

How. A starter snippet:

$policy = New-DlpPolicy -DisplayName "Agent baseline"
Add-ConnectorToBusinessDataGroup -PolicyName $policy.Name -ConnectorName shared_sharepointonline
Add-ConnectorToBlockedGroup -PolicyName $policy.Name -ConnectorName shared_logicflows_http
Set-DlpPolicyDefaultConnectorGroup -PolicyName $policy.Name -DefaultGroup Blocked

Learn more. DLP PowerShell cmdlets.

10.6 Govern tools, prompts, and MCP

Why. Agents do more than call connectors. Prompt content and external tools are governance surfaces in their own right; ignoring them leaves a hole the connector model alone cannot close.

How. Govern each surface:

Generative orchestration with built-in prompts — control via the AI prompts and Generative AI settings rules.
Custom prompts (prompt columns, AI Builder prompts) — audit through Dataverse.
MCP servers as external tools — treat each server as a connector and inventory via the Copilot Studio admin center.

Learn more. Agents and MCP.

10.7 Constrain knowledge sources upstream

Why. Knowledge sources are the easiest way for an agent to reach data the maker should not be able to reach. Upstream labels and authentication requirements close that gap.

How. Apply layered controls:

Require authenticated SharePoint sources; block public websites in regulated environments.
Apply Microsoft Purview sensitivity labels upstream; agents honor label-based access for SharePoint and OneDrive content.
Use the Accessing transcripts from conversations in Copilot Studio agents rule to block transcript retention where conversations could contain sensitive prompts.

Learn more. Purview sensitivity labels for Copilot, Endpoint filtering.

11. Application Lifecycle Management and Pipelines

Application Lifecycle Management (ALM) for Power Platform is built on solutions. A solution is a versioned container for apps, flows, agents, tables, roles, and environment variables. Movement between environments always goes through solution export and import — manually, via Power Platform Pipelines, or via Azure DevOps or GitHub.

11.1 Treat solutions as the unit of deployment

Why. Solutions give you a versioned, traceable artifact. The Default Solution does not; deploying out of it is how teams lose change history.

How. Use one solution per deployable unit and avoid the Default Solution.

Learn more. ALM basics.

11.2 Make flows solution-aware

Why. Non-solution-aware flows do not move cleanly between environments. Teams that skip this setting end up rebuilding flows in production by hand.

How. Turn on Default to creating new cloud flows as solution-aware so every new flow lands in a solution by default.

Learn more. ALM basics.

11.3 Use environment variables and connection references

Why. Hard-coded connection strings and configuration values break promotion. Variables and references decouple configuration from solution content.

How. Use environment variables for configuration and connection references for flows and agents so connections can be rebound per environment without editing solution content.

Learn more. ALM basics.

11.4 Version solutions semantically

Why. Semantic versioning makes it possible to reason about what changed between releases. Without it, “version 7” tells you nothing.

How. Use semantic versioning (1.4.2) and tag the source control commit that produced each version.

Learn more. ALM basics.

11.5 Adopt Power Platform Pipelines for native deployment

Why. Pipelines provide a deployment surface that runs inside the Power Platform itself — no external CI/CD required. For most low-code teams, they are the right starting point, and as of 2024 Microsoft has positioned them as the strategic replacement for the now-deprecated ALM Accelerator for Power Platform (covered in 11.6).

How. Pipelines require a host environment that is a Managed Environment. Source and target environments are registered in the host. A deployment imports a solution, rebinds connection references, and runs solution checker. Approvals can be attached to deployment stages.

Learn more. Power Platform Pipelines.

11.6 Set up the pipelines host (and migrate off the ALM Accelerator)

Why. Two pieces of status to keep straight when standing up native ALM:

The ALM Accelerator for Power Platform is formally deprecated (Microsoft Learn page last updated 2024-04-24). Microsoft’s overview page is now titled “ALM Accelerator for Power Platform (Deprecated)” and carries this notice: “The ALM Accelerator is deprecated and will be removed in a future release. Use Pipelines in Power Platform to bring ALM automation capabilities to Power Platform and Dynamics 365 services. Pipelines can be used with source code integration or extended to integrate with other providers.” The accelerator was a canvas-app-plus-Azure-Pipelines reference implementation; the strategic replacement is the native Power Platform Pipelines experience described in 11.5. The accelerator continues to function for now, but no new investment is going into it and it is on track to be removed.
The pipelines host solution is not the CoE Starter Kit. They are different solutions with different purposes; conflating them leads to stalled rollouts. (The CoE Starter Kit itself is also no longer actively maintained — see part 1, section 8.6 — but that is a separate transition from the ALM Accelerator one.)

How. Install the Power Platform Pipelines managed solution in the host environment (sometimes referenced as the Deployment Pipeline Configuration app). Register environments in the Deployment Environments table. If your tenant currently runs on the ALM Accelerator canvas app and Azure DevOps templates, treat this as the trigger to plan a migration to Pipelines — start by mapping each accelerator pipeline to a Pipelines stage, then route new solutions through Pipelines and freeze the accelerator pipelines for net-new work. Source-control integration is on the Pipelines roadmap; if you need it today, fall back to Build Tools or GitHub Actions (11.10) rather than to the accelerator.

Learn more. Set up Pipelines, ALM Accelerator for Power Platform (Deprecated).

11.7 Run pipelines as service principals

Why. Pipelines should not depend on a human account that might leave the company. Service principals carry their own lifecycle and credentials.

How. Assign a service principal with System Administrator in source and target environments. Use a delegated service account or managed identity where supported to avoid storing client secrets.

Learn more. Application users.

11.8 Promote solutions with the PAC CLI

Why. Manual export and import via the UI is fine for one-off moves but not for repeatable deployments. The CLI is what survives review, automation, and offboarding.

How. A typical export/import:

pac auth create --url https://dev.crm.dynamics.com
pac solution export --name HRSolution --path ./HRSolution.zip --managed
pac auth create --url https://test.crm.dynamics.com
pac solution import --path ./HRSolution.zip --force-overwrite

Learn more. pac pipeline reference.

11.9 Trigger a pipeline deployment from the CLI

Why. Pipeline runs should be drivable from automation. The CLI provides exactly the surface needed.

How. Authenticate against the source environment first, then run:

pac pipeline deploy --solutionName HRSolution `
                    --currentVersion 1.4.1 `
                    --newVersion 1.4.2 `
                    --stageId <stageId>

The active pac auth profile selects the pipeline; deployment notes are not a CLI argument and are added through the admin UI when needed.

Learn more. pac pipeline reference.

11.10 Use Azure DevOps or GitHub when policy requires it

Why. Native pipelines are the right tool for most low-code teams, but they do not cover every release scenario — external approvals, multi-stack coordination, artifact signing.

How. Power Platform Build Tools (Azure DevOps) and Power Platform Actions (GitHub) expose the same CLI commands as pipeline tasks. Use them when the release policy requires capabilities the native pipelines do not have.

Learn more. Power Platform Build Tools, GitHub Actions for Power Platform.

11.11 Deploy managed, not unmanaged, to higher environments

Why. Unmanaged solutions in production allow layered customization that erodes traceability. Managed solutions force changes back through ALM.

How. Deploy managed solutions to test and production. Use the environment group rule Unmanaged customizations to block unmanaged changes in production groups.

Learn more. ALM basics.

12. Solution Checker and Quality Gates

Solution checker is a static-analysis engine that inspects solution contents against a Microsoft-maintained rule set. It runs on canvas apps, model-driven apps, Power Fx, Dataverse customizations, plug-ins, web resources, cloud flows, and Copilot Studio agents.

12.1 Run on demand for fast feedback

Why. Makers and reviewers want feedback before they commit, not after the fact. On-demand runs close that gap.

How. Run from the Power Platform admin center, from make.powerapps.com, or with pac solution check.

Learn more. Solution checker.

12.2 Integrate with pipelines

Why. Manual runs depend on discipline. Pipeline-integrated runs depend on configuration, which is more reliable.

How. Solution checker runs automatically before deployment in Power Platform Pipelines. No additional setup is needed beyond registering the pipeline.

Learn more. Power Platform Pipelines.

12.3 Enforce at solution import

Why. A warning that does not block is a warning that gets ignored. Enforcement turns solution checker into an actual gate.

How. In Managed Environments, set Solution checker enforcement to Warn or Block for solution import. At Block, an import containing a rule violation at or above the threshold fails with the list of offending components.

Learn more. Solution checker enforcement.

12.4 Understand the rule categories

Why. Knowing what solution checker looks for helps you read the output and prioritize fixes.

How. The rule categories are:

Security — use of Execute Fetch with user-supplied input, open redirects in web resources, plug-in secrets in traces.
Performance — non-delegable queries in canvas apps, missing indexes in Dataverse, synchronous plug-ins on create/update of high-volume tables.
Maintainability — missing connection references, hard-coded GUIDs, deprecated API calls.
Reliability — missing error handling in flows, unbounded loops, overlapping workflow triggers.

Learn more. Rule reference.

12.5 Run from the CLI and consume SARIF

Why. Output that can be parsed by the rest of your toolchain is output that gets used. SARIF is the lingua franca of code scanners.

How. Run with the CLI and consume the SARIF output:

pac solution check --path ./HRSolution.zip `
                   --geo europe `
                   --ruleSetName "Solution Checker"

The CLI returns a SARIF file that can be uploaded to GitHub code scanning or parsed in Azure DevOps. Severity is one of High, Medium, Low, Informational.

Learn more. pac solution check.

12.6 Configure enforcement thresholds explicitly

Why. Default thresholds may not match your risk tolerance. Configuration makes the threshold explicit instead of implicit.

How. In the admin center, Edit Managed Environments > Solution checker:

Enforcement: Block
Severity threshold: High
Exclusion rules: (solution-specific overrides)

Learn more. Solution checker enforcement.

12.7 Use agent-specific checks

Why. Agents have failure modes apps and flows do not — missing fallbacks, public knowledge, blocked-channel publish. Solution checker covers them.

How. Solution checker validates Copilot Studio agents for unused topics, missing fallback, knowledge sources without authentication, publish to blocked channels, and references to DLP-blocked connectors.

Learn more. Solution checker.

12.8 Add agent evaluations for behavioral quality

Why. Solution checker validates structure, not behavior. Behavioral quality — does the agent give the right answer to the right prompt — needs a separate gate.

How. Copilot Studio includes agent evaluations : define test sets of prompts and expected behaviors, then run them before publish. Evaluations are complementary to solution checker, not a replacement.

Learn more. Agent evaluations.

13. Licensing and Capacity Governance

Licensing is a governance concern because runtime usage of Power Platform and Copilot Studio is gated by assigned licenses and consumed capacity. Capacity misconfiguration is a leading cause of outages.

13.1 Pick the right Power Apps license model

Why. Per-app and per-user pricing have different break-even points. Choosing the wrong one for your usage pattern overpays for licenses or under-provisions makers.

How. Power Apps ships per user or per app. Per-app covers one or two apps per user; per-user covers unlimited apps. Pick based on the number of apps the average user consumes.

Learn more. Power Platform licensing overview.

13.2 Match Power Automate plans to flow type

Why. Per-user, per-flow, and process plans suit different scenarios. Buying the wrong plan creates a cliff: either you over-license low-volume flows or run out of capacity on high-volume ones.

How. Power Automate ships per user (user plan) or per flow (hosted plan). Process plans allocate capacity to a single flow, typically for RPA workloads.

Learn more. Power Platform licensing overview.

13.3 Allocate Dataverse capacity per environment

Why. Without explicit allocation, environments draw from the tenant pool until exhausted. When the pool is gone, every over-pool environment goes read-only at once.

How. Dataverse capacity (database, file, log) accrues per tenant and is allocated per environment. Assign capacity explicitly under Resources > Capacity > Environments.

Learn more. Capacity in admin center.

13.4 Plan Copilot Studio consumption

Why. Copilot Studio consumption is metered in credits, and there is no hard tenant-wide cap. Budget alerts, not blocks, are the primary guard against runaway spend.

How. Copilot Studio consumes Copilot Credits per message, sold as prepaid capacity or pay-as-you-go (PAYG). Microsoft 365 Copilot user licenses include a separate allowance. Each agent message consumes credits based on type (generative answers, actions, autonomous events). Constraints to plan for:

PAYG requires the Dataverse Azure subscription meter configured at the environment level; without it, PAYG is inactive.
No hard tenant-wide consumption cap exists today.
Consumption reporting is in Resources > Capacity > Copilot Studio.

Monitor trends weekly and set Azure budget alerts on the PAYG meter subscription.

Learn more. Copilot Studio licensing, Pay-as-you-go setup.

13.5 License Power Pages by site capacity

Why. Power Pages licensing is per-site, not per-user, which is easy to miss when projecting cost.

How. Power Pages is licensed by authenticated or anonymous user capacity per site. Plan capacity per site, not per identity.

Learn more. Power Platform licensing overview.

13.6 Use group-based licensing and audit monthly

Why? Direct license assignment leaves orphaned licenses behind when users leave. Group-based licensing keeps assignment in sync with employment.

How. Use Microsoft Entra ID group-based licensing. Audit monthly with PowerShell:

Connect-MgGraph -Scopes "Directory.Read.All"
Get-MgUser -All -Property AssignedLicenses,UserPrincipalName |
  Where-Object { $_.AssignedLicenses.SkuId -contains "<Power Apps Per User SKU>" } |
  Select-Object UserPrincipalName

Learn more. Power Platform licensing overview.

13.7 Account for Developer Plan environments

Why. Developer environments are free per user but count against tenant environment quotas. Without a plan, they accumulate quietly.

How. Power Apps Developer Plan provides a free, single-user environment for learning. Use default environment routing to manage developer environments centrally.

Learn more. Developer Plan.

14. Change Management and Release Rings

Power Platform receives continuous updates from Microsoft and from in-tenant makers. A change management practice that controls both streams keeps environments predictable.

14.1 Choose a release channel per environment group

Why. Microsoft pushes updates continuously. Channels let you control whether an environment gets updates as soon as they ship or after a slower validation period — a critical lever for production.

How. Power Platform environments support two channels:

Auto (default) — updates arrive as they are released to ring-by-ring deployment. Recommended for production environments where stability is achieved through Microsoft’s own gradual rollout.
Monthly — fixed monthly cadence; let’s pilot environments to preview the next set of updates before they reach Auto.

Set the channel with the environment group rule Release channel. Production groups typically stay on Auto; pilot groups use Monthly for early validation. (There is no “Semi-Annual” channel for Power Platform — that term applies to Microsoft 365 Apps update channels, not Power Platform environments.)

Learn more. Release channels.

14.2 Track Microsoft release waves

Why. Release waves announce major capability changes twice a year. They are how you anticipate behavior changes that channels alone cannot smooth over.

How. Track release waves through the Microsoft Dynamics 365 release plans site and surface relevant items in your CoE communications.

Learn more. Release waves.

14.3 Gate previews and experimental features

Why. Preview features can disappear, change behavior, or carry undocumented limits. They do not belong in regulated environments.

How. Use the Generative AI settings and Preview and experimental AI models rules at the environment group level to gate preview capabilities. Keep previews off in regulated groups; enable in a pilot group with representative data.

Learn more. Rules for environment groups.

14.4 Define in-tenant change categories

Why? Treating every deployment the same way over-controls routine changes and under-controls risky ones. Categories let you match scrutiny to risk.

How. Apply the same change categories used in traditional IT:

Standard (pre-approved) — patch increments of an existing solution, deployed via pipelines.
Normal — new apps, flows, or agents; requires CoE or architecture review.
Emergency — hotfixes; post-implementation review required.

Each category maps to a pipeline approval configuration. Approvals are implemented as Dataverse records in the pipelines host with Azure AD group-based reviewer lists.

Learn more. Power Platform Pipelines.

14.5 Use administration mode during deployments

Why. Deploying while users are active risks corrupted state and poor user experience. Administration mode blocks user traffic while the change is being made.

How. Enable administration mode with the CLI:

pac admin set-runtime-state --runtime-state AdminMode

Revert with --runtime-state Enabled.

Learn more. Administration mode.

14.6 Take a manual backup before risky changes

Why. Automatic backups follow a schedule; risky changes do not. A manual restore point taken just before a deployment is the cheapest insurance available.

How. Run pac admin backup to create a manual restore point. Set the Backup retention rule to at least 28 days for production groups.

Learn more. Manual backups.

14.7 Communicate changes through known channels

Why. Surprises drive support volume. The cheapest way to reduce surprise is to communicate changes through channels the audience already reads.

How. Layer three channels:

The Message Center in the Microsoft 365 admin center surfaces Power Platform MC alerts. Subscribe Power Platform admins and CoE members.
Use the Maker welcome content rule to publish in-product notices for upcoming behavior changes.
Maintain a tenant changelog in the CoE environment, linking MC IDs, affected environments, and owners.

Learn more. Message Center.

14.8 Plan for rollback

Why. Rollback in Power Platform is limited and asymmetric. Pretending it is straightforward leads to ugly outages when something does need to be reversed.

How. Rollback is constrained to Dataverse restore points and solution re-import of a prior version. Canvas apps retain version history; model-driven apps roll back via solution version. For Copilot Studio, earlier agent versions can be restored from the agent version list.

Learn more. Manual backups, Copilot Studio agent versions.

Conclusion: bringing both parts together

Power Platform governance is not a single product, a single policy, or a single role. Across the two articles in this reference, the cumulative result is shown to be the result of many small, well-placed controls working together. Part 1 covered the platform on which everything stands: an environment topology that matches how the organization actually builds, a default environment treated as a perimeter rather than a playground, Managed Environments and environment groups that make policy enforceable at scale, DLP and tenant isolation that bound where data can move, identity and Conditional Access that decide who is allowed to touch the platform at all, and monitoring that turns activity into evidence. Part 2 covered everything that runs on that platform: Copilot Studio controls that extend the same model to AI agents, agent authentication and connector governance that shape what agents are allowed to do, ALM and pipelines that move solutions through environments traceably, solution checker and agent evaluations that gate quality at import and publish, licensing and capacity governance that prevent runtime surprises, and change management practices that absorb both Microsoft’s rollouts and your own without disrupting users.

The two halves only work together. Part 1’s controls are not useful without something to govern; part 2’s controls do not stick without the foundations underneath them. Sharing limits matter only because Managed Environments make them enforceable. Solution checker enforcement matters only because environmental groups can apply it consistently. Entra Agent ID matters only because Conditional Access gates the identities of those agents present. The two articles are designed to be read as a single document, split for readability, not as independent pieces.

The controls described across both parts already exist in the platform today. None of them requires custom engineering, third-party tooling, or preview access to be useful. What they require is decision-making: which environments belong in which group, which connectors are business versus non-business, which agents must authenticate with Microsoft Entra ID, which release channel each group follows, who approves a production deployment, and where the lines are drawn between regulated and general-purpose work. Those decisions are governance.

Start small and iterate. A minimum viable posture across both articles is: restrict default environment creation, attach a tenant-wide baseline DLP policy, enable default environment routing with a Managed Environments group, turn on tenant-level analytics, lean on the Power Platform admin center’s Inventory , Usage , Monitor , and Actions experiences (the CoE Starter Kit is no longer actively maintained), verify your tenant isolation defaults, require authenticated agents, run solution checker at import, deploy through Power Platform Pipelines as a service principal (the strategic replacement for the now-deprecated ALM Accelerator), and set Auto channel for production with Monthly for a pilot ring. Everything else in these two articles is a refinement on that foundation. Governance that grows with the platform is governance that lasts; governance that is bolted on after an incident rarely recovers the ground it lost.

Read this reference end to end once to map the territory, then return to it section by section as decisions come up. Both articles are designed to be skimmed for vocabulary and read closely when a specific control is at issue.

Power Platform Governance: A Practitioner's Reference (Part 1)

Holger Imbery — Sat, 16 May 2026 07:15:44 +0000

Summary lede. Power Platform and Copilot Studio now let every licensed Microsoft 365 user build apps, automations, and AI agents that touch enterprise data. This first part of a two-article reference walks through the platform-level foundations that everything else rests on — environment strategy, default-environment hardening, Managed Environments, environment groups and rules, DLP, tenant isolation, identity, and monitoring — and shows how they combine into a coherent, enforceable posture. Part 2 next week will cover Copilot Studio governance, agent authentication, ALM and pipelines, solution checker, licensing, and change management.

Who this is for. Power Platform administrators, security or compliance officers responsible for a Microsoft 365 tenant, Center of Excellence leads, architects, designing a low-code or AI-agent platform, and makers who want to understand the guardrails around the tools they build with. The document replaces fragmented documentation with an ordered walkthrough of what actually exists today: CLI snippets, PowerShell examples, and direct links to authoritative Microsoft Learn pages. Nothing here is speculative, so the patterns can be applied to a tenant without waiting for a future release. If you need to defend an environment posture to auditors, onboard a new admin, set priorities for a governance rollout, or decide what "good" looks like for your tenant, this gives you the vocabulary, the examples, and the reasoning in one place.

Why governance matters

Power Platform and Copilot Studio have turned a large part of every Microsoft 365 tenant into a build surface. Employees who never wrote code are now authoring apps, flows, and AI agents that read from SharePoint, write to Dataverse, send email on behalf of users, and call external APIs. That shift changes the risk profile of the tenant in ways that are visible to three constituencies, each with a different stake.

For the business, governance is what keeps low-code and AI investments returning value instead of accumulating liability. Ungoverned environments grow a long tail of half-finished apps, orphaned flows, and agents with unknown owners — each one a support ticket, compliance finding, or incident waiting to happen. Regulators increasingly ask who authored a control, what data it touched, and how changes were approved. A documented governance posture answers those questions without forensic archaeology and lets the business say yes to new use cases because the controls are already in place.

For IT, governance is the mechanism that preserves tenant integrity while the perimeter expands. Data Loss Prevention, tenant isolation, identity controls, and pipelines are not optional add-ons; they are how IT shifts from gatekeeper to platform operator. Without them, the only available stance is blanket denial — which does not reduce risk, because work simply moves to spreadsheets, personal accounts, and shadow SaaS where IT has no visibility at all. With them, IT can set guardrails at the platform level, delegate day-to-day authoring to business units, and focus engineering time on integrations, security, and automation.

For makers, governance is the difference between building with confidence and building in the dark. Clear environment boundaries, known DLP rules, a predictable ALM path, and a Center of Excellence to turn to all reduce the cognitive load of citizen development. Makers who know what is allowed spend their time on business logic, not on working around unclear policy. Maker welcome content, solution checker, and sharing limits exist to shorten the feedback loop so that problems are caught while they are still cheap to fix.

This first article works through the platform-level foundations: environment design, the default environment, Managed Environments, environment groups, DLP, tenant isolation, identity, and monitoring. Each control begins with the reason it exists, moves to the how, and points you to the authoritative Microsoft Learn page. Part 2 will build on this base by covering the resources that live inside those environments — Copilot Studio agents, ALM, licensing, and change management.

A note on the kits.

If you are coming to this article expecting the CoE Starter Kit or the ALM Accelerator to do the governance heavy lifting: that picture has changed. Microsoft has confirmed that the CoE Starter Kit is no longer actively maintained — its core capabilities are now part of the Power Platform admin center — and the ALM Accelerator for Power Platform is formally deprecated, with Power Platform Pipelines as the named replacement. Section 8.6 covers the CoE Starter Kit transition to the admin center; section 3.5 covers the ALM Accelerator → Pipelines transition; the rest of the article reflects that shift.

1. Environment Strategy

An environment in Power Platform is a container for apps, flows, agents, connections, and Dataverse data. Environment design is the foundation of governance because DLP scope, security roles, capacity, and compliance boundaries all follow environment lines. Get the topology right and every later control has somewhere clean to attach. Get it wrong and you spend the rest of the rollout retrofitting policy onto a sprawl.

1.1 Know the environment types

Why. Each type carries different defaults for licensing, data residency, and lifecycle. Choosing the wrong type at provisioning time creates rework that is rarely cheap to undo.

How. Five types are in active use today:

Default — one per tenant, created automatically, open to every licensed user.
Production — business-critical workloads, backed by Dataverse.
Sandbox — non-production, resettable, suitable for UAT and load tests.
Developer — single-maker environments, created manually or via routing.
Trial and Teams — time-limited or chat-scoped; not recommended for production assets.

Learn more. Environments overview.

1.2 Adopt a topology that mirrors your delivery lifecycle

Why. A topology aligned with how your organization actually builds shortens the path from idea to production and gives every governance rule (DLP, sharing limits, release channel) a natural place to land.

How. A typical enterprise pattern separates by lifecycle and purpose:

Default  -> personal productivity only (restricted)
DEV-*    -> per-maker or per-team developer environments
TEST-*   -> solution UAT with production-like data volume
PROD-*   -> published solutions, scoped by region or business unit

Add a dedicated CoE environment for governance tooling (pipelines host and any remaining CoE Starter Kit components — see section 8.6 for the kit's status) and a training environment for enablement.

Learn more. Establishing an environment strategy.

1.3 Restrict who can create environments

Why. Self-service environment creation is the fastest way to lose tenant-wide visibility. Most makers do not need to create environments; the few who do should be deliberately empowered.

How. In tenant settings, configure Who can create production and sandbox environments and Who can create trial environments to limit creation to admins or a named CoE group. For repeatable provisioning, drive creation through the CLI:

pac admin create --name "DEV-Finance-EU" --type Sandbox --region europe

Learn more. Control who can create environments, pac admin reference.

1.4 Pick a region before you provision

Why. Environments are region-locked. Region determines data residency, latency, and which regional services (Application Insights endpoints, Dataverse data centers) integrate cleanly. Moving an environment between regions is not a supported operation.

How. Choose the region from the regional catalog at provisioning time. For regulated workloads, pick the region that satisfies the controlling jurisdiction even if it is not where most of your users sit.

Learn more. Environment regions.

1.5 Plan capacity before you turn on Dataverse

Why. Dataverse-enabled environments draw from a tenant-wide capacity pool. Capacity surprises are a leading cause of unplanned read-only mode in production.

How. Any Dataverse-enabled environment — production or sandbox — requires the tenant-level minimum of 1 GB of database capacity. Capacity is a tenant pool, not a per-environment allocation, so plan total tenant capacity against the number of Dataverse-enabled environments you intend to run.

Learn more. About Dataverse storage capacity.

1.6 Isolate regulated workloads

Why. DLP, security roles, and Dataverse auditing settings are scoped per environment. Mixing regulated and non-regulated workloads in the same environment forces every control to be the strictest common denominator and complicates compliance reporting.

How. Provision separate environments for HR, Finance, and other regulated workloads. Each one becomes a clean target for environment-specific DLP, role assignments, and backup retention.

Learn more. Establishing an environment strategy.

2. Governing the Default Environment

The default environment is the only environment created automatically in every Power Platform tenant. Every licensed user is a member and holds the Environment Maker role, which lets them create apps, flows, connections, and Copilot Studio agents. It is always present, always writable, and cannot be deleted or renamed — which makes it the single most common source of shadow IT on the platform. It needs explicit governance, not benign neglect.

2.1 Understand what makes the default environment different

Why. Treating the default environment like any other production environment leads to surprises. Several behaviors are unique to it and cannot be disabled.

How. Plan around four facts:

It cannot be deleted. It can be renamed and reassigned to a non-default type for display purposes, but Microsoft treats it as the fallback environment for personal productivity.
Every new tenant user inherits Environment Maker. There is no tenant setting that removes this role on the default environment.
Dataverse is optional by default, but once added it cannot be removed.
Connectors default to the same tenant-wide DLP policy as any other non-scoped environment.

Learn more. Default environment overview.

2.2 Rename and tag for clarity

Why. Makers cannot make good decisions about where to build if the default environment looks like any other production candidate. The name is the first signal.

How. Rename the default environment to something like Personal Productivity (Default) so its purpose is unambiguous. Apply a consistent solution-naming convention for any assets that genuinely belong there (such as personal flows).

Learn more. Default environment overview.

2.3 Attach a restrictive DLP policy

Why. The default environment is where ungoverned makers land first. A restrictive DLP policy turns the blast radius from "every connector Microsoft ships" into a known, defensible set.

How. Move risky connectors (HTTP, custom connectors, social media, unauthenticated agent chat) to Blocked. Keep only Microsoft 365, Dataverse, and a short approved list in Business.

Learn more. Data policies overview.

2.4 Keep Dataverse out (or contained) by default

Why. Once Dataverse is added to the default environment, it cannot be removed. Production data living next to ungoverned personal flows is a compliance problem that is hard to walk back.

How. If the default environment still lacks Dataverse, leave it off. Production data belongs in dedicated environments where capacity and backup are managed deliberately.

Learn more. Default environment overview.

2.5 Constrain sharing through Managed Environments

Why. Without sharing limits, a maker can publish an app or agent to the entire tenant with one click. The default environment is the most common origin for that mistake.

How. Convert the default environment to a Managed Environment and apply Limit sharing rules: cap the number of users an asset can be shared with and disable sharing through security groups. Mechanics are covered in section 3.

Learn more. Limit sharing in Managed Environments.

2.6 Enable default environment routing

Why. The cleanest way to keep the default environment empty is to redirect new makers somewhere else automatically. Routing replaces "the default environment is where people end up" with "the default environment is where nobody builds."

How. Turn on default environment routing under Settings > Default environment routing in the Power Platform admin center. Each routed maker receives a personal developer environment; group those environments with Environment groups and apply policies in bulk (sharing limits, maker welcome content, solution checker).

Learn more. Default environment routing, Environment groups.

2.7 Lock down agents and Copilot in the default environment

Why. Copilot Studio agents created in the default environment are immediately discoverable by all licensed users. Without explicit controls, makers can publish unauthenticated agents, call any non-blocked connector, and add knowledge sources the creator already has access to.

How. Apply layered controls: convert the default environment to a Managed Environment, block the Chat without Microsoft Entra ID authentication in Copilot Studio connector in DLP, block the HTTP and HTTP with Microsoft Entra ID connectors, and enable tenant-level Copilot Studio restrictions on who can publish.

Learn more. Data policies for Copilot Studio, Tenant settings reference.

2.8 Turn on monitoring before you need it

Why. Telemetry collected after an incident is worth less than telemetry collected before one. Get visibility on the default environment first, because that is where ungoverned activity tends to land.

How. Enable tenant-level analytics in the Power Platform admin center to capture telemetry. Use the admin center Inventory, Usage, Monitor, and Actions experiences to list apps, flows, and agents by owner — these have replaced the CoE Starter Kit inventory flows (see section 8.6). Review the Managed Environments weekly digest for inactive and top-used assets.

Learn more. Tenant-level analytics.

2.9 A starting DLP stance

The example below is not a policy you should ship verbatim, but it is a defensible starting point for a default environment that has no other governance in place:

Business group:    Microsoft 365 connectors (Outlook, SharePoint, Teams, OneDrive), Dataverse, Approvals
Non-business:      (empty)
Blocked:           HTTP, HTTP with Microsoft Entra ID, custom connectors,
                   Chat without Microsoft Entra ID authentication in Copilot Studio,
                   SMTP, FTP, social media connectors

Learn more. Create a data policy, Control environment creation.

3. Managed Environments

Managed Environments is a premium governance layer that augments a standard environment with administrative controls. Enabling it does not change the environment's region, type, or Dataverse schema; it activates a set of tenant-visible features that admins can enforce. Most of the controls that follow in this document depend on Managed Environments being on.

3.1 Limit sharing

Why. Uncontrolled sharing is one of the highest-impact governance gaps. A single misconfigured share can expose an app or agent to the entire tenant.

How. Cap the number of users an app or agent can be shared with, block sharing through security groups, and block editor sharing. Configure under Edit Managed Environments > Limit sharing. Rules apply on the next share action; existing assignments remain.

Learn more. Limit sharing.

3.2 Receive a weekly digest

Why. Admins do not have time to log into the admin center every day. A scheduled summary keeps inactive and top-used assets in front of them without manual polling.

How. Power Platform and Dynamics 365 admins automatically receive an email summary covering active users, top apps, and inactive apps and flows. Add additional recipients via PowerShell:

$t = Get-TenantSettings
$t.powerPlatform.governance |
  Add-Member -NotePropertyName additionalAdminDigestEmailRecipients `
             -NotePropertyValue 'coe@contoso.com;ops@contoso.com'
Set-TenantSettings -RequestBody $t

Learn more. Usage insights (weekly digest).

3.3 Enforce solution checker

Why. Static analysis catches problems while they are cheap to fix. Without enforcement, solution checker becomes optional and stops being run.

How. Run static analysis on solution import; warn or block. Configure under Edit Managed Environments > Solution checker (mechanics in part 2, section 12).

Learn more. Solution checker enforcement.

3.4 Show maker welcome content

Why. Makers who never see your governance guidance cannot follow it. The first experience inside the maker portals is the cheapest channel for organizational rules.

How. Configure a banner that appears for new makers in make.powerapps.com and copilotstudio.microsoft.com. Use it to point makers at internal documentation, the right environments, and approval processes.

Learn more. Managed Environments overview.

3.5 Host Power Platform Pipelines

Why. Power Platform Pipelines are the native, low-code path for promotion across environments. They require Managed Environments to run, and as of 2024 Microsoft has positioned them as the strategic replacement for the ALM Accelerator for Power Platform, which is formally deprecated and slated for removal in a future release. New ALM rollouts should start here rather than on the accelerator.

How. Designate a Managed Environment as the pipelines host (covered in part 2, section 11). Source and target environments are registered in the host. If your tenant currently runs on the ALM Accelerator canvas app + Azure DevOps templates, treat this as the trigger to plan a migration to Pipelines — the accelerator continues to function for now, but no new investment is going into it.

Learn more. Power Platform Pipelines, ALM Accelerator for Power Platform (Deprecated).

3.6 Restrict Dataverse access by IP

Why. Network-level controls reduce the attack surface even when identity controls are intact. They also satisfy auditors who expect defense in depth.

How. IP firewall restricts Dataverse access to declared IP ranges. IP cookie binding adds session-level enforcement so that a stolen token cannot be replayed from a different network.

Learn more. IP firewall.

3.7 Customer Lockbox and Customer-Managed Key

Why. Regulated tenants often need explicit approval before Microsoft engineers can access tenant data, and they need encryption keys under their own control.

How. Customer Lockbox requires customer approval for Microsoft support access to environment data. Customer-Managed Key brings tenant-controlled keys into Dataverse encryption. Both require premium tenant licensing.

Learn more. Managed Environments overview.

3.8 Extend backup retention

Why. The seven-day default backup window is short for production workloads. An incident discovered on day eight has no restore point.

How. Managed Environments support extended retention windows beyond the seven-day default. Configure retention per environment or, for groups, via the Back-up retention rule.

Learn more. Manual backups.

3.9 DLP for desktop flows

Why. Desktop (RPA) flows can interact with anything on the host machine, which makes connector-level DLP particularly important for them. The control is gated behind Managed Environments.

How. Apply DLP policies to desktop flows the same way you apply them to cloud flows; enforcement happens at design and run time.

Learn more. Data policies overview.

3.10 Use the Catalog for internal distribution

Why. Without a catalog, reusable components are shared informally — by email, by chat, by file share — and traceability collapses. A catalog gives makers an approved internal store.

How. Publish components (templates, connectors, plug-ins) to the Catalog so makers can install them with provenance and version metadata.

Learn more. Managed Environments overview.

3.11 Plan for the licensing implications

Why. Managed Environments changes how runtime licensing works for users, and that change is the most common surprise during rollout.

How. Every user who activates an app, flow, or agent in a Managed Environment must hold a premium Power Platform license (Power Apps per user/per app, Power Automate Premium, or Dynamics 365). Confirm license posture before flipping the switch.

Learn more. Power Platform licensing overview.

3.12 Enabling

Why. The toggle is per environment, so enablement is a deliberate act, not a tenant flag.

How. In the Power Platform admin center, choose the environment, then Edit Managed Environments. From the CLI:

pac admin set-governance-config --environment-id <GUID> --protection-level Standard

Learn more. Enable Managed Environments.

3.13 When to enable

Why. Managed Environments costs licensing and increases admin overhead. Enabling it everywhere is wasteful; enabling it nowhere defeats the purpose.

How. Use Managed Environments for any environment with production workloads, external sharing, or sensitive data. Apply it to personal developer environments created by default environment routing so sharing limits and solution checker apply uniformly.

Learn more. Managed Environments overview.

4. Environment Groups and Rules

Environment groups organize Managed Environments into collections so admins can apply policies in bulk. Each environment belongs to at most one group; groups cannot be nested or overlap; non-managed environments cannot be added.

4.1 Group environments by policy intent

Why. Without grouping, every environment is a one-off configuration. As the platform scales, that becomes unmaintainable. Groups let policy intent travel with environments instead of with people.

How. Build groups around policy needs rather than reporting structure: a regulated production group, a pilot group for previews, a routed personal-productivity group, an admin/CoE group. A typical layout:

Personal Productivity        -> routed per-maker developer envs; sharing disabled
Pilot                        -> experimental AI features on; release channel = Monthly
Production EU                -> AI off for regulated data; release channel = Auto
Production US                -> same policies as EU, separate region
CoE / Admin                  -> governance tooling; locked down

Learn more. Environment groups.

4.2 Apply rules at the group level

Why. When a rule is published at the group level, the corresponding setting becomes read-only inside each member environment. This makes policy enforceable rather than advisory; per-environment overrides are not supported.

How. Use the admin center to publish any of the rules currently available (generally available unless noted):

Sharing agents with Editor permissions
Sharing agents with Viewer permissions
Sharing controls for canvas apps
Sharing controls for solution-aware cloud flows
AI-powered Copilot features
Generative AI settings
External models
AI prompts
Accessing transcripts from conversations in Copilot Studio agents
Sharing data between Copilot Studio and Viva Insights
Maker welcome content
Release channel
Back-up retention
Solution checker enforcement
Unmanaged customizations
Usage insights
Power Apps component framework for canvas apps
Content security policy
Power Apps code apps
Preview and experimental AI models
Default deployment pipeline (preview)
Advanced connector policy (preview)
AI-generated descriptions (preview)

Learn more. Rules for environment groups.

4.3 Manage group membership from the CLI

Why. Group membership changes are routine — onboarding new environments, retiring old ones — and should be scriptable. Rule definitions still live in the admin center; the CLI handles membership.

How. Use the pac admin group commands:

pac admin list-groups
pac admin add-group --environment <env-id> --environment-group <group-id>

Learn more. pac admin reference.

4.4 Plan for group removal

Why. Removing an environment from a group is a half-revert: rule values stay where they were, but the environment becomes editable again locally. Without a manual reconciliation step, the environment can drift in ways that are hard to detect.

How. When you remove an environment from a group, treat the action as the start of a configuration review. Reconcile every formerly group-managed setting against your current standard before declaring the migration done.

Learn more. Environment groups.

4.5 Pair groups with default environment routing

Why. Routed personal developer environments multiply quickly. Without grouping, you end up with hundreds of environments each configured independently.

How. Auto-route new makers into personal environments, then assign those environments to a Personal Productivity group with sharing disabled and solution checker enforced. Policies follow membership automatically.

Learn more. Default environment routing.

5. Data Loss Prevention Policies

Data Loss Prevention (DLP) policies in Power Platform control which connectors can be combined inside apps, flows, and agents. Policies apply to Power Apps, Power Automate cloud flows, desktop flows (in Managed Environments), and Copilot Studio agents.

5.1 Sort connectors into three groups

Why. Data leaks happen when organizational data moves alongside non-organizational connectors in the same resource. The three-group model is how Power Platform prevents that combination at design time.

How. Place each connector into one of three groups:

Business — connectors that can exchange organizational data with each other.
Non-business — connectors that can exchange non-organizational data with each other.
Blocked — connectors that cannot be used at all.

Data cannot flow between Business and Non-business within the same resource.

Learn more. Data policies overview.

5.2 Set the default group for new connectors

Why. Microsoft adds connectors continuously. If new connectors land in a permissive group by default, your DLP posture decays without anyone editing policy.

How. Set the Default group to Blocked. New connectors stay blocked until an admin explicitly classifies them.

Learn more. Create a data policy.

5.3 Choose the right policy scope

Why. A tenant-wide policy is the right baseline for high-risk connectors, but environment-specific policies let you allow connectors where they are needed without opening them everywhere.

How. Target a policy at all environments (with optional exclusions), at specific environments, or only at the default environment. Policies aggregate: a connector blocked by any applicable policy is blocked for that environment.

Learn more. Create a data policy.

5.4 Filter by endpoint

Why. A blanket block on HTTP is often too restrictive. Endpoint filtering keeps the connector available for the URLs you actually need while denying everything else.

How. Allow or block specific URLs on HTTP-capable connectors, SQL Server, Azure Blob, and others. This is a premium-tier capability.

Learn more. Endpoint filtering.

5.5 Restrict individual connector actions

Why. Many connectors expose both safe (read) and dangerous (arbitrary execute) actions. Action control lets you keep the safe ones while blocking the dangerous ones.

How. Allow or block individual actions on a connector — for example, block Execute a SQL query on SQL Server while allowing table reads. Premium tier.

Learn more. Connector action control.

5.6 Allow-list custom connector patterns

Why. Custom connectors are an open door if left ungoverned. Pattern-based allow-listing lets you approve a class of internal endpoints without approving every custom connector individually.

How. Allow-list custom connectors by URL pattern. Pair with a tenant-wide block on unmatched custom connectors.

Learn more. Connector endpoint filtering.

5.7 A high-risk tenant baseline

The example below is a defensible starting point for a tenant where regulated data lives next to general productivity:

Business:      Microsoft 365 connectors (Outlook, Teams, SharePoint, OneDrive),
               Dataverse, Approvals, Planner
Non-business:  Bing Search, Translator, MSN Weather
Blocked:       HTTP, HTTP with Microsoft Entra ID (unless endpoint-filtered),
               custom connectors in default env, Twitter/X, Facebook, SMTP, FTP,
               Chat without Microsoft Entra ID authentication in Copilot Studio
Default group: Blocked

Learn more. Create a data policy.

5.8 Manage DLP from PowerShell

Why. UI configuration does not survive disaster recovery, audit reproduction, or environment cloning. Scripted policy management does.

How. Use the DLP cmdlets to retrieve, create, and modify policies:

Get-DlpPolicy
New-DlpPolicy -DisplayName "Tenant baseline"
# manage connector groups via Set-DlpPolicyDefaultConnectorGroup, Add-ConnectorToBusinessDataGroup, etc.

Learn more. DLP PowerShell cmdlets.

5.9 Understand enforcement timing

Why. Knowing when a policy takes effect is critical during incident response — and during planned changes that you do not want to break in production at noon.

How. Policy changes take effect within minutes for new resources and on the next save or publish for existing ones. Copilot Studio enforces DLP in real time; makers see a validation error immediately when a blocked connector is referenced.

Learn more. Data policies overview.

6. Tenant Isolation and Cross-Tenant Controls

Tenant isolation restricts which external Microsoft Entra ID tenants can be used as the identity source for connections inside your Power Platform tenant. It is a separate control from DLP and from Microsoft Entra B2B. DLP governs which connectors can be used; tenant isolation governs which tenants users can authenticate to when creating a connection.

6.1 Distinguish inbound from outbound

Why. Tenant isolation is directional. Mixing the two leads to either over-blocking (legitimate B2B work breaks) or under-blocking (data exfiltration paths remain).

How. Configure each direction explicitly:

Inbound — another tenant's users creating a connection that points into your tenant.
Outbound — your tenant's users creating a connection that points into another tenant.

Learn more. Tenant isolation.

6.2 Verify your default — it depends on tenant age

Why. The default behavior changed in 2026, and assumptions from older guidance no longer apply universally.

How. Tenants created before 30 March 2026 have both directions allowed by default; tenant isolation must be enabled explicitly. Tenants created on or after 30 March 2026 ship with tenant isolation enabled and both directions defaulting to deny — only allow-listed tenant IDs can be used. Verify the current state in your tenant rather than assuming a default.

Learn more. Restrict cross-tenant inbound/outbound.

6.3 Configure tenant isolation deliberately

Why. The control is tenant-wide; there is no per-environment override. A misconfiguration affects the entire tenant.

How. In the Power Platform admin center, under Security > Tenant isolation, set:

Tenant isolation: On
Default:          Block
Exceptions:       contoso-partner.onmicrosoft.com  (Inbound, Outbound)
                  fabrikam.onmicrosoft.com          (Outbound only)

Learn more. Tenant isolation.

6.4 Know what tenant isolation does not cover

Why. Tenant isolation is sometimes mistaken for a complete cross-tenant boundary. It is not. Pairing it with the right Entra controls is the only way to close the gap.

How. Tenant isolation applies to connectors that authenticate with Microsoft Entra ID (SharePoint, OneDrive, Dataverse, Outlook, Microsoft 365, Azure, etc.). It does not apply to:

Connectors that use API keys or generic OAuth.
Service principal and managed identity authentication, which follow Microsoft Entra cross-tenant access policies.

Pair tenant isolation with Microsoft Entra cross-tenant access settings for a complete boundary.

Learn more. Tenant isolation.

6.5 Layer Conditional Access

Why. Tenant isolation controls where identities can come from; Conditional Access controls under what conditions identity is accepted. Both layers are needed.

How. Apply Conditional Access policies to the Power Platform API application to restrict sign-in by user, device, location, and risk.

Learn more. Conditional Access for Power Platform.

6.6 Layer the IP firewall

Why. Network-level controls reduce the attack surface even when identity controls are intact. They also satisfy auditors who expect defense in depth.

How. The IP firewall (Managed Environments) restricts Dataverse access to declared IP ranges. Configure per environment.

Learn more. IP firewall.

6.7 Enable continuous access evaluation

Why. Static sign-in evaluation cannot react to changes in identity posture mid-session. CAE narrows the window between an account becoming risky and that account losing access.

How. Continuous access evaluation (CAE) on Dataverse revokes sessions when identity posture changes — for example, when a user is disabled or moves outside an allowed network.

Learn more. Conditional Access for Power Platform.

6.8 Monitor configuration changes and runtime denials

Why. Tenant isolation is most useful when its events are visible. Two surfaces matter: configuration changes (someone modifying the policy) and runtime denials (someone hitting the policy).

How. Tenant isolation events flow to the Power Platform audit log and the Microsoft Purview unified audit log. Configuration changes appear under the TenantIsolationPolicyUpdate activity; runtime denials surface as connection-creation failures rather than under a single unified filter — review both when investigating.

Learn more. Audit logs.

7. Security Roles and Identity

Power Platform authorization is layered. Microsoft Entra ID provides tenant-level identity and global roles; Dataverse provides row- and column-level security inside each environment; connectors use per-user or service-principal credentials.

7.1 Use the right tenant role

Why. Tenant-scoped governance work does not need Global Administrator. Over-privileged accounts increase blast radius without buying capability.

How. Pick from the Microsoft Entra ID roles that govern Power Platform:

Power Platform Administrator — full control of environments, DLP, tenant settings, licenses (within Power Platform scope). Sufficient for almost all governance work.
Dynamics 365 Administrator — equivalent scope for Dynamics 365 and Dataverse-backed environments.
Global Administrator — superset; not recommended for day-to-day platform work.
Fabric Administrator — required for Power BI workspace and capacity governance that overlaps Power Platform.

Assign least-privileged roles.

Learn more. Microsoft Entra roles for Power Platform.

7.2 Map environment roles to job function

Why. Environment roles control everything inside an environment, including who can hand out further roles. A misassignment here is hard to roll back without auditing.

How. Inside each environment, the standard roles are:

System Administrator — root of the environment; can assign any role.
System Customizer — schema and solution changes without user management.
Environment Maker — create apps, flows, agents, and custom connectors.
Basic User — runtime access to assigned records.
Delegated Admin — scoped admin, granted via Microsoft Entra ID groups.

Learn more. Dataverse security roles.

7.3 Build custom roles from a known baseline

Why. Privileges in Dataverse are granular; granting * privileges in a custom role is the fastest way to recreate the problems custom roles were supposed to solve.

How. Always start from a copy of an out-of-box role and remove or add the privileges you need at table, action, and column level. Avoid wildcard privileges.

Learn more. Dataverse security roles.

7.4 Prefer teams over individual sharing

Why. Individual record sharing is auditable in theory and unmaintainable in practice. Teams turn membership into access.

How. Use Owner teams or Access teams for direct ownership and sharing. For large memberships, use Microsoft Entra group teams so group membership drives row access automatically.

Learn more. Microsoft Entra group teams.

7.5 Use service principals for non-interactive work

Why. Pipelines, integrations, and admin automation should not depend on human accounts. Service principals carry their own lifecycle and credentials, which is what audit and compliance expect.

How. Register the app in Microsoft Entra ID, then assign an application user:

pac admin create-service-principal `
  --environment <env-id> `
  --name "Pipelines SPN" `
  --role "System Administrator"

Rotate secrets and prefer federated credentials or managed identities where supported.

Learn more. Application users.

7.6 Apply Conditional Access and MFA

Why. Identity is the perimeter. Conditional Access and MFA are how that perimeter actually behaves under load — risky logins, anomalous locations, compromised credentials.

How. Apply Conditional Access to the Power Platform API app and to Microsoft Dataverse. Enforce MFA for administrators and for makers who connect to sensitive data sources.

Learn more. Conditional Access.

8. Monitoring, Analytics, and Tenant Inventory

Governance without telemetry degrades quickly. Power Platform exposes several monitoring surfaces; the Inventory, Usage, Monitor, and Actions experiences in the Power Platform admin center wrap them into a tenant-wide inventory and alerting layer that Microsoft maintains as a first-party feature. The community CoE Starter Kit filled this role for years but is no longer actively maintained — section 8.6 covers what that transition means in practice.

8.1 Use admin center analytics for per-environment visibility

Why. Most operational questions ("is anyone using this app?", "are flow runs failing?") have answers in the admin center. Reaching for the CoE Starter Kit before checking the built-in analytics wastes time.

How. Per-environment charts cover Power Apps usage, Power Automate runs, Dataverse API calls, and Copilot Studio sessions. Tenant-level analytics must be enabled in tenant settings before charts populate.

Learn more. Tenant-level analytics.

8.2 Push the weekly digest to a wider audience

Why. The digest is a high-signal, low-noise summary. Only Power Platform and Dynamics 365 admins receive it by default, which is usually too narrow.

How. Add CoE leads, security reviewers, and operations partners using additionalAdminDigestEmailRecipients (mechanics in section 3.2).

Learn more. Usage insights (weekly digest).

8.3 Stream audit events to Microsoft Purview

Why. Platform-level actions — DLP changes, environment creations, sharing events — must be reviewable months after they happen. Purview is the long-term store.

How. Power Platform activities flow natively into the Microsoft Purview unified audit log. Search them in the Purview portal, with the Search-UnifiedAuditLog Exchange Online PowerShell cmdlet, or via the Microsoft 365 Audit API.

Learn more. Audit logs and Purview, Search the audit log.

8.4 Turn on Dataverse auditing per table

Why. Per-table read and change auditing is the only way to answer record-level forensic questions ("who saw this row?", "what changed and when?"). Without it, the trail stops at the API layer.

How. Enable auditing on tables that hold sensitive data. Retention follows the environment's audit retention setting; review it before assuming you have history.

Learn more. Audit logs and Purview.

8.5 Export to Application Insights

Why. Built-in analytics is summary-level. Application Insights gives you raw traces for performance and error analysis — the same signal an Azure team would expect.

How. In an environment's settings under Product > Application Insights (preview), add the connection string:

InstrumentationKey=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx;
IngestionEndpoint=https://westeurope-1.in.applicationinsights.azure.com/

Power Apps emit pageView, traceEvent, and error events; model-driven apps add form performance metrics.

Learn more. Application Insights for Power Platform.

8.6 The CoE Starter Kit and ALM Accelerator — what changed

Why. Until recently, the CoE Starter Kit and the ALM Accelerator for Power Platform were the two community-driven kits most enterprise governance setups leaned on — one for inventory and oversight, the other for solution promotion and source control. Native, in-product experiences now supersede both, and following the old guidance without qualification leads new tenants into a maintenance dead end.

How. Two formal status updates to be aware of:

CoE Starter Kit — no longer actively maintained (Microsoft, May 2026). The Microsoft Learn page now opens with a clear notice: "The Power Platform CoE Starter Kit is no longer actively maintained. Its core capabilities are part of the Power Platform admin center. Issues are no longer reviewed or addressed." The kit remains available for existing and new deployments, but it will not receive new features, and GitHub issues are not triaged. Microsoft directs admins to the native admin center experiences instead:
- Inventory — view and govern all apps, flows, and agents across the tenant.
- Usage — track adoption and identify top resources and their owners.
- Monitor — operational health of heavily used resources.
- Actions — surface risks, enforce best practices, and act on governance insights.

Beyond the UI, the same data is reachable through the Power Platform CLI, the Power Platform API, the Power Platform inventory API, and the Power Platform for Admins V2 connector — which is what to build automation against going forward.

ALM Accelerator for Power Platform — formally deprecated (Microsoft Learn page last updated 2024-04-24). Microsoft's overview page is now titled "ALM Accelerator for Power Platform (Deprecated)" and carries this notice: "The ALM Accelerator is deprecated and will be removed in a future release. Use Pipelines in Power Platform to bring ALM automation capabilities to Power Platform and Dynamics 365 services. Pipelines can be used with source code integration or extended to integrate with other providers." The accelerator was a canvas-app-plus-Azure-Pipelines reference implementation; the strategic replacement is the in-product Power Platform Pipelines experience (introduced in section 3.5 and detailed in part 2). The accelerator continues to function for now, but no new investment is being made in it, and it is on track to be removed.

Practical recommendation. If you already run either kit, keep operating it — neither is being switched off tomorrow — but stop building new dependencies on top of it. For inventory, attestation, and orphaned-asset detection, build against the admin center experiences and the inventory API. For ALM, route new solutions through Power Platform Pipelines and plan a migration off the ALM Accelerator to avoid being caught out when a removal date is announced. Tenants starting from scratch in 2026 should skip both kits and build directly on the in-product surfaces.

Learn more. CoE Starter Kit transition to Power Platform admin center, ALM Accelerator for Power Platform (Deprecated), Power Platform Pipelines, Power Platform inventory API.

8.7 Forward to Microsoft Sentinel

Why? SOCs want one queue. Forwarding Power Platform signals to Sentinel places platform incidents alongside other security telemetry, rather than in a separate, unmonitored location.

How. Forward the Power Platform activity log and Purview audit events to Sentinel via the Microsoft Power Platform administrative logs data connector. Pre-built analytic rules cover suspicious DLP changes, mass app sharing, and unauthenticated agent publishing.

Learn more. Sentinel connector.

Outlook: what part 2 covers

Part 1 has established the platform-level perimeter — the environments themselves, the policies that bind them, the identities that reach them, and the telemetry that watches them. None of those controls answers the question of what gets built inside that perimeter, or how that work moves from a maker's hands into production. That is the territory of part 2.

Next week's article picks up where this one ends and walks through six topics in the same Why → How → Learn more pattern:

Governing Microsoft Copilot Studio. Tenant- and environment-level controls for AI agents, including who can publish, what AI features are allowed where, how knowledge sources are scoped, and how transcripts are retained.
Agent authentication and connector governance. The three authentication modes, the Entra Agent ID model that replaces stored client secrets, the connectors that should be blocked in regulated tenants, and the action-level controls that turn dangerous connectors into safe-but-useful ones.
Application Lifecycle Management and pipelines. Solutions as the unit of deployment, environment variables and connection references, Power Platform Pipelines as the native promotion path (and the strategic replacement for the now-deprecated ALM Accelerator for Power Platform), and when to fall back to Azure DevOps or GitHub.
Solution checker and quality gates. Static analysis for apps, flows, and agents; how to enforce it at solution import; how to read SARIF output; and how agent evaluations complement structural checks with behavioral ones.
Licensing and capacity governance. Power Apps and Power Automate license models, Dataverse capacity allocation, the Copilot Credits consumption model and its gotchas, and how to keep license assignment in sync with employment.
Change management and release rings. The two release channels that actually exist for Power Platform (Auto and Monthly — there is no Semi-Annual), how to gate previews per environment group, in-tenant change categories, administration mode, manual backups, and rollback realities.

If you treat part 1 as the platform on which everything stands, part 2 is everything that runs on that platform. The two are designed to be read together: the controls in part 1 matter only because they shape the work covered in part 2, and the controls in part 2 stick only because the foundations in part 1 hold them up.

Conclusion (Part 1)

Platform-level governance is the foundation of everything else. It is the cumulative result of many small, well-placed controls working together: an environment topology that matches how the organization actually builds, a default environment treated as a perimeter rather than a playground, Managed Environments and environment groups that make policy enforceable at scale, DLP and tenant isolation that bound where data can move, identity and Conditional Access that decide who is allowed to touch the platform at all, and monitoring that turns activity into evidence.

The controls described in this article already exist on the platform. None of them requires custom engineering, third-party tooling, or preview access to be useful. What they require is decision-making: which environments belong in which group, which connectors are business versus non-business, which tenants are trusted for cross-tenant connections, and who has the right to administer what. Those decisions are the platform half of governance.

Start small and iterate. A minimum viable foundation is: restrict default environment creation, attach a tenant-wide baseline DLP policy, enable default environment routing with a Managed Environments group, turn on tenant-level analytics, lean on the Power Platform admin center's Inventory, Usage, Monitor, and Actions experiences (the CoE Starter Kit is no longer actively maintained — see section 8.6), and verify your tenant isolation defaults. Everything in part 1 is a refinement on that minimum viable foundation; everything in part 2 will be a refinement on top of part 1.

Read on next week for the second half — agents, ALM, licensing, and change management — and a closing recap that ties both articles together.

Microsoft Agent 365: The Control Plane for AI Agents Is Now Generally Available

Holger Imbery — Sat, 09 May 2026 00:00:00 +0000

Summary Lede

On 1 May 2026, Microsoft moved Agent 365 from preview into general availability , positioning it as the control plane that organizations need in order to observe, govern, and secure the rapidly growing population of AI agents working across their tenants. The service is licensed at fifteen US dollars per user per month as a standalone add-on, or as a component of the new Microsoft 365 E7 suite, and it integrates directly with the security and compliance products that most enterprises already operate, including Microsoft Entra , Microsoft Purview , and Microsoft Defender. Rather than introducing yet another place to build agents, Agent 365 sits above existing platforms and consolidates operational disciplines that have, until now, been distributed across many disconnected tools and dashboards.

Why read this: If your organization has begun deploying AI agents — whether built in Copilot Studio, embedded in third-party SaaS, developed in Microsoft Foundry, or installed by users on their own devices — you are facing a governance question that did not exist eighteen months ago. This article explains what Agent 365 actually is, what it changes in the operating model of an organization that has already invested in Microsoft 365 Copilot, and how to take the first practical steps after acquiring a license. It is intended to bring the moving parts into a single, coherent narrative you can use when you brief stakeholders, plan a pilot, or write your internal adoption proposal.

Introduction: Why a Control Plane for Agents Is Now a Necessity

AI agents are no longer a category of pilot project owned by a single innovation team. They are appearing inside Word, Excel, PowerPoint, Outlook, Teams, partner SaaS portals, custom-built applications, and, increasingly, on the local workstations of individual employees who have installed AI assistants on their own initiative. According to figures Microsoft cites from IDC, the worldwide population of AI agents is expected to reach approximately 1.3 billion by 2028.

Even at a small fraction of that figure, the implications for an enterprise tenant are significant. The questions IT leaders are being asked have outpaced the tools available to answer them: Which agents are running in our tenant today? Who owns each one? What data does each one touch, and on whose authority? Is it behaving as expected? If it has been compromised, when, by whom, and how would we know?

Microsoft now refers to this phenomenon openly as agent sprawl , and a newer category — the so-called shadow agent , meaning a local AI assistant installed on an endpoint without IT approval — has begun to appear on enterprise risk registers because it represents an identity-less, policy-less, telemetry-less actor with access to the same documents and chats as its human user. A control plane is the established architectural answer to this class of problem, and Agent 365 is, very deliberately, that control plane.

What Microsoft Agent 365 Actually Is

Agent 365 is a tenant-level service that gives administrators a single, authoritative view of every AI agent operating in the organization, together with the policy, identity, and security mechanisms required to manage them at scale. Microsoft summarizes its purpose using three verbs that recur throughout the official documentation: observe, govern, and secure.

To observe means that administrators have real-time visibility into the agents in their environment, what those agents are doing, and how they are performing.
To govern means that the lifecycle of each agent — from registration, through approval and assignment, to retirement — is controlled through consistent policies rather than ad-hoc decisions.
To secure means that the same enterprise-grade identity, data protection, and threat detection that already apply to users are extended, faithfully, to agents.

The service is built around five capabilities that Microsoft consistently identifies as the pillars of the offering:

Pillar	Role in the control plane
Registry	A unified inventory of every agent in the organization — including those issued an Entra Agent ID, those published in the Microsoft Teams or Agent Store, and shadow agents discovered on endpoints
Access Control	Unique Entra Agent ID for every agent, with policy templates and adaptive, risk-based access decisions enforced by Microsoft Entra
Visualization	Telemetry, dashboards, role-based reports for IT, security, and business audiences, plus an Agent Map of relationships between agents, users, and resources
Interoperability	Governs agents built with Microsoft tools, with open-source frameworks, and with ecosystem partner platforms; agents can use the same Microsoft 365 context that users do
Security	Integration with Microsoft Defender , Microsoft Entra , and Microsoft Purview for threat detection, identity protection, data loss prevention, and compliance

It is also worth being precise about what Agent 365 is not. It is not a new platform for building agents. Copilot Studio, Microsoft Foundry, and external frameworks remain the places where agents are designed, configured, and developed. Agent 365 is the layer above those platforms — the layer that ensures whatever you build, or whatever a partner ships into your tenant, becomes a managed, identified, observed, and governable entity within your organization.

The Five Pillars in Greater Detail

Registry

The Registry is, in many respects, the foundational capability, because every other discipline depends on having an authoritative inventory. Through the Registry, administrators see a unified list of agents that includes those issued an Entra Agent ID, those published in the Microsoft Teams Store or the Agent Store, and — as discovery functionality continues to roll out — shadow agents detected on managed Windows endpoints by Microsoft Defender and Microsoft Intune. From the Registry, administrators can install agents for selected audiences, block or unblock them across the organization, assign or reassign owners, publish requested agents to the store, reject submissions, or delete agents and their associated files.

Access Control through Entra Agent ID

Each agent is given a unique identity, the Entra Agent ID , which enables agents to be governed in the same identity-centric manner as human users. Administrators can apply policy templates that encode standard guardrails on day one, and Microsoft Entra enforces adaptive, risk-based access decisions that respond to real-time context, blocking agents that show signs of compromise from reaching organizational resources. The principle of least privilege, long established for human accounts and service principals, is now available in operation for agents.

Visualization

The visualization layer goes beyond conventional dashboards. In addition to Telemetry, alerts, and role-based reports tailored separately to IT, security, and business audiences, Agent 365 provides an Agent Map that displays the relationships between agents, the users on whose behalf they act, and the resources they connect to. This is particularly valuable for spotting unintended data flows or excessive privilege. Microsoft also references built-in performance measurement to help decision-makers assess return on investment. However, the granularity of those metrics will, in practice, depend on the specific agents and host applications involved.

Interoperability

The interoperability story is significant because it acknowledges that agents in a real enterprise are not all built on the same platform. Agent 365 governs agents created with Microsoft tools, with open-source frameworks, and with ecosystem partner platforms , with pre-integrated partner agents available to deploy directly from the Microsoft 365 admin center at general availability. Agents can access the same Microsoft 365 context that users do — Teams, calendars, mailboxes, SharePoint — and Microsoft has highlighted unified SDKs and consistent Model Context Protocol (MCP) interfaces for developers building agentic tools across Outlook, Teams, and SharePoint.

Security

The security pillar is delivered through tight integration with the existing Microsoft security stack rather than through a parallel set of agent-specific tools.

Microsoft Purview brings information protection, data loss prevention, sensitivity labels, eDiscovery, Insider Risk Management, and the data security posture management capability for AI (often referred to as DSPM for AI) into scope for agents.
Microsoft Defender XDR contributes agent inventory, real-time runtime protection, threat hunting, and security posture management.
Microsoft Entra contributes adaptive access enforcement and, as announced at RSAC 2026, network-level prompt-injection protection and explicit shadow-AI detection.

The cumulative effect is that agents become first-class citizens of your existing security operations rather than exceptions that need to be handled separately.

What Changes Operationally with Agent 365

For organizations that have been running Microsoft 365 Copilot for some time, the introduction of Agent 365 changes several aspects of day-to-day operations in concrete ways:

Area	Before Agent 365	With Agent 365
Identity	Agents represented by shared service accounts or app registrations	Every agent receives a unique Entra Agent ID and is governed individually
Inventory	Scattered across Teams Store, Copilot Studio, partner portals	Single Agent Registry view in the Microsoft 365 admin center
Access control	Per-agent manual permission grants	Policy-template-driven, adaptive, risk-based enforcement through Microsoft Entra
Observability	Vendor-specific dashboards	Unified telemetry, role-based reports, and an Agent Map of relationships
Security and compliance	Reaching into agent behaviour required significant effort	Native coverage through the same Defender , Entra , and Purview controls used today

The new shadow-AI dimension is also worth highlighting. Until now, an unsanctioned local AI assistant on an employee’s laptop has been more or less invisible to IT. With Agent 365, Defender, and Intune, surface those agents in the Registry so administrators can quarantine them, block them from accessing organizational resources, or, where appropriate, formally onboard them under policy. This is one of the more visibly novel capabilities of the GA release.

Licensing and Availability

Microsoft Agent 365 is licensed on a per-user basis :

Standalone price: fifteen US dollars per user per month.
Bundled option: included in the new Microsoft 365 E7 suite, which combines Microsoft 365 E5, Microsoft 365 Copilot, the Microsoft Entra Suite, and Agent 365 into a single offering aimed at enterprises that want to standardize on a fully integrated identity, productivity, and agent-governance platform.
Availability segment: Commercial cloud at general availability. Microsoft’s documentation also notes that Microsoft 365 for Government Community Cloud High and Government Community Cloud Moderate environments support agent publishing scenarios.

There are no strict product prerequisites to enable Agent 365. Still, Microsoft recommends that customers hold Microsoft Entra P1, Entra P2, or the Entra Suite , alongside Microsoft Purview Data Loss Prevention , to make full use of the governance and security benefits. A Microsoft 365 Copilot license remains necessary to use Copilot-based agents. Pricing details should always be validated with your account team, since regional adjustments, channel offers, and bundle economics can materially change the total cost of ownership.

Taking Your First Steps After Acquiring a License

One of the most pragmatic aspects of Agent 365 is that there is no infrastructure to deploy. The service is activated via licensing and configured in the Microsoft 365 admin center, so the initial steps are administrative rather than technical. The following sequence reflects the recommended path from the Microsoft Learn documentation, organized as most teams will execute it in practice.

1. Confirm Roles and Licenses

Begin by ensuring that the right people hold the right roles. Please assign the Global Administrator role and any other agent-administration roles required for your organization, in line with your least-privilege practices. At the same time, confirm that the audiences for which you intend to enable agents have appropriate licenses in place — Microsoft 365 Copilot in particular, but also any Entra and Purview SKUs that you plan to rely on for governance and protection.

2. Open the Agents Area in the Microsoft 365 Admin Center

Sign in to the Microsoft 365 admin center , expand … Show all in the left navigation, select Agents , and open All agents. This view, with Registry selected, is the canonical inventory of every agent known in your tenant. It is worth spending time here before making any operational decisions; many organizations are surprised by what already exists in their environment once it is presented in a single list.

3. Establish a Baseline Policy

Before approving anything broadly, define and apply an Agent Policy Template that captures your organization’s standards. The template determines, among other things, which connectors and data sources agents may use, what sensitivity-label boundaries apply, and which Conditional Access conditions Entra should enforce. Treat this template as a living governance artifact, deliberately version it, and review it with your security and data-protection stakeholders before it becomes the default for new approvals.

4. Activate or Install Your First Agent

To install an agent for users:

Select the agent from the Registry list.
Choose Install in the agent details pane.
Decide whether the deployment scope should be the entire organization or specific users and groups, and select Next.
Review the requested permissions, and select Grant admin consent.
Accept the requested permissions, and select Next.
Select Finish deployment.

The agent will subsequently appear in the relevant host product — Copilot, Teams, Outlook, or another Microsoft 365 surface — for the chosen audience.

5. Process Activation and Approval Requests

End users can request agents from the Microsoft Teams Store or the Agent Store, and those requests surface to administrators for review. Each request can be approved and activated, or rejected with a rationale. From a governance perspective, formally approving or rejecting requests is at least as important as installing approved agents, because it creates the decision trail that auditors and risk reviewers will look for in 12 months.

6. Turn On Observability and Security

The full value of Agent 365 only emerges when its companion services are correctly configured:

In Microsoft Purview , enable the data security posture management capability for AI, configure DLP policies that include agent identities, and bring agent interactions into your retention and eDiscovery scope.
In Microsoft Defender XDR , review the agent inventory, switch on real-time runtime protection for agents, and integrate agent telemetry into your existing threat-hunting workflows.
In Microsoft Entra , define the adaptive access policies that should apply to agent identities, and decide explicitly whether and how external parties may interact with your agents — by default, agents behave as internal identities and external access is constrained by administrative policy.

7. Use the Agent Map to Validate Reality Against Intent

Once a small number of agents are running under the new policies, open the Agent Map to inspect the relationships between agents, users, and data sources. This is the single most effective way to detect unintended privilege, unexpected agent-to-agent connections, or data flows that violate your sensitivity-label model. The Map is intended to make excessive privilege visually obvious, and it should become a regular part of your operational review cadence.

8. Maintain Lifecycle Hygiene

Agent 365 supports a comprehensive set of lifecycle actions through the Microsoft 365 admin center, including install, uninstall, block, unblock, assign a new owner, publish to the store, reject a submission, and delete. Schedule a recurring review — quarterly is a reasonable starting point — to identify orphaned, unused, or stale agents and retire them deliberately.

Practical Guidance for a Successful Rollout

A few principles consistently distinguish successful rollouts from those that struggle:

1. Begin with discovery rather than deployment. The first weeks should be spent observing what already exists in your tenant before approving anything new, because the inventory you uncover will materially shape your governance design.

2. Please assign an Agent Owner to each agent. Agents without accountable human owners tend, over time, to drift outside policy. Make ownership explicit at registration and revisit it quarterly.

3. Treat the initial Agent 365 program as a security and compliance project at least as much as a productivity project. The most defensible early wins are reducing shadow-AI exposure and producing a credible audit trail. Productivity outcomes follow naturally once stakeholders trust the guardrails.

4. Be honest about the operational overhead. A control plane only delivers value if someone is operating it. Plan for a small but explicit team responsible for policy templates, request triage, agent reviews, and lifecycle hygiene, and make sure that the team has the authority to say no when an agent does not meet your standards.

5. Evaluate the Microsoft 365 E7 bundle carefully if you are already an E5 customer. The bundle economics can become attractive once Microsoft 365 Copilot, the Entra Suite, and Agent 365 are all in scope. Still, the decision should be evaluated through a deliberate commercial analysis rather than a reflex.

Caveats and What to Watch

Several aspects of the service deserve a measured note:

Shadow-agent detection is described in Microsoft’s announcement materials as a capability that will continue to expand over time, which suggests that initial coverage will improve as the service matures.
Performance and ROI metrics are real. Still, their granularity in any given environment will depend on the specific agents and host applications you operate, so it is wise to validate them against your own reporting needs early.
End-to-end value depends on companion services. Because Agent 365 spans several Microsoft products, the value you obtain will depend on how thoroughly Entra, Purview, and Defender are configured in your tenant — partial deployments will yield partial value.

Conclusion: A Single, Coherent Place to Govern Agents

The general availability of Microsoft Agent 365 is a significant milestone, less because it introduces dramatic new capabilities than because it consolidates a set of operational disciplines that enterprises have urgently needed. Organizations have been building, buying, and deploying AI agents for some time. The absence of a unified control plane has become increasingly difficult to defend against in front of security committees, auditors, and regulators. Agent 365 closes that gap by extending the same identity, governance, and security model that already underpins your users to the agents that increasingly act alongside them.

For most Microsoft 365 customers, the practical implication is straightforward. The infrastructure question is settled, the licensing path is clear, the integration with the existing Entra, Purview, and Defender stack is real, and the first steps are administrative rather than technical. What remains is the work that always determines the success of a governance program: defining sensible policy, assigning accountable owners, observing what is happening in your tenant, and acting on what you learn. Agent 365 will not do that work for you, but it now provides — for the first time — a single, coherent place in which to do it.

If you are considering when to begin, the answer is almost certainly now. The agents are already there. The control plane is finally here.

Official Sources

Microsoft Learn — Microsoft Agent 365 overview (https://learn.microsoft.com/en-us/microsoft-agent-365/overview)
Microsoft Learn — Get started with Microsoft Agent 365 (https://learn.microsoft.com/en-us/microsoft-agent-365/get-started)
Microsoft Learn — Governance and Lifecycle actions for agents available in Microsoft 365 admin center (https://learn.microsoft.com/en-us/microsoft-365/admin/manage/agent-actions)
Microsoft 365 Blog — Microsoft Agent 365: The control plane for AI agents (Charles Lamanna, Executive Vice President, Business Applications & Agents, November 2025) (https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/)
Microsoft — Microsoft Agent 365: The Control Plane for Agents (product page, https://www.microsoft.com/en-us/microsoft-agent-365)
Microsoft Copilot Acceleration Team — Microsoft Agent 365 Resources (https://microsoft.github.io/agent-resources/agent365/)

Copilot Studio Billing – A Short Answer to a Frequent Question

Holger Imbery — Sat, 02 May 2026 08:04:40 +0000

Summary Lede

Organizations implementing Microsoft Copilot Studio often ask about the machine structure and how to monitor and analyze resource use across their deployed agents. This article addresses these common questions by examining the foundational principles of Copilot Studio's billing methodology. Specifically, the platform employs a usage-based billing model that operates through Copilot Credits, a standardized measurement unit that quantifies billable computational resources. Microsoft Copilot Studio gives you comprehensive visibility into consumption and costs through integrated Analytics. You get real-time visibility into billing metrics, consumption attribution, and capacity planning, without needing external tools or manual reconciliation.

Usage-based billing with Copilot Credits

Microsoft Copilot Studio employs a consumption-based billing model centered on Copilot Credits, a standardized unit of measurement that quantifies the billable computational resources your agent uses. We add up each agent's operational costs by summing all Copilot Credits used across your organizational tenant, providing a transparent, predictable billing framework. The volume of Copilot Credits your agent consumes is determined by multiple contributing factors, including the frequency and intensity of user interactions, the specific AI capabilities invoked during those interactions (such as natural language responses, backend action execution, or business process flows), and the overall complexity of the scenarios the agent must handle. This approach lets your costs scale with your actual usage and feature use across your organization.

Where to see billing and consumption

Copilot Studio provides a dedicated Analytics page at the agent level that shows billing‑relevant data for a selected time period. This includes:

Total billed Copilot Credits for the agent

The Analytics dashboard provides comprehensive visibility into your agent's billing metrics through several key components:

Billing trend visualization: A temporal representation of Copilot Credit consumption plotted over your selected time period, enabling stakeholders to identify consumption patterns, peak usage intervals, and seasonal fluctuations in agent utilization.
Activity-based consumption breakdown: A detailed attribution analysis that segments credit consumption by interaction type and feature category. This granular view helps identify which capabilities—such as natural language processing, action executions, or business process integrations—are the primary drivers of your organization's computational costs.
Credit allocation and remaining capacity: A dashboard element that displays your monthly credit allocation alongside actual consumption to date, providing clear visibility into remaining available credits within the current billing cycle and helping prevent unexpected cost overruns.

This analytical approach helps makers and administrators go beyond simple cost visibility to understand the factors and user behaviors that drive consumption patterns in their agents.

Near‑real‑time visibility

Please note that consumption data in the Analytics experience isn't reflected immediately. Because data collection and aggregation are distributed across Microsoft's infrastructure, there is a deliberate processing interval between when an interaction occurs in your agent and when the corresponding credit consumption metrics appear in the Analytics dashboard. Specifically, recent user activity and associated Copilot Credit charges typically require several hours to propagate through the telemetry pipeline and surface in the analytics interface. This temporal lag between actual consumption and reported metrics is a critical consideration when conducting performance monitoring, particularly in scenarios involving new agent deployments, recent architectural modifications, or optimization initiatives. Organizations implementing consumption tracking during these periods should account for this delay when interpreting analytics data and making operational decisions based on observed billing trends.

Why this matters

The relationship between billing metrics and agent design decisions is a core principle in building and running Microsoft Copilot Studio deployments. Organizations that recognize and leverage this connection are better positioned to make informed architectural decisions that balance functional requirements with cost efficiency. The Analytics experience provided within Copilot Studio serves a dual purpose that extends well beyond simple billing transparency. As a primary governance tool, it enables organizations to establish consumption baselines, define and enforce cost budgets, and implement optimization strategies at both the agent and organizational levels. This comprehensive analytical framework becomes increasingly critical as agents transition from the experimentation and proof-of-concept phases to production environments, where operational costs accumulate rapidly, and optimization opportunities become more constrained. By establishing clear visibility into consumption patterns during early development stages, teams can identify inefficient architectural patterns, optimize interaction flows, and refine AI capabilities—all before deploying agents at scale. Furthermore, the transparency provided by the Analytics dashboard facilitates organizational governance by empowering stakeholders to establish accountability for resource utilization, track consumption trends against budgetary targets, and make data-driven decisions regarding agent expansion, feature prioritization, and technology investments.

Links and resources

billing rates

Conclusion

Microsoft Copilot Studio's billing model, centered on usage-based Copilot Credits, provides a transparent and scalable framework for managing the costs associated with AI agent deployment. By leveraging the built-in Analytics experience, organizations can gain comprehensive insights into their agents' consumption patterns, enabling informed decision-making around agent design, optimization, and cost management. As organizations continue to adopt and scale their use of AI agents, understanding the nuances of billing and consumption visibility will be essential for maximizing the value of their investments in Microsoft Copilot Studio while maintaining control over operational costs.

Microsoft IQ: The New Intelligence Layer for Enterprise AI Agents

Holger Imbery — Sat, 25 Apr 2026 08:42:12 +0000

Summary Lede

Microsoft's new IQ layers - Work IQ, Fabric IQ, and Foundry IQ - are unified intelligence systems that give enterprise AI agents deep organizational context. Rather than relying solely on general knowledge, these layers ground agents in your company's real data, workflows, and knowledge domains, enabling them to make decisions with the credibility and awareness of experienced employees.

Why read this: If you're building or deploying AI agents in enterprise environments, understanding these three intelligence layers is essential. Learn what each layer does, the business value they unlock, practical developer integration steps with Copilot Studio, and real limitations to watch for - whether you're looking to protect your competitive edge or avoid common deployment pitfalls.

Introduction: Why Enterprise Context Is the New AI Differentiator

AI agents - autonomous systems that plan, reason, and act on behalf of users - are moving from experimentation to production at enterprise scale. A July 2024 Capgemini survey of 1,100 companies with over \$1 billion in annual revenue found that 82% plan to integrate AI agents within the next one to three years, with only 7% reporting no plans at all Of those surveyed, 71% expect AI agents to drive automation, and 64% expect them to free human workers from repetitive tasks so they can focus on higher-value functions. Separately, KPMG's Q2 2025 AI Quarterly Pulse Survey of 130 U.S.-based C-suite leaders (from organizations with \$1 billion or more in revenue) reports that 33% of organizations have now deployed at least some agents - a three-fold increase after two consecutive quarters at 11%. The same survey found that 82% of leaders agree their industry's competitive landscape will look fundamentally different within 24 months due to AI.

Yet deploying powerful language models alone does not guarantee business impact. Large language models ship with broad general knowledge but lack awareness of an organization's current data, internal processes, contractual obligations, and human workflow patterns. The real differentiator is no longer how smart the model is, but how well it understands your organization. At Microsoft Ignite 2025, Microsoft addressed this gap by introducing a "Unified Context Layer" comprising three tightly connected intelligence systems: Work IQ, Fabric IQ, and Foundry IQ. Together, they form "Microsoft IQ" - a unified intelligence layer spanning productivity, data, and knowledge - designed to ground AI agents with deep enterprise context so they can make reliable decisions and continuously optimize operations.

Each IQ layer addresses a distinct dimension of organizational context:

IQ Layer	Context Domain	Platform	Primary Function
Work IQ	User & work context	Microsoft 365	Captures collaboration signals - emails, meetings, chats, documents, relationships - and builds persistent memory of how people and teams work
Fabric IQ	Business & data context	Microsoft Fabric	Unifies analytical, operational, and real-time data into a governed semantic model with ontologies, graphs, and business rules
Foundry IQ	Knowledge & reasoning context	Microsoft Foundry (Azure AI Foundry)	Creates multi-source, permission-aware knowledge bases with agentic retrieval for grounded, citation-backed answers

Each workload is standalone, but they can be used together to provide a comprehensive organizational context for agents. For business decision-makers, these layers translate into faster time-to-insight, more trustworthy AI outputs, and the ability to deploy agents that act with the contextual awareness of experienced employees rather than generic assistants. For developers building with Microsoft Copilot Studio (low-code) or the Microsoft Agent Framework / Agent 365 (pro-code), the IQ layers provide ready-made intelligence services that eliminate the need to hand-build retrieval pipelines, semantic models, or user-context systems from scratch.

Work IQ - Personalizing AI with User & Collaboration Context

What It Is

Work IQ is the intelligence layer in Microsoft 365 that gives AI agents a real-time, permission-aware understanding of how people actually work. It is built on three tightly integrated layers - Data, Memory, and Inference - that work together to provide Microsoft 365 Copilot and custom agents with continuous contextual understanding of work:

Data unifies signals from files, emails, meetings, chats, and business systems across Microsoft 365 to capture how work happens across the organization.
Memory builds a persistent understanding of how people and teams work, enabling agents to stay aligned to priorities and remain consistent across tasks, apps, and sessions.
Inference brings together models, skills, and tools so agents can reason and take action while the Agent 365 control plane ensures those actions remain observable, governed, and compliant.

Work IQ connects to organizational and personal data - SharePoint files, Outlook emails, Teams meetings - and builds personalized memory based on user preferences, habits, and workflows. Conversational memory in Microsoft 365 Copilot, powered by Work IQ, enables it to retain context and important details across sessions by drawing on a user's work profile, instructions, preferences, and insights from past chats. Users stay in control and can review or delete these memories at any time.

Importantly, Work IQ does not merely retrieve information - it interprets context. This is why a Work IQ-enabled agent can answer questions like "What did we decide last week about the field project budget?" or "Summarise the latest customer escalations and draft a report". It reasons over signals, patterns, and workflows rather than searching a document library.

Strategic Value for Business

For decision-makers, Work IQ delivers personalization at scale without sacrificing security. It enables AI agents to recognize who a user works with, what they focus on, and how they typically accomplish tasks. Microsoft is now exposing the power of Work IQ through APIs, allowing developers to build AI agents targeting specific enterprise scenarios beyond what the built-in Copilot offers.

Work IQ also surfaces workflow intelligence - for example, identifying that operations teams are overloaded handling exceptions manually, observing long email chains, or spotting recurring "delay review" meetings that consume capacity. This kind of insight goes beyond analytics into organizational awareness, helping leaders understand where human effort is being consumed and where agents can provide the most value.

Tradeoff consideration: Work IQ can only be as effective as the signals it can access. It is accessed primarily through Copilot and works best when collaboration data is well-structured and consistently captured in Microsoft 365 tools. Organizations with fragmented communication (e.g., heavy use of external email, shadow IT chat tools, or poorly organized SharePoint) will see diminished returns until information management improves.

Developer Integration - Copilot Studio

Work IQ is surfaced to developers as Model Context Protocol (MCP) tools that can be attached to agents in Copilot Studio. The following step-by-step process adds the Work IQ Mail server to an agent:

Sign in to Copilot Studio and select or create your agent.
Select the Tools tab and then Add Tool.
On the Add tool page, select Model Context Protocol to see Work IQ MCP servers and other MCP servers.
Type "mail" in the search box.
Select Work IQ Mail and expand the connection dropdown to select Create New Connection.
Select Create, provide credentials, and complete the sign-in process.
Select Add and Configure to complete the process.
Test the agent - for example, prompt: "Send an email to [name] and ask how the hands-on lab is going."
When asked to allow the Work IQ tool to connect and use services, select Allow.

After configuration, the agent can read email content, understand the context, and respond accordingly. Repeat these steps for Work IQ Calendar or Work IQ Teams to extend the agent's capabilities with meeting insights, chats, and more>.

Prerequisite: A Microsoft 365 Copilot license is required to use Work IQ MCP servers.

Developer Integration - Microsoft Agent Framework (Agent 365)

For pro-code developers, the Work IQ tooling infrastructure is built into the Microsoft Agent 365 SDK and CLI, Microsoft Foundry, and Copilot Studio. Agent 365 provides a secure, centralized gateway for extending agents with enterprise-ready tools through Work IQ for Microsoft 365 services and custom tooling servers for specialized workflows. This means developers building agents programmatically - using the Agent 365 SDK in Python, C#, or other supported languages - can invoke Work IQ's capabilities (querying recent communications, documents, or calendar entries) as part of an agent's reasoning loop without manually calling Microsoft Graph APIs and interpreting raw data.

Governance and Security

IT administrators retain full control over Work IQ MCP tools through the Microsoft 365 admin center under the Agents and Tools section, where they can:

View all activated Work IQ MCP servers (Work IQ Mail, Work IQ Calendar, Work IQ Teams, and any custom servers).
Allow or block specific servers based on organizational policies.
Apply scoped permissions so agents only access what they need.

If an admin blocks a Work IQ MCP tool or MCP server, it blocks access for every user and every agent. Permissions always take precedence over configuration.

Observability is built in via Microsoft Defender. Admins can run queries in Advanced Hunting to inspect trace logs of tool calls made by agents, monitor execution details (which tools were invoked, parameters passed, and outcomes), and detect anomalies or unauthorized usage patterns.

All Work IQ MCP servers also undergo continuous evaluationto measure accuracy, latency, and reliability, ensuring production-grade robustness.

Fabric IQ - Turning Enterprise Data into Business-Meaningful Intelligence

What It Is

Fabric IQ (preview) is a workload in Microsoft Fabric that unifies data across OneLake and organizes it in the language of your business. The unified data is then exposed to analytics, AI agents, and applications with consistent semantic meaning and context. While Work IQ understands work, Fabric IQ understands data - and specifically what data means in business terms.

Fabric IQ models business data through ontologies, semantic models, and graphs so agents can reason over analytics in OneLake and Power BI. It combines the following items into one semantic intelligence workload

Fabric IQ Item	Description
Ontology (preview)	Enterprise vocabulary and semantic layer that defines entity types, relationships, properties, and condition-action rules (through Fabric Activator). Binds definitions to real data so downstream tools share the same language
Plan (preview)	Unified no-code platform for collaborative planning, reporting, analytics, data integration, and management on a single platform
Graph (preview)	Native graph storage and compute for nodes, edges, and traversals over connected data - suited to path finding, dependency analysis, and graph algorithms. Integrated with the ontology item
Data Agent (preview)	Conversational Q&A systems using generative AI that connect to the ontology to understand business concepts when answering questions
Operations Agent (preview)	AI agent to monitor real-time data and recommend business actions, aware of business terminology from the ontology
Power BI Semantic Models	Curated analytics models optimized for reporting and interactive analysis with measures, hierarchies, and relationships. Ontologies can be generated directly from them

Several of these items are shared with other Fabric workloads (e.g., Graph and Operations Agent are also part of Real-Time Intelligence; Data Agent is shared with Data Science).

Strategic Value for Business

Fabric IQ delivers six key benefits identified by Microsoft:

Unification of data - Combines data from various OneLake sources (lakehouses, eventhouses, Power BI semantic models) into a single consistent model. Can also unify external operational data using OneLake shortcuts without copying data or building ETL pipelines.
Consistent language across tools - A single definition of a concept (like Customer, Material, or Asset) drives how Power BI, notebooks, and agents interpret data.
Faster onboarding - Business concepts only need to be declared once, then new dashboards and AI experiences inherit that meaning automatically.
Governance and trust - Reduces duplication and inconsistent definitions across teams by enforcing clear semantics, while constraints improve data quality.
Cross-domain reasoning - Represents relationships between concepts with graph links, enabling traversals like Order → Shipment → Temperature Sensor → Cold Chain Breach to explain outcomes.
AI readiness and decision-ready actions - Provides structured grounding for copilots and agents so answers reflect enterprise language. Rules defined in the ontology (via Fabric Activator) enable governed, real-time actions (e.g., alerts and notifications) when conditions are met.

For finance teams, this means an AI agent can answer questions like "Why did request volume spike in the North region last month?" or "Show anomalies in field service cycle time" - grounded in what the data means, not just where it lives. The semantic model ensures that metrics such as "net profit" and "customer churn" are calculated exactly as defined by the business, so CFOs can trust the AI's output.

For operations teams, Fabric IQ's ontology can model supply chain entities, inventory levels, delivery metrics, and their relationships. When an on-time delivery percentage dips below historical norms, a Fabric IQ-informed agent can detect the anomaly, correlate it with upstream bottlenecks, and surface the issue before it escalates.

Tradeoff consideration: Fabric IQ requires upfront investment in semantic modeling. Organizations must define ontologies and business rules, which demands collaboration between data engineers and domain experts. Once agents depend on business meaning, that meaning becomes production infrastructure - semantic models must be versioned, governed, deployed, and monitored with the same rigor applied to code. For organizations already using Power BI, existing data models provide a head start: they instantly serve as a catalyst, giving agents rich, business-specific context.

Developer Integration

Developers interact with Fabric IQ primarily through the Microsoft Fabric portal, where they can create ontologies, bind them to data sources, and build Data Agents or Operations Agents that reason over the unified data model. The recommended approach for choosing the right Fabric IQ item depends on the scenario:

Item	When to Use
Ontology (preview)	Cross-domain consistency, governance, and AI/agent grounding; reasoning across processes
Graph (preview)	Relationship-heavy questions (impact chains, communities, shortest paths) dominate; GQL-style pattern matching needed
Power BI Semantic Model	Business users need trusted KPIs and fast visuals with dimensional modeling and governed datasets

Several key item relationships support combined use:

Ontology + Semantic Model: Generate or align Power BI semantic models so terminology and KPIs stay consistent across reports.
Ontology + Graph: Ontology declares which things connect and why; Graph stores and computes traversals.
Ontology + Data/Operations Agents: Ontology grounds agents in shared business semantics and rules, enabling them to retrieve context, reason across domains, and trigger governed actions.
Plan + Semantic Model: Plan connects to existing semantic models, allowing dimensions and measures to be used in planning sheets for seamless plan-versus-actuals analytics.

Foundry IQ - Unified Knowledge Retrieval and Reasoning for AI Agents

What It Is

Foundry IQ (preview) is a managed knowledge layer in Microsoft Foundry that enables the creation of configurable, multi-source knowledge bases providing agents with permission-aware, citation-backed responses based on organizational data. It tackles what many architects consider the hardest challenge in agent design: knowledge retrieval and grounding.

A Foundry IQ knowledge base consists of knowledge sources (connections to internal and external data stores) and parameters that control retrieval behavior. Multiple agents can share the same knowledge base. When an agent queries the knowledge base, Foundry IQ uses agentic retrieval - a multi-query pipeline - to process the query, retrieve relevant information, enforce user permissions, and return grounded answers with citations.

Core Capabilities

Foundry IQ offers the following capabilities:

Multi-source knowledge bases: Connect one knowledge base to multiple agents. Supported knowledge sources include Azure Blob Storage, SharePoint, OneLake, and public web data.
Automated document processing: Automate document chunking, vector embedding generation, and metadata extraction for indexed knowledge sources. Schedule recurring indexer runs for incremental data refresh.
Flexible query modes: Issue keyword, vector, or hybrid queries across indexed and remote knowledge sources.
Agentic retrieval engine: Uses a large language model to plan queries, select sources, run parallel searches, and aggregate results. The retrieval reasoning effort can be configured at three levels: minimal, low, or medium for LLM processing.
Extractive data with citations: Returns answers with source references so agents can reason over raw content and trace answers back to source documents.
Permission-aware: Synchronizes access control lists (ACLs) for supported sources and honors Microsoft Purview sensitivity labels. Enforces permissions at query time so agents return only authorized content.
Identity-based queries: Runs queries under the caller's Microsoft Entra identity for end-to-end permission enforcement.

The underlying indexing and retrieval infrastructure is powered by Azure AI Search.

Components

Component	Description
Knowledge base	Top-level resource that orchestrates agentic retrieval. Defines which knowledge sources to query and parameters that control retrieval behavior, including retrieval reasoning effort (minimal, low, or medium)
Knowledge sources	Connections to indexed or remote content. A knowledge base references one or more knowledge sources
Agentic retrieval	Multi-query pipeline that decomposes complex questions into subqueries, executes them in parallel, semantically reranks results, and returns unified responses. Uses an optional LLM from Azure OpenAI in Foundry Models for query planning

Foundry IQ knowledge bases can be used in Foundry Agent Service, Microsoft Agent Framework, or any custom application by calling the knowledge base APIs from Azure AI Search.

Strategic Value for Business

Foundry IQ is the layer that makes AI enterprise-grade for knowledge work. It enables agents to understand contracts, policies, procedures, SLAs, regulatory constraints, and unstructured documents - and to reason across them safely. The combination of permission enforcement and source citations directly addresses the two most common executive concerns about enterprise AI: data leakage and hallucination. Every assertion the agent makes can be traced to a vetted document, supporting trust and auditability.

Foundry IQ serves as a single endpoint for high-quality organizational data to maximize context for AI applications. Its knowledge retrieval engine runs over multiple data sources including Work IQ, Fabric IQ, Azure data services, custom web applications, and the web.

Tradeoff consideration: Because Foundry IQ indexes and retrieves content automatically, the quality of the knowledge base depends heavily on the quality and curation of source content. Outdated, duplicated, or poorly written documents will produce lower-quality retrieval results. Organizations should invest in content hygiene (removing obsolete documents, standardizing formatting, clarifying ownership) before connecting sources to Foundry IQ. Additionally, Foundry IQ is currently in public preview without a production service-level agreement, which means production-critical workloads should be tested thoroughly and planned around the preview constraints.

Developer Integration - Setting Up via the Microsoft Foundry Portal

The typical portal-based workflow for Foundry IQ:

Sign in to Microsoft Foundry at https://ai.azure.com. Ensure the "New Foundry" toggle is on.
Create a project or select an existing project.
From the top menu, select Build.
On the Knowledge tab:
- Create or connect to an existing search service that supports agentic retrieval.
- Create a knowledge base by adding one knowledge source at a time.
- Configure knowledge base properties for retrieval behavior.
On the Agents tab:
- Create or select an existing agent.
- Connect to your knowledge base.
- Use the playground to send messages and refine your agent.

For proof-of-concept testing, you can use the free tier for Azure AI Search and a free allocation of tokens for agentic retrieval.

Developer Integration - Connecting Foundry IQ to Agents Programmatically

For pro-code developers, the connection from an agent to a Foundry IQ knowledge base uses the Model Context Protocol (MCP) to facilitate tool calls. When invoked by the agent, the knowledge base orchestrates:

Plans and decomposes the user query into subqueries.
Processes the subqueries simultaneously using keyword, vector, or hybrid techniques.
Applies semantic reranking to identify the most relevant results.
Synthesizes the results into a unified response with source references.

SDK and API support (as of the documentation)>:

Platform	Python SDK	C# SDK	JavaScript SDK	Java SDK	REST API
Microsoft Foundry	✔️	-	-	-	✔️

Prerequisites for programmatic setup:

An Azure AI Search service with a knowledge base containing one or more knowledge sources.
A Microsoft Foundry project with an LLM deployment (such as gpt-4.1-mini).
Authentication and permissions configured on the search service and project.
Python SDK version 2.0.0 or later or the 2025-11-01-preview REST API version:

For role-based access control (RBAC):

Azure AI User role on the parent resource to access model deployments and create agents.
Azure AI Project Manager role on the parent resource to create a project connection for MCP authentication.
A system-assigned managed identity on the project for interactions with Azure AI Search.

Microsoft provides an end-to-end Python sample on GitHub - the agentic-retrieval-pipeline-example - for integrating Azure AI Search and Foundry Agent Service for knowledge retrieval.

How the Three IQ Layers Work Together - A Practical Scenario

Understanding each IQ layer individually is important; understanding how they combine is what unlocks transformative use cases. As one analysis frames it: think of a three-layer stack - Fabric IQ at the foundation (structured data intelligence), Foundry IQ in the middle (reasoning and knowledge grounding), and Work IQ on top (human workflow intelligence).

Scenario: Supply Chain Delay Management (Operations)

Consider a company building an AI agent to help manage supply chain delays:

Fabric IQ detects anomalies in delivery metrics. It sees that certain suppliers are trending late beyond historical norms, notices that on-time delivery percentages are dipping in specific regions, and correlates delays with upstream bottlenecks. This is data-driven awareness.
Foundry IQ grounds the agent in supplier contracts, SLAs, penalty clauses, and internal policies. It understands what the agreement actually says about late deliveries, interprets escalation thresholds, and knows which suppliers have stricter terms. This is contextual reasoning.
Work IQ observes that operations teams are overloaded handling these exceptions manually. It sees long email chains, recurring "delay review" meetings, and individuals spending hours every week tracking updates from vendors. It identifies patterns of reactive work consuming capacity. This is workflow intelligence.
The agent combines all three streams: it recommends which delays need escalation based on contractual impact, drafts communications to suppliers referencing the correct SLA language, suggests internal reprioritization, and surfaces issues before they become crises.

Scenario: Customer Service Knowledge Agent

A customer service team deploys an agent using:

Foundry IQ as the primary knowledge source, indexing product manuals, troubleshooting guides, FAQs, and past support ticket resolutions. When a customer asks about error code E305, the agent retrieves the relevant manual section with a citation to the source document.
Work IQ to access recent internal communications - for example, identifying that an engineering team discussed this exact error in a Teams conversation last week and already developed a workaround. The agent can surface this workaround alongside the official documentation.
Fabric IQ to check whether the error is correlated with a particular product batch or region by querying the semantic model of manufacturing and logistics data, enabling the support agent to proactively notify affected customers.

Scenario: Financial Reporting and Analysis (Finance)

A finance team connects Fabric IQ to its consolidated financial data in OneLake and Power BI:

The Ontology defines entities like Account, Transaction, Cost Center, and Region with standardized definitions for metrics like Operating Margin and Revenue Growth.
A Data Agent in Fabric IQ allows analysts to ask natural-language questions such as "What are the top five cost centers by budget variance this quarter?" - grounded in the official semantic model, ensuring the answer uses Finance's own calculation methodology.
Foundry IQ supplements this with knowledge from internal accounting policies, audit findings, and regulatory guidance documents, so the agent can explain why a variance occurred and whether it triggers any policy-based escalation.
Work IQ can surface the context of recent discussions among the finance team (e.g., "The CFO discussed this variance in Monday's meeting and requested a root-cause analysis by Friday"), ensuring the AI's recommendations are aligned with current priorities.

Practical Guidance for Copilot Studio and Agent Framework Developers

Choosing the Right Development Platform

Microsoft offers two primary paths for agent development, and both can leverage the IQ layers. The choice depends on the developer persona and scenario complexity:

Criteria	Copilot Studio	Microsoft Agent Framework (Agent 365 SDK)
Target audience	Business users, makers, power users, fusion teams	Professional developers, IT teams
Approach	Low-code / no-code with visual design canvas	Pro-code (Python, C# SDKs) with full programmatic control
IQ integration	Work IQ via MCP tools (Add Tool > Model Context Protocol); Foundry IQ via MCP connection; Fabric IQ via Fabric Data Agent	Work IQ via Agent 365 SDK; Foundry IQ via knowledge base APIs and Python SDK; Fabric IQ via APIs
Governance	Built-in through Agent 365 control plane + M365 admin center	Same Agent 365 governance + Azure RBAC
Best for	FAQ bots, task-specific assistants, business-process agents with moderate complexity	Multi-agent orchestration, complex retrieval pipelines, custom reasoning logic, enterprise-grade production agents

Copilot Studio is aimed at low-code builders while Azure AI Foundry serves pro-code developers; Agent 365 delivers a consistent, developer-friendly experience backed by rigorous evaluation for accuracy, latency, and reliability across both paths.

These platforms are not mutually exclusive. A common pattern is to use Foundry to build and fine-tune the agent's reasoning backend (including Foundry IQ knowledge bases) and Copilot Studio for the conversational front-end and deployment to Microsoft 365 channels. Microsoft also supports connecting a Foundry agent directly into Copilot Studio for organizations that want pro-code control over the backend with low-code deployment】.

Additional Developer Resources

Microsoft provides a hands-on learning experience called the IQ Series - an official GitHub repository (microsoft/iq-series) that includes video episodes, Jupyter notebooks, and Azure deployment templates spanning Foundry IQ, Work IQ, and Fabric IQ. This is a valuable starting point for developers exploring integration patterns.

For Copilot Studio, the Copilot-Studio-and-Azure GitHub repository includes a lab on Microsoft Foundry agentic retrieval (labs/2.4-microsoft-foundry-agentic-retrieval) with a notebook (foundry-IQ-agents.ipynb) that demonstrates how to connect Copilot Studio agents to Foundry IQ.

Key Best Practices

1. Curate your knowledge sources deliberately. For Foundry IQ, prioritize high-quality, authoritative content - official policy libraries, product documentation, and knowledge articles that customer service reps already use. Remove outdated or duplicated material before indexing. Please use scheduled indexing for incremental data refresh so agents always use the current information.

2. Invest in semantic modeling. For Fabric IQ, collaborate with business domain experts to design ontologies that capture actual business rules, relationships, and terminology. Start from existing Power BI semantic models when possible - they can be used as a bootstrap for ontologies, keeping language consistent across Fabric experiences.

3. Clean up collaboration data. For Work IQ, ensure that the organization's SharePoint structures are tidy, file ownership is clear, and key processes are documented. Reduce duplication and align Dataverse models with real business logic. As noted in an analysis by VisualLabs, "Copilot cannot infer intent from SharePoint chaos" - AI amplifies what already exists.

4. Combine IQ layers for maximum impact. **Foundry IQ can incorporate Work IQ and Fabric IQ as data sources, enabling custom agents to unify all three context dimensions through a single retrieval interface.

5. Govern and monitor rigorously. Use the M365 admin center for Work IQ tool governance and Azure RBAC for Foundry IQ permissions. Use Microsoft Defender's Advanced Hunting to audit all tool calls your agents make in production. For Foundry IQ, enforce permissions at query time by passing user tokens to filter results based on identity.

6. Use the "Bring Your Own Model" capability strategically. Copilot Studio supports connecting models from Microsoft Foundry's model catalog (including GPT 4.5, Llama, DeepSeek, and 11,000+ more models) to specific prompt actions within an agent. This allows you to pick the best-performing model for each task - not just use a single model globally. Governance for these model connections is managed through Power Platform admin center policies under the "Microsoft Foundry" connector.

Use Case Summary

The following table consolidates practical use cases across three domains, illustrating which IQ layers contribute and how:

Domain	Use Case	Work IQ Contribution	Fabric IQ Contribution	Foundry IQ Contribution
Customer Service	Intelligent support agent that resolves complex tickets	Surfaces recent internal discussions about the issue (Teams chats, email threads)	Correlates the issue with product batch data, defect rates, or regional patterns	Retrieves official troubleshooting guides, product manuals, and policy documents with citations
Finance	Automated financial analysis and variance reporting	Identifies which finance team members have been discussing a variance and surfaces meeting action items	Provides a governed semantic model of financial KPIs, ensuring consistent definitions (e.g., how "operating margin" is calculated)	Grounds the agent in accounting policies, audit findings, and regulatory guidance documents
Operations	Supply chain delay advisor	Detects that ops teams are overloaded with manual exception handling (long email chains, recurring meetings)	Identifies anomalies in delivery metrics, correlates delays with upstream bottlenecks, and detects regional performance dips	Retrieves supplier contracts, SLAs, and penalty clauses to determine contractual obligations and escalation thresholds
HR / Onboarding	New employee onboarding assistant	Understands who the new hire's team members are and what projects are active	N/A	Retrieves onboarding guides, IT setup instructions, benefits documentation
Compliance	Regulatory compliance advisor	Tracks which compliance officers have been communicating about a specific regulatory change	Monitors regulatory metrics and flags anomalies against defined thresholds	Retrieves the latest regulatory texts, internal policies, and past audit reports with citations

Addressing the Runtime and Governance Layer: Agent 365

No discussion of enterprise agent deployment is complete without addressing who monitors these agents and who is accountable for them. Microsoft's answer is Agent 365 - the runtime and governance layer that sits over all IQ workloads and agent interactions.

Agent 365 monitors agent decisions, tracks accuracy over time, enforces compliance boundaries, and provides deep observability into how agents behave in the real world. It gives visibility into what the agent is doing, why it is doing it, and whether it is staying within defined guardrails. This is not just logging - it is operational control: knowing when performance drifts, when policies change, and when human override is required. Without this layer, as one analysis notes, "you have experiments - clever, promising, but fragile. With it, you have enterprise-grade systems that can scale responsibly.".

For organizations evaluating AI agent initiatives, the presence of Agent 365 as a centralized governance layer is a critical factor in the build-versus-buy decision. It combines extensibility, security, and compliance to help organizations confidently scale AI agents across productivity and business systems.

Conclusion: Turning Enterprise Context into Competitive Advantage

Work IQ, Fabric IQ, and Foundry IQ represent a structural shift in how enterprises build AI agents. Rather than bolting generic AI onto existing workflows, these intelligence layers embed organizational understanding directly into the agent's reasoning process - from user collaboration patterns (Work IQ) to governed business semantics (Fabric IQ) to secure, multi-source knowledge retrieval (Foundry IQ).

For business decision-makers evaluating AI investments, the implication is clear: the value of enterprise AI is proportional to the quality and depth of context it can access. Organizations that invest in structuring their work data (M365), defining business semantics (Fabric), and curating knowledge bases (Foundry) will extract far more value from AI agents than those relying on generic models. KPMG's survey found that among organizations scaling AI, the top ROI metrics are productivity (cited by 98% of leaders), profitability (97%), and improved performance and work quality (94%) - all outcomes that depend on context-rich, trustworthy AI rather than raw model capability.

For developers, the three IQ layers provide a clear architecture and integration path - whether through Copilot Studio's visual MCP tool integration or through the Agent 365 SDK's programmatic APIs. The key is to start with well-defined data foundations (clean SharePoint, governed Fabric models, curated knowledge sources) and progressively layer in IQ capabilities as agent scenarios mature.

The organizations that will lead in the agentic era are those that recognize deployment is only the beginning - and that contextual intelligence, not model size, is the true differentiator.

Connecting Azure Application Insights to Microsoft Copilot Studio: Unlocking Deep Analytics for Agentic Systems

Holger Imbery — Sat, 18 Apr 2026 07:08:42 +0000

Summary Lede

Agentic systems demand visibility. By connecting Azure Application Insights to your Copilot Studio agents, you gain enterprise-grade monitoring that goes far beyond built-in analytics—enabling real-time diagnostics, performance optimization, and strategic business insights in a single integrated platform.

Why read this: You'll discover how to unlock faster issue detection and resolution, measure and improve user experience, demonstrate ROI to stakeholders, and establish the telemetry foundation that separates high-performing teams from those operating blindly. This guide walks through prerequisites, configuration steps, and best practices to help you implement a mature observability strategy immediately.

As agentic systems grow in complexity and autonomy, visibility becomes critical. Analytics illuminate how agents interpret user intent, make decisions, and interact with external systems—transforming a "black box" into an understandable, debuggable, and continuously improving system. In production environments, telemetry reveals performance bottlenecks, catches errors before users notice them, and provides the evidence base for optimizing agent behavior and dialog flows. Without analytics, teams operate blindly; with it, they make data-driven decisions and build trust through transparency.

Benefits of Connecting Azure Application Insights to Copilot Studio

Connecting Azure Application Insights to Microsoft Copilot Studio agents significantly extends your monitoring, diagnostics, and analytics capabilities far beyond the native tooling provided by Copilot Studio alone. Application Insights, a powerful component of the broader Azure Monitor platform, is a fully extensible Application Performance Management (APM) service designed to meet enterprise-scale requirements. This service captures granular message-level telemetry, topic trigger events, interaction latency measurements, custom domain-specific events, and comprehensive error details in near real-time, enabling immediate visibility into agent behavior. By establishing this integration, organizations gain access to both fine-grained technical observability, which empowers engineering teams to debug and optimize agent performance, and strategic usage intelligence, which informs business stakeholders about adoption patterns, user satisfaction trends, and operational efficiency metrics.

Key Benefits Overview

Benefit Area	Technical Capabilities	Business / Strategic Value
Real-Time Monitoring	Live telemetry stream of conversations; configurable Azure Monitor alerts for anomalies or thresholds	Proactive issue detection minimizes downtime; enables swift scaling responses during usage spikes
Performance Optimization	Latency and performance data per interaction; Smart Detection flags unusual performance drops automatically	Faster, more reliable agent increases user satisfaction; reduces abandonment from slow responses
Diagnostics & Error Logging	Automatic capture of exceptions with full context (stack traces, conversation state, topic/step); custom telemetry events for domain-specific tracking	Faster troubleshooting lowers support costs; higher reliability builds user trust
User Interaction Analytics	Conversation counts, active users, channels, topics triggered, session durations — queryable via KQL	Data-driven improvements to dialog design; evidence base for prioritizing development effort
Dashboards & Reporting	Pre-built Copilot Studio Dashboard (Azure Workbook) with total conversations, latency, exceptions, tool usage, and topic analytics — editable and shareable	Cross-functional visibility for technical and business stakeholders; supports ROI reporting
Ecosystem Integration	Connects to Power BI, Azure Data Lake, Azure Monitor alerts, and other Azure services	Enterprise-grade reporting pipelines; cross-system correlation between bot telemetry and business outcomes
Custom Events & Extensibility	"Log a custom telemetry event" action in Copilot Studio for domain-specific tracking; KQL for arbitrary analysis	Tailored KPI tracking (resolution rates, conversion events); structured A/B testing of agent configurations

How Telemetry Is Collected

Setting up Application Insights for your Copilot Studio agents involves configuring where and how telemetry data flows. The process is straightforward but requires understanding key configuration options and prerequisites. Here's what you need to know to get started:

Per-agent configuration: There is no tenant-wide switch. Each agent must be connected to Application Insights individually.
Setup: Add the Application Insights connection string in Settings → Advanced → Application Insights within Copilot Studio.
Azure Subscription: Required to use Application Insights.
Logging Options:
- Log activities: Logs all incoming/outgoing messages and events.
- Log sensitive Activity properties: Includes userid, name, text, and speak. Off by default due to privacy implications.

What Metrics and Data Are Available

Custom Dimensions

Telemetry records include rich metadata in the customDimensions field:

Field	Description	Sample Values
`type`	Type of activity	`message`, `conversationUpdate`, `event`, `invoke`
`channelId`	Channel identifier	`emulator`, `directline`, `msteams`, `webchat`
`fromId`	Sender identifier	`<id>`
`fromName`	Username from client	`John Bonham`, `Keith Moon`
`locale`	Client origin locale	`en-us`, `zh-cn`, `de-de`
`recipientId`	Recipient identifier	`<id>`
`recipientName`	Recipient name	`John Bonham`, `Keith Moon`
`text`	Text in message	`find a coffee shop`
`designMode`	Whether conversation occurred in the test canvas	`True` / `False`

{: .important }
Note: Data quality varies by channel. For example, unique user counts are only reliable when users are authenticated.

Built-In Copilot Studio Dashboard

Copilot Studio provides a pre-built Azure Workbook dashboard that offers immediate visibility into your agent's performance and usage patterns. This dashboard aggregates key metrics without requiring custom configuration:

Location: Application Insights → Monitoring → Workbooks → Copilot Studio Dashboard
Metrics Included: Total conversations, latency, exceptions, tool usage, topic analytics
Customizable: Add tiles using KQL, save and share dashboards with team members (requires Reader role)

KQL Querying

Use Kusto Query Language (KQL) to analyze telemetry data:

let queryStartDate = ago(14d);
let queryEndDate = now();
let groupByInterval = 1d;
customEvents
| where timestamp > queryStartDate
| where timestamp < queryEndDate
| summarize uc=dcount(user_Id) by bin(timestamp, groupByInterval)
| render timechart

To exclude test conversations:

customEvents
| extend isDesignMode = customDimensions['designMode']
| where isDesignMode == "False"

Built-In Analytics vs. Application Insights

Use Case	Built-In Analytics	Azure Application Insights
Track topic usage and completion	yes	yes (with custom events)
Understand user satisfaction	yes	yes (if instrumented)
Debug dialog transitions	no	yes
Monitor API latency or errors	no	yes
Visualize trends over time	yes (limited)	yes (custom dashboards)
Correlate with external systems	no	yes
Alerting and anomaly detection	no	yes

{: .important }
Application Insights complements — not replaces — built-in analytics.

Technical Benefits

Application Insights delivers powerful technical capabilities that transform agent monitoring and diagnostics:

Live Metrics: Real-time monitoring of bot activity.
Smart Detection: Automatic anomaly and performance issue detection.
Custom Telemetry: Log domain-specific events from within Copilot Studio.
Centralized Monitoring: Consolidate logs, metrics, and traces across agents.
Scalability: Monitor bots across multiple environments and regions.
Extensibility: Integrate with Power BI, Azure Data Lake, and more.

Business Value and Strategic Advantages

Application Insights transforms agent monitoring from a purely technical exercise into a strategic business tool. By connecting telemetry data to measurable outcomes, organizations can demonstrate ROI, accelerate innovation cycles, and build confidence in agentic systems. The following sections explore the key business advantages:

Data-Driven Decision-Making

Telemetry provides the foundation for evidence-based improvements across your agent ecosystem.

Use telemetry to understand user behavior, optimize dialog flows, and prioritize development.
Dashboards and reports provide evidence for product decisions and stakeholder communication.

Operational Efficiency

Application Insights dramatically reduces the time and effort required to maintain reliable agents in production. By automating detection and providing detailed diagnostics, teams can respond faster to issues and prevent recurring problems from consuming resources.

Reduce mean time to detect and resolve issues.
Identify systemic issues and eliminate recurring failures.

Customer Satisfaction

Application Insights enables you to measure and enhance user experience by providing visibility into agent responsiveness and identifying friction points in conversations. By understanding where users encounter delays or confusion, teams can make targeted improvements that directly impact satisfaction and retention.

Improve response times and reduce errors.
Analyze drop-offs and confusion points to refine UX.

Compliance and Auditing

In regulated industries and enterprises subject to data governance requirements, comprehensive audit trails and compliance documentation are non-negotiable. Application Insights provides the forensic capabilities needed to demonstrate regulatory compliance, investigate incidents, and maintain defensible records of agent behavior and data handling.

Maintain detailed logs for audit trails.
Support regulatory requirements with timestamped, queryable data.

Best Practices

Implementing a robust telemetry strategy requires discipline and intentionality. The following best practices will help you maximize the value of Application Insights while minimizing operational overhead and ensuring data quality:

Log Meaningful Events: When instrumenting your Copilot Studio agents, focus exclusively on capturing events that provide actionable intelligence. This includes clear indicators of user intent (such as topic invocations or explicit requests), documented dialog outcomes (successful resolutions, escalations, or abandonment), and comprehensive error context. By avoiding noise and focusing on the signal, you reduce data volume while improving analysis quality and reducing query latency.
Use Correlation IDs: Implement correlation identifiers to link related activities across multiple services, dialogs, and organizational boundaries. This practice is essential in distributed systems where a user's interaction may involve multiple agents, backend APIs, and cloud services. Correlation IDs enable end-to-end tracing of requests, making it significantly easier to diagnose complex failures and understand latency across the entire interaction pipeline.
Set Up Alerts: Configure Azure Monitor alerts on critical performance thresholds and anomalies. Rather than waiting to discover problems through manual dashboard review, proactive alerting ensures that your team is immediately notified of concerning patterns—such as sudden spikes in error rates, performance degradation, or unexpected traffic patterns. This enables rapid response before issues escalate into user-facing problems.
Review Dashboards Regularly: Establish a cadence for reviewing Application Insights dashboards with relevant stakeholders—both technical teams who investigate issues and business stakeholders who track adoption metrics. Regular review sessions transform telemetry from a passive record into an active feedback loop that informs prioritization, guides feature development, and validates hypotheses about agent behavior and user satisfaction.

Connect your Copilot Studio agent to Application Insights

Establishing a connection between your Copilot Studio agent and Azure Application Insights requires careful configuration to ensure telemetry data flows correctly to your monitoring environment. This section provides comprehensive guidance on the setup process, prerequisites, and optional configuration settings that affect which data is captured and transmitted.

Prerequisites and Initial Setup

Before you can establish a successful connection between your Copilot Studio agent and Application Insights, ensure that you have an active Azure subscription and an existing Application Insights resource. The connection process relies on authentication credentials stored in your agent's configuration so that you will need administrative access to both your Copilot Studio environment and your Azure resources.

To initiate the connection process, navigate to the Settings page for your agent within Copilot Studio. From the Settings page, locate and select the Advanced tab. This tab contains configuration options not exposed in the standard settings interface and typically reserved for operational and monitoring settings.

Configuring the Application Insights Connection String

Within the Advanced settings tab, you will find a dedicated Application Insights section. In this section, locate the Connection string field and populate it with the connection string obtained from your Azure Application Insights resource.

The connection string serves as the authentication and routing credential that enables your agent to transmit telemetry data to your specific Application Insights instance securely. Refer to the Azure Monitor documentation for comprehensive instructions on locating and retrieving your connection string from your Application Insights resource in the Azure portal.

Optional Logging Configuration

In addition to basic connectivity, Application Insights offers two optional configuration flags that let you control the scope and sensitivity of captured telemetry data. These settings provide flexibility to balance comprehensive monitoring with privacy and compliance considerations.

Log Activities: When this setting is enabled, Application Insights captures comprehensive details of all incoming and outgoing messages exchanged between users and your agent, as well as all event notifications triggered during agent operation. This option provides maximum visibility into agent behavior and user interactions, enabling detailed diagnostics and comprehensive audit trails. However, enabling this option increases telemetry volume and may have cost implications for higher-traffic agents.

Log Sensitive Activity Properties: This setting governs whether certain data fields that may contain personally identifiable information (PII) or other sensitive information are included in logged telemetry. When enabled, the following properties are captured in logs: userid, name, text, and speak (note that the text and speak properties apply exclusively to message-type activities and are not captured for other event types).

By default, this setting is disabled due to data privacy and compliance considerations. Organizations operating under strict data governance frameworks, healthcare regulations (HIPAA), financial regulations (PCI-DSS), or general privacy standards (GDPR) should exercise caution when enabling this option. If enabled, ensure that appropriate data retention policies, encryption measures, and access controls are in place to protect sensitive information logged to Application Insights.

Conclusion

Connecting your Copilot Studio agents to Azure Application Insights represents a fundamental shift in how you approach agent observability and operational excellence. By moving beyond built-in analytics, you gain enterprise-grade monitoring capabilities that illuminate every aspect of agent behavior—from granular message-level telemetry to high-level strategic insights about adoption and ROI.

The integration delivers immediate practical benefits: faster issue detection and resolution, performance optimization grounded in real data, and comprehensive audit trails that satisfy regulatory requirements. It also turns your organization's relationship with agentic systems from speculative to evidence-driven, so you can scale confidently and keep improving.

Whether your priority is reducing support costs, accelerating time-to-resolution, or demonstrating measurable business value to stakeholders, Application Insights provides the visibility and analytical power to achieve those goals. Start with the fundamentals—configure the connection string, enable appropriate logging, and establish a dashboard review cadence. As your telemetry practice matures, layer in custom events, automated alerts, and deeper correlation across your broader system ecosystem.

Telemetry infrastructure pays off right away and grows over time as your teams build data-driven habits, and your agents become more reliable, responsive, and aligned with business goals.

Building On‑Prem AI Agents with Azure Local, Foundry Local, and Microsoft Agent Framework

Holger Imbery — Sat, 11 Apr 2026 07:26:18 +0000

Summary Lede

Cloud-native architecture belongs on-premises too

On-premises and cloud-native are not contradictions — they are complementary. While enterprises have spent years building cloud-native practices in the cloud, those same principles—containerization, orchestration, API-driven integration, and infrastructure-as-code - deliver even greater value when deployed on-premises. This guide shows you how to build production AI agents that (must) run locally and using cloud native deployment schemas with Azure Local, Foundry Local, and Microsoft Agent Framework - this is proving that cloud-native excellence is not constrained by your network boundary.

If you operate in regulated industries, manage constrained connectivity, or face data residency requirements, this architecture gives you the operational consistency of the cloud without leaving your premises.

This article is the second in a series on Azure Local:

Azure Local, Foundry Local, and Microsoft 365 Local: A Comprehensive Guide for IT Architects and Decision-Makers.

Enterprise teams are moving beyond “chatbots” toward agents that can retrieve internal knowledge, call tools, orchestrate workflows, and produce outcomes aligned to real business processes. The challenge is that many agent reference designs assume always‑on cloud connectivity and cloud-hosted inference. That assumption does not hold everywhere.

In regulated industries, in plants and branches with constrained connectivity, or in environments where latency and data locality are non‑negotiable, the architecture has to follow the use case. This post describes a pragmatic design you can implement today by combining:

Azure Local as the on‑prem infrastructure substrate, managed through Azure Arc.
AKS on Azure Local as the standardized Kubernetes runtime for agent services and supporting components.
Foundry Local (preview) as the local inference runtime exposing an OpenAI‑compatible REST interface for model calls.
Microsoft Agent Framework (MAF) as the agent and workflow layer, including tool integration, session/state management, middleware, and telemetry patterns.

A critical insight: cloud-native architecture and practices are not limited to cloud deployments. The principles—containerization, orchestration, infrastructure-as-code, API-driven integration, observability, and declarative state management—are equally valuable on‑premises. In fact, they become more essential when your infrastructure cannot scale elastically or rely on the implicit redundancy of cloud regions. By applying cloud-native architecture to on‑prem agent deployments, you gain consistent operational models across locations, faster iteration, clear boundaries between layers, and the ability to treat infrastructure changes as routine rather than exceptional.

One design choice drives everything that follows: separate the agent runtime from the model runtime. You want the agent layer (routing, tools, workflows, state, observability) to evolve independently from inference, especially when local inference is in preview and can change.

The architecture in one picture (logical view)

A practical baseline pattern is “local inference, centralized orchestration.”

This separation keeps your application surface stable by establishing a clear boundary between stateless agent logic and stateful model inference. Because the agent layer and model runtime are decoupled, you can update agent code, refine routing logic, add new tools, or adjust middleware without touching the inference layer. Tools can be added safely behind constrained proxies or API gateways, allowing you to apply fine-grained network controls and audit trails at the integration boundary. Governance policies, observability hooks, and logging patterns remain consistent across agent operations regardless of where inference is placed. Simultaneously, inference becomes a managed dependency that can scale, relocate, or upgrade independently of application code. This architectural separation is particularly valuable in regulated environments where model serving and application logic often require separate operational controls, hardware isolation, or audit commitments. By decoupling these layers, you achieve the flexibility to place inference close to hardware accelerators (GPUs, NPUs) and data sources without forcing agent code to depend on infrastructure choices that are still evolving, especially when the inference runtime is in preview status and subject to API changes or performance tuning.

Where this approach fits (and where it does not)

This is a good fit when

This pattern becomes the right choice when one or more of the following constraints are fundamental to your deployment environment:

Data residency and regulatory compliance are hard boundaries. When regulations, industry standards, or organizational policy require that prompts, retrieved context, and inference results remain physically within an on‑premises boundary—whether for financial data, healthcare records, or proprietary intelligence—local inference becomes non‑negotiable. Cloud-based APIs, even with encryption and data-deletion assertions, may not satisfy audit requirements or legal obligations in certain jurisdictions. In these cases, the agent architecture must be designed to keep the full inference pipeline local while still benefiting from cloud-based observability and control planes, where appropriate, via segregated connections.

Latency is a direct measure of operational usability. In manufacturing plants, field service operations, retail branches, or other environments where agents serve human users on the shop floor or remote locations, response time is not a performance metric—it is a functional requirement that affects whether the agent is used at all. When users are waiting for a troubleshooting recommendation or a work instruction, a response that takes tens of seconds to traverse cloud networks is often abandoned. Local inference, combined with local agent orchestration, ensures that the slowest part of the response pipeline is your own internal network and compute capacity, not external connectivity.

Connectivity cannot be assumed to be always available and high-bandwidth. Many operational environments have constrained connectivity: scheduled outbound traffic windows, rate-limited connections, air-gapped subnets, or intentional network fragmentation for security isolation. The agent needs to function usefully within these constraints rather than degrade into a pass-through to cloud APIs. Azure Local supports this by enabling local execution and local state, while Arc provides a control-plane integration path when connectivity is available, rather than requiring continuous connectivity.

You want cloud-native operational practices applied on‑premises. This includes containerized deployments, Kubernetes orchestration for workload management, infrastructure-as-code for reproducibility, GitOps-driven delivery pipelines, policy enforcement at the runtime boundary, and standardized telemetry and logging. These practices are not exclusive to cloud deployments; they provide the same benefits on‑premises—clear separation of concerns, predictable deployments, and auditability—but require an infrastructure platform like Azure Local to realize them consistently.

When this pattern becomes problematic

Reconsidering a local-first architecture is warranted in several practical scenarios. If your inference workload demands elastic horizontal scaling and you cannot predict peak capacity without overprovisioning on-premises infrastructure, then chasing elastic scale with local hardware becomes economically and operationally inefficient. Building auto-scaling logic that manages standby capacity across stateful models would contradict the efficiency argument for locality. Similarly, if your operational environment requires production-grade stability guarantees from the inference API layer with minimal risk of breaking changes between deployments, the current maturity of local inference runtimes (such as Foundry Local, which remains in preview) presents a material risk. Preview components introduce uncertainty regarding backward compatibility, performance-tuning recommendations, and troubleshooting depth, which may not align with production SLAs. Finally, if the problem you are solving is fundamentally deterministic—where steps follow a fixed sequence, validation rules are static, and branching logic is known in advance—a structured workflow orchestration tool or a conventional microservice often provides clearer observability, simpler debugging, and lower operational overhead than an agent. Not every problem with tools and state management requires agentic behavior; sometimes explicit choreography is both simpler and more reliable.

These are the constraints that inform the "whether" decision. The next section moves to the "why Azure Local" specifically, grounded in use-case context rather than abstract on-premises philosophy.

Why Azure Local makes sense here (the use case drives locality)

Azure Local is not the point of the architecture. It is the platform choice that becomes rational when the agent pattern has to follow the environment: where data and tools live, what network rules allow, what latency targets are required, and what failure modes are acceptable.

1) The agent needs to live where the tools and data already are

High‑value agents are typically tool-heavy, and the distribution of that tooling directly affects where the agent runtime should run. The model call itself—the inference step that generates a response — is only one part of the agent interaction. The larger portion of the interaction involves retrieving documents from internal repositories, querying operational databases, validating business rules and constraints, and writing outcomes back to systems of record. Each of these operations carries a latency cost, integration overhead, and often data governance implications.

When the authoritative data sources and the systems that perform work live on‑premises—whether that is an ERP system, a manufacturing execution system, a document repository, or a service dispatch platform—moving the agent runtime closer to those systems becomes pragmatically necessary rather than architecturally optional. A remote agent calling back into on‑premises tools over the network incurs not only the latency of each call but also the operational complexity of maintaining secure, reliable network pathways between cloud and on‑premises infrastructure, managing retry logic for transient failures across that boundary, and reasoning about whether a failure in the agent's response came from the model inference or from a tool integration issue.

Placing the agent runtime on‑premises reduces integration complexity by collapsing tool interactions into local function calls with minimal network hops. It also materially shrinks the trust boundary. Data that would otherwise traverse a cloud service boundary—even with encryption in transit and assertions of deletion—can remain inside the perimeter where it originated. Azure Local provides a consistent, repeatable substrate on which to host the runtime within the organizational boundary while still enabling cloud-native operational practices such as containerization, orchestration, and declarative configuration management that teams have come to expect.

2) Latency is a functional requirement, not a nice-to-have

In operational scenarios, predictable response time is not an optimization target—it is a functional requirement embedded in the task itself. When a field technician, floor supervisor, or support worker invokes an agent for a troubleshooting recommendation or work instruction, they are typically performing a task that cannot proceed until they receive guidance. A response that arrives within seconds fits naturally into human workflow and decision-making; the user can act on it immediately and move forward. A response that takes tens of seconds—or worse, becomes non-deterministic depending on cloud API load—exceeds the mental context window of the task. Users abandon the agent, fall back to phone calls or manual lookups, or proceed without the agent's input entirely, making the agent operationally irrelevant regardless of its intelligence.

The latency problem compounds when the agent's response is not a single inference call. A typical operational agent orchestrates multiple steps: retrieving context from a document repository, querying a database to validate prerequisites, calling an external service to fetch the current state, and then synthesizing a response. Each of these operations incurs round-trip time. When those dependencies live on-premises, and the agent runtime lives in a cloud region thousands of kilometers away, the baseline latency floor is determined by geography and internet backbone capacity, not by the execution speed of any individual component. You cannot optimize away the speed of light or your ISP's rate limiting. Placing the agent runtime and its tooling in the same local environment—where internal networks typically offer latency in the single-digit millisecond range—ensures that the slowest element becomes your own infrastructure capacity, which you can measure, predict, and scale. This transforms latency from an externality you absorb to a variable you control.

Azure Local supports this placement strategy by providing AKS-hosted agent services and local model-serving infrastructure in a shared operational footprint. The inference engine, the agent orchestration layer, and the tool integrations all run in the same data center or facility where the authoritative systems live. This collapse of distance translates directly into a collapse of latency, which translates into usability in environments where response time affects task completion.

3) Connectivity constraints are a design input

Many environments are not "cloud-connected" as cloud reference architectures assume. The assumption embedded in most cloud-native architecture guidance is that outbound connectivity is available, reliable, and incurs acceptable latency and throughput. In practice, many operational environments operate under very different constraints. Outbound traffic to public cloud endpoints may be restricted by security policy or rate-limited by egress gateways. Connectivity may be scheduled—available only during specific windows or subject to maintenance blackouts. In other cases, network segments may be deliberately disconnected by design: operations technology networks in manufacturing facilities, isolated domains in financial institutions, or intentionally air-gapped environments in highly regulated sectors all follow this pattern. Even when connectivity exists, it may be mediated by proxies, firewalls, or VPNs, adding latency and complexity to troubleshooting when the agent's inference or tool calls fail.

Azure Local enables local execution of the agent runtime and inference engine regardless of whether upstream cloud connectivity is available. Simultaneously, it aligns with Azure's control-plane concepts and governance models via Azure Arc when connectivity is available. This dual capability means you can design and operate an agent system that functions reliably in disconnected or intermittently-connected scenarios without abandoning cloud-native operational practices. When connectivity is available, Arc can be used for centralized observability, policy enforcement, and update orchestration. When connectivity is unavailable, the local agent continues to function using local tools and data. This gives you an operational path that respects the actual constraints of your environment rather than forcing an architecture that assumes away those constraints or requires workarounds to compensate for them.

4) You can keep cloud-native operations without reinventing on‑prem deployment

Teams generally want repeatable delivery, policy enforcement, and consistent observability. The conventional tension between on-premises deployments and cloud-native operations has historically forced a false choice: either accept the operational discipline and automation of cloud platforms at the cost of moving workloads outside your perimeter, or keep infrastructure on-premises and revert to manual configuration management, bespoke deployment scripts, and fragmented observability tooling.

Azure Local plus AKS on Azure Local severs that coupling. Containerized deployments, GitOps-driven configuration management, Kubernetes namespaces, and declarative rollout strategies work identically whether your agent runtime is in a public cloud region or in your own data center. The infrastructure boundary becomes transparent to operational practices. Teams can maintain the same deployment pipelines, policy engines, and observability systems they have built for cloud workloads and apply them without modification to on-premises clusters. This continuity of tooling and process significantly reduces the operational friction that typically accompanies on-premises agent deployments. The "local" decision becomes a deployment location decision—a choice about where to run proven, familiar infrastructure patterns—rather than a return to bespoke server management, manual patching, and isolated monitoring infrastructure that would otherwise characterize traditional on-premises deployments.

5) Local inference forces you to manage capacity and hardware intentionally

If inference is local, capacity planning and acceleration hardware become first-class concerns that demand explicit decision-making rather than outsourced abstraction. When inference runs in a public cloud service, capacity is nominally infinite—or at least, the perception of infinity is maintained through multi-tenancy and auto-scaling tiers that obscure the underlying hardware realities. Costs accumulate by token count and API call frequency, but the physical infrastructure remains opaque. The tradeoff is acceptable if your workload is occasional or bursty; the cost volatility is a known variable you can budget for.

When inference runs locally, however, the hardware economics become tangible. A single GPU accelerator costs tens of thousands of dollars upfront, requires power and cooling infrastructure, and has a finite lifespan. Acquiring that hardware is no longer a usage-based charge smoothed into monthly billing; it is a capital expenditure that sits in your facility and has opportunity cost. This visibility forces intentional capacity planning: you must understand your typical inference load, peak throughput requirements, model sizes, and acceptable latency percentiles, and then purchase hardware that meets those requirements with some headroom for growth. You cannot simply add capacity by changing a tier or waiting for auto-scaling to provision more instances; you provision intentionally.

Azure Local provides a platform to run and govern those resources, allowing you to isolate inference nodes, stage updates, and enforce change control without coupling the inference lifecycle to the agent code lifecycle. You can reserve specific nodes for specific models, apply resource quotas to prevent one workload from starving another, and manage hardware refreshes independently from application deployments. This separation of concerns means you can upgrade your inference engine or swap model versions without draining the entire cluster, and you can plan hardware replacement without triggering emergency application refactorings. The operational rigor this imposes is not a burden—it is an alignment of technical decision-making with the actual cost structure of your infrastructure.

6) The architecture stays incremental and reversible

By separating agent runtime from model runtime, you establish a deployment boundary that allows you to make infrastructure decisions independently from application logic. This separation is critical in practice because it decouples two sources of change that typically move at different velocities: the agent orchestration layer (tools, workflows, routing, state management) tends to evolve rapidly as teams refine business logic and respond to operational feedback, while the inference runtime makes infrequent but high-impact decisions around model selection, hardware acceleration strategy, and inference node topology that are capital-intensive and difficult to reverse.

Starting small means you can pilot with a single inference node running a small quantized model, then grow to multiple specialized nodes—some optimized for latency-sensitive operations, others for throughput—without requiring changes to the agent code itself. The agent layer continues to interact with inference through the same OpenAI-compatible API boundary, indifferent to whether a single GPU or a distributed cluster backs that endpoint. You can keep your agent API stable while swapping models; if a new quantization or a different model family becomes available, you can stage it on a secondary node and route traffic to validate behavior before completing the migration. You can change inference node placement by adjusting scheduling constraints or moving nodes between racks without triggering a redeploy of agent services. This mobility is not possible when agent code and inference are tightly coupled—for example, when inference decisions are embedded in application code or when the agent layer depends on model-specific features or tokenization strategies.

Azure Local supports this incremental expansion by providing a consistent Kubernetes control plane and standard scheduling mechanisms that treat compute resources as fungible. Your initial pilot might span a single machine running AKS on Azure Local in a branch or regional office; as you validate the model and prove business value, you can expand to a small cluster in your primary data center. Each step remains operationally routine because you are not changing how workloads are deployed or managed—you are only changing the scale and distribution of resources. A pilot deployment and a production cluster follow the same GitOps patterns, use the same artifact promotion pipelines, and respond to the same observability signals, allowing you to graduate from proof-of-concept to production without a redesign of your delivery model or a learning curve on unfamiliar operational practices.

Practical decision test: Azure Local tends to be the right call when most of these are true: the authoritative tools/data are on‑prem, prompts and retrieved context must remain local, latency is a requirement, connectivity is constrained, and you want cloud-native operations in the same footprint.

With that context, we can move from "why" to "how".

Step‑by‑step implementation runbook

Phase 0 — Define boundaries: agent vs workflow, and what "done" means

Write the outcome in business terms.

Define success in measurable outcomes, not in model features. Examples include reduced downtime, faster triage, fewer escalations, shorter handling time, or improved compliance auditability.
Classify steps as agent or workflow.
- Use an agent for open-ended interpretation, conversational assistance, flexible tool use, and summarization.
- Use a workflow for deterministic steps, routing, approvals, checkpoints, and auditable state transitions.
Produce a tool inventory and trust boundary map.

For each tool, define authentication, authorization, validation, allowed destinations, and audit requirements.

Operational gotcha: Teams often prototype by giving agents broad access "to move fast." That security debt becomes expensive later. Start with constrained proxies and allow-lists from day one.

Phase 1 — Platform baseline: Azure Local + Arc + AKS on Azure Local

1.1 Establish baseline assumptions

Decide upfront:

Topology: pilot node, datacenter cluster, or distributed sites.
OS mix: Linux nodes, Windows nodes, or mixed.
Acceleration: CPU only vs GPU/NPU inference nodes.
Connectivity mode: connected, constrained, or partially disconnected.

Operational gotcha: Constrained connectivity changes everything about artifact flow. Treat "how will nodes pull images and models?" as a first-class requirement (private registry, artifact promotion, caching).

1.2 Build a minimal AKS baseline (repeatable)

Include:

Namespaces for separation (platform, agents, tools, observability).
Ingress and certificate strategy.
Secrets management strategy.
Logging/metrics pipeline.
Network policies and egress controls.

Example namespace baseline:

apiVersion: v1
kind: Namespace
metadata:
  name: agents
  labels:
    pod-security.kubernetes.io/enforce: "restricted"
---
apiVersion: v1
kind: Namespace
metadata:
  name: tools
  labels:
    pod-security.kubernetes.io/enforce: "restricted"

Operational gotcha: Without early namespace boundaries and baseline policies, your cluster becomes a collection of special cases that are hard to govern and hard to migrate.

Phase 2 — GitOps delivery (recommended even for pilots)

2.1 Repository layout pattern

A structure that scales:

clusters/<cluster-name>/ for cluster-specific overlays
platform/ for shared add-ons (ingress, monitoring, policy)
workloads/agents/ for agent services
workloads/tools/ for tool proxies and connectors

2.2 Kustomization pattern (example)

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: agents
  namespace: flux-system
spec:
  interval: 5m
  path: ./workloads/agents/overlays/prod
  prune: true
  sourceRef:
    kind: GitRepository
    name: platform-repo
  timeout: 5m

Operational gotcha: GitOps only reduces drift if "kubectl apply in production" is the exception with a documented break-glass process.

Phase 3 — Foundational services: state, memory, and observability

Make state explicit and intentional:

Conversation state (threads, session context) belongs in agent stores designed for that purpose.
Business state (work items, approvals, tickets) belongs in systems of record.

Common supporting components on AKS:

Redis for caching and rate limiting
PostgreSQL (or equivalent) for durable state
A vector store if you implement local RAG
OpenTelemetry collector for traces/metrics/logs

Operational gotcha: Agent telemetry can explode. Define retention, sampling, and content redaction policies early. In regulated environments, you often cannot log raw prompts or retrieved text.

Phase 4 — Install Foundry Local (preview) on inference nodes

Treat Foundry Local as a managed runtime dependency.

4.1 Placement and isolation

Prefer dedicated inference nodes where possible.
Place them where the acceleration hardware lives.
Segment networking so AKS can reach them reliably while keeping exposure minimal.

4.2 Endpoint discovery (avoid hard-coded ports)

Prefer one of these:

Discovery service pattern: publish the current base URL into a config store that your agent services read.
Gateway pattern: place a stable internal proxy in front of Foundry Local to normalize routing and policies.

Operational gotcha: Hard-coded ports work in a lab and fail after reboots, upgrades, or runtime changes. Build discovery or stable routing into the design.

Phase 5 — Network, TLS, and identity between AKS and Foundry Local

5.1 Connectivity options

Common choices:

Direct HTTPS from agent pods to Foundry node IP/DNS
Internal L4/L7 proxy for stable routing and policy
Service mesh for mTLS and telemetry (only if you already operate one)

5.2 TLS strategy

Use your standard PKI approach, if possible, and ensure that clients validate certificates by default.

Operational gotcha: "Works with curl -k" is a warning sign, not a milestone. Fix trust chains early so insecure shortcuts do not become permanent.

Phase 6 — Implement the inference adapter in your MAF service

Design goal: agent code calls a model client abstraction, not a concrete endpoint.

6.1 Configuration pattern (ConfigMap + Secret)

apiVersion: v1
kind: ConfigMap
metadata:
  name: agent-config
  namespace: agents
data:
  FOUNDRY_BASE_URL: "https://foundry-local.internal.example"
  FOUNDRY_MODEL: "local-chat-model"
  INFERENCE_TIMEOUT_SECONDS: "30"
  INFERENCE_MAX_RETRIES: "2"
---
apiVersion: v1
kind: Secret
metadata:
  name: agent-secrets
  namespace: agents
type: Opaque
stringData:
  FOUNDRY_API_KEY: "placeholder-if-required-by-client"

Deployment consuming it:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: maf-agent-api
  namespace: agents
spec:
  replicas: 2
  selector:
    matchLabels:
      app: maf-agent-api
  template:
    metadata:
      labels:
        app: maf-agent-api
    spec:
      containers:
      - name: api
        image: registry.local/agents/maf-agent-api:1.0.0
        envFrom:
        - configMapRef:
            name: agent-config
        - secretRef:
            name: agent-secrets
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "2"
            memory: "2Gi"
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10

6.2 Client policy (timeouts, retries, circuit breakers)

Start with conservative defaults:

Timeout: 20–60s depending on model/prompt size
Retries: 1–2 for transient failures only
Circuit breaker: open after repeated failures to prevent cascading latency
Concurrency limits: protect inference nodes from overload

Operational gotcha: Without explicit backpressure, a single busy agent route can saturate inference and degrade every workload that shares the runtime.

Phase 7 — Tool integration with constrained proxies

Do not give agents direct access to sensitive systems.

Recommended approach:

Deploy tool proxy services in a dedicated namespace.
Restrict outbound connectivity to approved destinations only.
Enforce authorization, validation, and allow-lists in the proxy.
Log every invocation with correlation IDs.

A default-deny egress policy concept:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tools-default-deny-egress
  namespace: tools
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress: []

Operational gotcha: If you skip network controls early, you will discover "mystery dependencies" later when tools call endpoints that were never approved.

Phase 8 — Observability: correlate agent → tools → inference

Minimum requirements:

Correlation ID propagated across inbound request, tool calls, inference calls, and response
Latency breakdown (tool time vs inference time vs orchestration time)
Error classification by category (tool failure, inference failure, policy block, timeout)
Token/prompt size metadata if available

Operational gotcha: Decide what is safe to log. For many environments, metadata and hashes are acceptable, but raw prompts and retrieved snippets are not.

Phase 9 — Hardening: safety, governance, regression testing

Hardening checklist:

Prompt and tool regression tests for critical flows
Golden conversations for validation after runtime updates
Tool schemas and allow-lists are enforced centrally
Timeouts on every external call
Rate limits per user and per route
Graceful degradation when inference is unavailable (fallback to workflow/human)

Operational gotcha: Preview inference runtimes can introduce behavior changes that are not "errors" but still break user expectations. Without regression tests, you will find out in production.

Phase 10 — Operations: versioning, rollouts, and capacity planning

10.1 Independent update cadencess

Operate on separate cadences:

Agent services: frequent updates via CI/CD
Inference runtime: cautious updates via staged rollout
Cluster/platform: regular maintenance windows

10.2 Rollout strategy

Canary agent changes with a small traffic slice and compares latency/error rates
Pin inference runtime versions and validate with representative load before expanding rollout

10.3 Capacity planning

Define explicit SLOs:

p95 latency target for a representative prompt
maximum concurrent sessions per inference node
acceptable queueing delay under peak load

Operational gotcha: Size for peaks and recovery scenarios. A thundering herd is common when shifts start, sites reconnect, or batch processes trigger.

A practical “day‑1 to day‑30” plan

Day 1–3: Foundation

Define business outcomes and agent/workflow boundaries
Stand up AKS baseline namespaces, ingress, and GitOps scaffolding
Deploy telemetry pipeline and basic dashboards

Day 4–10: Inference integration

Install Foundry Local on inference nodes
Implement endpoint discovery and TLS trust
Add inference adapter in the MAF service with externalized configuration

Day 11–20: Tools and data

Build constrained tool proxies with allow-lists and audit logs
Implement retrieval paths that keep data inside the boundary
Add correlation IDs end-to-end

Day 21–30: Hardening and operations

Add regression tests and golden conversations
Implement rollouts, version pinning, and canary strategy
Load test and finalize a capacity plan and operational runbooks

Conclusion

This stack is not about "on‑prem versus cloud." It is about aligning the agent pattern with the constraints imposed by the use case: data locality, tool proximity, latency targets, and network realities. Azure Local provides a consistent on‑prem platform for that pattern; AKS keeps operations cloud-native; Foundry Local enables local inference; and Agent Framework provides the application layer to build agents and workflows that map to real business outcomes. By following this architecture and implementation runbook, you can deliver production‑grade AI agents that run locally, proving that cloud-native excellence is not constrained by your network boundary.

Testing Copilot Agents: When to Use Agent Evaluation vs. the Copilot Studio Kit

Holger Imbery — Sun, 05 Apr 2026 07:29:14 +0000

Microsoft's Agent Evaluation GA announcement on March 31, 2026, update to Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)

Summary Lede
Agent Evaluation and the Copilot Studio Kit are not competing tools—they represent a layered quality-assurance strategy. Agent Evaluation provides fast, AI-assisted behavioral validation embedded directly in Copilot Studio, ideal for iteration and rapid feedback. The Copilot Studio Kit delivers deterministic, enterprise-grade verification for production gates, compliance, and governance. This article breaks down what each tool does, when to use them, and how to adopt both as your agent quality matures.

Why read this
If you're building or scaling Copilot agents in your organization, you need clarity on testing strategy. This article cuts through the positioning and provides a practical decision framework for when to reach for Agent Evaluation versus the Copilot Studio Kit, with real-world scenarios showing how mature teams layer both tools across their development lifecycle.

What Microsoft shipped with Agent Evaluation (GA)

On March 31, 2026, Microsoft announced the general availability of Agent Evaluation, marking a significant milestone in Copilot Studio's testing and validation capabilities. Agent Evaluation is now generally available and built directly into Copilot Studio. Its goal is to make agent quality visible, repeatable, and scalable without requiring external tools or setup. This GA release represents the culmination of Microsoft's efforts to democratize agent quality assurance, bringing evaluation capabilities previously limited to advanced setups directly into the hands of everyday agent makers in the Copilot Studio authoring environment.

Core characteristics (as of 31. March 2026)

Integrated directly into the Copilot Studio authoring experience
Agent Evaluation is not a separate tool or external service. It lives within the Copilot Studio interface, where agents are built, allowing makers to validate their agents without context-switching or complex integrations. This tight integration reduces friction and encourages frequent validation during development cycles.

Designed to answer the production question:
"Can we trust this agent to behave correctly, consistently, and safely?"
This core question drives the entire design philosophy. Agent Evaluation focuses on behavioral confidence—whether the agent produces appropriate, consistent, and safe responses across diverse scenarios and user inputs.

Replaces unscalable manual testing and spot‑checking
Before Agent Evaluation, agent validation relied heavily on manual testing: individually testing scenarios, reviewing responses, and hoping coverage was adequate. This approach doesn't scale with agent complexity or usage volume. Agent Evaluation automates and scales this process through AI-assisted evaluation and reusable test sets.

Intended to be used before launch and continuously after changes
Agent Evaluation is not a one-time gate. It's designed for continuous validation: before initial launch, before deploying updates, and continuously as conversations flow through production. This shift from ceremonial testing to continuous validation aligns with modern DevOps practices.

Evaluation capabilities

Agent Evaluation allows makers to:

Create evaluation sets from:
- Manually added questions
- Imported test sets
- AI‑generated queries derived from agent metadata and knowledge sources
Choose flexible evaluation methods, including:
- Exact/partial match
- Semantic similarity
- Intent recognition
- Relevance and completeness scoring
Mix AI‑generated and human‑defined scenarios to balance breadth and depth
Reuse evaluations over time and run them via APIs for lifecycle testing

Key framing:
Agent Evaluation is positioned as a lightweight, AI‑assisted validation layer that fits naturally into everyday agent authoring and iteration. Unlike heavy external testing frameworks that require context switching and specialized infrastructure, Agent Evaluation operates within Copilot Studio itself, where agents are built. This embedded approach acknowledges that agent makers are iterating rapidly, testing comprehensively at each step, and need validation feedback within their authoring flow rather than as a post-production bottleneck. The AI-assisted scoring means makers don't need to hand-write every test case or define complex rubrics upfront; they can generate relevant test scenarios from their agent's own knowledge sources and metadata, then refine them. This makes evaluation accessible to makers of all skill levels and scales with agent complexity.

What the Copilot Studio Kit provides for testing

The Copilot Studio Kit (Power CAT) is a separate, solution‑based toolkit that augments Copilot Studio with enterprise‑grade testing, governance, and analytics. Developed by the Microsoft Power CAT (Patterns and Practices) team, the Kit represents a mature, production-ready framework built for organizations requiring rigorous quality assurance, regulatory compliance, and scalable CI/CD integration. While Agent Evaluation addresses everyday iteration and behavioral confidence within the authoring canvas, the Copilot Studio Kit provides the structural backbone for organizations that need deterministic verification, audit trails, multi-layer testing orchestration, and governance enforcement across large deployments.

Explicit testing capabilities

The Kit supports structured, deterministic, and multi‑layer testing, including:

Response Match (exact or conditional text comparison)
Attachment Match (Adaptive Cards/files)
Topic Match (requires Dataverse enrichment)
Generative Answer evaluation using AI Builder and rubrics
Multi‑turn tests running in a shared conversation context
Plan Validation for generative orchestration (verifying which tools/actions are invoked, not just what the agent says)

Execution and automation

Tests are executed via Copilot Studio APIs (Direct Line)
Bulk creation and maintenance via Excel import/export
Detailed run‑level telemetry:
- Pass/fail
- Latency
- Observed responses
- Aggregated metrics
Results can be enriched with:
- Azure Application Insights
- Dataverse conversation transcripts

Enterprise extensions beyond testing

The Kit also includes:

Conversation KPIs for Power BI
Prompt Advisor
Agent Inventory
Agent Review Tool
Compliance Hub with policy enforcement and SLA‑driven reviews

Key framing:
The Copilot Studio Kit is built for verification, regression testing, production gates, and governance at scale. Unlike Agent Evaluation's lightweight, AI-assisted approach that lives within the authoring canvas, the Kit functions as an enterprise testing backbone designed for organizations that require deterministic verification, full audit trails, and regulatory compliance enforcement. It bridges the gap between development-time validation and production-readiness, enabling structured quality gates that align with enterprise DevOps pipelines. The Kit's emphasis on exact response matching, topic validation, and orchestration plan verification makes it essential for mission-critical deployments where agent behavior must be predictable, traceable, and compliant.

Direct comparison

Dimension	Agent Evaluation (GA)	Copilot Studio Kit
Where it lives	Built into Copilot Studio UI	Separate Power CAT solution
Primary purpose	Behavioral validation	Functional verification
Setup effort	Minimal	Higher (Dataverse, AI Builder, App Insights optional)
Test creation	Manual, import, AI‑generated	Manual + Excel bulk
AI‑assisted scoring	Yes (core feature)	Yes (Generative Answers via AI Builder)
Deterministic checks	Limited	Strong (exact match, topic, attachments)
Multi‑turn scenarios	Not explicitly documented	Explicitly supported
Orchestration plan validation	Not documented	Explicitly supported
CI/CD & quality gates	Implicit / API‑based	Explicit pipeline integration
Governance & compliance	Not in scope	First‑class feature

How they relate (this is the key insight)

Microsoft is not replacing the Copilot Studio Kit with Agent Evaluation.
Instead, the sources show a clear layering strategy:

Agent Evaluation
→ Fast, AI‑assisted, in‑product validation
→ Ideal for early feedback, iteration, and continuous confidence
Copilot Studio Kit
→ Deep, deterministic, automatable verification
→ Ideal for release gates, regression testing, orchestration correctness, and governance

This positioning is also explicitly reflected in community and Microsoft guidance that frames Agent Evaluation as filling the gap that manual testing cannot scale, while the Kit remains the system‑level quality backbone.

Practical takeaway for enterprise teams

Based on what is explicitly documented:

When to use each tool

Agent Evaluation is best suited for:

Rapid iteration cycles during agent development
Early-stage quality validation before formal review
Continuous behavioral checks without infrastructure complexity
Scenarios where AI-assisted, semantic evaluation is sufficient
Teams prioritizing speed of feedback over deterministic guarantees
Questions like: "Is this agent generally behaving well after my last change?" → Use Agent Evaluation.

Copilot Studio Kit is best suited for:

Production release gates and formal deployment approval
Regression testing before pushing updates to production
Regulatory and compliance-driven scenarios requiring audit trails
Mission-critical agents where deterministic verification is mandatory
Complex orchestration scenarios requiring plan and tool invocation validation
Multi-turn conversations that need end-to-end correctness
Questions like: "Did we break anything? Are the topics correct? Are the tools invoked? Can this ship?" → Use the Copilot Studio Kit.

How they complement each other

In mature setups, the tools are complementary, not competitive:

Development phase: Agent Evaluation provides fast feedback loops for iteration
Pre-production phase: Copilot Studio Kit enforces deterministic verification gates
Production phase: Both tools support continuous monitoring—Agent Evaluation for behavioral trends, the Kit for functional regression detection
Governance phase: The Kit's compliance and KPI tracking provide the enterprise audit trail and policy enforcement layer

Organizations scaling from single-agent projects to enterprise deployments should expect to adopt both tools at different maturity stages, using them in sequence rather than as either/or choices.

Conclusion

Agent Evaluation and the Copilot Studio Kit represent Microsoft's thoughtful answer to the agent testing maturity curve. As organizations build, iterate, and scale agents from proof-of-concept to mission-critical systems, both tools play essential roles at different stages of the lifecycle.

Agent Evaluation brings quality validation into the authoring experience, reducing friction in everyday iteration and making behavioral confidence accessible to all agent makers. Its AI-assisted approach acknowledges the reality of rapid development cycles and the need for fast feedback loops.

The Copilot Studio Kit, by contrast, provides the deterministic backbone that enterprises require—exact verification, governance enforcement, regulatory compliance, and the audit trails necessary for mission-critical deployments.

The key insight is that these tools are not competitors but complementary. Teams should adopt them in sequence, starting with Agent Evaluation during development for rapid iteration, then layering in the Copilot Studio Kit as the agent approaches production. Organizations serious about agent quality at scale will ultimately adopt both, using them to build confidence at every stage from ideation to production and beyond.

Azure Local, Foundry Local, and Microsoft 365 Local: A Comprehensive Guide for IT Architects and Decision-Makers

Holger Imbery — Sat, 04 Apr 2026 07:56:37 +0000

Summary Lede

Cloud Capabilities Without Leaving Premises

As regulatory demands tighten, latency requirements become critical, and data sovereignty moves from a nice-to-have to a must-have, Microsoft has engineered a comprehensive answer: Sovereign Private Cloud. This three-pillar platform—Azure Local infrastructure, Microsoft 365 Local productivity, and Foundry Local AI—enables organizations to operate complete, intelligent cloud systems entirely within their boundaries. Whether you're managing classified government systems, running millisecond-critical manufacturing operations, sustaining teams in air-gapped locations, or processing sensitive AI workloads behind regulatory firewalls, this guide walks you through architectures, deployment strategies, and real-world patterns for implementing on-premises cloud at enterprise scale.

Reasons to Read This Article:

Complete Platform Understanding: Grasp all three components of this Sovereign Private Cloud approach, how they integrate, and which combination matches your operational model (connected, intermittently connected, or fully offline).
Deployment Confidence: Learn the hardware requirements, licensing models, connectivity tolerances, and planning phases required to deploy Azure Local (hyperconverged or disconnected), Microsoft 365 Local, and Foundry Local in production.
Use Case Alignment: Identify whether your organization fits one of the key scenarios—government/defense data sovereignty, manufacturing low-latency control, retail edge compute, isolated locations, or confidential AI—with architectural patterns and reference implementations.
Agentic AI on Premises: Discover how to build multi-agent AI systems using Microsoft Agent Framework + Foundry Local + Azure Local infrastructure, enabling autonomous reasoning and automation with zero cloud dependency.
Risk Mitigation and Best Practices: Understand connectivity tolerance, failover strategies, backup approaches, and testing protocols to ensure your on-premises cloud operates reliably and compliantly.
Evaluation Path: Explore trial options (60-day Azure Local eval, free Foundry Local, partner-delivered M365 Local pilots) tailored to your budget and risk profile.

Azure Local is Microsoft's distributed infrastructure solution — formerly known as Azure Stack HCI — that extends Azure capabilities to customer-owned environments. It enables local deployment of both modern and legacy applications across distributed or sovereign locations, using Azure Arc as the unifying control plane. Azure Local is the foundation of Microsoft's Sovereign Private Cloud offering, which unifies three components: Azure Local (infrastructure), Microsoft 365 Local (productivity), and Foundry Local (AI inference) to deliver a full-stack private cloud that operates at any connectivity level — connected, intermittently connected, or fully disconnected. This article provides a comprehensive overview of these offerings, their use cases, deployment options, and best practices for IT architects and decision-makers considering on-premises Azure solutions.

Why Azure Local: Hyperconverged and Disconnected

Azure Local addresses a set of business requirements for which the public cloud alone is insufficient: compute that must remain on-premises, mission-critical application resiliency, low-latency decision-making, and specific compliance mandates. Microsoft positions it as part of the adaptive cloud approach — bringing the cloud to the customer so they can build and innovate anywhere.

Hyperconverged Deployments (Connected Mode)

A hyperconverged deployment of Azure Local consists of one machine or a cluster of machines connected to Azure. Clusters support 1 to 16 physical machines with hyperconverged storage (up to 8 machines in rack-aware configurations). The architecture is built on proven technologies: Hyper-V, Storage Spaces Direct, and Failover Clustering.
In connected mode, the Azure cloud serves as the management plane. Administrators use the Azure Portal, Azure CLI, or PowerShell to view, monitor, and manage individual Azure Local instances or an entire fleet. Azure Local includes a secure-by-default configuration with more than 300 security settings, providing a consistent security baseline and a drift-control mechanism.
Connectivity tolerance: If internet connectivity is lost, all host infrastructure and existing VMs continue to run normally. However, features that directly rely on cloud services become unavailable. Azure Local must successfully sync with Azure at least once every 30 consecutive days. If that window is exceeded, the cluster enters a reduced-functionality mode — existing VMs continue running, but new VMs cannot be created until connectivity is restored.

Disconnected Operations (Full Offline Mode)

For environments where any cloud connectivity is undesired or impossible, disconnected operations bring the entire Azure control plane on-premises. Organizations can deploy and manage Azure Local instances, build VMs, and run containerized applications using select Azure Arc-enabled services from a local control plane that provides a familiar Azure Portal and Azure CLI experience — all without a connection to the Azure public cloud.
Key constraint: Disconnected mode requires extra capacity for a dedicated management cluster to host the local control plane appliance.

This management cluster has the following minimum hardware requirements:

Specification	Minimum Configuration
Number of nodes	3 nodes
Memory per node	96 GB (appliance alone needs ≥64 GB)
Cores per node	24 physical cores
Storage per node	2 TB SSD/NVMe
Boot disk drive storage	960 GB SSD/NVMe

Disconnected operations are intended for organizations that cannot connect to Azure due to connectivity issues or regulatory restrictions. To procure this capability, a valid business justification and a Microsoft Customer Agreement for Enterprises (MCA-E) (or other eligible agreement type) are required.

Connected vs. Disconnected: Decision Framework

Decision Factor	Connected (Hyperconverged)	Disconnected
Cloud dependency	Requires outbound HTTPS to Azure ≥ once per 30 days	Zero cloud dependency; local control plane
Management plane	Azure public cloud (Azure Portal, Arc)	On-premises Azure Portal and CLI replica
Hardware overhead	Workload cluster only (1–16 nodes)	Workload cluster + dedicated 3-node management cluster
Eligibility	Any Azure subscription	Requires MCA-E and business justification
Best for	Hybrid scenarios, branch offices, edge with periodic connectivity	Air-gapped facilities, classified environments, remote sites without Internet

Typical Use Cases

Azure Local, Foundry Local, and Microsoft 365 Local serve scenarios in which the traditional public cloud alone cannot meet operational, regulatory, or latency requirements. The following use cases emerge from Microsoft's documentation and partner ecosystem:

Government and Defense (Data Sovereignty)

Organizations in government, defense, and intelligence sectors require that data, operations, and control remain within organizational boundaries. Azure Local enables sovereign private clouds in which all workloads run locally. Microsoft 365 Local adds core collaboration tools — Exchange Server, SharePoint Server, and Skype for Business Server — that run entirely within the customer's sovereign operational boundary, keeping teams productive even when disconnected from the cloud. In disconnected mode, data residency and sovereign requirements are met without relying solely on public sovereign cloud controls.

Manufacturing and Industrial Operations (Low Latency & Reliability)

Azure Local targets control systems and near real-time operations with extreme latency requirements — manufacturing execution systems, industrial quality assurance, and production line operations that must continue through network outages. On-premises compute clusters enable decisions in milliseconds without cloud round-trip delays. Azure Local's integration with Azure IoT Operations (deployed on AKS clusters enabled by Azure Arc on Azure Local) provides a turnkey approach for managing and processing IoT data at the edge.

Retail and Branch Offices (Edge Compute)

Azure Local supports single-machine deployments through full clusters, making it suitable for distributed retail or branch scenarios where local AI inference at the source is needed — for example, self-checkout systems and loss-prevention applications in retail stores. The hyperconverged design ensures that even if WAN connectivity to central services drops, local operations continue uninterrupted.

Remote and Isolated Locations

Industries operating in areas with limited network infrastructure — oil rigs, mining sites, rural clinics, and vessels at sea — benefit from operating in disconnected environments. Azure Local lets them use Azure Arc services and run workloads without relying on internet connectivity. Foundry Local extends this by enabling on-device inference of AI models in offline or bandwidth-constrained environments.

Confidential AI and Data Processing

Organizations that need to run AI on sensitive data without exposing it to third-party clouds can combine Azure Local with Foundry Local. This enables local AI inferencing, where data is processed at the source. Foundry Local supports chat completions (text generation) and audio transcription (speech-to-text) through a single runtime that runs entirely on-device, with no cloud dependency for inference. Foundry Local now supports large multimodal models on Azure Local infrastructure, using the latest GPUs from partners like NVIDIA, so you can run advanced AI inference in sovereign environments.

What Is Available

Azure Local Core Infrastructure

Azure Local is a full-stack infrastructure software running on validated hardware in customer facilities. It supports VMs, containers, and select Azure services locally while maintaining Azure-consistent management through Azure Arc.

Features and architecture of hyperconverged deployments:

Feature	Description
Hardware	Validated hardware from Microsoft partners; 1–16 machines per instance (max 8 for rack-aware clusters)
Storage	Storage Spaces Direct; external SAN storage in preview for qualified opportunities
Networking	Customer-managed with physical switches and VLANs; optional software-defined networking (SDN)
Azure Local Services	VMs for general-purpose workloads; AKS enabled by Azure Arc for containerized workloads
Azure Management	Azure Policy, Azure Monitor, Microsoft Defender for Cloud, and others via Azure Arc
Observability	Metrics and logs sent to Azure Monitor and Log Analytics for infrastructure and workload resources
Management Tools	Azure Portal, CLI, ARM/Bicep/Terraform (cloud); PowerShell, Windows Admin Center, Hyper-V Manager, Failover Cluster Manager (local)
Disaster Recovery	Azure Backup, Azure Site Recovery, and non-Microsoft partners
Security	300+ security settings for consistent baseline and drift control; Trusted Launch for VMs; Microsoft Defender for Cloud integration

Common Azure services on Azure Local:

Use Case	Service
Virtual machines	Azure Local VMs enabled by Azure Arc (Windows/Linux, with Trusted Launch support)
Virtual desktops	Azure Virtual Desktop (AVD) session hosts on-premises
Container orchestration	Azure Kubernetes Service (AKS) enabled by Azure Arc
Arc-enabled services	Select Azure services for hybrid workloads via Azure Arc
High-performance databases	SQL Server on Azure Local with extra resiliency
Media analytics	Azure AI Video Indexer enabled by Azure Arc
AI chat assistants	Azure Edge RAG (Preview) — turnkey RAG solution for custom chat over private data
IoT management	Azure IoT Operations on AKS clusters on Azure Local

Disconnected operations support a subset of these services via the local control plane:

Service	Description
Azure Portal	Local portal experience similar to Azure Public
Azure Resource Manager (ARM)	Subscriptions, resource groups, ARM templates, CLI
RBAC	Role-based access control for subscriptions and resource groups
Managed identity	System-assigned managed identity for supported resource types
Arc-enabled servers	VM guest management for Azure Local VMs
Azure Local VMs	Windows or Linux VMs via disconnected operations
Arc-enabled Kubernetes (Preview)	CNCF Kubernetes cluster management on Azure Local VMs
AKS enabled by Arc (Preview)	AKS on Azure Local in disconnected mode
Azure Local device management	Create and manage instances, add/remove nodes
Azure Container Registry	Store and retrieve container images and artifacts
Azure Key Vault	Store and access secrets
Azure Policy	Enforce standards and governance on new resources

Deployment types include hyperconverged deployments, multi-rack deployments (in preview), Microsoft 365 on Local, and disconnected operations. Multi-rack deployments support larger configurations with prescriptive hardware BOMs featuring pre-integrated racks containing SAN storage, servers, and network devices; re-use of existing hardware is not supported for multi-rack at this time.

Microsoft 365 Local

Microsoft 365 Local runs Exchange Server, SharePoint Server, and Skype for Business Server on Azure Local infrastructure that is entirely customer-owned and managed. It supports both hybrid and fully disconnected deployments and provides an Azure-consistent management experience with a unified control plane.

Core capabilities:

Exchange, SharePoint, and Skype for Business: Enterprise-grade email, document management, and unified communications on-premises, addressing stringent data residency requirements.
Certified and validated solutions: Deployed on Azure Local Premier Solutions from hardware partners, guaranteeing compatibility for sovereign deployments.
Full-stack validated reference architecture: Prescriptive guidance for networking, storage, compute, and identity integration based on best practices for optimal performance and resiliency.
Sovereign Private Cloud capabilities: Azure-consistent management with enhanced security features (encryption, access controls, compliance mechanisms) aligned with local regulatory frameworks.
Hybrid or fully disconnected support: Connected mode uses Azure as the cloud control plane; disconnected mode uses a local control plane for complete isolation and air-gapped operations.

Example large-scale server role allocation (connected mode):

3 servers configured as a three-node Azure Local instance for SharePoint Server and SQL Server workloads.
4 servers each as single-node Azure Local instances for Exchange Server mailbox roles.
2 servers each as single-node Azure Local instances for Exchange Server edge transport roles.

Microsoft 365 Local is now generally available and must be deployed through a Microsoft-certified solution partner. Microsoft has committed to supporting these on-premises productivity server workloads through at least 2035.

Foundry Local

Foundry Local is an on-device AI inference solution (currently in public preview) that enables local execution of AI models through a CLI, SDK, or REST API. It provides an OpenAI-compatible REST endpoint running entirely on-device, meaning prompts and model outputs are processed locally without being sent to the cloud.

System requirements:

Requirement	Details
OS	Windows 10 (x64), Windows 11 (x64/ARM), Windows Server 2025, macOS
Minimum hardware	8 GB RAM, 3 GB free disk space
Recommended hardware	16 GB RAM, 15 GB free disk space
Optional acceleration	NVIDIA GPU (2000 series+), AMD GPU (6000 series+), AMD NPU, Intel iGPU, Intel NPU (32 GB+ memory), Qualcomm Snapdragon X Elite (8 GB+ memory), Qualcomm NPU, Apple silicon

Supported AI capabilities:

Component	Description
Foundry Local Service	An OpenAI-compatible REST server providing a standard interface for inference. The endpoint is dynamically allocated when the service starts.
ONNX Runtime	Executes optimized ONNX models on CPUs, GPUs, or NPUs; supports multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and quantized models for faster inference.
Model Management	CLI and cache system for downloading, listing, and managing AI models locally.

Key architectural components:

Foundry Local Service: An OpenAI-compatible REST server providing a standard interface for inference. The endpoint is dynamically allocated when the service starts.
ONNX Runtime: Executes optimized ONNX models on CPUs, GPUs, or NPUs; supports multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and quantized models for faster inference.
Model Management: CLI and cache system for downloading, listing, and managing AI models locally.

No Azure subscription is required to use Foundry Local on a device; it runs on local hardware with no recurring cloud costs for inference.
For sovereign environments requiring heavier AI workloads, the integration of Foundry Local with Azure Local supports large-scale models utilizing the latest GPUs from NVIDIA, with Microsoft providing comprehensive support for deployments, updates, and operational health.

Prerequisites and Planning for Deployment

Hardware and Catalog Selection

Azure Local runs exclusively on validated hardware configurations listed in the Azure Local Solutions Catalog. Hardware solutions fall into three categories: Validated Nodes, Integrated Systems, and Premier Solutions. Premier Solutions delivers deep integration and validation for a smooth end-to-end experience. For hyperconverged deployments, you can reuse existing hardware only if it matches a supported configuration in the catalog; otherwise, upgrades or new hardware are required.

Each Azure Local machine in a hyperconverged cluster must meet system requirements for CPU, memory, storage, and network. For planning, the Azure Local Catalog and available sizing tools help estimate hardware requirements for the intended workload profile. Networking must be designed for redundancy and performance—typically using 10–25 GbE or higher links, physical switches, and VLANs. Optional SDN services can be enabled for software-defined networking.

For disconnected operations, plan additional capacity for the management cluster as detailed in Section 1 (3 nodes, 96 GB RAM/node, 24 cores/node, 2 TB SSD/node, 960 GB boot disk/node).

For Microsoft 365 Local, hardware must be an Azure Local Premier Solution that specifically meets the M365 Local requirements listed in the Azure Local Solutions Catalog. Please work with your authorized Microsoft partner to size the deployment appropriately. We have reference architectures for small-, mid-, and large-scale configurations tailored to your needs.

Azure Subscription and Licensing

An Azure subscription is required for Azure Local. The billing model charges a per-physical-core fee on on-premises machines, plus consumption-based charges for any additional Azure services used. All charges roll up to the existing Azure subscription. For disconnected operations, an eligible enterprise agreement (such as MCA-E) is also needed, and qualification must be discussed with the Microsoft account team before procurement.

Additional licensing considerations include:

OS licenses for workload VMs (e.g., Windows Server)
Microsoft 365 server licenses if deploying M365 Local (Exchange, SharePoint, Skype)
Foundry Local requires no Azure subscription and has no RBAC role requirements when running solely on-device.

Network Connectivity Planning

In connected mode, each machine must have outbound HTTPS connectivity to well-known Azure endpoints at least every 30 days. If SDN is planned, review the SDN overview before deployment. Network and host requirements must be met per Microsoft's published specifications.

In disconnected mode, the local management cluster must be networked to the workload clusters within the customer's environment, but no external internet is required post-deployment (only registration data is exchanged during initial deployment, registration, and license renewal).

Assessment and Planning Phases

A structured planning process reduces risk. Microsoft and its partners typically follow phased engagement for Azure Local projects, especially for M365 Local:

Phase	Description
Assessment	Analyze organizational requirements, compliance needs, and desired outcomes
Planning	Define hardware configurations, software solutions, migration, and integration strategies
Acquisition	Procure necessary hardware, software, and licenses
Deployment	Execute the planned rollout in accordance with best practices

For disconnected operations, organizations must additionally identify workloads and application requirements for disconnected operation, and staff (or partners) with the capability to deploy and operate disconnected environments.

Deploying Azure Local: Steps and Best Practices

Cluster Installation and Registration

For hyperconverged deployments, the Azure Local operating system can be downloaded from the Azure Portal, which includes a free 60-day trial. Alternatively, pre-integrated systems from OEM partners arrive with Azure Local pre-installed. After installing the OS on each server node and configuring the cluster (using Storage Spaces Direct for storage and Failover Clustering for high availability), the cluster must be registered with Azure Arc to enable cloud management through the Azure Portal and Arc tools.

Hardware can be purchased from any Microsoft hardware partner listed in the Azure Local Catalog, and the available sizing tool can help estimate hardware requirements before purchase.

Post-Deployment Configuration

Once registered, the Azure Local instance appears in the Azure Portal as a manageable resource. Post-deployment steps include:

Enabling Arc-enabled services: Configure AKS clusters, Arc-enabled data services, or other platform services as needed for workload requirements.
Applying governance policies: Use Azure Policy to enforce compliance standards across the on-premises environment, and configure Microsoft Defender for Cloud to assess and improve security posture.
Setting up monitoring: Configure Azure Monitor and Log Analytics for metrics and log collection from both infrastructure and workloads.
Keeping the environment current: Azure Local provides Solution Updates that simplify keeping the entire stack up to date across OS, firmware, and drivers.

For disconnected deployments, these management services are configured on the local control plane appliance rather than through Azure public endpoints. The local Azure Portal and CLI provide an equivalent experience for managing policies, deploying VMs, and monitoring infrastructure within the isolated environment.

Deploying Microsoft 365 Local
M365 Local must be deployed through a Microsoft-certified solution partner. The partner follows the reference architecture to provision the required Azure Local instances and configure Exchange, SharePoint, and Skype for Business server roles. The reference architectures include prescriptive guidance for networking and security — covering virtual networks, network security groups, and load balancers to segment, isolate, and secure access to workloads. In connected mode, architectures use Azure as the cloud-connected control plane; in disconnected mode, they use a local control plane.

Organizations can contact their Microsoft account team or visit the Microsoft 365 Local General Availability sign-up page for information about authorized partners.

Testing and Validation

Thorough validation after deployment is critical:

Cluster validation: Run the built-in validation tools to confirm hardware, storage, and network configurations meet requirements.
VM and failover testing: Create test VMs, perform live migrations between nodes, and simulate node failures to verify high availability.
Connectivity resilience (connected mode): Simulate internet outages to confirm workloads continue uninterrupted and that the cluster correctly reconnects and syncs within the 30-day window.
Disconnected mode testing: Verify that the local management portal supports all required operations (VM provisioning, policy enforcement, monitoring) without any external connectivity.
Backup and recovery validation: Test backup and restore procedures using Azure Backup, Azure Site Recovery, or third-party solutions.

Planning and Deploying VM Workloads

Capacity Planning

Unlike the elastic scaling of public Azure, on-premises capacity is finite. IT architects must right-size VMs based on the physical resources available in the Azure Local cluster, while maintaining headroom for peak loads and failover overhead. Consider future growth when sizing: adding capacity requires purchasing and deploying new server nodes — a slower process than cloud scaling. The Azure Local Catalog and sizing tools assist with estimating how many VMs of given sizes a cluster configuration can support.

Creating VMs via Azure Arc

Azure Local manages VMs as Azure resources through the Azure Arc Resource Bridge. VMs can be created using the Azure Portal, Azure CLI, ARM templates, Bicep, or Terraform. The creation workflow through the Azure Portal involves:

Navigate to the Azure Local cluster resource and select + Create VM.
Specify project details: subscription, resource group.
Configure instance details: VM name, custom location (associated with the Azure Local cluster), security type (Standard or Trusted Launch), storage path, OS image, administrator account, vCPU count, memory allocation (static or dynamic — cannot be changed post-deployment).
Optionally enable Guest Management for Arc extensions integration, Domain Join for Active Directory, and additional data disks.
Configure networking: attach at least one network interface with appropriate IP allocation (DHCP or static).
Review and create.

This information is based on the documented VM deployment process for Azure Local environments.

Image management: Custom VM images (VHDs) can be uploaded or imported as templates. Preparing golden images — pre-hardened with security agents, configurations, and required software — streamlines consistent provisioning across the fleet.

Security for VM Workloads

Trusted Launch: Supported for Azure Local VMs, enabling secure boot and virtual TPM (vTPM). The vTPM state automatically transfers within a cluster, and attestation confirms whether the VM started in a known-good state.
Microsoft Defender for Cloud: Can assess and improve the security posture of both the Azure Local instance and individual VMs.
Arc guest management: Extensions can be deployed inside VMs for configuration management, monitoring, and security agent installation.

GPU Workloads

For AI or graphics-intensive workloads, Azure Local supports GPU-equipped servers. GPUs can be made accessible to VMs through direct pass-through or shared via GPU partitioning (GPU-P), which allows a single physical GPU to be divided into multiple virtual GPUs for different workloads simultaneously. This is valuable when multiple AI inference services, rendering tasks, or data processing workloads need GPU acceleration concurrently. NVIDIA GPUs (such as A-series models) are validated for Azure Local deployments.

Tryout and Evaluation Options

Option	Details
Azure Local 60-Day Trial	Download the Azure Local OS from the Azure Portal for a free 60-day evaluation for proof-of-concept deployments on your own hardware. Even a single validated server can be used to test core features. Microsoft's Azure Arc Jumpstart project provides step-by-step demo scenarios.
Foundry Local (Preview)	Free, no Azure subscription required. Install via `winget install Microsoft.FoundryLocal` (Windows) or `brew tap microsoft/foundrylocal && brew install foundrylocal` (macOS). Run a model immediately: `foundry model run qwen2.5-0.5b`. Experiment with text generation and speech-to-text on existing hardware. Alternatively, download the installer from the Foundry Local GitHub repository.
Microsoft 365 Local	No standalone trial download; engagement through Microsoft or a certified solution partner is required for proof-of-concept or pilot deployments. Contact your Microsoft account team or visit the M365 Local GA sign-up page. Hardware requirements are significant (enterprise-scale server configurations), so evaluations typically take place in partner labs or test environments.
Learning Resources	Microsoft Learn modules, tutorials, and the Azure Arc Jumpstart provide guided lab experiences. Community blogs, partner solution briefs (from Dell, HPE, Lenovo, etc.), and the Microsoft Tech Community contain implementation case studies and architectural guidance.

Tradeoff consideration for trials: The 60-day Azure Local trial enables self-service evaluation of the core hyperconverged platform and VM management. However, testing disconnected operations requires the dedicated management cluster hardware and MCA-E eligibility, which limits ad hoc experimentation. For M365 Local, the partner-delivered model ensures proper configuration, but it means organizations cannot independently test before engaging commercially. Foundry Local, by contrast, offers the lowest barrier to entry — it runs on a standard laptop or desktop with no cloud dependencies.

Appendix: Building Agentic AI Solutions with Azure Local, Microsoft Agent Framework, and Foundry Local

Conceptual Overview

Modern AI applications increasingly follow an agentic pattern — multiple specialized AI agents that reason, communicate, and act to perform complex tasks. Microsoft provides tools to develop and run these solutions entirely on local infrastructure by combining three components:

Azure Local — the on-premises infrastructure providing compute, storage, networking, and (optionally) GPU acceleration.
Foundry Local — the on-device AI inference runtime serving LLM and other models via an OpenAI-compatible API endpoint.
Microsoft Agent Framework (MAF) — an open-source framework (Python and .NET SDKs) for building, orchestrating, and deploying AI agents and multi-agent workflows.

The Agent Framework was introduced as an open-source project by Microsoft and is hosted on GitHub at microsoft/agent-framework with over 8,300 stars and 1,400 forks. The latest release at the time of research was python-1.0.0rc5 (dated 2026-03-19).

Architecture Pattern

A concrete reference implementation was published on the Microsoft Developer Community Blog, demonstrating real-world AI automation with Foundry Local and MAF — described as running with "no cloud subscription, no API keys, no internet required". The system uses four specialized agents orchestrated by MAF:

Agent	Function	Latency
PlannerAgent	Sends user commands to the Foundry Local LLM and produces a structured JSON action plan	4–45 seconds
SafetyAgent	Validates actions against workspace bounds and schema constraints	< 1 ms
ExecutorAgent	Dispatches validated actions to the target system (e.g., robotics simulator for inverse kinematics and gripper control)	< 2 seconds
NarratorAgent	Produces a template-based summary of actions taken (with optional LLM elaboration)	< 1 ms

The orchestration flow follows a sequential pipeline: User → Orchestrator → Planner → Safety → Executor → Target System, with the Narrator providing observability.

In this reference, the PlannerAgent uses Foundry Local as its AI backend, invoking a local model (e.g., qwen2.5-coder-0.5b) via the standard OpenAI Python client pointing to the Foundry Local endpoint:


from foundry_local import FoundryLocalManager
import openai

manager = FoundryLocalManager("qwen2.5-coder-0.5b")
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key,
)

This pattern — structured JSON output from an LLM, validated by a safety layer, dispatched to a domain-specific engine — generalizes beyond robotics to home automation, game AI, CAD, lab equipment, and any domain requiring safe, structured control.

Deployment Patterns on Azure Local

For production deployment of agentic AI on Azure Local infrastructure, the following layered architecture applies:

Layer 1 — AI Model Hosting: One or more Azure Local VMs (or containers) running Foundry Local to serve AI models. For small models, a standard CPU-equipped VM suffices. For large multimodal models, VMs with dedicated GPU access on Azure Local infrastructure leverage the latest NVIDIA GPUs for high-throughput inference. Foundry Local automatically selects the best execution provider (NPU > GPU > CPU) for the available hardware.
Layer 2 — Agent Orchestration: The Microsoft Agent Framework runs as a service (in a container on AKS or in a VM) and orchestrates the multi-agent pipeline. It handles agent-to-agent communication, memory management, tool integrations, and calls to the Foundry Local inference endpoint. Domain-specific engines (simulation environments, database connectors, control system APIs) can be integrated as tools that agents invoke during execution.
Layer 3 — Application Interface: A custom frontend (web application, dashboard, CLI, or API gateway) through which users submit tasks and receive results. This can be hosted on the same Azure Local cluster. All inter-layer communication occurs over the cluster's internal network, keeping data fully on-premises and latency to a minimum.

Applicable Scenarios

The combination of Azure Local + Foundry Local + MAF enables agentic AI solutions where:

Industrial automation: Agents interpret natural-language operator commands, plan machine actions, validate safety constraints, and execute robotic or process-control operations — all on the factory floor without cloud dependency.
Sovereign AI assistants: Multi-agent systems that collate local data, reason using on-device LLMs, and provide decision support in classified or regulated environments (defense, finance, healthcare) where data must never leave the premises.
Edge intelligence: IoT-connected environments where agents monitor sensor data streams, use local AI for anomaly detection and root-cause analysis, and actuate responses in real time — applicable to energy infrastructure, transportation systems, or smart facilities.
Offline automation: Field operations, shipboard systems, or disaster-response scenarios where internet connectivity is unavailable but sophisticated AI reasoning and automation are still required.

Key Advantages and Tradeoffs

Advantages: Running agentic AI entirely on Azure Local provides data sovereignty (all prompts, model outputs, and orchestration data remain local), low latency (no network hops to cloud endpoints), deterministic cost (no per-token API charges), and operational resilience (functions without internet).
Tradeoffs: On-device models are constrained by local GPU memory and compute — the largest cloud-hosted models (e.g., GPT-4 at full scale) may not be runnable locally without significant GPU investment. Model updates require manual download and deployment rather than automatic cloud-side updates. Additionally, Foundry Local remains in public preview, meaning features and supported models are still evolving and may have limitations before general availability. Organizations should evaluate whether the models available for local inference meet their quality bar for production use, and plan for a path to larger models as Foundry Local's support for large-scale models on Azure Local with NVIDIA GPUs matures.

Conclusion

Azure Local, Foundry Local, and Microsoft 365 Local together form a cohesive platform for organizations seeking sovereign, on-premises cloud capabilities without compromise. As data residency, regulatory compliance, and operational resilience become non-negotiable requirements across industries, Microsoft's investment in distributed infrastructure and local AI inference reflects a fundamental shift in how enterprises architect their digital ecosystems.

The combination of Azure Local (providing edge-aware infrastructure and hybrid compute), Microsoft 365 Local (delivering productivity and collaboration on-premises), and Foundry Local (enabling local LLM inference) addresses the long-standing tension between cloud agility and data sovereignty. Whether your organization operates in a connected, intermittently connected, or fully disconnected environment, these solutions let you innovate locally without sacrificing the governance, scale, or intelligence that cloud-native architectures offer.

For IT architects and decision-makers, the path forward is clear: evaluate your specific regulatory, latency, and data residency requirements; prototype on a small cluster or Azure Local Appliance; and progressively expand as organizational confidence and operational maturity grow. The learning curve is manageable, the economics are favorable for regulated industries, and the competitive advantage in markets demanding data sovereignty is significant.

As Foundry Local and Azure Local move toward general availability and mature their feature sets, the case for Sovereign Private Cloud becomes stronger. The future of enterprise computing is not "cloud vs. on-premises" — it is a thoughtfully designed hybrid architecture that respects both business logic and the regulatory terrain in which that logic operates.

Practical Guideline: How to Move Agents Beyond POCs and Deliver Real Enterprise Value

Holger Imbery — Sat, 21 Mar 2026 08:29:29 +0000

Summary Lede

I hear the same question repeatedly from customers exploring agent or broader AI adoption: "How do we escape the endless POC phase and actually deliver real business value?" Most organizations get stuck prototyping broadly instead of executing narrowly, trapped in cycles of experimentation that never reach production. This practical guideline distills ten core principles proven to move agents from ideation into measurable enterprise impact. Read on to discover how to anchor initiatives in real processes, maintain scope discipline, connect agents to live input channels, enforce production-grade behavior from day one, integrate with mission-critical systems early, deliver in short iteration cycles, create lightweight review processes, commit to real usage within 30 days, use multiple small agents, and plan for long-term flexibility—transforming your AI investment from experimentation into sustainable value delivery.

Organizations often get stuck in endless prototyping cycles because they experiment broadly rather than execute narrowly. This guideline distills the core principles that move agents from ideation to measurable business impact.

1. Anchor the Initiative in a Single Real Process

Begin by identifying a single operational workflow where your organization currently loses productive time on a recurring basis. This workflow should exhibit characteristics such as repetitive manual steps, rule-based decision logic, or intensive data manipulation. Avoid starting with abstract experimentation or exploratory prototypes that lack connection to actual business operations.

The economic rationale for this approach is straightforward. When you ground your agent development in a concrete, existing process, you force alignment with real data sources, actual system dependencies, and measurable business outcomes. This concrete anchoring eliminates the disconnection that is characteristic of laboratory environments and proof-of-concept work, which often remain isolated from production constraints and real-world variability.

To establish this baseline understanding, you should document the current state of the process by answering these questions. First, identify what data inputs currently exist and where they originate within your organization. Second, determine which specific steps within the workflow consume the most human effort and therefore represent the highest opportunity for efficiency gains. Third, establish what quantifiable outcome should improve as a result of agent implementation, whether measured in terms of time savings per transaction, reduction in human errors, or increase in process throughput.

2. Keep the First Agent Extremely Narrow

The most decisive factor in moving beyond proof-of-concept phases is maintaining scope discipline. Organizations frequently fail to operationalize agents because they attempt to expand functionality too broadly before achieving stable baseline performance in a narrow domain. This expansion pattern increases complexity exponentially while simultaneously distributing development resources across multiple problem dimensions.

The essential discipline requires that you define the agent's responsibilities in a single, unambiguous sentence of the form: "This agent is responsible for X and nothing beyond X." This constraint forces explicit trade-offs between capability breadth and implementation depth, ensuring that resources concentrate on achieving reliable performance in one well-defined function rather than fragmented performance across multiple functions.

Consider these concrete examples of appropriately scoped initial deployments. An agent might be chartered to look up customer pricing from a master database and return one verified result, without attempting to negotiate, modify, or recommend alternative pricing. Another agent might be restricted to extracting structured fields from incoming documents and validating them against schema requirements, without attempting interpretation or applying business rules. A third agent might be limited to classifying incoming inquiries into exactly three predefined categories, without attempting subcategories or fuzzy classifications.

This discipline against over-engineering serves multiple economic functions. It reduces the surface area for defects, shortens the time to reach measurable operational impact, and simplifies the governance model for operating the agent in production environments. By deferring expansion until baseline performance is established, organizations create a foundation of operational reliability upon which additional capabilities can be layered incrementally.

3. Connect the Agent to a Real Input Channel Early

Manual testing through a studio UI creates an isolated environment that does not reflect operational reality. The agent is evaluated against synthetic inputs, clean data structures, and predetermined response patterns—conditions that rarely occur in production systems. Real business value emerges only when the agent receives actual operational requests that contain the variability, ambiguity, and edge cases inherent in genuine work.

Operational input channels include the following mechanisms through which work currently flows into your organization: forwarded email messages containing unstructured customer inquiries, Teams chat messages that combine urgent questions with side conversations, CRM cases that reference prior interactions and incomplete context, and uploaded documents that may contain inconsistent formatting or missing required fields. Each of these channels introduces distinct data quality challenges and user expectations.

The economic rationale for early channel integration stems from the principle of revealed preference through actual behavior. When users interact with the agent through their existing workflow channels rather than a lab environment, their usage patterns reveal which capabilities create genuine value and which create friction. Synthetic testing cannot substitute for this behavioral signal. Furthermore, exposure to live variability from the first iteration accelerates learning about edge cases and failure modes that would otherwise remain hidden until full production deployment.

Action: Select one input channel that currently delivers the highest volume of work into your target process and route genuine operational requests through the agent beginning in the first development cycle. This approach ensures that early versions contend with real data distributions and authentic user patterns rather than idealized test scenarios.

4. Enforce Production‑Grade Behavior From Day One

Development environments and production environments typically operate under fundamentally different constraints and enforcement mechanisms. In many organizations, agents developed during the proof-of-concept phase operate with suspended governance controls, synthetic data, and permissive access policies that would never be acceptable in operational systems. This separation creates a structural barrier to production adoption because moving an agent from a development environment to production then requires a complete redevelopment of its data connectors, compliance controls, and operational behaviors.

The efficient approach eliminates this artificial separation by imposing production-grade constraints from the initial development phase. This requires the agent to use the actual data sources employees rely on for their daily work, rather than sanitized copies or test databases. The agent must apply existing data access restrictions and compliance controls that govern access to sensitive information within your organization, rather than running with elevated or unrestricted permissions. The agent must maintain consistent tone, content, and response handling in line with organizational standards, rather than developing ad hoc response patterns during development that would later require modification. The agent must draw on approved knowledge sources that align with organizational information governance policies, rather than accessing ad hoc files or unvetted external data.

This approach reduces the economic cost of production deployment by eliminating the need to redesign and re-implement governance controls at transition time. Additionally, exposing the agent to genuine constraints during development accelerates the identification of edge cases and failure modes that would otherwise remain hidden until production deployment, when they would be far more costly to address.

Action: Establish the expectation that the first version of the agent operates under the same governance framework and data access policies as the final production system. This mindset collapses the artificial gap between proof-of-concept development and production readiness.

5. Integrate With One Mission‑Critical System Early

From an operational perspective, a proof-of-concept implementation that remains disconnected from your organization's core systems generates negligible business value regardless of how well the agent performs in isolation. The critical transformation occurs when the agent gains the ability to read from or write to systems that directly affect your operational workflows, such as customer relationship management platforms, enterprise resource planning systems, document management repositories, or human resources information systems. At that point, the agent transitions from a theoretical capability into a practical tool that produces measurable outcomes within your existing business processes.

The economic principle underlying this requirement is straightforward: manual handoff steps between system boundaries represent a fundamental source of friction and delay. When an agent completes its analysis but requires a human to manually transfer its output into another system, you have failed to eliminate the bottleneck that prompted the agent's development in the first place. Conversely, when the agent can directly query information from, or write results to, systems where decisions take effect, the entire workflow collapses into a unified operational flow that removes intermediate steps.

Your implementation approach should prioritize identifying which single system integration would eliminate the greatest volume of repetitive manual work, and then building only the minimal version of that integration during the initial development phase. This targeted approach might manifest as the agent querying a system to retrieve structured reference data that previously required manual lookup, writing a record to a system to capture the decision the agent has reached, extracting and processing a document that originated from a system's document repository, or triggering an automated workflow in a system that would otherwise require manual initiation. Even a single integration point of this magnitude, when implemented at the outset rather than deferred until later phases, serves as a forcing function that exposes the real constraints your agent must navigate within your organization's operational environment.

6. Deliver in Short Iteration Cycles

Extended design phases pose a structural impediment to effective agent development by deferring real-world validation and prolonging the time horizon before measurable feedback becomes available. Organizations attempting comprehensive upfront design face two competing failures: either they design systems that do not align with operational reality once implemented, or they extend the pre-implementation phase so long that organizational priorities shift before deployment occurs. Agents improve most rapidly through short cycles of genuine operational usage because each cycle generates concrete evidence of performance gaps and behavioral mismatches that cannot be anticipated through laboratory analysis alone.

The organizational practice that supports this principle involves establishing a defined release rhythm of 7 to 10 days as the baseline cadence for detecting problems, gathering behavioral feedback, and incorporating refinements. This rhythm creates a predictable organizational rhythm while ensuring sufficient time for both development work and operational assessment. Within this structured cycle, the work proceeds through sequential phases: during the initial week, the focus concentrates on delivering a working version of the agent that operates within the narrowly defined scope established by principle two. During the second week, attention shifts toward integrating the agent with the single mission-critical system identified during principle five, which forces the agent to operate under genuine operational constraints. During the third week, the team prioritizes incorporating feedback-driven refinements based on direct observations of how the agent performs under real operational patterns and edge cases. By the fourth week, the agent transitions into daily-use rotation, becoming a standard component of the operational workflow rather than an experimental capability.

This structured iteration discipline transforms theoretical value into practical, measurable improvements by compressing the feedback loop between hypothesis and evidence to a manageable timeframe. Organizations that maintain shorter iteration cycles identify defects and misalignments exponentially faster than those that attempt extended design phases, resulting in a substantially faster path to production-grade performance.

7. Create a Lightweight Review and Quality Model

Heavyweight governance structures—comprehensive architecture review boards, multi-stage approval processes, and extensive documentation requirements—impose transaction costs that delay feedback cycles and create organizational friction. These formal processes were designed for environments where deployment cycles measured months and the cost of errors remained relatively stable. Agent development operates under fundamentally different constraints: deployment cycles measure days, and the cost of a minor behavioral inconsistency in an agent can compound over hundreds of interactions before detection.

The economic principle underlying lightweight review processes is that not all decisions require the same deliberative overhead. Operational decisions about individual agent behaviors—how to handle edge cases, whether a response meets quality standards, or how the agent should escalate undefined requests—benefit from frequent, lightweight validation rather than formal approval hierarchies. Conversely, decisions about expanding an agent's scope or integrating new system connectors require structured deliberation, but this deliberation should remain episodic rather than continuous.

The practical implementation of this principle involves establishing separate review cadences calibrated to the urgency of decisions. Weekly operational reviews should examine direct evidence of agent performance, specifically documented failures, observed edge cases that the agent failed to handle correctly, and user experience friction points that emerged during actual operational usage. These reviews operate without approval authority; they serve as diagnostic sessions that generate recommendations for refinement. Monthly functional expansion decisions should convene stakeholder representatives to evaluate whether the agent's scope should be widened, which integration points to add next, or whether the agent should be split into multiple specialized agents. These decisions operate with explicit approval authority because scope decisions determine resource allocation for the subsequent month's development work.

Standard templates for agent instructions, response formats, and escalation procedures ensure consistency across agents without requiring case-by-case review. A template encodes learned patterns from prior agent implementations into repeatable structures that new agents can adopt immediately, reducing both development time and the likelihood of behavioral inconsistencies between agents.

This calibrated review model reduces unnecessary transaction costs while maintaining deliberative oversight for decisions that require it, so alignment occurs without creating a delivery bottleneck.

8. Commit to Real Usage Within 30 Days

The principle of time-bound value realization addresses a fundamental problem in agent adoption: organizations frequently defer the transition from development to operational deployment indefinitely, justifying continued laboratory work with incremental improvements that never add up to genuine business impact. The economic cost of this deferral compounds over time because development resources consumed during extended proof-of-concept phases represent opportunity costs that could have been deployed toward other organizational priorities.

A disciplined commitment to a fixed time horizon solves this problem by establishing an explicit deadline for demonstrating measurable operational value. The specific timeframe of thirty days aligns with the typical organizational planning cycle, allowing early agent deployment results to inform resource allocation decisions for the subsequent planning period. This timeframe is sufficiently compressed to prevent indefinite deferral while remaining realistic for narrowly scoped agents integrated with single system connectors.

The operational rule is straightforward: if the agent is not delivering quantifiable value within 30 days of initial deployment to a real operational channel, the scope must be simplified rather than expanded. ** This is not a judgment of development competence but rather a signal that the current scope-to-resource ratio has become misaligned. Value delivery failure indicates that either the scope remains too broad to achieve stability within the available development effort, or the integration points do not connect to work patterns that generate sufficient transaction volume to demonstrate impact. In either case, the remedy is to reduce scope further rather than to invest additional effort in the current design.

This discipline creates accountability structures that prevent laboratory research from consuming indefinite organizational resources while also forcing difficult conversations about scope alignment early in the adoption cycle, before significant resource commitments have been made.

9. Use Multiple Small Agents Instead of One Overloaded One

As operational processes expand and additional requirements pile up, assigning more responsibilities to a single agent compounds performance issues and makes the governance framework needed to maintain operational consistency much harder to manage. Each additional responsibility you layer onto an existing agent increases the dimensionality of the state space the agent must navigate, exponentially expanding the set of edge cases and behavioral scenarios that must be designed for, tested against, and monitored in production.

From an economic perspective, this multifaceted complexity imposes two distinct costs. First, development velocity decreases substantially as the cognitive burden of managing interdependencies between distinct responsibilities grows. When an agent handles both classification and task execution, modifications to classification logic require careful analysis of how those changes cascade through task execution behavior. Second, operational failure modes become increasingly difficult to isolate and remediate because a performance problem observed at the system boundary may originate from any of several distinct layers of responsibility.

The principle of agent specialization addresses this problem by establishing the discipline of splitting responsibilities across multiple focused agents as operational scope expands. Rather than expanding a single agent to handle routing decisions, classification decisions, domain-specific task execution, and document processing in sequence, you would instead deploy four distinct agents, each responsible for a single function. The routing agent receives incoming work and determines which specialized agent should handle the request. The classification agent processes the routed work and assigns it to the appropriate category within a predefined taxonomy. The domain-specific task agent performs the operational work within that category, calling back-end systems and generating results. The document processing agent extracts structured information from unstructured documents and prepares it for downstream task agents.

This decomposition yields multiple benefits that justify the additional engineering required to orchestrate multiple agents. Small, specialized agents reach production stability faster because each agent operates within a constrained state space with fewer edge-case combinations. Governance remains explicit and traceable because each agent has a single defined responsibility, making it straightforward to document expected behavior and audit actual behavior against that standard. Failure isolation becomes tractable because a performance degradation can be attributed to a specific agent component rather than requiring analysis across all bundled responsibilities. When a specific agent begins exhibiting unexpected behavior, the blast radius of potential impact remains constrained to the specific function that agent performs, rather than cascading through multiple dependent responsibilities.

Over extended operational timelines, this modular architecture provides additional economic value through reduced cost of capability evolution. When organizational requirements change, you can modify or replace a single specialized agent without requiring a redesign of the entire set of responsibilities. This flexibility allows organizations to adapt their agent ecosystem as operational priorities change.

10. Plan for Long‑Term Flexibility

Long-term organizational success with agent systems depends on architectural decisions that preserve future optionality without imposing excessive upfront complexity. Adoption frameworks and industry analysis show that organizations with modular architectures, rather than monolithic designs, have significantly lower total cost of ownership over multi-year operational timelines. The economic principle underlying this requirement is that modular systems distribute change costs across smaller component boundaries, whereas monolithic systems concentrate change costs across tightly coupled dependencies.

Your agent architecture should prioritize flexibility in integrating capabilities by establishing well-defined interfaces between agents and external systems, rather than embedding system-specific logic directly into agent instructions or prompts. This approach means that when your organization adopts a new CRM platform or replaces a document management system, you can update the system integration layer without requiring redesign of agent behavior specifications. Additionally, the architecture should remain protocol-driven, meaning that agents communicate with each other and with external systems through standardized APIs and message formats rather than through proprietary connectors. This discipline ensures that as your organization's technology infrastructure evolves, your agent ecosystem can adapt without requiring wholesale redevelopment.

The practical implication of this principle is that your initial agent deployment should incorporate extensibility patterns from the outset rather than deferring architectural considerations until later phases. When you define how an agent accesses your customer database, design that access pattern to accommodate a future change in the database platform without requiring modifications to the agent's core logic. When you establish how agents communicate with business systems, use standardized protocols and well-documented interfaces that would allow additional agents to access those same systems without requiring new connector development. This forward-looking engineering discipline imposes modest additional design effort during initial implementation but eliminates expensive rearchitecting work later as organizational requirements evolve and technology infrastructure changes.

Conclusion: The Fast‑Path to Production

Organizations that successfully transition agents from proof-of-concept phases into sustained operational deployment share a consistent pattern of implementation discipline. These ten principles represent a synthesis of organizational practices that have demonstrated measurable results across diverse operational contexts.

The foundational requirement is to anchor agent development in a specific, real operational process rather than pursue abstract experimentation. This grounding in actual business workflows ensures that agent capabilities connect directly to measurable organizational problems. Building on this foundation, maintaining scope discipline through narrowly defined initial agent responsibilities creates the conditions for rapid stabilization and early demonstration of operational value. The agent should then receive genuine operational input through the channels where work currently flows into the organization, exposing the agent to real data distributions and authentic user behaviors from the initial development phase.

Throughout the development cycle, applying production-grade governance controls, data access policies, and behavioral standards from day one eliminates the artificial gap between development and production environments. Simultaneously, integrating with at least one mission-critical system early in the development process forces the agent to operate under genuine operational constraints rather than remaining isolated in a laboratory environment. The development methodology should employ short iteration cycles measured in weeks rather than months, which compresses the feedback loop between hypothesis and evidence, enabling rapid identification of misalignments between designed behavior and operational reality.

Supporting this development rhythm requires establishing lightweight review processes calibrated to the urgency of decisions, and separating continuous operational assessments from episodic capability expansion decisions. Organizations must enforce time-bound value realization through a commitment to deliver measurable operational results within thirty days, which prevents indefinite deferral of production deployment and forces disciplined conversations about scope alignment. As operational requirements expand, maintaining modular architectures that distribute capabilities across multiple specialized agents rather than accumulating responsibilities within single agents preserves development velocity and simplifies operational governance. Finally, planning for long-term flexibility through well-defined interfaces and standardized protocols enables the agent ecosystem to adapt as organizational technology infrastructure and business requirements evolve.

These principles work together to create implementation patterns that compress the transition from conception to the delivery of operational value.

Microsoft 365 E7: Why Microsoft's New License Is a Logical Step for Agent‑Driven Enterprises

Holger Imbery — Sat, 14 Mar 2026 08:13:21 +0000

Summary Lede

Microsoft's announcement of Microsoft 365 E7 in March 2026 marks a watershed moment in enterprise technology strategy. For the first time in over a decade, Microsoft introduced a new top-tier enterprise license—not to add incremental features, but to fundamentally reconceptualize how organizations govern both human workers and autonomous AI agents as integrated components of the workforce. At $99 per user per month, E7 bundles Microsoft 365 E5, Microsoft 365 Copilot Wave 3 with agentic capabilities, the Microsoft Entra Suite, and the newly introduced Agent 365 control plane. This consolidation signals that AI agents have transitioned from experimental pilots to production-grade organizational resources requiring enterprise-grade identity, access, compliance, and auditability frameworks.

Why You Should Read This

If you lead enterprise technology strategy, manage cloud infrastructure, evaluate AI adoption roadmaps, or determine software licensing budgets, E7 represents a critical inflection point in how enterprises will architect their IT operating models over the next five years. This article explains not just what E7 includes, but why Microsoft built it—addressing the architectural gaps E5 left exposed as organizations scale agent deployment from hundreds of thousands to tens of millions of instances. You'll understand the economic logic, the governance infrastructure, and the strategic positioning underlying this licensing evolution, enabling you to make informed decisions about whether E7 aligns with your organization's agent deployment trajectory and control requirements.

Introduction: From Productivity Suites to Agent Platforms

Historical Context and Evolution

Microsoft's enterprise licensing strategy has traditionally centered on supporting productivity and organizational efficiency through cloud services and security infrastructure. The Microsoft 365 E5 tier, introduced in 2015, represented the established enterprise standard, designed to address the comprehensive security, compliance, productivity, and governance requirements of large organizations during the cloud adoption phase. For the next 11 years, E5 served as the highest-tier enterprise licensing option within the Microsoft 365 portfolio.

The March 2026 Announcement

On 9 March 2026, Microsoft announced the availability of Microsoft 365 E7, designated as the Frontier Suite, representing the first introduction of a new top‑tier enterprise license since the E5 tier was originally established in 2015. This announcement signals a deliberate architectural evolution in how Microsoft structures enterprise licensing and organizational governance at scale.

Composition and Technical Structure

The Microsoft 365 E7 offering, priced at $99 per user per month, consolidates multiple previously distinct components into a unified licensing structure. This bundled approach encompasses Microsoft 365 E5, Microsoft 365 Copilot, the complete Microsoft Entra Suite, and the newly introduced Agent 365 control plane. Each component addresses specific operational and governance requirements within the modern enterprise technology infrastructure.

Fundamental Architectural Shift

The introduction of E7 should not be interpreted as a simple price adjustment or repackaging of existing capabilities. Rather, E7 represents a substantive architectural shift in Microsoft's strategic positioning and technical philosophy. Microsoft is fundamentally repositioning Microsoft 365 from a platform optimized for human-centric productivity and security to a comprehensive control plane that manages and governs an integrated, mixed workforce comprising both human workers and autonomous artificial intelligence agents. This transition reflects evolving organizational requirements as enterprises move beyond pilot implementations of AI technologies toward systematic, organization-wide agent deployment at scale.

What Microsoft 365 E7 Actually Includes

Microsoft 365 E7 consolidates four distinct product components, previously available as separate subscription offerings, into a unified licensing structure. This architectural consolidation reflects Microsoft's strategic decision to bundle interdependent capabilities that are increasingly required for enterprise-scale deployment of autonomous AI agents. The following sections provide detailed technical specifications for each included component.

Microsoft 365 E5: Foundation Layer

Microsoft 365 E5 is the foundational component of the E7 licensing tier, providing core productivity, compliance, security, and identity management capabilities. These capabilities encompass the complete Microsoft Office productivity suite, including Exchange Online for messaging infrastructure, SharePoint Online for content management and collaboration, Teams for unified communications, OneDrive for business cloud storage, Microsoft Defender for comprehensive threat protection, Microsoft Intune for mobile and device management, Microsoft Purview for data governance and compliance, and Power BI Pro for business analytics and visualization. These capabilities provide the fundamental infrastructure required for enterprise productivity, data protection, and organizational governance.

Microsoft 365 Copilot (Wave 3): Advanced AI Integration

The E7 tier includes Microsoft 365 Copilot at the Wave 3 release level, which represents a significant evolution in AI integration across the Microsoft 365 application portfolio. Copilot is embedded across Microsoft Word, Excel, PowerPoint, Outlook, Teams, and the Loop workspace collaboration platform. Beyond traditional copilot assistance functions, Wave 3 introduces expanded agentic capabilities that enable autonomous planning, decision-making, and action execution. Additionally, Wave 3 extends multi-model support to integrate with multiple language model providers, specifically OpenAI and Anthropic Claude, providing organizations with flexibility in selecting the underlying AI model infrastructure based on specific organizational requirements, performance characteristics, or policy constraints.

Microsoft Entra Suite: Identity and Access Management

The E7 offering includes the complete Microsoft Entra Suite, an expanded product tier that goes beyond the standard Entra ID P2 offering. The Entra Suite encompasses advanced identity verification, comprehensive access governance frameworks, Zero Trust network access architecture for conditional connectivity, and sophisticated conditional access policy enforcement mechanisms. These capabilities provide an enterprise-grade identity management and access control infrastructure necessary to manage both human and non-human (agent-based) organizational identities within a unified framework.

Agent 365: Governance and Control Infrastructure

Agent 365 represents a newly introduced governance and security layer specifically designed to manage autonomous AI agents at an organizational scale. Agent 365 provides centralized inventory tracking across both Microsoft-native and third-party AI agent frameworks; comprehensive observability and monitoring capabilities; policy enforcement mechanisms specific to agent behavior and resource utilization; and lifecycle management functionality, including agent provisioning, update orchestration, and controlled retirement procedures. This component addresses the operational requirement for centralized governance of non-human autonomous entities executing within enterprise systems.

Economic Analysis and Bundle Composition

When these four components are purchased individually through separate licensing arrangements after July 2026, the aggregate monthly cost per user is projected to be approximately $117 USD. The E7 consolidation bundle is offered at $99 USD per user per month, representing an aggregate cost reduction of approximately 15–17% when compared to the sum of individually purchased components. This pricing structure reflects both the operational efficiency gains from unified licensing administration and Microsoft's strategic intent to incentivize adoption of the consolidated governance framework for agent deployment.

Why E7 Exists: E5 Was Built for the Cloud Era, Not the Agentic Era

Microsoft executives have been explicit that E5 was designed "pre‑agentic".

E5 assumes:

humans are the primary actors,
automation is largely scripted,
identities map cleanly to employees.

Modern enterprises increasingly violate all three assumptions.

AI agents today:

act autonomously,
access mailboxes, calendars, files, and APIs,
execute multi‑step workflows over time,
are often created outside central IT using low‑code or no‑code tools.

Agent 365: The Missing Control Plane Enterprises Have Been Lacking

Agent 365 is the genuinely new element in E7, and the main reason E7 is more than a repackaged bundle.
Agent 365 provides:

Centralized agent inventory across Microsoft and third‑party frameworks
Identity and access controls via Entra
Security monitoring via Defender XDR
Compliance and auditability via Purview
Lifecycle management (provisioning, update, retirement)

Crucially, Agent 365 does not build or host agents. It governs them.
Compute and execution remain consumption‑based via Copilot Studio, Azure AI Foundry, or partner platforms.
This mirrors how enterprises already separate:

application development,
runtime infrastructure,
identity and governance.

Why E7 Is a Good Move for Enterprises Leveraging Agents

It Normalizes Agents as Enterprise Identities

Microsoft is treating agents as digital workers, subject to the same identity, access, and policy frameworks as humans.
This is a necessary prerequisite for scaling agents beyond experimentation.

It Reduces Architectural Fragmentation

Prior to E7, organizations had to stitch together:

E5,
Copilot add‑ons,
Entra extensions,
emerging agent governance tools.

E7 consolidates these into a single, coherent enterprise architecture aligned with Zero Trust principles.

It Shifts AI from "Assistance" to "Execution".

Wave 3 of Copilot introduces agentic capabilities that plan, act, and execute, not just summarize or draft.
E7 provides the governance layer required to allow that execution safely.
Without E7‑level controls, many organizations would be forced to block these capabilities entirely.

It Aligns Cost Models with Reality

While $99 per user appears high, E7 reflects:

the true cost of enterprise security,
identity governance for both humans and agents,
reduced overhead compared to managing multiple SKUs.

Importantly, Microsoft is signaling that agents will be licensed like users, potentially with hybrid subscription and consumption models over time.

Microsoft 365 Enterprise Comparison

E5 vs E5 + Copilot vs E7 (Frontier Suite)

Status (March 2026)

Microsoft 365 E7 GA: May 1, 2026

Pricing shown is list price (USD, per user/month)

Consumption costs for building/running agents are not included in any plan

Dimension	E5	E5 + Copilot	E7 (Frontier Suite)
What’s included	Full Microsoft 365 E5 (Office apps, Exchange/SharePoint/OneDrive, Teams*, Defender, Intune, Purview, Power BI Pro). No Copilot, no Agent 365, and no full Entra Suite.	E5 (as left) plus Microsoft 365 Copilot add‑on. No Agent 365, no full Entra Suite.	E5 + Copilot + Entra Suite + Agent 365 in one SKU; positioned as the “Frontier Suite” for agent‑at‑scale scenarios.
List price (USD/user/month)	$60 (from July 1, 2026).	$90 (E5 $60 + Copilot $30).	$99 (bundle). With Teams‑excluded option reported at $90.45.
Copilot (Wave 3) agentic capabilities	Not included.	Included (via add‑on). Multi‑model (OpenAI + Anthropic) support arrives with Wave 3.	Included by default with Wave 3 agentic features (planning, acting across Microsoft 365).
Agent 365 (agent governance & control plane)	Not included.	Not included by default (can be added at $15/user/month).	Included; GA on May 1, 2026.
Entra Suite (beyond Entra ID P2)	Not included (E5 includes Entra ID P2 but not the broader Entra Suite).	Not included.	Included (e.g., Private Access, Internet Access, ID Governance/Protection, Verified ID).
Security & compliance posture for agents	Human‑centric controls only; no unified agent inventory/observability.	Adds creation/use of Copilot agents but without centralized agent governance unless Agent 365 is added.	Unified agent inventory, policy enforcement, auditability across Defender/Entra/Purview.
Bundle economics	Baseline plan.	A la carte add‑on model. E5 + Copilot = ~$90. Adding Entra Suite (+$12) and Agent 365 (+$15) pushes to ~$117.	~$99 vs ~$117 à la carte → ~15–17% discount; simpler procurement/governance.
Availability date	Available now.	Available now (Copilot GA prior to E7; Wave 3 rolling out).	GA May 1, 2026.
Who it fits	Organizations prioritizing core productivity/security without near‑term agent scale‑out.	Teams piloting Copilot or limited agentic use cases, willing to bolt on governance later.	Enterprises standardizing on agents (cross‑department), requiring identity‑first governance, zero‑trust access, and consolidated risk controls.

Notes

Teams availability depends on regional licensing rules; E7 is also offered as a *“without Teams”** SKU in some regions.*
Agent execution costs (LLM tokens, orchestration runtime, long‑running workflows) are not included in any license and must be budgeted separately.
E7 is the first Microsoft 365 SKU designed explicitly for the agentic AI era, not just productivity.

Important Caveat: E7 Is Not the Full Cost of an Agentic Enterprise

E7 does not include:

agent execution compute,
LLM consumption,
orchestration runtime costs.

These remain variable and are billed separately via Copilot Studio, Azure AI Foundry, or partner services.

Enterprises should therefore view E7 as:

the governance and control foundation, not the entire AI budget.

Conclusion: E7 as an Architectural Statement

Microsoft 365 E7 represents a significant shift in how enterprises conceptualize their licensing strategy and operational architecture. Rather than functioning primarily as a licensing vehicle, E7 serves as a declaration that Microsoft 365 is positioned as the foundational operating system for enterprises that intend to operate within an agentic computing paradigm.

For organizations that anticipate the need to execute a comprehensive deployment strategy involving autonomous AI agents across multiple business functions, integrate these agents into mission-critical business processes, and maintain the requisite levels of security governance, compliance attestation, and auditability, the scope and depth of capabilities provided by E7 are not superfluous. Instead, these capabilities represent structural and architectural necessities that must be addressed to enable safe and controlled agent deployment at an organizational scale.

The evolution from E5 to E7 reflects a fundamental recalibration of enterprise platform design philosophy. The E5 licensing tier was optimized and engineered for the cloud-centric era, where enterprises sought to modernize their infrastructure, data management, and collaboration mechanisms through cloud-native services. E7, by contrast, is optimized for an organizational context in which the workforce composition includes both human workers and autonomous AI agents, each requiring appropriate identity governance, access controls, security monitoring, and compliance instrumentation within an integrated control plane.

This architectural shift acknowledges that managing agents as first-class organizational entities—rather than as peripheral or experimental capabilities—requires the same level of systematic governance, policy enforcement, and observability that enterprises have come to expect from their core identity and security infrastructure.

Introducing MATE: A Modular Testing Environment for AI Agents

Holger Imbery — Sat, 07 Mar 2026 08:07:02 +0000

Summary Lede

As AI agents become integral to business processes, reliable and repeatable testing is essential for confidence in deployment. This article introduces the Multi-Agent Test Environment (MATE) – an enterprise-grade framework for automated testing of AI agents across platforms and frameworks – and explains how its modular design addresses key challenges in agent testing. We explore why testing AI agents is critical, delve into MATE's architecture and features, compare MATE with alternative testing approaches, and outline MATE's roadmap including red-team testing, enhanced cloud deployment, and support for emerging agent frameworks.

Importance of Testing AI Agents

AI agents built with Microsoft Copilot Studio are powerful but complex systems. They combine natural language understanding, generative AI, and business logic, often operating in critical scenarios (customer support, data retrieval, workflow automation, etc.). Ensuring these agents work correctly and safely under diverse conditions is as important as testing traditional software – if not more so. Key reasons why rigorous agent testing is essential include:

Reliability and Consistency: Unlike deterministic software, AI agents can produce different answers to the same question due to their probabilistic nature. Without structured tests, one might only catch issues by manually typing questions and hoping for the right answer, a fragile approach. Automated testing provides consistency – the same test can be run repeatedly to ensure the agent’s behavior remains reliable after updates.
Enterprise-Grade Quality: In enterprise deployments, an untested agent can lead to incorrect or even unsafe outputs, damaging user trust or violating compliance. Ad-hoc testing that “relies on intuition instead of structured testing” doesn’t scale for enterprise needs. Organizations require repeatable, at-scale test processes to validate that agents meet quality standards (accuracy, relevance, safety) consistently before and after release.
Complex Multi-turn Interactions: Copilot Studio agents often handle multi-turn conversations, maintaining context across multiple user and agent turns. Testing these multi-step dialogues manually is time-consuming and error-prone. Automated test suites allow developers to simulate complex conversation flows (with varying user inputs, branching dialogs, tool invocations, etc.) and verify the end-to-end behavior in one run. This ensures that the agent can handle scenario-based conversations robustly, from greeting to task completion.
Nondeterministic and Generative Responses: When agents use generative AI capabilities, they might produce creative or unexpected phrasing. Verifying such responses is not as simple as exact string matching. Effective testing must evaluate responses on semantic correctness, completeness, and compliance, even if wording varies. This introduces a challenge: how do you automatically judge an AI-generated answer’s quality? We’ll see how MATE tackles this with an AI-based “judge” component.
Frequent Updates and Continuous Integration: Agents are rarely static – their underlying prompts, skills, and knowledge sources evolve. Without automation, re-testing the agent after each change or on a schedule (for example, to catch drift or regressions) would be prohibitively labor-intensive. A good agent testing framework enables continuous integration (CI) pipelines and nightly runs, so that any breaking change or quality degradation is caught early. This is crucial for scaling up the number of agents in production while keeping maintenance overhead low.
Transparency and Debugging: When a test fails, developers need insights into why. For example, did the agent retrieve the wrong data because of an intent misclassification? Or did it produce a partially correct answer that was marked as a failure due to a strict check? Good testing tools provide detailed reporting – conversation transcripts, logs, and metrics – to help pinpoint the root cause of failures. This accelerates debugging and continuous improvement of the agent.

In summary, robust testing of agents is the linchpin for trustworthy AI deployments. It allows teams to validate functionality, accuracy, robustness, and safety in a systematic way. This need has driven Microsoft to introduce solutions like the Power CAT Copilot Studio Kit (a Power Platform solution for agent testing) and, more recently, the built-in Agent Evaluation feature in Copilot Studio (now in preview). However, each solution comes with certain limitations or prerequisites, which the new MATE aims to overcome. Before comparing approaches, let’s first introduce MATE and how it works.

Introducing MATE: A Modular Testing Framework for AI Agents

Link to MATE GitHub Repository

Link to MATE Wiki

Multi-Agent Test Environment (MATE) is an internal project and framework designed to provide automated, comprehensive testing for AI agents, initially focusing on Microsoft Copilot Studio agents. MATE was created to address the challenges above by combining enterprise-grade tooling with a modular, extensible architecture. In essence, MATE allows developers and testers to connect to a running Copilot Studio agent, simulate conversations, evaluate the agent’s responses against expected outcomes using AI, and produce detailed metrics and reports – all in an automated fashion.

MATE’s approach can be seen as bringing many of the benefits of the Copilot Studio Kit into a single, code-first testing environment. Rather than a Power App solution, MATE is a pure .NET 9 application that you can run in a container stack. This design choice means MATE operates outside the constraints of the Power Platform, giving developers more flexibility in how and where they run their tests.

Let’s break down how MATE works and how it addresses key testing challenges:

Direct Line Integration (Live Agent Testing): MATE connects to the agent through its Direct Line API endpoint. This is the same interface used by real chat channels (like Teams or a custom web chat). By using Direct Line, MATE ensures it's testing the deployed agent exactly as end-users experience it. The tool can send a sequence of user messages and receive the agent’s replies in turn, thereby automating full multi-turn conversations. This addresses the challenge of multi-turn flows by allowing complex scenario scripts to be executed automatically. It’s effectively like an automated “test chat” but running dozens or hundreds of predefined conversations unattended.
Test Case Definition and Multi-turn Flows: In MATE, you can define a test case with multiple steps of user input (representing a conversation) and the expected outcomes. Expected outcomes can include:
- Expected Intent and Entities – i.e., which topic or action the agent should trigger and which key data (entities) it should extract.
- Acceptance Criteria – specific conditions that constitute a pass/fail for the test (for example, certain keywords must appear in the answer, or a certain API call must be made).
- Reference Answer – an ideal answer text or outline for comparison. Each test case can be labeled with a priority or category, useful for organizing large test suites (e.g., “P1 critical flows”, “Edge cases”, etc.). By supporting multi-step conversations in test cases, MATE ensures you can test end-to-end agent behavior, not just isolated single-turn Q&A.
“Model-as-a-Judge” Evaluations: One of MATE’s most powerful features is using an AI model to evaluate the quality of the agent’s response. Rather than relying only on hard-coded checks (exact matches or simple contained keywords), MATE sends the agent’s answer along with the reference answer and validation criteria to a Large Language Model (LLM) – for instance, an Azure OpenAI GPT-4 model – which acts as an impartial judge. This AI Judge scores the response across multiple evaluation dimensions:
- Task Success: Did the agent fulfill the user’s request or solve the user’s problem?
- Intent Match: Did the agent correctly understand what the user was asking for?
- Factuality: Is the information provided true and accurate (no hallucinations or incorrect data)?
- Helpfulness/Completeness: Is the answer complete, well-structured, and does it address the user’s need effectively?
- Safety/Compliance: Does the response avoid policy violations (no sensitive data exposure, no disallowed content)? Each of these dimensions is scored (e.g. 0.0 to 1.0), and MATE can apply configurable weightings to decide if a test passes or fails overall. For example, you may require 0.9+ on Task Success and Intent Match, tolerate a lower score on style metrics like Helpfulness, and demand a perfect score on Safety. This approach directly tackles the challenge of evaluating nondeterministic generative answers: even if the agent’s wording differs from the expected answer, the AI Judge can still determine that the answer is essentially correct and useful. Conversely, if the agent’s response is irrelevant or contains errors, the AI Judge will assign low scores, causing the test to fail. This method provides a nuanced, context-aware evaluation that traditional automated tests struggle to achieve. (Internally, the AI Judge uses prompt-based prompting of an LLM with the expected answer or criteria to get these scores.)
In Addition, MATE also supports other judge types, such as:
- RubricsJudge – A fully deterministic judge that evaluates responses using explicit rules such as Contains, NotContains, and Regex, making it ideal for compliance, safety, and reproducible pass/fail checks.
- HybridJudge – A cost‑efficient combination judge that first gates responses with deterministic rubrics and then applies an LLM for deeper qualitative scoring only where needed.
- CopilotStudioJudge – A Copilot‑Studio‑specific LLM judge that is citation‑ and grounding‑aware, aligning evaluations with Copilot Studio’s default reasoning and response patterns:
- GenericJudge – A lightweight, zero‑cost judge based on simple keyword and regex matching, intended for fast smoke tests and offline or CI scenarios
Automated Test Generation from Documentation: Authoring a comprehensive set of test cases can be labor-intensive. MATE addresses this by allowing you to upload documents (PDFs or text files) that are relevant to your agent’s domain or knowledge base. It then automatically:
- Extracts text content from the documents (using a PDF parser).
- Indexes and chunks the content for semantic analysis (using a Lucene-based index).
- Generates potential questions and answers from the content using an LLM. The outcome is a set of suggested Q&A pairs or even multi-turn conversation scenarios derived from the documentation. For example, if you upload a product FAQ PDF, MATE can generate likely customer questions and the correct answers from that PDF. These can be reviewed and added to your test suites. This feature helps broaden test coverage automatically, ensuring the agent is tested on real knowledge it’s supposed to have, and catching gaps where it might not respond correctly. It’s an intelligent way to keep tests in sync with content. (Notably, Copilot Studio Kit in the Power Platform also introduced an AI-based test generation in preview, which uses the agent’s topics and knowledge to generate example questions. MATE provides a similar capability but on external docs and with full control of the generated cases.)
Detailed Reporting and Analysis: After executing tests, MATE provides rich metrics and logs. In the Web Dashboard, you can see overall pass rates, success trends over time, and drill down into individual test runs. Each test run retains the transcript of the conversation and the scores for each evaluation dimension, so you can inspect exactly where a particular test failed. This addresses transparency: instead of just “Test 5 failed”, you can see that it failed because, say, Factuality scored low (perhaps the agent gave a wrong detail), and even read the conversation to diagnose the issue. MATE’s Runs view lets you compare results between runs – useful for spotting regressions after an update. All test data (test cases, results, transcripts, etc.) are stored in a local PostgreSQL database for quick retrieval and can be queried or exported for additional analysis.
Web UI and CLI for Different Use Cases: MATE offers two interfaces:
- A Web Application (built with ASP.NET Blazor Server) for an interactive experience. This is ideal for exploratory testing, configuring your test suites, and reviewing results. The UI includes a setup wizard for initial configuration (entering your agent’s Direct Line credentials and your AI model info) to generate the necessary settings file. Testers can use the web UI to kick off test runs on-demand, monitor progress, and view results in real time.
- A Command-Line Interface (CLI) tool for automation. The CLI allows you to run tests as part of scripts or pipelines. For example, you can incorporate dotnet run --suite "Regression Suite" into a DevOps or GitHub Actions pipeline, so that whenever the agent’s bot is updated or its content changes, the test suite runs and verifies everything still works. The CLI returns an exit code indicating success or failure (0 if all tests passed, non-zero if any test failed), which CI systems can use to pass/fail a build. This enables true CI/CD for AI agents – a failed test can halt a deployment, preventing flawed agent versions from going live.
Containerized and Extensible Architecture: MATE is designed to be run in a self-hosted manner, giving teams full control. It doesn’t require a SaaS backend or a Dataverse environment – you just need a machine that can reach the internet for calling the agent service and the AI model endpoint. This avoids many of the Power Platform licensing constraints associated with the Copilot Studio Kit (discussed later). The architecture is modular by design, with separate components (projects) for domain logic, data storage, core services, web UI, and CLI. This modularity not only enforces clean separation of concerns, but also sets the stage for supporting multiple types of agents in the future. In fact, MATE’s roadmap includes extending support to other agent frameworks beyond Copilot Studio – the core logic (test execution, AI judging, etc.) can be adapted to different agent APIs by swapping out the integration layer. Early code commits already hint at multi-agent support being developed and even an upcoming “red teaming” module for adversarial testing (there are structural hooks in the codebase for this, though the feature is not yet implemented). This means MATE is not a one-off tool but a growing platform for comprehensive AI agent testing across the board.

MATE Architecture at a Glance

Internally, MATE is built with a modern software architecture using the latest Microsoft stack:

.NET 9 with C# – providing performance and cross-platform support.
ASP.NET Core Blazor Server for the web front-end – delivering a rich interactive UI for managing tests and viewing results.
Entity Framework Core (with PostgreSQL) – for the local database that stores test cases, results, transcripts, etc., ensuring persistence without requiring an external DB server.
Azure AI OpenAI SDK – to connect to the AI Judge model hosted on Azure’s AI services (Azure OpenAI “Foundry”). This is how MATE queries an LLM for evaluation of answers.
Lucene.NET – used for full-text indexing in the document-driven test generation feature, to find relevant content in uploaded docs for question generation.
PDF processing libraries (e.g., UglyToad.PdfPig) – to extract text from PDF documents for test generation.
Serilog – for structured logging of events and errors, helping with diagnosing issues in test executions.

The solution is divided into several components (projects) reflecting a modular design: a Domain layer for core models and interfaces, a Data layer for database access, a Core services layer (implementing the judge logic, execution engine, etc.), a WebUI for the front-end, and a CLI project for the command-line interface. This modularity makes it easier to maintain and extend specific parts (for example, adding a new agent connector could be done by introducing a new service in the Core or a new API integration, without touching the UI or data layers).

Overall, MATE is engineered to be a scalable, extensible test harness for AI agents. It’s currently focused on Copilot Studio agents, but its principles apply broadly. Next, we’ll compare MATE to the Copilot Studio Kit – the established testing solution from Microsoft’s Power CAT team – to understand their differences and use cases.

Comparing MATE with Copilot Studio Kit

Microsoft’s Power CAT Copilot Studio Kit is an existing solution aimed at testing and managing Copilot Studio agents. It’s a Power Platform solution (managed package) that provides a canvas app or model-driven app interface, along with Dataverse entities and Power Automate flows, enabling test case creation, automated test runs via the Direct Line API, and analytics (such as conversation transcripts, dashboards, etc.). The Copilot Studio Kit was instrumental in early adoption of agent testing – it allowed makers to do things like bulk import test cases (via Excel), run them from a UI, and even integrate with Azure DevOps pipelines via Power Platform build tools.

However, the Copilot Studio Kit has some inherent characteristics stemming from its Power Platform foundation. Below is a comparison of MATE vs. Copilot Studio Kit across key dimensions:

Aspect	MATE (Multi-Agent Test Environment)	Copilot Studio Kit (Power CAT)
Deployment Model	.NET application (containarized). Runs locally or in cloud; launched via web UI or CLI on demand.	Power Platform managed solution. Deployed to a Dataverse environment; accessed via Power Apps interface.
Licensing & Costs	source-available - CC BY-NC 4.0 . Requires .NET runtime and an Azure OpenAI endpoint (for AI Judge) which may incur usage costs. No special Power Platform licensing needed beyond having a Copilot Studio agent to test.	Provided by Microsoft Power CAT as a sample solution (available on GitHub). However, requires Power Platform premium licenses: a Dataverse environment, and for certain features, AI Builder credits (for generative answer analysis). Usage of Dataverse and Power Automate in the kit might consume capacity or require specific licenses.
Technology Stack	Modern .NET 9 stack; Blazor Web UI, CLI tool, local PostgreSQL DB. Integrates with Azure services (OpenAI) for evaluation. Highly customizable and extendable by developers (source code available).	Low-code Power App + Dataverse. Relies on standard Power Platform tech (model-driven app or canvas app, Dataverse tables, Power Automate flows, AI Builder for some AI tasks). Customization is limited to what Power Platform allows.
Test Creation	Supports manual creation of test cases via UI or by defining JSON/CSV, etc., and auto-generation of test cases from documents using LLMs. Test cases can include multi-turn dialogues in one case. Organized into test suites for batch execution.	Supports manual test case input (through the app or via Excel import/export). Also supports multi-turn test cases and offers some AI-assisted generation of test questions from agent topics/knowledge (in Preview, via the Agent Evaluation integration). Test cases stored in Dataverse; grouping of tests supported (by test set).
Test Execution	Runs tests externally by connecting to the agent’s Direct Line channel (or Web Channel with secret & bot ID). Offers a CLI for headless execution (suitable for CI pipelines) and a web interface for interactive runs. Test results are stored locally and displayed in the web UI with analytics.	Executes tests through Copilot Studio’s Direct Line API as well, orchestrated by Power Automate flows under the hood. Typically run on-demand from the app’s interface. There is integration for pipelines via Power Platform build tools, though this is more complex to set up. Results are stored in Dataverse and can be viewed via in-app dashboards or exported.
Evaluation Methodology	AI-driven semantic evaluation: uses a GPT-based AI Judge to score responses on multiple quality dimensions (task success, intent match, factual correctness, etc.). This allows flexible, semantic comparisons rather than simple exact matches. Configurable pass thresholds provide fine-grained control. Also supports explicit pass/fail rules (acceptance criteria) where needed.	Rule-based and some AI: supports exact or partial response matching, checking for expected keywords or presence of attachments, etc. For generative answers, the kit uses AI Builder to compare the agent’s answer with a reference answer for similarity. It also retrieves telemetry from Application Insights to help explain failures. Plan validation tests examine the agent’s action plan against expected tools (for orchestration scenarios).
Modularity & Extensibility	Designed to be modular and extensible. The core can be extended to new agent types (plans to support other AI agent frameworks are in progress). The evaluation component (AI Judge) can be pointed to different models or adapted with different prompts. Being source-available, organizations can modify or extend MATE (e.g., add custom evaluation metrics, integrate with other data sources) to fit their needs.	Focused scope, limited extensibility. The Copilot Studio Kit is specific to Copilot Studio agents and deeply tied to the Power Platform environment structure. It’s not architected to test arbitrary other agents. Customizing it generally means modifying the Power App or creating new Dataverse fields/flows, which requires Power Platform expertise.
Data & Infrastructure	Stores test artifacts and results in a local database within the application. No cloud infrastructure needed to get started; data stays within the user’s environment. For scaling up, the application could be hosted on a server or in Kubernetes (containerization support is under development). Because it’s self-hosted, data sovereignty and privacy can be managed internally.	Relies on Dataverse for storing tests and results, and optionally uses other services (App Insights, SharePoint, etc.) for logs and knowledge management. This provides seamless integration if you’re already within the Microsoft ecosystem, but it requires that all data be in the Power Platform cloud.

Table: Comparison of MATE vs. Copilot Studio Kit across key aspects of testing functionality and usage.

As seen above, MATE and Copilot Studio Kit share the same goal – improving agent quality through automated testing – but they differ in implementation approach. MATE is more developer-oriented, offering flexibility, openness, and extensibility, whereas the Copilot Studio Kit is maker-friendly, integrated in the Power Platform with a ready-to-use interface but comes with platform constraints.

From a Microsoft perspective, the Copilot Studio Kit was a bridge solution to empower agent creators with testing capabilities before deeper platform features were available. Now, with Agent Evaluation built directly into Copilot Studio (currently in preview), some capabilities of the kit are being absorbed into the product itself – for instance, AI-generated test queries and built-in execution of test sets. Still, the Kit provides additional tooling (like dashboards, inventory, governance features) that are useful in complex environments.

see articles:

Ship Copilot Studio Agents with Confidence: Master Automated Testing with the Copilot Studio Kit

Testing Copilot Studio Agents: Copilot Studio Kit vs. Agent Evaluation (Preview)

MATE, on the other hand, is an independent effort to provide a robust testing harness that can evolve fast and go beyond what the closed-source product features offer. It is not limited by Power Platform’s boundaries (for example, one could imagine integrating MATE with other LLM evaluation criteria, or hooking it up to monitor backend APIs invoked by the agent). Additionally, MATE’s modular nature means it could incorporate other agent types into the same testing dashboard. For example, if you have a fleet of different AI bots – some built in Copilot Studio, some using Azure OpenAI Orchestration, some third-party – MATE could theoretically be extended to test them all in one place, whereas the Copilot Studio Kit is only for Copilot Studio agents.

When to use which? If you are a Power Platform maker or IT admin who wants a straightforward, supported way to test Copilot Studio agents and you’re already comfortable with Power Apps and Dataverse, the Copilot Studio Kit is a solid choice. It integrates nicely with the environment (and your data, logs, etc.) and doesn’t require coding to use. However, you’ll need the necessary licenses and some patience to configure the environment, and you won’t be able to easily customize how tests are evaluated beyond what Microsoft provides.

If you are a developer or dev team looking for a more flexible, code-driven approach – especially if you want to integrate agent testing into a DevOps pipeline or extend testing to specialized scenarios – MATE is very appealing. It does require .NET and some setup, but it gives you full control. You can run it locally for rapid iteration, include it in automated builds, and tweak it to your needs. There’s also no dependency on having a Power Platform environment or any particular license. You do need access to an Azure OpenAI service (or you could swap in another LLM API if desired) to leverage the AI judge, but that is relatively straightforward for most enterprise scenarios.

Roadmap and Future Enhancements in MATE

MATE is an evolving project, and there are several notable enhancements on the roadmap:

Support for Additional Agent Types: As of now, MATE supports testing Microsoft Copilot Studio agents exclusively, because it specifically uses Copilot’s Direct Line API and related assumptions (like the concept of “topics” and Dataverse knowledge base) in its current version. However, the architecture is being extended to accommodate other agent platforms. Future versions are expected to introduce modules for other agent types – for example, the ability to test Microsoft Agent Framework agents, or even agents built with entirely different frameworks. This will broaden MATE’s applicability across various “agentic AI” solutions used within Microsoft and beyond, making it a one-stop testing hub for heterogeneous AI systems.
Integrated Red Teaming: In addition to “blue team” style functional testing (checking that the agent does what it’s supposed to), MATE aims to incorporate “red teaming” capabilities. Red teaming in AI refers to attacking or stress-testing the agent with malicious or unexpected inputs to probe its defenses and safety measures. This can include testing the agent’s response to prompt injections, inappropriate content requests, or attempts to trick the agent into breaking rules. The goal is to ensure the agent is robust against misuse or adversarial users. The MATE codebase already contains the foundation for a Red Teaming module, but this is currently just a skeleton (non-functional in the current release). Once completed, this feature will allow users to run a suite of adversarial tests (perhaps using predefined malicious prompts or common attack patterns) against their agents and get a report on vulnerabilities or policy compliance issues. This is a critical part of AI system testing, especially for enterprise scenarios, and its inclusion will differentiate MATE further by offering a more comprehensive safety evaluation than what is currently possible with the Copilot Studio Kit or built-in Agent Evaluation (which, so far, focus on correctness and performance rather than adversarial robustness).
Cloud and Scalable Deployment: Presently, MATE runs as a local docker stack. Looking forward, the project plans to simplify deployment on Azure, likely via containerization. Kubernetes support is on the roadmap, meaning you might be able to deploy MATE as a set of containers (web app, background worker, etc.) in an AKS (Azure Kubernetes Service) or similar environment. This will enable team-wide usage at scale – multiple testers or developers could share a MATE instance, run tests concurrently, and store results in a central location, much like a web application service. Cloud deployment will also facilitate integration with other services (for example, connecting to Azure DevOps for automatic test triggers, or scaling out the AI Judge component).
UI/UX and Usability Improvements: As an source-available project, MATE will continue to refine its user interface and ease of use. Features on the horizon could include richer test editing experiences (perhaps a visual conversation flow editor), more analytics dashboards (trend of agent performance over time, flakiness of certain tests, etc.), and integration with agent design tools (for example, pulling in agent topics or suggesting tests based on recent real user conversations – aligning with how built-in Agent Evaluation reuses Test Pane interactions).

Conclusion

Testing AI agents is no longer optional – it’s a necessity for any organization that wants to confidently deploy AI solutions. Agent Development Solutions empower the creation of sophisticated AI Agents, but ensuring these agents function correctly, safely, and efficiently requires going beyond manual testing or one-off trials. This is where testing frameworks like Copilot Studio Kit and MATE come into play.

MATE (Multi-Agent Test Environment) represents a next-generation approach to agent testing. It addresses the limitations of earlier tools by adopting a fully modular, code-first architecture that can keep pace with the rapidly changing AI landscape. By using MATE, developers and testers gain the ability to thoroughly automate conversations with their agents, evaluate responses with the help of AI, generate tests from existing knowledge, and integrate all this into continuous delivery pipelines. The outcome is a higher degree of assurance that your Copilot Studio agent will perform as expected when it’s in production – responding correctly to user queries, using the right tools, and staying within the guardrails.

In comparison to the Power Platform-based Copilot Studio Kit, MATE offers more flexibility, extensibility, and independence. You won’t be constrained by specific licensing or environment setups, and you can tailor the tool to your needs. On the other hand, it’s a more technical solution that may require developer effort to set up and maintain, whereas the Copilot Studio Kit is more turn-key if you’re already within Microsoft’s ecosystem. It’s encouraging to see both approaches available, as they cater to different audiences.

Ultimately, MATE’s importance goes beyond just testing Copilot Studio agents. It signifies an evolving philosophy in the AI agent world: that testing and evaluation should be first-class citizens in the development lifecycle of AI systems, just as they are in traditional software development. With AI models and agents becoming increasingly central to applications, tools like MATE help ensure we can trust these systems through systematic validation. MATE’s deep integration of AI for testing (using an AI to test another AI) is an innovative approach that can significantly enhance the rigor of evaluations.

In summary, MATE enables teams to ship Copilot Studio agents (and, in the future, other AI agents) with greater confidence. It provides the means to catch issues early, improve agents iteratively based on test feedback, and guard against regressions as agents evolve. By combining the power of automation with the wisdom of AI judging, MATE exemplifies a “test smarter” strategy for the era of generative AI – ensuring that our intelligent agents are not only smart, but also reliable, safe, and effective when they go to work for us.