Forem: Alex

We Had Secrets in Kubernetes. Then We Got Audited.

Alex — Mon, 20 Apr 2026 08:45:00 +0000

For the first two years of running workloads on AKS we stored secrets the way most teams do when they're moving fast. We created Kubernetes secrets, base64 encoded the values, committed the manifests to a private repo and told ourselves we'd clean it up later. Then a security audit flagged it and we had four weeks to fix it.

This is the story of that migration and what we learned doing it under pressure.

The Problem With Kubernetes Secrets

Kubernetes secrets are not actually secret in any meaningful security sense. They are base64 encoded which is encoding not encryption. Anyone with read access to the namespace can decode them in seconds. If your etcd is not encrypted at rest and someone gets access to a snapshot they have all your secrets in plaintext. And if a developer accidentally commits a secret manifest to a repo before the gitignore catches it the value is now in git history forever.

We knew all of this. We just hadn't prioritised fixing it until the audit made it urgent.

The audit finding was specific. We had database connection strings, third party API keys and internal service tokens stored as Kubernetes secrets across eight namespaces on three clusters. The auditor wanted to see secrets stored in a dedicated secrets manager with audit logging, rotation support and access policies tied to identities rather than cluster-level permissions.

Azure Key Vault was the obvious answer. The question was how to get secrets from Key Vault into our pods without rebuilding how our applications consumed them.

What We Evaluated

Our applications expected secrets either as environment variables or as files mounted into the container. We didn't want to change application code as part of this migration. That constraint ruled out a few approaches.

The first option was to have applications call the Key Vault SDK directly. This would work but it meant changing code in a dozen services and introducing a new dependency and failure mode into each one. Not something we wanted to do under a four week deadline.

The second option was to use the Azure Key Vault Provider for Secrets Store CSI Driver. This runs as a DaemonSet on your nodes and lets you define a SecretProviderClass resource that maps Key Vault secrets to files mounted into your pods. Optionally it can sync those secrets into Kubernetes secrets so applications that read environment variables still work without code changes. This was the right fit for us.

The Architecture

The setup has three components working together.

The Secrets Store CSI Driver handles mounting secrets as volumes into pods. The Azure Key Vault Provider is the plugin that knows how to talk to Key Vault specifically. Workload Identity is how the pod authenticates to Key Vault without any credentials stored anywhere in the cluster.

Workload Identity is worth pausing on. The way it works is that a Kubernetes service account is federated with an Azure Managed Identity. When a pod using that service account makes a request to Key Vault, Azure verifies the federation and grants access based on the Key Vault access policy attached to the Managed Identity. No secrets are involved in the authentication. No tokens to rotate. No credentials to leak.

Setting it up looks like this.

First you enable the CSI driver and Workload Identity on your cluster.

az aks update \
  --name my-cluster \
  --resource-group my-rg \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --enable-addons azure-keyvault-secrets-provider

Then you create a Managed Identity and give it access to Key Vault.

az identity create \
  --name my-workload-identity \
  --resource-group my-rg

az keyvault set-policy \
  --name my-keyvault \
  --object-id <identity-principal-id> \
  --secret-permissions get list

Then you federate the Kubernetes service account with the Managed Identity.

az identity federated-credential create \
  --name my-fed-credential \
  --identity-name my-workload-identity \
  --resource-group my-rg \
  --issuer <oidc-issuer-url> \
  --subject system:serviceaccount:my-namespace:my-service-account

Then you define a SecretProviderClass that maps which Key Vault secrets you want and how they should appear in the pod.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: my-app-secrets
  namespace: my-namespace
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: <managed-identity-client-id>
    keyvaultName: my-keyvault
    tenantID: <tenant-id>
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
        - |
          objectName: api-key-external-service
          objectType: secret
  secretObjects:
    - secretName: my-app-secrets
      type: Opaque
      data:
        - objectName: db-connection-string
          key: DB_CONNECTION_STRING
        - objectName: api-key-external-service
          key: EXTERNAL_API_KEY

The secretObjects block is what creates the Kubernetes secret from the Key Vault values. Your pod can then reference it as an environment variable the same way it always did. No application changes required.

What Broke During Migration

We migrated eight namespaces across three clusters over ten days. Here is what went wrong.

The sync delay we didn't know about

The CSI driver syncs secret values from Key Vault on a polling interval. The default is two minutes. We discovered this when we rotated a Key Vault secret during testing and the running pods kept using the old value for two minutes after the rotation. This is expected behaviour but it meant our rotation procedure needed to account for this lag. If you have a hard dependency on immediate propagation after rotation you need to either reduce the sync interval or plan for a rolling restart of affected pods after rotation completes.

We settled on a rotation runbook that updates the Key Vault secret, waits for the sync interval and then triggers a rollout restart on the affected deployments. Not fully automated yet but reliable.

Pods fail to start if Key Vault is unreachable

This one caught us in staging. The CSI driver mounts the secret as a volume at pod startup. If Key Vault is unreachable when the pod starts the mount fails and the pod does not start at all. This is different from the old behaviour where the Kubernetes secret was already in the cluster and pod startup had no external dependency.

During a brief Key Vault connectivity issue we had pods failing to restart after a node recycling event. The pods that were already running were fine. Any pod that needed to start fresh was stuck.

The mitigation is to make sure your AKS clusters access Key Vault over a Private Endpoint rather than the public endpoint so the network path is more reliable and doesn't traverse the public internet. We had meant to do this from the start but hadn't gotten to it. This incident moved it up the priority list.

Access policy gaps we found late

When you grant a Managed Identity access to Key Vault you scope it to specific secret names or use a wildcard. We initially used wildcards to move fast. The auditor came back and asked us to scope permissions to specific secret names per workload. Going back and tightening those policies across all our Key Vault instances without breaking running workloads was tedious. We should have done it right the first time.

The lesson is to treat Key Vault access policies like you treat Kubernetes RBAC. Least privilege from day one is much easier than retrofitting it later.

What the Audit Outcome Looked Like

Four weeks after the finding we had all secrets migrated to Key Vault, Workload Identity configured on all clusters, Private Endpoints in place for Key Vault access and audit logging enabled in Azure Monitor so we could show exactly which identity accessed which secret and when. The auditor closed the finding.

The audit log piece turned out to be more useful than expected. We surfaced a service account that was pulling a secret it had no business accessing because a developer had copy-pasted a service account name from another namespace. We wouldn't have caught that without the logs.

What I'd Tell Someone Starting This Today

Don't wait for an audit. Set this up before you have secrets in Kubernetes at all. It takes a day to configure properly on a fresh cluster and it saves you the pain of migrating running workloads later.

Enable Private Endpoints for Key Vault before you go to production. The public endpoint works but any network dependency at pod startup is a reliability risk you don't need.

Scope access policies to specific secrets per workload from the beginning. The wildcard shortcut costs you later.

Set up alerts on Key Vault diagnostic logs for denied access attempts. It's the fastest way to catch misconfigured identities and the occasional developer testing something they shouldn't be.

And document your rotation procedure before you need it. The worst time to figure out how rotation works end-to-end across your clusters is when you're rotating a secret because it leaked.

How We Set Up One Private Container Registry for 6 AKS Clusters Across 3 Regions and What Broke Along the Way

Alex — Thu, 16 Apr 2026 03:52:30 +0000

When our team started expanding from a single AKS cluster in East US to clusters across West Europe and Southeast Asia, the first thing we assumed was that container image management would be the easy part. It wasn't.

This post walks through how we architected a single private container registry accessible by all six of our AKS clusters across three Azure regions. I'll cover what worked, what silently failed for weeks before we noticed, and the decisions I'd make differently today.

The Setup We Started With

In the beginning we had one AKS cluster and one Azure Container Registry sitting in the same region. The CI/CD pipeline would build an image, push it to ACR, and AKS would pull it. Simple.

Then we added a second cluster in West Europe for latency reasons. And a third in Southeast Asia for compliance reasons. Suddenly we had a problem that sounds simple but has a lot of moving parts: how does every cluster pull images securely without us managing a pile of credentials and without paying a fortune in cross-region egress fees.

What We Evaluated

We looked at three approaches before landing on our final architecture.

Option 1: One central ACR, all clusters pull from it

The simplest option and the one we tried first. Every cluster just pointed to the same ACR endpoint regardless of where it was running. This worked fine in testing. In production it caused two problems. First the pull latency during cluster upgrades was noticeable since every node was pulling large images over a long network path. Second when we had a brief connectivity blip in East US one night clusters in other regions couldn't pull images and pod restarts started failing. A registry in one region had become a single point of failure for the entire platform.

Option 2: Separate ACR per region

We considered running an independent ACR in each region and pushing images to all three from the pipeline. This solved the latency and availability problem but created a worse one. Now our pipeline had to push to three registries on every build. Image promotion between environments became a mess. And keeping digests consistent across registries turned out to be harder than expected since different push operations under load would sometimes result in slightly different metadata depending on timing.

Option 3: ACR Geo-Replication

This is what we landed on. ACR's Premium SKU supports geo-replication where you maintain a single registry logically but Azure replicates your images automatically to replicas in whichever regions you choose. You push once and every regional replica gets the image. Clusters in each region pull from their nearest replica automatically with no changes to image references in your manifests.

The Architecture We Run Today

Here is the high level picture.

Our CI/CD pipeline in Azure DevOps builds the image and pushes to a single ACR endpoint in East US. ACR handles replication to West Europe and Southeast Asia replicas in the background. Each AKS cluster is configured to authenticate using Managed Identity so there are no image pull secrets to rotate or manage. Azure handles the auth handshake between AKS and ACR natively.

The command to wire up a cluster to ACR is straightforward.

az aks update \
  --name my-cluster \
  --resource-group my-rg \
  --attach-acr my-registry

That one command grants the cluster's managed identity the AcrPull role on the registry. Do this for each cluster and you're done with credentials.

For geo-replication you add replicas through the portal or CLI.

az acr replication create \
  --registry my-registry \
  --location westeurope

az acr replication create \
  --registry my-registry \
  --location southeastasia

After this any image you push to the registry replicates to both locations within a few minutes depending on image size.

What Broke and When

I want to be honest about the parts that didn't go smoothly because these are the things no architecture diagram ever shows you.

Replication lag during fast rollouts

Our first incident with this setup was subtle. We pushed a new image and triggered a rollout within seconds of the push completing. The East US cluster picked up the new image fine since it was pulling from the local replica. But the West Europe cluster tried to pull before replication had completed and got an image not found error. Pods went into ImagePullBackoff and we spent 20 minutes confused about why the same rollout was failing in one region and succeeding in another.

The fix was to add a replication wait check in our pipeline before triggering rollouts across all regions.

az acr replication show \
  --name westeurope \
  --registry my-registry \
  --query "status.displayStatus" \
  --output tsv

We poll this until it returns "Synced" before allowing the pipeline to proceed to the rollout stage. Simple but not something you'd think to add until you've been burned.

Digest pinning across replicas

We use digest pinning in production so that what gets deployed is exactly what got scanned and approved. The assumption was that the same image pushed to ACR would have the same digest everywhere. That assumption held in our case but it is worth explicitly verifying in your setup because some replication tools and registry configurations can rewrite manifest metadata in ways that change the digest. If your digest changes between regions your GitOps tooling will treat them as different images and you'll have a very confusing debugging session ahead of you.

Node pool scaling pulling a lot at once

Cluster autoscaler adding multiple nodes simultaneously means many nodes pulling the same large image at the same time. During a traffic spike we scaled out 12 nodes in Southeast Asia within two minutes. Each node started pulling a 2.4GB image independently. The registry replica handled it but we saw throttling warnings in our ACR metrics that we hadn't seen before. We addressed this by implementing image pre-pulling using DaemonSets for our heaviest images and by tuning the parallel image pull settings in the kubelet config.

Egress costs were not zero

Even with geo-replication you pay for replication traffic between regions. For small teams with small images this is negligible. For us with images averaging around 800MB across a dozen services the monthly replication cost was noticeable. It didn't break the budget but it was something we hadn't accounted for in our initial cost estimates. Factor this in before you go to your engineering manager with a cost projection.

What I'd Do Differently

If I were starting this from scratch today here's what I'd change.

First I'd instrument the registry from day one. ACR exposes metrics for pull latency, error rates and replication lag. We didn't set up alerts on these until after our first incident. Add them before you need them.

Second I'd enforce digest pinning in production from the start using something like Kyverno. We retrofitted this policy later and it was painful to roll out across six clusters without disrupting running workloads.

Third I'd test regional failover before an actual outage forces you to. Simulate a replica going dark and confirm your clusters handle it gracefully. We did this six months in and found an edge case where one of our clusters had a stale DNS cache that caused it to keep trying the failed replica instead of failing over. Finding that in a drill at 2pm is much better than finding it during an actual incident at 2am.

The Takeaway

ACR geo-replication with Managed Identity auth is a genuinely good solution for multi-region AKS setups. The operational overhead is low compared to running your own Harbor instances or managing push-to-multiple-registries pipelines. But like everything in distributed systems the edge cases live in the timing and the assumptions.

Watch your replication lag. Pin your digests. Alert on your registry metrics. And test your failover before production tests it for you.