Forem: Mahra Rahimi

How to Monitor the Length of Your Individual Azure Storage Queues

Mahra Rahimi — Mon, 27 Jan 2025 13:21:47 +0000

TL;DR: Azure Storage Queues lack built-in metrics for individual queue lengths. However, you can use the Azure SDK to query approximate_message_count and track each queue's length. Emit this data as custom metrics using OpenTelemetry. A sample project is available to automate this process with Azure Functions for reliable, scalable monitoring.

If you're using Azure Storage Queues and need (or simply want) to monitor the length of each queue individually, I have some bad news. 😫

Azure only provides metrics for the total message count across the entire Storage Account via its built-in metrics feature. Unfortunately, this makes those built-in metrics less useful if you need to track message counts for individual queues.

Example above of the in-built metrics. There are two queues at any given time, but we are unable to identify how many messages are in the individual queues. The filter functionality is disabled, and there is no specific metric for queue message count, as can be seen below.

Why does monitoring individual queue lengths matter?

Monitoring individual queue lengths can be important for several reasons. For instance, if you're managing multiple queues, you may want to:

Track a poison message queue to avoid disruptions in your system.
Monitor the pressure on specific queues to ensure they are processing messages efficiently.
Manage scaling decisions by watching how queues grow under different loads.

Whether you're debugging or scaling, knowing the message count for each queue helps keep your system healthy.

The good news 😊

While Azure doesn’t provide this feature out of the box, there’s an easy workaround, which this blog will walk you through.

How to Get Your Metrics

As mentioned, Azure does not provide individual Storage Queue lengths as a built-in metric. Given that people have been asking for this feature for the past five years, it's likely not a simple task for Microsoft to implement this as a standard metric. Therefore, finding a workaround might be your best option.

Naturally, this leads to the question: If standard metrics don’t provide this, is there another way to get it? 🤔

A closer look at the Azure Storage Account SDK reveals the queue.properties attribute approximate_message_count, which gives you access to the information you need—just via a different method.

Knowing this, wouldn’t it be great if you could use this data to track queue lengths as a metric?

Here’s a thought: What if you just do that? 🧠

You can query the length of each queue, create metric gauges and update the value on a regular basis.

Let’s break it down step by step.

1. Get Queue Length

Using the Python SDK, you can easily retrieve the individual length of a queue. See the snippet below:

from azure.identity import DefaultAzureCredential
from azure.storage.queue import QueueClient

STORAGE_ACCOUNT_URL = "<storage-account-url>"
QUEUE_NAME = "<queue-name>"
STORAGE_ACCOUNT_KEY = "<key>"

credentials = STORAGE_ACCOUNT_KEY or DefaultAzureCredential()
client = QueueClient(
    STORAGE_ACCOUNT_URL,
    queue_name=QUEUE_NAME,
    credential=credentials,
)

try:
    properties = client.get_queue_properties()
    message_count = properties.approximate_message_count
    print(message_count)
except Exception as e:
    logger.exception(e)

Since the SDK is built on top of the REST API, similar functionality is available across other SDKs. Here are references for the REST API and SDKs in other languages:

2. Create a Gauge and Emit Metrics

Next, you create a gauge metric to track the the queue length.

A gauge is a metric type that measures a value at a particular point in time, making it perfect for tracking queue lengths, which fluctuate constantly.

For this, we’ll use OpenTelemetry, an open-source observability framework gaining popularity for its versatility in collecting metrics, traces, and logs.
Below is an example of how to emit the queue length as a gauge using OpenTelemetry:

from opentelemetry.metrics import Meter, get_meter_provider

meter = get_meter_provider().get_meter(METER_NAME)

gauge = meter.create_gauge(
    name=gauge_name, description=gauge_description, unit="messages"
)

new_length = None

⋮ # Code to get approximate_message_count and set new_length to it

gauge.set(new_length)

Another advantage for OpenTelemetry is that it integrates extremly well with various observability tools like Prometheus, Azure Application Insights, Grafana and more.

3. Make It Production Ready

While the above approach is great for experimentation, you’ll likely need a more robust solution for a production environment. That’s where resilience and scalability come into play.

In production, continuously monitoring queues isn’t just about pulling metrics. You need to ensure the system is reliable, scales with demand, and handles potential failures (such as network issues or large volumes of data). For example, you wouldn’t want a failed query to halt your monitoring process.

If you're interested in seeing how this can be made production-ready, I’ve created a sample project: azure-storage-queue-monitor. This project wraps everything we’ve discussed into an Azure Function that runs on a timer trigger. It handles resilience, concurrency, and scales with your queues, ensuring you can monitor them reliably over time.

Conclusion

Now that you have the steps to track individual queue lengths and emit them as custom metrics, you can set this up for your own environment. If you give this a try, feel free to share your experience or improvements—I'd love to hear your thoughts and help if you encounter any issues!

Happy queue monitoring! 🎉

How to use Azure VM metadata service to automate post-provisioning metadata configuration in your IaC for VMSS

Mahra Rahimi — Thu, 10 Aug 2023 06:57:50 +0000

TL;DR: How to use cloud-init for Linux VMs and Azure Custom Script Extension for Windows VMs to create a .env file on the VM containing VM metadata from Azure VM metadata service when using Azure VM Scale Sets

When using Virtual Machines or Virtual Machine Scale Sets on Azure, it often becomes extremely useful to have certain VM metadata accessible to your applications. This type of metadata (like ID, name, private IP, etc.) gets normaly generated at the provisioning time, and having an automated way for applications to access these will come in handy.

Azure provides an amazing service called the Azure VM metadata service, which can be accessed from within a VM to retrieve a all VM specific information.

 curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

While this command is useful, integrating it into your Infrastructure as Code (IaC) can automate the process and ensure scalability.

In this blog, we'll explore how to package the VM metadata service call into a script, store the metadata in a file, and incorporate this process into both Windows and Linux VMs in a VMSS setup.

Creating a Generalized Metadata Retrieval Script

When looking at the VM metadata service endpoint from Azure, everything other than the IP appears to be generic. However, upon closer reading of the Azure documentation, it is mentioned that this "magic" IP is the same for all VMs.

"Azure's instance metadata service is a RESTful endpoint available to all IaaS VMs created via the new Azure Resource Manager. [..] The [VM metadata service] endpoint is available at a well-known non-routable IP address (169.254.169.254) that can be accessed only from within the VM."

This allows us to easily package the call up in a script and output the metadata in our needed format. For the sake of this blog, we will simply create a file that will contain the information we need.

Let's proceed with the implementation details for both Windows and Linux VMs. The full code can be found here.

Windows VMs: Utilizing Azure Custom Script Extension

For Windows VMs, the Azure Custom Script Extension is a powerful tool to execute post-provisioning scripts. Within the script, we can use the VM metadata service to retrieve the VM name and store it in a file under C:\ called vm-metadata.env.

# vm-metadata.ps1vm-metadata.ps1
$vmName = Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET -Uri "http://169.254.169.254/metadata/instance/compute/name?api-version=2021-02-01&format=text"
"VM_NAME=$vmName" | Out-File -FilePath C:\vm-metadata.env -Append

In the IaC definition, the above script can be passed either via an Azure storage account or from GitHub.

resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2022-03-01' = {
  name: vmssName
  location: location
  ...
  properties: {
    singlePlacementGroup: null
    platformFaultDomainCount: 1
    virtualMachineProfile: {
      extensionProfile: {
        extensions: [ {
            name: 'CustomScriptExtension'
            properties: {
              publisher: 'Microsoft.Compute'
              type: 'CustomScriptExtension'
              typeHandlerVersion: '1.10'
              settings: {
                commandToExecute: 'powershell -ExecutionPolicy Unrestricted -File vm-metadata.ps1'
                fileUris: [ '<link-to-file>' ]
              }
            }
          } ]
      }
    }
    ...
  }
}

Linux VMs: Harnessing cloud-init

For Linux VMs, leveraging the native cloud-init tool simplifies the process.

Note: We could, however, also use the same Azure Custom Script Extension as we did for Windows here. Check out the docs for that here.

Amongst many other things, the cloud-init definition allows you to specify one or more commands in the runcmd section, which should run after the initial startup. Just like for the PowerShell script, the VM metadata is called and the extracted VM name is stored in the vm-metadata.env file.

#cloud-config
runcmd:
  -  vmName=$(curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance/compute/name?api-version=2021-02-01&format=text") && echo "VM_NAME=${vmName}" >> vm-metadata.env

Similar to regular VMs, the VMSS allows you to set the customData property when defining your OS profile. It behaves the same way as it does for a VM deployment with cloud-init, expecting the file to be passed as a base64-encoded string.

param cloudInitScript string = loadFileAsBase64('./cloud-init.yaml')

...

resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2022-03-01' = {
  name: '${prefix}-vmss'
  location: location
  dependsOn: [
    vmssLB
    vmssNSG
  ]
  sku: {
    name: 'Standard_DS1_v2'
    capacity: 1
  }
  properties: {
    singlePlacementGroup: null
    platformFaultDomainCount: 1
    virtualMachineProfile: {
      osProfile: {
        computerNamePrefix: 'vmss'
        adminUsername: 'azureuser'
        adminPassword: adminPassword
        customData: cloudInitScript
      }
      ...

    }
    ...
  }
}

And with that, you know how to retrieve VM metadata values for your applications from a VM in your VMSS pool in an automatic fashion :)

NVIDIA GPU Monitoring on Windows VMs: Tools and Techniques

Mahra Rahimi — Thu, 10 Aug 2023 06:54:39 +0000

TL;DR: How to get NVIDIA GPU utilization on Windows VMs according to GPU mode.

In the era of Machine Learning, OpenAI, and ChatGPT, GPUs have gained significant attention. Driven by the rapid growth of machine learning and rendering projects in various industries, GPUs' usage has become increasingly common, even extending beyond the realms of IT to fields like manufacturing and other non-IT sectors.

However, it's important to note that unlike greenfield projects, most of these companies already possess preexisting IT ecosystems and infrastructures. When building upon such an ecosystem, the likelihood of encountering unconventional technology constellations increases.

The Scenario

One such scenario is NVIDIA GPU metrics retrieval in WDDM mode on Windows machines. While NVIDIA offers tools for Linux-based machines (for instance DMGC), there are fewer comprehensive tools available for Windows-based workloads. Furthermore, these tools might not adequately cover all required use cases simultaneously.

In this blog, my aim is to guide you through various methods of accessing NVIDIA GPU adapter and process-level utilization on Windows VMs. Hopefully, this can be of assistance to someone out there :)

NVIDIA tools for GPU Utilization

There are two main NVIDIA tools that offer access to GPU utilization: NVAPI and NVML.
It's important to note that these tools differ in terms of the level of granularity they offer for GPU load, and some might be restricted to functioning in only one of the two GPU modes.

Let's begin by examining the details you can extract from each tool, and in the following section, we will explore the distinctions between the GPU mode approaches.

NVAPI:
NVAPI (NVIDIA API) is the NVIDIA's SDK that gives direct access to the NVIDIA GPU and driver for Windows-based platforms. However, it exclusively provides access to GPU adapter level utilization and does not offer process-level information access.
NVML:
NVML (NVIDIA Management Library), on the other hand, is a C-based API designed to access various states of the GPU and is the same tool used by nvidia-smi. Unlike NVAPI, NVML allows access to both adapter and process level GPU utilization, making it a more comprehensive tool for monitoring and managing GPU performance.

GPU Modes

When dealing with NVIDIA GPUs, it's crucial to be aware of the various modes they can be set to based on your requirements: WDDM and TCC. As mentioned above, not all tools are designed to handle both modes. Therefore, the next section will introduce the different approaches that can be used depending on the GPU mode.

TCC Mode Tools

The TCC Mode serves as the computation mode of GPUs, enabled when the CUDA drivers are installed. In this mode, you can easily access adapter and process level GPU utilization using the common nvml.dll provided by NVIDIA. You can write your own wrapper or leverage existing wrapper libraries and samples available.
Here is a small list for nvml wrappers in some languages:

WDDM Mode Tools

On the other hand, the WDDM mode is primarily used for rendering work on GPUs and requires installing the GRID drivers. When operating in WDDM mode, process level metrics can no longer be accessing via the nvml.dll. Instead, these metrics are now routed through the Windows Performance Counter, requiring a different approach to retrieve them.

In the next section, we will delve into a small example of how to retrieve GPU load at both the process and overall levels when operating in WDDM mode. This will allow you to access the PerformanceCounter from your code and retrieve GPU memory utilization. We'll focus on the two categories: GPU Process Memory and GPU Adapter Memory.

Note: There are, however, many more categories. If you need to access a list of them, the PerformanceCounterCategory provides a static method to retrieve them all: PerformanceCounterCategory.GetCategories().

Adapter level metrics

As the name GPU Adapter Memory suggests, this category contains a list of adapters and their load in bytes. The code snippet below demonstrates how to retrieve the load for each adapter and print it in a log line:

using System.Diagnostics;

...

var category = new PerformanceCounterCategory("GPU Adapter Memory");
var adapters = category.GetInstanceNames();

foreach ( var adapter in adapters)
{
    var counters = category.GetCounters(adapter);

    foreach (var counter in counters)
    {
        if (counter.CounterName == "Total Committed")
        {
            var value = counter.NextValue();
           Console.WriteLine($"GPU Memory load on adapter {adapter} is {value} bytes.");
        }
    }
}

Process level metrics

As before, the category name GPU Process Memory indicates that it contains a list of processes and their GPU memory load in bytes.
Again, the code snippet will simply print each process and its respective load as a demonstration. This code can be adapted to be used to publish metrics for collection by other tools ( eg. Prometheus, OpenTelemetry collector)

using System.Diagnostics;

...

var performanceCounterCategory = new PerformanceCounterCategory("GPU Process Memory");
var processes = performanceCounterCategory.GetInstanceNames();
foreach (var process in processes)
{
    var counters = performanceCounterCategory.GetCounters(process);
    var totalCommittedCounter = counters.FirstOrDefault(counter => counter.CounterName == "Total Committed");
    var value = totalCommittedCounter.NextValue();
    Console.WriteLine($"GPU Memory load of process {process} is {value} Bytes");
}

This category offers a significant advantage over GPU Adapter Memory, as it provides the ability to filter the 'total load' based on specific processes. This can be particularly helpful when you want to monitor the GPU memory load of specific applications or processes.

For instance, let's say you have three particular processes of interest, and you want to focus on monitoring only their GPU memory load. In this scenario, utilizing the GPU Process Memory category and applying filters for your targeted processes becomes highly valuable. This enables you to extract precise insights into the GPU memory utilization of these specific applications, allowing for more accurate performance analysis and resource allocation.

Conclusion

In conclusion, as GPUs continue to be a cornerstone of modern computing, understanding the nuances of their management is crucial. While challenges may arise due to different ecosystem, the tools and techniques mentioned above should provide you with a head start in effectively monitoring GPU resources for Windows-based workloads.

Refactoring GitOps repository to support both real-time and reconciliation window changes

Mahra Rahimi — Fri, 13 Jan 2023 09:36:49 +0000

Restructuring GitOps repository to be able to enable multiple reconciliation types. eg real-time and reconciliation window changes with the approach described in the previous part.

For some scenarios allowing only updates to be applied during a reconciliation window is not enough.
There are cases when some application resources should be managed in real time, but others are still only allowed to change during a reconciliation window.
The example we use here is a nginx deployment to the cluster, which contains a Deployment, Service, and a ConfigMap manifest.
The ConfigMap, which defines the nginx.conf should me manageable in real time. However, the Deployment and the Service should only be changed with in a reconciliation window.

Hence, the problem statement changes slightly from the last part:

We want to enable two ways of applying changes to a cluster using Flux:

Real-time changes: Representing the default behavior of Flux when it comes to reconciling changes.
Reconciliation windows changes: Predefined time windows in which a change can be applied to the resource by Flux.

We can still use the core approach shown here to solve our new problem. However, we need to make some adjustments to how we organize our GitOps repository, to enable real-time as well as reconciliation window changes.

Even though we are only demonstrating the restructuring of this GitOps repository on two reconciliation types. This approach can easily be extended for more types. Just note that, for each new type of reconciliation window, corresponding set of of CronJobs are needed to manage the new windows.

Pre-requisits:

IMPORTANT: If you haven't already read the first part, go back and do so, as we will use its approach on how to enable the reconciliation window in this blog.
Intermediate knowledge of Flux, Kustomize and K8s

Core Principles

Before we start restructuring the repository, it might be useful to understand why we have to do so in the first place.

As covered in the previous blog, to be able to control the reconciliation cycle differently for a group of resources, these resources need to be managed by an independent Kustomization resource.

Because of this the goal of the following sections are:
"Restructure the GitOps repository such that its resources can be managed by one of the N-Kustomization resources we will create.
Where N defines the number of schedules for applying changes."

As in this blog we are only interested in real-time and reconciliation window changes, N is equal to 2.

Set up

1. Set up your applications or components

Let's start with the smallest unit of grouping we have in our GitOps repository: apps

Looking at the example in this sample, under apps we have an nginx folder, which contains the Deployment, a Service, and a ConfigMap manifest.

apps
└── nginx
    ├── kustomization.yaml
    ├── deployment.yaml
    ├── service.yaml
    └── configmap.yaml

As mentioned, we want to now make sure we can change the nginx server configuration, defined in the configmap.yaml in real time, but infrastructure changes such as deployment and the service should only change between Monday 8 am to Thursday 5 pm.

To enable this, the first step is to make sure we can split resources that can be changed real-time from resources that can only change state during a reconciliation window from kustomizes point of view.

Note: If you are not familiar with how kustomize is used to manage resources check out the official doc from Kubernetes on this at Overview of Kustomize

One of the ways we can achieve this is by splitting all the resources for each application we have defined under apps/ (see default GitOps folder structure for mono repos) into two versions. These versions' sole purpose is to package the resources to be either managed by the real-time or the reconciliation window Kustomization resource.

We can then split all manifest files into these two subfolders and add the respective suffixes to the subfolders:

Real-time changes: -rt
Reconciliation windows changes: -rw

Original structure:

apps
└── nginx
    ├── kustomization.yaml
    ├── deployment.yaml
    ├── service.yaml
    └── configmap.yaml

Enabeling real time and reconciliation windows changes:

apps
└── nginx
    ├── nginx-rt
    │   ├── kustomization.yaml
    │   └── configmap.yaml
    └── nginx-rw
        ├── kustomization.yaml
        ├── deployment.yaml
        └── service.yaml

The result of this splitting you can see in the sample repository here

2. Set up your clusters

The next step is to restructure the clusters directory. The goal is to make sure we can create two independents Kustomization resources. This means we need two entry points to point each of the Kustomization resources to.
For that we split the previous apps into two subfolders, apps-rt/apps-rw.
Where ./cluster/<cluster_name>/apps/apps-rt will be the entry point for the real-time Kustomization resources and ./cluster/<cluster_name>/apps/apps-rw for the reconciliation window controller.

Original structure:

clusters/cluster-1
├── apps
│    └── nginx
└── infra
     └── reconciliation-windows

Enabeling real time and reconciliation windows changes:

clusters/cluster-1
├── apps
│   ├── apps-rw
│   │   └── nginx
│   └── apps-rt
│       └── nginx
└── infra
      └── reconciliation-windows

Next, we need to add the kustomization.yaml and make sure they reference the right resources.

Let's first have a look at the the kustomization.yaml in clusters/cluster-1/apps/app-rw and clusters/cluster-1/apps/app-rt setup.
Both app-rw and app-rt will have a root kustomization.yaml which will point to all applications deployed onto the cluster. In our example, this is only the nginx app.

Folder structure:

clusters/cluster-1
├── apps
│   ├── apps-rw
│   │   ├── kustomization.yaml
│   │   └── nginx
│   └── apps-rt
│       ├── kustomization.yaml
│       └── nginx
└── infra

The kustomization.yaml files:

#clusters/cluster-1/apps/apps-rw/kustomization.yaml
resources:
  - ./nginx

#clusters/cluster-1/apps/apps-rt/kustomization.yaml
resources:
  - ./nginx

Going one level deeper, both the nginx under clusters/cluster-1/apps/app-rw and clusters/cluster-1/apps/app-rt have a similar setup.
To not go over the same thing twice, we are going to only have a look at the clusters/cluster-1/apps/app-rt. To see the setup of the app-rw you can check the sample here.

Folder structure:

clusters/cluster-1
├── apps
│   ├── apps-rw
│   └── apps-rt
│       ├── kustomization.yaml
│       └── nginx
│           ├── namespace.yaml
│           └── kustomization.yaml
└── infra

The kustomization.yaml files:

#clusters/cluster-1/apps/apps-rt/nginx/kustomization.yaml
resources:
  - ./../../../../../apps/nginx/nginx-rt
  - ./namespace.yaml

As shown above, the application resources referenced under clusters/cluster-1/apps/apps-rt are the resources we bundled up under apps/nginx/nginx-rt and should now only contain resources that can be changed in real-time.

And just like that you have separated all configurations to be managed by different Kustomization resources!

Set up `Kustomization` resources.

Our GitOps repository is ready now, but how do we set up the Kustomization resources?
Let's first create a flux Source resources.

flux create source git source \
    --url="https://github.com/<github-handle>/flux-reconciliation -windows-sample" \
    --username=<username>\
    --password=<PAT> \
    --branch=main \
    --interval=1m \
    --git-implementation=libgit2 \
    --silent

Next, we now need two controllers for apps and one for infra.

flux create kustomization infra \
    --path="./clusters/cluster-1/infra" \
    --source=source\
    --prune=true \
    --interval=1m

flux create kustomization apps-rt \
    --depends-on=infra \
    --path="./clusters/cluster-1/apps/apps-rt" \
    --source=source\
    --prune=true \
    --interval=1m

flux create kustomization apps-rw \
    --depends-on= apps-rt \
    --path="./clusters/cluster-1/apps/apps-rw" \
    --source=source\
    --prune=true \
    --interval=1m

Not this should give you something like this.

user@cluster:~$ flux get kustomization
NAME    REVISION        SUSPENDED READY MESSAGE
infra   main/7cf3aaf  False     True  Applied revision: main/7cf3aaf
apps-rt main/7cf3aaf  False     True  Applied revision: main/7cf3aaf
apps-rw main/7cf3aaf  False     True  Applied revision: main/7cf3aaf

Demo

Now that the cluster is set up, we can upgrade the nginx version and change the configuration nginx.conf to include the nginx_status endpoint and see how one is visible right away, while the other needs a reconciliation window to open.

1. Initial state

Before we do any changes, we can check out the current state of the nginx deployment.
Get the public ip address of the machine you are running your cluster on and navigate to the http://<ip>:8080/ we should see somehing like this.

Note: if you are running it locally you can replace the ip with localhost

We can download the nginx.conf file by clicking on it and see what configuration is currently mounted into the nginx pod from the ConfigMap.

2. Change state

The next step is to change the state of our application.
To change the state of the application we can change the image version number from 1.14.2 to the (currently) newest image 1.23.3 inside the apps/nginx/nginx-rw/deployment.yaml. And in the same commit, we can add the configuration shown below to the nginx.conf section in the apps/nginx/nginx-rt/configmaps.yaml file to include the new status endpoint.

location /nginx_status {
                stub_status;
                allow all;
            }

3. See real-time changes

Now if we go back to the browser, refresh the page and re-download the file nginx.conf, we should see the new section we just added.

Note: It might take up to 2 minutes in the worst case for the Source and then Kustomization resource to reconcile

4. Wait for reconciliation window to open

If we now wait till the next reconciliation window opens, the pod should be restarted and we should be able to see the version either by checking the resource.

kubectl describe pod  <nginx-podname> -n nginx

Or if you don't want to access the machine directly you can go to a non-existing route in the browser eg http://:8080/settings/. There you should see a standardnginx` 404 page which contains the current deployed version at the bottom.

Conclusions

Let's summarize what we did when it came to restructuring the repository.

We separated all application resources into two sub-versions. One for resources which can be changed in real-time and one for resources that can only be changed when a reconciliation window is open.
We split the clusters directory in such a way, so that we can create two independent Kustomization resources, which reference either one or the other application sub-version.

After this we could create the infra and the two apps Kustomization resource and start using the solution, as demonstrated.

So, at its core it boils down to separating the resource definition, in such a way that they are only managed by one of the Kustomization resources created. This can be done like it's shown above, or slightly differently to fit your needs.

But hopefully after this second part, you should be good to go on using these reconciliation windows and have the knowledge on how to tweak the setup to fit your use case :)

How to enable reconciliation windows using Flux and K8s native components

Mahra Rahimi — Fri, 13 Jan 2023 09:35:29 +0000

How to enable reconciliation windows for a GitOps Setup using the suspension feature of the flux Kustomize resource and K8s CronJobs.

When using Flux to manage a K8s cluster every new change in your repository will be immediately applied to the cluster’s state. In some use cases, the newest changes to a GitOps repository should only apply to the cluster within a designated time window. For example, the cluster should reconcile to the newest changes of the GitOps repository only between Monday 8am to Thursday 5pm. Any change coming in to the GitOps repository on Friday or the weekend will have to wait till Monday 8am to be applied.

What are the scenarios this could be used for in real life?

Sometimes the cluster is connected to external systems, which need to be in maintenance mode before updates can be applied.
You want to be able to determine a designated time window when the next changes go into production, so that in case of issue you are able to react quickly.

So our problem in short:
We want to be able to predefine time windows to deploy all new changes to a cluster that is managed by Flux.

To make things easier, let's call these time windows "reconciliation windows" and dig right into how to solve the problem.

Pre-requisits:

Intermediate knowledge of Flux, Kustomize and K8s

Core principles

Now how do we create such reconciliation windows using Flux and K8s native resources?
To go there we first need to understand how the Flux Kustomization and Flux Source resource work, and how we can leverage this to solve our problem.

When setting up a cluster with Flux there will always be a Source resource that reconciles the changes from the GitOps repository into the cluster.
After that, the Kustomization resource will poll the newest changes from the Source resource and apply them to the cluster.

Now interestingly enough both of the reconciliations of these resources can be suspended.

Suspend Source/Kustomization resource from reconciling



flux suspend source <name>
flux suspend kustomization <name>

Resume reconciling of Source/Kustomization resource



flux resume source <name>
flux resume kustomization <name>

Suspending the Kustomization resource means no changes are applied to the cluster:

Since our goal is to suspend the reconciliation of the cluster state, just suspending the Kustomization resource is enough. The Source resource can continues syncing content in the predefined interval.

Schedule opening and closing of reconciliation windows

So far so good. But how do we automate this?
Well, K8s has already native ways to support scheduling of jobs, which are CronJob resources, so why not use them?

With Cron Jobs we can create an open-reconciliation-window-job and a close-reconciliation-window-job which will use the Flux CLI and a ServiceAccount to resume/suspend the kustomizations.
Let's use the “No-deployment Friday” example. For the reconciliation window from every Monday 8:00 am to Thursday 5:00 pm, this is how the jobs would look.

Note: The ServiceAccount and the corresponding RoleBinding and Role is needed to give the job the right access to perform operations on the cluster resources. For more information on this see the K8s docs on configuring service accounts



# open-reconciliation-window-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: open-reconciliation-window
  namespace: jobs
spec:
  schedule: "0 8 * * MON"
  suspend: true
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-job-runner
          containers:
            - name: hello
              image: ghcr.io/fluxcd/flux-cli:v0.36.0
              imagePullPolicy: IfNotPresent
              command: ["/bin/sh", "-c"]
              args:
                - flux resume kustomization infra -n flux-system;
                  flux resume kustomization apps -n flux-system;
          restartPolicy: Never



# close-reconciliation-window-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: close-reconciliation-window
  namespace: jobs
spec:
  schedule: "0 17 * * THU"
  suspend: true
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-job-runner
          containers:
            - name: hello
              image: ghcr.io/fluxcd/flux-cli:v0.36.0
              imagePullPolicy: IfNotPresent
              command: ["/bin/sh", "-c"]
              args:
                - flux suspend kustomization infra -n flux-system;
                  flux suspend kustomization apps -n flux-system;
          restartPolicy: Never

Note: you can customize the window times as you want by playing with the scheduling string set in specs.schedule. There are a few online tools to help you understand how these cron-strings work, eg crontab guru.

Scale by using GitOps to manage reconciliation windows in GitOps

At this point, we have the capabilities to resume and suspend, but we still need to create the CronJobs manually for each cluster.

Imagine we have a GitOps repository that manages 10+ clusters. Not all of these clusters will probably have their reconciliation window set at the same time. Also, you don't want to manually have to create these jobs, let alone maintain the jobs if for example more Kustomization resources get added to the cluster.

Not to worry, there is also a solution for that ;)

I mean we are already using GitOps? Why not stick the definition of the job into the repository as part of our infrastructure?
And why not use kustomize's patch functionality to overwrite the CronJob's cron string to be able to customize the reconciliation window times for each cluster?

If that sounds interesting check out the full sample here.
Now instead of having to manually create the ClusterRole, RoleBinding, ServiceAccount, and CronJobs, Flux will take care of that for us.

Conclusion

Now this is how we can leverage Flux and K8s native approaches to restrict the application of changes to a cluster to happen only in a reconciliation window.
There are a few advantages to this approach:

For clusters running on the edge, if the connectivity goes down during a reconciliation window, simple changes will still reconcile normally. This is because the Source resource already pulled the newest changes.

Note: Careful this only works for image tag changes if there is a local ACR. Else the new images need to be pre-downloaded to the device

The GitOps repository reflects the desired state after a reconciliation window of the cluster.
No need to maintain a custom gateway or such. All the used components are open-source and there is no need for custom logic.
During the reconciliation windows changes are applied like we used to know from Flux.

What we are however not solving with this, is scheduling fine granular changes. As you might have noticed the granularity end at every resource which is managed by the Kustomization resource the CronJobs suspend and resume. So individual configuration cannot be managed with this approach.

That did not solve your problem yet and your cluster needs real-time changes, as well as changes within a reconciliation window. Not to worry, got you ;) Check out the next part.

Forem: Mahra Rahimi

How to Monitor the Length of Your Individual Azure Storage Queues

Why does monitoring individual queue lengths matter?

The good news 😊

How to Get Your Metrics

Here’s a thought: What if you just do that? 🧠

1. Get Queue Length

2. Create a Gauge and Emit Metrics

3. Make It Production Ready

Conclusion

How to use Azure VM metadata service to automate post-provisioning metadata configuration in your IaC for VMSS

Creating a Generalized Metadata Retrieval Script

Windows VMs: Utilizing Azure Custom Script Extension

Linux VMs: Harnessing cloud-init

NVIDIA GPU Monitoring on Windows VMs: Tools and Techniques

The Scenario

NVIDIA tools for GPU Utilization

GPU Modes

TCC Mode Tools

WDDM Mode Tools

Adapter level metrics

Process level metrics

Conclusion

Refactoring GitOps repository to support both real-time and reconciliation window changes

Pre-requisits:

Core Principles

Set up

1. Set up your applications or components

2. Set up your clusters

Set up Kustomization resources.

Demo

1. Initial state

2. Change state

3. See real-time changes

4. Wait for reconciliation window to open

Conclusions

How to enable reconciliation windows using Flux and K8s native components

Pre-requisits:

Core principles

Schedule opening and closing of reconciliation windows

Scale by using GitOps to manage reconciliation windows in GitOps

Conclusion

Set up `Kustomization` resources.