<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: MediaMarktSaturn Technology</title>
    <description>The latest articles on Forem by MediaMarktSaturn Technology (@mms-tech).</description>
    <link>https://forem.com/mms-tech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F8133%2F58ce7453-58eb-404d-85cb-8b743e249c98.png</url>
      <title>Forem: MediaMarktSaturn Technology</title>
      <link>https://forem.com/mms-tech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mms-tech"/>
    <language>en</language>
    <item>
      <title>Find Product Variants</title>
      <dc:creator>Fabian Eder</dc:creator>
      <pubDate>Fri, 23 Jan 2026 11:22:11 +0000</pubDate>
      <link>https://forem.com/mms-tech/find-product-variants-5043</link>
      <guid>https://forem.com/mms-tech/find-product-variants-5043</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;by &lt;a href="https://www.linkedin.com/in/fabian-eder-5a621517b/" rel="noopener noreferrer"&gt;Fabian Eder&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How can you identify product variants in product data? You don't need to search for them manually. We have a pipeline that finds them for us. We use a multi-step approach using hard constraints, a XGBoost model and a graph clustering algorithm.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What are product variants?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Product variants&lt;/strong&gt; are products of the same manufacturer, that differ only slightly, for example in color or size. So, as they fulfill the same customer needs, MediaMarktSaturn shows them next to each other on the product detail page.&lt;br&gt;
The definition of product variants purely relies on product attributes, and not on customer interaction. For this reason, we maintain variants in our product information management system. We put products into &lt;strong&gt;variant groups&lt;/strong&gt;: products that are variants of each other are part of the same variant group. One product must only be part of 1 variant group, it cannot be in multiple variant groups.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do we need to improve on finding product variants?
&lt;/h2&gt;

&lt;p&gt;To ensure that all product variants are visible on the product detail page, we need complete variant groups for our whole assortment. In each of the 11 countries we are operating in, the size of our assortment comprises tens of thousands of products.&lt;br&gt;
There, content managers maintain variant groups &lt;strong&gt;manually&lt;/strong&gt;: manufacturers provide spreadsheets with products for which variant groups need to be created, or content managers search in the product management system for similar products and create new variant groups, etc. As you can imagine, this process is cumbersome and time-consuming. As a consequence, our variant groups are never complete, and they are usually not reviewed or revised after they are created.&lt;/p&gt;

&lt;h2&gt;
  
  
  An automated solution to find product variants
&lt;/h2&gt;

&lt;p&gt;To make product variant management easier, we have created a solution leveraging machine learning techniques. It reviews all products and all variant groups in our whole assortment to&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;suggest new variant groups&lt;/li&gt;
&lt;li&gt;identify products that should be added to existing variant groups&lt;/li&gt;
&lt;li&gt;suggest changes to existing variant groups (split a group / merge groups)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The solution comprises the following processing steps:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkw1z0feaprfm8upk22zo.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkw1z0feaprfm8upk22zo.jpeg" alt="Processing steps to get variant group proposals" width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identify &lt;em&gt;potential&lt;/em&gt; product variant pairs using constraints: only products from the same manufacturer and from the same product category can be variants of each other.&lt;/li&gt;
&lt;li&gt;Get the probability for each &lt;em&gt;potential&lt;/em&gt; product variant pair if they could be variants of each other: from a &lt;strong&gt;XGBoost&lt;/strong&gt; model (see below for further details)&lt;/li&gt;
&lt;li&gt;Create a graph from the probabilities for each manufacturer and product category&lt;/li&gt;
&lt;li&gt;Identify subgraphs in each graph using a &lt;strong&gt;graph clustering&lt;/strong&gt; algorithm: each subgraph represents a &lt;em&gt;proposed&lt;/em&gt; variant group (see below for further details)&lt;/li&gt;
&lt;li&gt;Compare the &lt;em&gt;proposed&lt;/em&gt; variant groups from the graph clustering algorithm with the &lt;em&gt;existing&lt;/em&gt; variant groups from the product information management system&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After step 5, we show the results to assortment managers and content managers. If they approve the new product variant groups and the suggested variant group changes, the new data can be imported into the product management system and distributed across the whole company from there.&lt;/p&gt;

&lt;h3&gt;
  
  
  The XGBoost model to assign probabilities to product pairs
&lt;/h3&gt;

&lt;p&gt;XGBoost (&lt;a href="https://xgboost.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;eXtreme Gradient Boosting&lt;/a&gt;) is a supervised learning algorithm that needs labelled and structured data as input. That means, we need tabular data with a label column and feature columns. In our case, we have a binary classification problem: products are variants of each other (positive class) / are not variants of each other (negative class). Each potential product variant pair (the result of step 1) is one observation.&lt;/p&gt;

&lt;p&gt;For creating the label column, we make use of the existing variant groups: product pairs from the same variant group belong to the positive class, products from different variant groups belong to the negative class. Products without a product variant group are not used for model training and model evaluation.&lt;/p&gt;

&lt;p&gt;For creating the feature columns, we have sophisticated feature engineering in place. For example, we consider the number of product attributes that the products have in common, the similarity of their product names, their similarity with respect to system IDs and system timestamps, etc.&lt;/p&gt;

&lt;p&gt;After we have successfully trained, validated and evaluated the classifier on training, validation and test sets, we apply it to all potential product variant pairs from the whole assortment. For each product pair, we get the prediction probability of the binary classifier. The prediction probability is a value between 0 and 1, and a higher value means that products are very likely variants of each other. We use the prediction probabilities to build graphs (see below).&lt;/p&gt;

&lt;h3&gt;
  
  
  Graph clustering to propose variant groups
&lt;/h3&gt;

&lt;p&gt;The following image shows an example graph that we created from the prediction probabilities of the XGBoost model. Each node represents one product, and each edge shows the probability that the two products are variants of each other. Shorter and thicker edges represent a higher probability. Probabilities below 0.9 are not shown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu8z6342lz1vz4pdgbmro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu8z6342lz1vz4pdgbmro.png" alt="A graph clustering example" width="707" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://scikit-network.readthedocs.io/en/latest/reference/clustering.html#louvain" rel="noopener noreferrer"&gt;Louvain algorithm&lt;/a&gt; has split up the graph into subgraphs, as indicated by the colors. Nodes that are strongly connected are put into the same subgraph. Each subgraph represents a &lt;em&gt;proposed&lt;/em&gt; variant group. In other terms, the graph clustering algorithm puts products into the same variant group that are very likely to be variants of each other.&lt;/p&gt;

&lt;p&gt;For each manufacturer and each product category, we create such a graph and apply the graph clustering algorithm on it. The algorithm proposes around 800-3000 new variant groups for 2000-9000 products for each country.&lt;/p&gt;

&lt;h2&gt;
  
  
  Put everything together
&lt;/h2&gt;

&lt;p&gt;To execute all steps above in a coordinated manner and on a regular schedule, we have implemented a Kubeflow pipeline which we run via &lt;a href="https://cloud.google.com/vertex-ai/docs/pipelines/introduction" rel="noopener noreferrer"&gt;Vertex AI Pipelines&lt;/a&gt; on the Google Cloud Platform. A &lt;a href="https://streamlit.io/" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt; app shows the variant group proposals of the algorithm to the assortment and content managers, who either approve or reject the proposals. Finally, we have set up a &lt;a href="https://cloud.google.com/run/docs/overview/what-is-cloud-run" rel="noopener noreferrer"&gt;Cloud Run Job&lt;/a&gt; that supports the import of approved variants into the product management system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;There is no need to do data maintenance tasks in a purely manual manner anymore. Machine learning offers so many great opportunities to support and improve business processes.  &lt;/p&gt;

&lt;p&gt;In our case, we have successfully built a pipeline on Google Cloud Platform that suggests new product variant groups and that improves existing variant groups for thousands of products. It leverages a multi-step approach using hard constraints, an XGBoost model, and a graph clustering algorithm.&lt;/p&gt;

&lt;p&gt;What about you? Do you face similar challenges in your business and use data science and machine learning approaches to make your life easier? Feel free to reach out to us and share your experience!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>automation</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How to Do MongoDB Indexes</title>
      <dc:creator>Bernd Stübinger</dc:creator>
      <pubDate>Fri, 01 Aug 2025 08:24:30 +0000</pubDate>
      <link>https://forem.com/mms-tech/how-to-do-mongodb-indexes-2ki6</link>
      <guid>https://forem.com/mms-tech/how-to-do-mongodb-indexes-2ki6</guid>
      <description>&lt;p&gt;&lt;em&gt;You just finished building an awesome feature that involves some new database queries. You have added unit tests, you carefully validated it in your test environment, and you deployed it to production, waiting for cheers and happy feedback.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Except that... your database suddenly implodes, because one of your new queries is extremely slow, taking down the whole database cluster (and your application) with it as well.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sound familiar? Here's how we avoid it.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happened?
&lt;/h2&gt;

&lt;p&gt;But first of all: What was the issue, actually? You tested everything, even in a real environment, right? Right, but not with a real &lt;em&gt;data volume&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In this case, your query was using a collection scan because it was not able to use an index. And while this worked locally and in your test environment, data volume on production was orders of magnitude larger, resulting in significantly higher load on the database and exponentially slower response times.&lt;/p&gt;

&lt;p&gt;Surely, you could duplicate data from production to your test environment. But not only does that come with a certain price tag, you would also have to consider data anonymization and data access, as test environments are usually less restricted - but still must not leak production data.&lt;/p&gt;

&lt;p&gt;So what's the better way?&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;p&gt;To follow our approach, you need to maintain your MongoDB indexes in your application code. This feels like a natural choice when you are already maintaining your queries and data model there, and helps to avoid other problems as well. But more on that later.&lt;/p&gt;

&lt;p&gt;Let's assume you have a simple &lt;code&gt;DataMongoRepository&lt;/code&gt; to manage a collection of &lt;code&gt;Data&lt;/code&gt; documents (yes, naming things is one of the two challenges in software engineering):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;database&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;collection&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getCollection&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"datas"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insertOne&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;fetchById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;firstOrNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In that case, a typical test could look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;mongoContainer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoDbTestContainer&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;also&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;mongoClient&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;mongoContainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReplicaSetUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;repository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;`test&lt;/span&gt; &lt;span class="n"&gt;simple&lt;/span&gt; &lt;span class="nf"&gt;insert`&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Given&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// When&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;fetched&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// Then&lt;/span&gt;
            &lt;span class="nf"&gt;assertEquals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It starts an actual MongoDB via &lt;a href="https://testcontainers.com/" rel="noopener noreferrer"&gt;testcontainers&lt;/a&gt;, inserts some test data and then fetches it to see if it is still the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using an Index
&lt;/h2&gt;

&lt;p&gt;Now, to ensure that this query works with higher loads, let's add an index in our &lt;code&gt;DataMongoRepository&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="nf"&gt;init&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Indexes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we run the test again, it still succeeds (as expected, of course).&lt;/p&gt;

&lt;p&gt;But what if we forgot to add this index? Can we somehow make this more robust?&lt;br&gt;
Yes, we can!&lt;/p&gt;
&lt;h2&gt;
  
  
  Ensuring Index Usage
&lt;/h2&gt;

&lt;p&gt;The magic lies in MongoDB's &lt;a href="https://www.mongodb.com/docs/manual/reference/parameters/#mongodb-parameter-param.notablescan" rel="noopener noreferrer"&gt;&lt;code&gt;notablescan&lt;/code&gt; parameter&lt;/a&gt;. When enabled, it prevents queries from running collection scans, and returns an error instead.&lt;/p&gt;

&lt;p&gt;The parameter can be configured using the &lt;a href="https://www.mongodb.com/docs/manual/reference/command/setParameter/#mongodb-dbcommand-dbcmd.setParameter" rel="noopener noreferrer"&gt;&lt;code&gt;setParameter&lt;/code&gt;&lt;/a&gt; command on the admin database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="n"&gt;mongoClient&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"admin"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{ setParameter: 1, notablescan: 1 }"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our test setup (here simplified as &lt;code&gt;@BeforeTest&lt;/code&gt;) we enable this as default for all tests, which allows us to quickly catch new queries that would try to do a collection scan, and react accordingly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="nd"&gt;@BeforeTest&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;setUp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"admin"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;runCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{ setParameter: 1, notablescan: 1 }"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's see how this works by adding a query by name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;fetchAllByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
        &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Filters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toList&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a corresponding test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;`test&lt;/span&gt; &lt;span class="n"&gt;fetch&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="nf"&gt;name`&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Given&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data2&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// When&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;fetched&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchAllByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// Then&lt;/span&gt;
            &lt;span class="nf"&gt;assertEquals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fetched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the new test fails with an exception:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mongodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MongoQueryException&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt; &lt;span class="n"&gt;failed&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="mi"&gt;291&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;NoQueryExecutionPlans&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice, isn't it?&lt;/p&gt;

&lt;p&gt;To make that test work, we could of course add an index for the &lt;code&gt;name&lt;/code&gt; field. But that's not always desired, so let's see how we can disable validation for a particular test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;`test&lt;/span&gt; &lt;span class="n"&gt;fetch&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="nf"&gt;name`&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Allow table scan&lt;/span&gt;
            &lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"admin"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;runCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{ setParameter: 1, notablescan: 0 }"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// Given&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data2&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// When&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;fetched&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchAllByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// Then&lt;/span&gt;
            &lt;span class="nf"&gt;assertEquals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fetched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Et voilà.&lt;/p&gt;

&lt;p&gt;By using a Kotlin extension function we can make this even more concise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="c1"&gt;// Runs the given [block] with `notablescan` disabled&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;allowTableScan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="nc"&gt;CoroutineScope&lt;/span&gt;&lt;span class="p"&gt;.()&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Unit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"admin"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{ setParameter: 1, notablescan: 0 }"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;block&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{ setParameter: 1, notablescan: 1 }"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;`test&lt;/span&gt; &lt;span class="n"&gt;fetch&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="nf"&gt;name`&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;allowTableScan&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Given&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data2&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// When&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;fetched&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchAllByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// Then&lt;/span&gt;
            &lt;span class="nf"&gt;assertEquals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fetched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can be useful for e.g. complex search queries with multiple parameters, where you either cannot or don't want to create an index for every possible combination of fields. Indexes are not for free, after all - they consume memory and need to be maintained when fields are updated, incurring additional overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Unique Constraint Validation
&lt;/h2&gt;

&lt;p&gt;As mentioned, indexes are not only created for performance reasons. Let's get back to our example: Isn't an &lt;code&gt;id&lt;/code&gt; actually supposed to be unique?&lt;/p&gt;

&lt;p&gt;The following test runs without error, so obviously there is no guarantee:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;`test&lt;/span&gt; &lt;span class="n"&gt;duplicate&lt;/span&gt; &lt;span class="nf"&gt;insert`&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In fact, we now have two &lt;code&gt;Data&lt;/code&gt; documents in our collection with &lt;code&gt;id=1&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
  Document{{_id=688b83c7bc39ec12e9b2a15a, id=1, name=data1}},
  Document{{_id=688b83c7bc39ec12e9b2a15b, id=1, name=data1}}
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially dangerous because &lt;code&gt;fetchById&lt;/code&gt; will simply ignore all except the first document, so it may take a while to actually notice that issue.&lt;/p&gt;

&lt;p&gt;Let's fix the problem by making our index unique using &lt;code&gt;IndexOptions().unique(true)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;database&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mongoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;collection&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getCollection&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="s"&gt;"datas"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;init&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Indexes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nc"&gt;IndexOptions&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the test succeeds and we can validate the expected error code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataMongoRepositoryTest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;`test&lt;/span&gt; &lt;span class="n"&gt;duplicate&lt;/span&gt; &lt;span class="nf"&gt;insert`&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;runBlocking&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Given&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"data1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;// When/Then&lt;/span&gt;
            &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;exception&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;assertFailsWith&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;MongoWriteException&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="nf"&gt;assertEquals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ErrorCategory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DUPLICATE_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ErrorCategory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromErrorCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;We now have an easy way of validating database indexes, and an automated reminder that breaks our build if we accidentally add a new query without corresponding index.&lt;/p&gt;

&lt;p&gt;Naturally, this cannot ensure that all queries use an &lt;em&gt;optimal&lt;/em&gt; index. But while choosing the optimal index for a given query &lt;a href="https://www.mongodb.com/community/forums/t/using-index-but-still-very-slow/120797" rel="noopener noreferrer"&gt;can have significant impact on performance&lt;/a&gt;, it is not always trivial, and you've already come a long way when you can at least guarantee that a query uses &lt;em&gt;any&lt;/em&gt; index in order to avoid worst case scenarios.&lt;/p&gt;

&lt;p&gt;And sometimes, all it needs is a little reminder, so there is really no reason to not include index validation in your build - especially, if it is such a trivial implementation.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>kotlin</category>
      <category>mongodb</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>An Elegant Way to Solve Multi-Tenancy</title>
      <dc:creator>Bernd Stübinger</dc:creator>
      <pubDate>Tue, 30 Jan 2024 16:14:46 +0000</pubDate>
      <link>https://forem.com/mms-tech/an-elegant-way-to-solve-multi-tenancy-41m8</link>
      <guid>https://forem.com/mms-tech/an-elegant-way-to-solve-multi-tenancy-41m8</guid>
      <description>&lt;p&gt;&lt;em&gt;In this article, I will show you an elegant way to introduce multi-tenancy in an already established code base for a Kotlin/Koin stack. And while the examples are specific to this stack, the general idea should be applicable to any dependency injection framework that supports qualifiers.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  History
&lt;/h2&gt;

&lt;p&gt;Our product initially started below the radar to support the migration of MediaMarktSaturn's webshop to the new platform. Back then, we had to replace an existing legacy promotion system by the beginning of the Black Friday season, which meant that we already had a fixed (business) scope &lt;em&gt;and&lt;/em&gt; a fixed deadline.&lt;/p&gt;

&lt;p&gt;Naturally, we had to cut short on some of the more technical aspects of our code base. One of that aspects was multi-tenancy, in our case different countries and sales lines - Media Markt Germany would be a different tenant than Saturn Germany or Media Markt Austria.&lt;/p&gt;

&lt;p&gt;Of course, we knew that we had to deal with multiple tenants &lt;em&gt;at some point&lt;/em&gt; but since the whole migration targeted mediamarkt.de only, we could easily postpone that topic until after the first go-live. We already included the relevant parts in our data model and in our API, though, so that we wouldn't have to perform a data migration and consumers of our system could keep their existing integrations.&lt;/p&gt;

&lt;p&gt;For our implementation we used the much easier "ignorance" strategy, i.e. we didn't deal with tenants anywhere - and in the (few) places where we &lt;em&gt;needed&lt;/em&gt; to work with a tenant, we simply hardcoded it to Media Markt Germany.&lt;/p&gt;

&lt;h2&gt;
  
  
  Initial Setup
&lt;/h2&gt;

&lt;p&gt;We are using a pure Kotlin reactive stack with &lt;a href="https://ktor.io/"&gt;Ktor&lt;/a&gt; as web framework and &lt;a href="https://insert-koin.io/"&gt;Koin&lt;/a&gt; for dependency injection and application configuration. Our product manages coupons and calculates discounts for customers, i.e assesses their baskets with respect to pre-configured promotion rules.&lt;/p&gt;

&lt;p&gt;In its essence, our setup looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;serviceModule&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;module&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;single&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;OutletRepository&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;single&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;single&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;single&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;CouponRepository&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can think of a &lt;code&gt;module&lt;/code&gt; as a space to collect all your Koin-managed components, similar to "beans" in Spring or CDI. &lt;code&gt;single&lt;/code&gt; declares a singleton component that will be instantiated only once and re-used for every injection point.&lt;/p&gt;

&lt;p&gt;In our routing layer we receive basket assessment requests and forward them to our main service, the &lt;code&gt;CalculationService&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nf"&gt;routing&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;calculationService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"basket-assessments"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;request&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;receive&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CalculationRequest&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculationService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculateBasket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CalculationService&lt;/code&gt; then fetches active promotions from the &lt;code&gt;PromotionRepository&lt;/code&gt; and applies coupons using a &lt;code&gt;CouponRepository&lt;/code&gt;. Naturally, this setup has different requirements to multi-tenancy: While coupons and active promotions differ between country and sales line, the core calculation logic will always stay the same and also outlet master data will be identical across all tenants. Coupons are identified by their code and thus even need to be separated on persistence level, so that multiple tenants can use identical codes.&lt;/p&gt;

&lt;p&gt;Inside our &lt;code&gt;CalculationService&lt;/code&gt; we use &lt;code&gt;get&lt;/code&gt; to let Koin eagerly resolve and inject the managed instance(s) of other services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;promotionRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;couponRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CouponRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;outletRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;OutletRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  From Simple...
&lt;/h2&gt;

&lt;p&gt;When we eventually added support for multi-tenancy, we already had an established code base, so we were facing a few challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We couldn't afford (and didn't want) to rebuild everything from scratch&lt;/li&gt;
&lt;li&gt;We had lots of functions that required to know which tenant they were working with&lt;/li&gt;
&lt;li&gt;We also had lots of tests that didn't know anything about tenants yet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we were looking for a solution that would allow us to keep as much of our current implementation as possible while at the same time didn't force us to introduce changes everywhere. We decided to separate all data on database level already so that our tenants could work independently from each other - I've already mentioned coupon codes above, but this can be seen as a good practice in general.&lt;/p&gt;

&lt;p&gt;The simple approach was to determine a tenant from the request, and then add a &lt;code&gt;tenant&lt;/code&gt; parameter to each subsequent function call. Of course, that wasn't exactly... &lt;em&gt;elegant&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;We had some places that experimented with the &lt;a href="https://kotlinlang.org/docs/coroutine-context-and-dispatchers.html"&gt;&lt;code&gt;CoroutineContext&lt;/code&gt;&lt;/a&gt; to transfer tenant information but there were still a lot of places (including tests) that needed to be adapted, and back then we were just starting to understand how coroutines work. Also, we had to rely on the presence of a tenant during runtime, and actually wanted to have a bit more compile-time safety.&lt;/p&gt;

&lt;h2&gt;
  
  
  ...to Elegant
&lt;/h2&gt;

&lt;p&gt;So instead, we opted for "tenant-aware" services. We divided our existing service implementations into those that would provide common functionality and those that needed a tenant. For the latter we introduced a &lt;code&gt;TenantAware&lt;/code&gt; interface with a single property:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;TenantAware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Tenant&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All services that needed a tenant would now implement this interface and receive a &lt;code&gt;tenant&lt;/code&gt; in their constructor. So our module configuration changed to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;serviceModule&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;module&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;single&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;OutletRepository&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nc"&gt;SupportedTenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="nf"&gt;single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;CouponRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way, we had e.g. an instance of &lt;code&gt;PromotionRepository&lt;/code&gt; for MediaMarkt Germany and one for Saturn Germany, and within that instance we had reliable access to the &lt;code&gt;tenant&lt;/code&gt; property whenever we needed it - for example, when referring to other tenant-aware services that would now resolve the tenant-specific instance from Koin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TenantAware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;promotionRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;couponRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CouponRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;outletRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;OutletRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allowed us to limit all new tenant functionality to our routing layer, and we didn't have to change anything in our business code because every tenant-aware instance would only ever talk to other instances for the same tenant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nf"&gt;routing&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"basket-assessments"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;request&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;receive&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CalculationRequest&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;calculationService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculationService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculateBasket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And by adding a tenant suffix to our database collection (we are using &lt;a href="https://www.mongodb.com/"&gt;MongoDB&lt;/a&gt;) we could easily achieve data separation in our persistence as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TenantAware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;collectionName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"promotions.${tenant.id}"&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our tests we only had to adapt our injection points and didn't need to touch any of the actual test methods - voila! If needed, we could easily test multiple tenants by simply injecting different instances of a service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extension Functions
&lt;/h2&gt;

&lt;p&gt;As last polish, we used Kotlin's extension functions to simplify our module definition and dependency resolution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qualifier&lt;/span&gt;
    &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;named&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;inline&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;reified&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;singleByTenant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;noinline&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;.(&lt;/span&gt;&lt;span class="nc"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
    &lt;span class="nc"&gt;SupportedTenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="nf"&gt;single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qualifier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;inline&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;reified&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qualifier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;serviceModule&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;module&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;single&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;OutletRepository&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;singleByTenant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;singleByTenant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;singleByTenant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;CouponRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TenantAware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;promotionRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;PromotionRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;couponRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CouponRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;outletRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;OutletRepository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;routing&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"basket-assessments"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;request&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;receive&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CalculationRequest&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;calculationService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;CalculationService&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calculationService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculateBasket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During the initial design we spent some days tweaking our multi-tenancy implementation but since then we've had no issues and are still using it without major changes.&lt;/p&gt;

&lt;p&gt;And with our meanwhile grown understanding of coroutines, we have also been able to use our &lt;code&gt;TenantAware&lt;/code&gt; interface for elegant tenant-aware authorization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;inline&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;TenantAware&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;requireClaim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Claim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Check if current user has required `claim` for current `tenant`&lt;/span&gt;
    &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CouponService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;tenant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Tenant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;TenantAware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;createCoupon&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;requireClaim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Claim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;COUPON_CREATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tell me what you think and especially, if you see even more potential for improving!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>kotlin</category>
      <category>architecture</category>
    </item>
    <item>
      <title>One Chart to rule them all</title>
      <dc:creator>Florian Heubeck</dc:creator>
      <pubDate>Tue, 30 Jan 2024 09:59:30 +0000</pubDate>
      <link>https://forem.com/mms-tech/one-chart-to-rule-them-all-4hdb</link>
      <guid>https://forem.com/mms-tech/one-chart-to-rule-them-all-4hdb</guid>
      <description>&lt;p&gt;&lt;em&gt;Describing Kubernetes deployments having best practices applied is quite verbose and maintenance intensive. Doing that for multiple applications in various stages by different teams results in lots of yam(l)mer. Helm charts provide a convenient solution. For the best deployment experience, we're maintaining and using a generic application chart.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources over Resources
&lt;/h2&gt;

&lt;p&gt;Operating applications on Kubernetes beyond basic tutorials requires a few resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;em&gt;Deployment&lt;/em&gt; itself&lt;/li&gt;
&lt;li&gt;a &lt;em&gt;ServiceAccount&lt;/em&gt; most of the time&lt;/li&gt;
&lt;li&gt;one to many &lt;em&gt;ConfigMaps&lt;/em&gt; as well as &lt;em&gt;Secrets&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Service&lt;/em&gt; and &lt;em&gt;Ingress&lt;/em&gt; if someone shall interact with your deployment&lt;/li&gt;
&lt;li&gt;often a &lt;em&gt;[Horizontal&amp;amp;vert;Vertical]PodAutoscaler&lt;/em&gt; referring the deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As we're adopting GitOps practices for quite some time with great success - you may have heard that 😇 - the manifests of all those resources have to be literally written into a Git repository.&lt;/p&gt;

&lt;p&gt;Mentioning GitOps... for automating rollouts there are even more (custom) resources required in order to instruct our GitOps controllers (Flux &amp;amp; Friends):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;em&gt;ImageRepository&lt;/em&gt; targeting the container registry&lt;/li&gt;
&lt;li&gt;a &lt;em&gt;ImagePolicy&lt;/em&gt; declaring version patterns to automate&lt;/li&gt;
&lt;li&gt;and eventually a &lt;em&gt;Canary&lt;/em&gt; as no one wants to observe rollouts manually&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Observation - another great topic: To announce applications to the Prometheus monitoring stack, a &lt;em&gt;ServiceMonitor&lt;/em&gt; resource comes on top.&lt;/p&gt;

&lt;p&gt;Well... over time you'll probably add a &lt;em&gt;PodDisruptionBudget&lt;/em&gt; or want to have a service mesh in place, adding furthermore resources, for instance &lt;em&gt;DestinationRules&lt;/em&gt; and &lt;em&gt;VirtualServices&lt;/em&gt; in case of Istio.&lt;br&gt;
Wait - with Istio in place, the &lt;em&gt;Ingress&lt;/em&gt; resource gets replaced by a &lt;em&gt;VirtualService&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;As we're constantly evolving, our added service mesh may get superseded by the Kubernetes Gateway API, introducing &lt;em&gt;Gateway&lt;/em&gt; or &lt;em&gt;HttpRoutes&lt;/em&gt; in favor of the Istio CRDs.&lt;/p&gt;

&lt;p&gt;Guess you got the point. Writing all those manifest isn't even the hardest part. They need to be maintained constantly. Kubernetes APIs come and go, with some version lifecycle in between.&lt;/p&gt;
&lt;h2&gt;
  
  
  Scaling pains
&lt;/h2&gt;

&lt;p&gt;Of course, some team has written all those resource manifests for the first application that was deployed. And they quickly put them into a Helm chart when it was about replication to multiple stages. The Helm chart could also be reused for further applications of that team, because - honestly - all applications ("microservices") are looking very similar.&lt;/p&gt;

&lt;p&gt;To ensure, that changes on templated manifests don't break everything immediately, the team linted their Helm chart using different tools, hinting on upcoming Kubernetes API changes, or recommending best practices.&lt;/p&gt;

&lt;p&gt;That's all nice and works out well. If there weren't other teams doing exactly the same but different. After a while, there were countless charts around, that all served the same purpose, but varied only slightly.&lt;/p&gt;

&lt;p&gt;Like everywhere else, reuse is practiced by duplication, but we weren't able to copy solutions, because of the minor differences in the description of our application declarations.&lt;/p&gt;
&lt;h2&gt;
  
  
  Combine and generalize
&lt;/h2&gt;

&lt;p&gt;So we did what was the obvious solution: We took the best from all those Helm charts and brewed a generic &lt;a href="https://github.com/MediaMarktSaturn/helm-charts/tree/main/charts/application" rel="noopener noreferrer"&gt;&lt;em&gt;application&lt;/em&gt; chart&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This chart was even our first open sourced component. It describes a typical application workload, incorporating the requirements of many, many product teams - and guess what: they're all nearly the same.&lt;/p&gt;

&lt;p&gt;Our application chart is tested against the latest seven Kubernetes versions (1.23 to 1.29 at the time of writing) in lots of variations using the &lt;a href="https://github.com/helm/chart-testing" rel="noopener noreferrer"&gt;Helm chart-testing project&lt;/a&gt;. Each permutation is also validated using a specialized Kubernetes manifest linter for best practices and security hardening.&lt;/p&gt;

&lt;p&gt;Every time, someone requires something not yet possible, the chart is improved for the benefit of all. In the meanwhile, there's quite an impressive feature list, which you can easily study by reading its &lt;a href="https://github.com/MediaMarktSaturn/helm-charts/blob/main/charts/application/values.yaml" rel="noopener noreferrer"&gt;values.yaml&lt;/a&gt; - being also the documentation of the chart. Some highlights are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Init container and sidecar configuration without hassle&lt;/li&gt;
&lt;li&gt;Google Kubernetes Engine (GKE) support on workload identity and ingress config&lt;/li&gt;
&lt;li&gt;Automated canary deployment support using &lt;a href="https://flagger.app/" rel="noopener noreferrer"&gt;Flagger&lt;/a&gt;, implemented using:&lt;/li&gt;
&lt;li&gt;Service meshes (Istio or Linkerd) as well as the Gateway API&lt;/li&gt;
&lt;li&gt;Various configuration, secret and volume mount options&lt;/li&gt;
&lt;li&gt;Job execution before rollout for preparation tasks like database schema updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All in all, the following [custom] resource manifests are managed by the chart (at the time of writing):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;flagger.app&lt;/em&gt; &lt;strong&gt;Canary&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;image.toolkit.fluxcd.io&lt;/em&gt; &lt;strong&gt;ImagePolicy&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;image.toolkit.fluxcd.io&lt;/em&gt; &lt;strong&gt;ImageRepository&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;cloud.google.com&lt;/em&gt; &lt;strong&gt;BackendConfig&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;networking.istio.io&lt;/em&gt; &lt;strong&gt;DestinationRule&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;networking.istio.io&lt;/em&gt; &lt;strong&gt;VirtualService&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ConfigMap&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;apps&lt;/em&gt; &lt;strong&gt;Deployment&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;autoscaling&lt;/em&gt; &lt;strong&gt;HorizontalPodAutoscaler&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;gateway.networking.k8s.io&lt;/em&gt; &lt;strong&gt;HTTPRoute&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;networking.k8s.io&lt;/em&gt; &lt;strong&gt;NetworkPolicy&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;policy&lt;/em&gt; &lt;strong&gt;PodDisruptionBudge&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;batch&lt;/em&gt; &lt;strong&gt;Job&lt;/strong&gt; (as &lt;code&gt;pre-install,pre-upgrade&lt;/code&gt; Helm hook)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secret&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ServiceAccount&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Service&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;monitoring.coreos.com&lt;/em&gt; &lt;strong&gt;ServiceMonitor&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Needless to mention, that the highest API version available is also respected, so full compatibility to all supported Kubernetes releases is ensured.&lt;/p&gt;
&lt;h2&gt;
  
  
  Sample time
&lt;/h2&gt;

&lt;p&gt;As stated earlier, this chart, which we're using for a wide spectrum of different types of application on different Kubernetes versions, is available for everyone.&lt;/p&gt;

&lt;p&gt;The chart can be installed using the Helm cli&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add mms-tech https://helm-charts.mms.tech
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or Flux&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;source.toolkit.fluxcd.io/v1beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRepository&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mms-tech&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;120m&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://helm-charts.mms.tech&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To give you an impression on its usage, this is how our (well-known and beloved) GitHub app &lt;a href="https://github.com/MediaMarktSaturn/technolinator" rel="noopener noreferrer"&gt;Technolinator&lt;/a&gt; is configured:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;helm.toolkit.fluxcd.io/v2beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRelease&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;technolinator&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;chart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;chart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;application&lt;/span&gt;
      &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~1"&lt;/span&gt;
      &lt;span class="na"&gt;sourceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRepository&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chart-repo&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60m&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10m&lt;/span&gt;
  &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/mediamarktsaturn/technolinator&lt;/span&gt;
      &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.48.11&lt;/span&gt; &lt;span class="c1"&gt;# {"$imagepolicy": "app:technolinator:tag"}&lt;/span&gt;
      &lt;span class="na"&gt;tagSemverRange&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~1"&lt;/span&gt;
    &lt;span class="na"&gt;secretEnvFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;technolinator-config&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
        &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10Gi&lt;/span&gt;
      &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6"&lt;/span&gt;
        &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;35Gi&lt;/span&gt;
    &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/q/health/live&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/q/health/ready&lt;/span&gt;
    &lt;span class="na"&gt;monitoring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;serviceMonitor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;metricsPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/q/metrics&lt;/span&gt;
    &lt;span class="na"&gt;podSecurityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;runAsUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;201&lt;/span&gt;
      &lt;span class="na"&gt;runAsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;101&lt;/span&gt;
      &lt;span class="na"&gt;fsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;101&lt;/span&gt;
      &lt;span class="na"&gt;fsGroupChangePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Always&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data&lt;/span&gt;
        &lt;span class="na"&gt;pvcName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;technolinator-data&lt;/span&gt;
        &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data&lt;/span&gt;
    &lt;span class="na"&gt;configuration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;APP_ANALYSIS_TIMEOUT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;120M&lt;/span&gt;
      &lt;span class="na"&gt;APP_PROCESS_LOGLEVEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DEBUG&lt;/span&gt;
      &lt;span class="na"&gt;APP_PULL_REQUESTS_CONCURRENCY_LIMIT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again: All features are explorable in the &lt;a href="https://github.com/MediaMarktSaturn/helm-charts/blob/main/charts/application/values.yaml" rel="noopener noreferrer"&gt;values.yaml&lt;/a&gt;.&lt;br&gt;
We will be happy, if this chart could also help others. And we're welcome contributions of any kinds for further improvements and use cases that we have not yet considered.&lt;/p&gt;

&lt;h3&gt;
  
  
  The end
&lt;/h3&gt;

&lt;p&gt;There's a very divisive discussion about such generic (monster/mega) charts. Sure, unique applications, provided by their projects have their own charts, and it doesn't make any sense to squeeze them into a generic one.&lt;/p&gt;

&lt;p&gt;But in our opinion it's different for internal business applications of a company. Using our generic chart is a matter of hardening, reuse and easy knowledge sharing on application deployment configuration. The first one demanding a certain setup crafts it once, for everyone's benefit.&lt;/p&gt;

&lt;p&gt;Especially when it comes to repetitive complex configurations like sidecars used for integration with databases or database schema update jobs, there's no need to reinvent the wheel over and over.&lt;/p&gt;

&lt;p&gt;Reach out (for instance using GitHub &lt;a href="https://github.com/MediaMarktSaturn/helm-charts/issues" rel="noopener noreferrer"&gt;issues&lt;/a&gt; or &lt;a href="https://github.com/MediaMarktSaturn/helm-charts/discussions" rel="noopener noreferrer"&gt;discussions&lt;/a&gt;), if you'd like to discuss with us.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>helm</category>
      <category>kubernetes</category>
      <category>devops</category>
      <category>gitops</category>
    </item>
    <item>
      <title>Software Supply Chain Awareness at Scale</title>
      <dc:creator>Florian Heubeck</dc:creator>
      <pubDate>Tue, 30 Jan 2024 09:57:22 +0000</pubDate>
      <link>https://forem.com/mms-tech/software-supply-chain-awareness-at-scale-150j</link>
      <guid>https://forem.com/mms-tech/software-supply-chain-awareness-at-scale-150j</guid>
      <description>&lt;p&gt;&lt;em&gt;Securing the software supply chain is hip nowadays.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About Dependencies
&lt;/h2&gt;

&lt;p&gt;You know these jokes about Windows users fearing updates, MacOS disciples paying updates and Linux nerds awaiting updates?&lt;br&gt;
I've always loved updates, even long before I became a Linux user. New versions ship new features, fix bugs, look better (sometimes), run faster (hopefully) and recently: provide security.&lt;/p&gt;

&lt;p&gt;Some eye-opening samples from the newer history raised awareness, be it well-intended features becoming exploitable like with "log4shell" or npm packages hijacked like "coa".&lt;br&gt;
It's obviously not only necessary to keep dependencies up-to-date, but also to observe them for potential vulnerabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Software Bill of Material
&lt;/h2&gt;

&lt;p&gt;To achieve supply chain security there's, first of all, need for taking inventory. We need to know which components we are depending on.&lt;br&gt;
That's what the so called SBOM is about. It's the ingredients a system is made of, with lots of metadata like checksums or licenses attached.&lt;br&gt;
There are two major formats out there, "&lt;a href="https://spdx.dev/" rel="noopener noreferrer"&gt;SPDX&lt;/a&gt;" driven by the Linux foundation, and "&lt;a href="https://cyclonedx.org/" rel="noopener noreferrer"&gt;CycloneDX&lt;/a&gt;" by &lt;a href="https://owasp.org/" rel="noopener noreferrer"&gt;OWASP&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  OWASP Projects
&lt;/h2&gt;

&lt;p&gt;The CycloneDX standard glares with better tool support, mainly because the OWASP provides lots of tooling around it. There are plugins for nearly every programming language or build framework to create an SBOM file. And there's a kind of orchestration tool named &lt;a href="https://github.com/CycloneDX/cdxgen" rel="noopener noreferrer"&gt;cdxgen&lt;/a&gt; that recently joined the official CycloneDX umbrella.&lt;/p&gt;

&lt;p&gt;Cdxgen wraps around ~30 different &lt;em&gt;&lt;a href="https://github.com/CycloneDX/cdxgen#supported-languages-and-package-format" rel="noopener noreferrer"&gt;things&lt;/a&gt;&lt;/em&gt; (at time of writing) to create a SBOM for nearly every project by a single command.&lt;br&gt;
That's very useful for a heterogenous project zoo like in an enterprise environment.&lt;/p&gt;

&lt;p&gt;Another great tool provided by OWASP is &lt;a href="https://dependencytrack.org/" rel="noopener noreferrer"&gt;Dependency-Track&lt;/a&gt; which consumes SBOMs, visualizes projects and dependencies and makes them searchable.&lt;br&gt;
But that's not all: Dependency-Track also consumes vulnerability databases from public as well as commercial services and matches CVEs to the dependencies of our projects, AND scans artifact repositories for updates of our dependencies 🤯.&lt;/p&gt;

&lt;p&gt;That provides us with risk assessment at a glance together with precise pointers where to take action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Ends
&lt;/h2&gt;

&lt;p&gt;Building a holistic inventory of all dependencies for an entire organization would require each and every project to create and upload a SBOM. While this is of course possible, it's not very efficient and comes with some management overhead of distributing Dependency-Track API keys.&lt;/p&gt;

&lt;p&gt;That's where our GitHub app &lt;a href="https://github.com/MediaMarktSaturn/technolinator" rel="noopener noreferrer"&gt;Technolinator&lt;/a&gt; comes in. GitHub apps can be installed on organization level and instrument all contained repositories.&lt;br&gt;
Technolinator wraps around cdxgen and gets notified on GitHub &lt;em&gt;push&lt;/em&gt; events. For every update on a repository's default branch, a fresh SBOM is created and uploaded to Dependency-Track:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgh3xc2k539wfmt3e86va.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgh3xc2k539wfmt3e86va.png" alt="Image description" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback about the process is provided as commit status:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik25tg1jhzcbqcrxu5ff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik25tg1jhzcbqcrxu5ff.png" alt="Image description" width="549" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using this approach we are able to efficiently create an inventory of all our dependencies, and in addition provide all teams some insights on potential risks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hello Open-Source World
&lt;/h2&gt;

&lt;p&gt;Since our Dependency-Track installation shows thousands of open-source projects, and this solution of course also makes use of awesome projects, we are happy to announce that Technolinator became the first open-source software project we provide as MediaMarktSaturn Technology, together with a public &lt;a href="https://github.com/MediaMarktSaturn/helm-charts" rel="noopener noreferrer"&gt;Helm chart&lt;/a&gt; repository that contains our Dependency-Track configuration as publicly available chart.&lt;/p&gt;

&lt;p&gt;Just have a look into Technolinator - whose name has no meaning beyond its beautiful sound. It's easy to adopt to your needs, its documentation contains already everything to get started, just start a discussion or file an issue to get your questions answered, if any.&lt;br&gt;
We hope to help other organizations solving software supply chain issues using Technolinator as well, and of course welcome any contribution.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>sbom</category>
      <category>github</category>
      <category>softwaresupplychain</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>Reliable Application Deployments in a GitOps Setup with Flagger</title>
      <dc:creator>Florian Heubeck</dc:creator>
      <pubDate>Tue, 30 Jan 2024 09:51:06 +0000</pubDate>
      <link>https://forem.com/mms-tech/reliable-application-deployments-in-a-gitops-setup-with-flagger-5hjm</link>
      <guid>https://forem.com/mms-tech/reliable-application-deployments-in-a-gitops-setup-with-flagger-5hjm</guid>
      <description>&lt;p&gt;&lt;em&gt;The ultimate goal of every GitOps setup is complete automation. For a reliable, headless application rollout, the perfect supplement to our Flux managed Kubernetes resources is &lt;a href="https://flagger.app" rel="noopener noreferrer"&gt;Flagger&lt;/a&gt;. In this blog, you will learn how to not worry about application deployments at any time, even on Fridays.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article is the second of two accompanying articles to my talk about this topic on the &lt;a href="https://www.mastering-gitops.de/veranstaltung-15616-se-0-die-gitops-delivery-pipeline-ueberwachen-und-haerten-mit-flux-%26-flagger.html" rel="noopener noreferrer"&gt;Mastering GitOps&lt;/a&gt; conference.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Having continuous delivery in place means, that new versions (that passed the fully automated continuous integration (CI), of course) automatically get deployed - ideally also to production.&lt;br&gt;
&lt;a href="https://dev.to/mms-tech/monitoring-and-hardening-the-gitops-delivery-pipeline-with-flux-1gk"&gt;In the related blog article about how to monitor and harden the GitOps setup itself&lt;/a&gt;, our goal was to reduce risk of failure and build alerting for actual errors, so that no one has to actively monitor the rollout of changes.&lt;/p&gt;

&lt;p&gt;All of this already relaxes operation, but anyhow we may be forced to react immediately to mitigate customer impact in case of an incident.&lt;br&gt;
In this way, we want to protect our business applications even better against failures caused by changes.&lt;br&gt;
There are lots of deployment strategies out there to ensure smooth changes and early problem discovery - and for applying them in our Kubernetes GitOps setup in an automated way, we're using Flagger.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deployment sources of error
&lt;/h2&gt;

&lt;p&gt;A typical business application may fail in production for mainly two reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuration errors&lt;/li&gt;
&lt;li&gt;Software problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first is obvious: Configuration is different for each environment. And most of the time, an application will face the production config first when hitting production. Adopting GitOps properly, your application configuration is hold under version control and passes the four-eyes-principle before getting applied - this reduces risk of error considerably.&lt;/p&gt;

&lt;p&gt;Software problems indeed should be discovered far before production, you would say. True, for sure. But true as well is, that there's no place comparable with production. Data, stored and (possibly) migrated over years or decades. Traffic profiles or event series, impossible to create artificially during CI. And of course: Users. Creative and ingenious. Say no more, we're all doing our best to ensure quality, but we have to handle the unforeseeable.&lt;/p&gt;

&lt;p&gt;When doing changes to our systems, we should apply them carefully, maybe run some validations and possibly even give them a circumspect try before rolling out completely.&lt;/p&gt;
&lt;h2&gt;
  
  
  Application shadows
&lt;/h2&gt;

&lt;p&gt;So far, I probably haven't told you anything new. The relevant part is, that we don't want to do any of that manually. Why investing in universal automation, but risking the most important step to fail, requiring manual remediation.&lt;/p&gt;

&lt;p&gt;Let's have a look to a typical setup in our context:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkztqm3dthikhwmwvht0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkztqm3dthikhwmwvht0u.png" alt="Image description" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a deployment wrapping our business application, ConfigMap and Secret are mounted for injecting configuration. Most likely there's a HorizontalPodAutoscaler (HPA) or a ScaledObject (SO) definition as well.&lt;br&gt;
The application publishes metrics, scraped by Prometheus, and of course there are metrics available from the ingress and Kubernetes itself as well.&lt;/p&gt;

&lt;p&gt;On every update on the deployment declaration, Kubernetes runs a rolling update by default. That means, new pods are created and when they become ready, old pods are terminated. The insurance for well-being of the software has to be built into the health probes.&lt;/p&gt;

&lt;p&gt;When configuration changes, its application depends on the way it's used. Environment values only change on pod creation or restart. File mounts are actually updated ad-hoc, but the application also needs to re-initialize them.&lt;/p&gt;

&lt;p&gt;Since our application manifests are anyhow packaged as Helm chart, we can trigger a re-deployment on configuration change by annotating checksums of ConfigMap and Secret into the pod template of our deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;checksum/config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include (print $.Template.BasePath "/configmap.yaml") . | sha256sum&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
        &lt;span class="na"&gt;checksum/secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;include (print $.Template.BasePath "/secret.yaml") . | sha256sum&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we're prepared for introducing Flagger. Flagger is configured by a so called &lt;a href="https://docs.flagger.app/usage/how-it-works#canary-resource" rel="noopener noreferrer"&gt;&lt;code&gt;Canary&lt;/code&gt; CRD&lt;/a&gt; that instructs it to care about a defined target, in our case the deployment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyd1lyg50zfe1su1sfrgd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyd1lyg50zfe1su1sfrgd.png" alt="Image description" width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically, Flagger duplicates this target, its mounted resources and autoscaler objects referenced by the Canary CR.&lt;br&gt;
Eventually, when the copy became ready, Flagger instructs the Ingress to route the traffic to the copy, the so called "primary" deployment.&lt;br&gt;
The last step of this initialization phase is to scale down the original deployment, now called "canary".&lt;/p&gt;

&lt;p&gt;This seems to be very complicated, what do we gain from it?&lt;br&gt;
First of all some psychological safety 😉 - all the traffic targeting our application hits a copy of our original definition. Changing the definitions has exactly zero immediate impact to our users. And don't worry, the complexity will disappear as it's transparently handled by Flagger.&lt;/p&gt;
&lt;h2&gt;
  
  
  The cycle of deployment life
&lt;/h2&gt;

&lt;p&gt;Are you aware on how heavily frequented road or rail bridges are replace? There's a new bridge created alongside the old one and the traffic shifted over after it's complete.&lt;br&gt;
The additional bridge might be a temporary one, to free up the old bridge for their reconstruction.&lt;br&gt;
This is exactly how Flagger works.&lt;/p&gt;

&lt;p&gt;On changes to the Kubernetes manifest of our Deployment, Flagger reacts by scaling it up, wait for it to get ready and runs predefined &lt;a href="https://docs.flagger.app/usage/webhooks" rel="noopener noreferrer"&gt;actions&lt;/a&gt;.&lt;br&gt;
These actions are basically HTTP calls along the process of rollout. With that, Flagger can trigger for instance automated tests, validating the new deployment. And it's not only firing these webhooks, but interpreting the response code. Flagger will not proceed with the next step of the rollout, if configured webhooks did not succeed.&lt;br&gt;
We're using those webhooks for request generation on non-production environments to simulate application upgrade under load.&lt;/p&gt;

&lt;p&gt;Of course not every application is only serving requests but doing batch processing or connecting to event sources. Also these kind of deployments can be progressively rolled out using Flagger. As requests cannot be routed by modifying Ingress rules in this case, Flaggers webhooks have to be used for guiding the application through the deployment lifecycle.&lt;/p&gt;

&lt;p&gt;Eventually, Flagger will synchronize its copy of the Kubernetes resources, completing the rollout, and scale down our deployment of origin again. This deployment strategy is referred to as &lt;a href="https://docs.flagger.app/tutorials/kubernetes-blue-green" rel="noopener noreferrer"&gt;blue/green&lt;/a&gt; and already lets us sleep very well, hence any issue with the change to be rollout out, will cause an abortion, no users will be impacted.&lt;/p&gt;

&lt;p&gt;There are metrics provided from Flagger reporting the state of all Canary definitions, which we can use for alerting: &lt;code&gt;flagger_canary_status&lt;/code&gt;.&lt;br&gt;
We've configured a PrometheusRule that triggers a chat message containing some useful information about the failed deployment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focrdqla8fjl2zei2oueo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focrdqla8fjl2zei2oueo.png" alt="Image description" width="497" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, there is also a notification from Flagger itself included, as it can not only trigger webhooks, but also &lt;a href="https://docs.flagger.app/usage/alerting" rel="noopener noreferrer"&gt;"alert"&lt;/a&gt; humans.&lt;br&gt;
The "CanaryRollback" rule is defined as follows and uses some custom metrics of our application to provide more details (yeah, we're proud of that PromQLs 😇):&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PrometheusRule&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CanaryRollback&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flagger_canary_status &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Canary&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;failed"&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
              &lt;span class="s"&gt;Canary deployment of version&lt;/span&gt;
              &lt;span class="s"&gt;{{ with query (printf "max_over_time(promotion_service_major_version{job=\"%s-canary\"}[2h])" $labels.name) }}{{ . | first | value | humanize }}{{ end }}.{{ with query (printf "max_over_time(promotion_service_minor_version{job=\"%s-canary\"}[2h])" $labels.name) }}{{ . | first | value | humanize }}{{ end }}.{{ with query (printf "max_over_time(promotion_service_patch_version{job=\"%s-canary\"}[2h])" $labels.name) }}{{ . | first | value | humanize }}{{ end }}&lt;/span&gt;
              &lt;span class="s"&gt;to {{ $labels.name }}.{{ $labels.exported_namespace }} failed.&lt;/span&gt;
            &lt;span class="na"&gt;canaryVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(printf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"max_over_time(promotion_service_major_version{job=\"%s-canary\"}[2h])"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;humanize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}.{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(printf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"max_over_time(promotion_service_minor_version{job=\"%s-canary\"}[2h])"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;humanize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}.{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(printf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"max_over_time(promotion_service_patch_version{job=\"%s-canary\"}[2h])"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;humanize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}'&lt;/span&gt;
            &lt;span class="na"&gt;primaryVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(printf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"max(promotion_service_major_version{job=\"%s-primary\"})"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;humanize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}.{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(printf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"max(promotion_service_minor_version{job=\"%s-primary\"})"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;humanize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}.{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(printf&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"max(promotion_service_patch_version{job=\"%s-primary\"})"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;humanize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  The proof of the pudding is in the eating
&lt;/h2&gt;

&lt;p&gt;Our updated application seems to feel quite well in production by now, but why throw it into the cold water of production traffic.&lt;br&gt;
As you may have guessed from the CRD name "Canary", Flaggers main purpose is the automated progressive rollout of applications.&lt;br&gt;
For making use of that functionality best, let's give Flagger fine grain traffic control by adding a service mesh:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7efjft6apd2ld20n5toe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7efjft6apd2ld20n5toe.png" alt="Image description" width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Flagger supports all major service mesh implementations and can also perform progressive rollouts using the most common ingress gateways. I've created a &lt;a href="https://github.com/heubeck/cloudland-canary" rel="noopener noreferrer"&gt;playground setup&lt;/a&gt; using &lt;a href="https://docs.flagger.app/tutorials/linkerd-progressive-delivery" rel="noopener noreferrer"&gt;Linkerd&lt;/a&gt; some while ago, which gives a first hands-on experience in an easy way.&lt;/p&gt;

&lt;p&gt;With a service mesh or an ingress gateway supporting weighted routing, Flagger can route sparse traffic to our updated canary deployment and increase the amount stepwise according to the &lt;a href="https://docs.flagger.app/usage/deployment-strategies#rollout-weights" rel="noopener noreferrer"&gt;Canary definition&lt;/a&gt;. But Flagger not only opens the floodgates slowly, giving our new deployment a chance to warm up, it also cares about our applications' wellbeing.&lt;/p&gt;

&lt;p&gt;Flagger &lt;a href="https://docs.flagger.app/usage/metrics" rel="noopener noreferrer"&gt;analyses metrics&lt;/a&gt; provided by the service mesh or the ingress gateway to decide for further rolling out the new application, or better rolling back. Lots of metric definitions are already built in, and selected according to the chosen mesh or ingress implementation, but custom metrics can easily be considered as well.&lt;br&gt;
If there are issues with the changed canary version, only a little amount of requests are affected, and according to the configured analysis configuration, it's rolled back quickly.&lt;/p&gt;

&lt;p&gt;The key is, to properly express the relevant aspects in metrics, by the application itself, but also from infrastructure components like databases (e.g. open connections) and especially service mesh or ingress components, that can provide the client view on your application.&lt;br&gt;
Depending on how you consider your application working great, business metrics from the application could make sense, but also comparing metrics like "the updated application must not be slower than the old one", within some tolerance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flagger.app/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MetricTemplate&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;request-duration-factor-pfifty&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus&lt;/span&gt;
        &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus-operated.monitoring:9090&lt;/span&gt;
    &lt;span class="c1"&gt;# Factor of P50 request duration canary to primary&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;histogram_quantile(0.50,&lt;/span&gt;
          &lt;span class="s"&gt;sum(&lt;/span&gt;
            &lt;span class="s"&gt;irate(&lt;/span&gt;
              &lt;span class="s"&gt;istio_request_duration_milliseconds_bucket{&lt;/span&gt;
                &lt;span class="s"&gt;reporter="destination",&lt;/span&gt;
                &lt;span class="s"&gt;destination_workload_namespace="{{ namespace }}",&lt;/span&gt;
                &lt;span class="s"&gt;destination_workload="{{ target }}"&lt;/span&gt;
            &lt;span class="s"&gt;}[{{ interval }}]&lt;/span&gt;
        &lt;span class="s"&gt;)) by (le))&lt;/span&gt;
        &lt;span class="s"&gt;/&lt;/span&gt;
        &lt;span class="s"&gt;histogram_quantile(0.50,&lt;/span&gt;
          &lt;span class="s"&gt;sum(&lt;/span&gt;
            &lt;span class="s"&gt;irate(&lt;/span&gt;
              &lt;span class="s"&gt;istio_request_duration_milliseconds_bucket{&lt;/span&gt;
                &lt;span class="s"&gt;reporter="destination",&lt;/span&gt;
                &lt;span class="s"&gt;destination_workload_namespace="{{ namespace }}",&lt;/span&gt;
                &lt;span class="s"&gt;destination_workload="{{ target }}-primary"&lt;/span&gt;
            &lt;span class="s"&gt;}[{{ interval }}]&lt;/span&gt;
        &lt;span class="s"&gt;)) by (le))&lt;/span&gt;
&lt;span class="s"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flagger.app/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Canary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;p50-factor&lt;/span&gt;
        &lt;span class="na"&gt;templateRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;request-duration-factor-pfifty&lt;/span&gt;
        &lt;span class="na"&gt;thresholdRange&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Everything is better in graphics
&lt;/h2&gt;

&lt;p&gt;In the beginning, I said, no one wants to observe changes getting rolled out well, but rely on automation and alerting.&lt;br&gt;
Call me a pretender, but even with all of this in place, and the confidence nothing bad will happen, I still like to watch changes getting introduced to production:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gzwgnqwhj4xwf3t6csv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gzwgnqwhj4xwf3t6csv.png" alt="Image description" width="800" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8eo8hsbgasy4phpr8l9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8eo8hsbgasy4phpr8l9u.png" alt="Image description" width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These Grafana panels visualize, how traffic gets shifted from the primary deployment (Flaggers copy of the previous version) - above the zero-line, to the updated canary deployment - below the zero-line, and back.&lt;/p&gt;

&lt;p&gt;We were asked, why not interchanging roles of primary and canary after the traffic got shifted completely. We're not aware of an official statement by the Flagger team, but my guess is: For simplicity. For being able to having the canary become the primary after successful analysis, another abstraction would be required. The way Flagger works allows us, to just work with the plain Kubernetes app resources.&lt;/p&gt;

&lt;p&gt;As you see in the traffic weight panel above, we shift 100% of the traffic to the canary before Flagger synchronizes its primary copy. This is not necessary and can be done at any weight - but it gives use some benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No sudden traffic impact on the newly created pods of the updated primary deployment&lt;/li&gt;
&lt;li&gt;Slow warm up of the pods, either implicit (built-in) or explicit (using Flagger webhooks)&lt;/li&gt;
&lt;li&gt;Better resource utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some words about the latter: Lets assume our application has a high resource demand, because it's heavily scaled out. During a Canary rollout, this demand would double, what might be a waste of resources, or even not possible because of a restricted cluster size. With separate autoscaling configurations (HPA/SO), the canary can extend its resource demand during the traffic shift, whereas the primary can shrink. This way the overall resources don't exceed the standard Kubernetes rolling upgrades.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sleep well
&lt;/h2&gt;

&lt;p&gt;When starting with Flagger, you need to be aware that the new and the old versions of your application will run in parallel for some time and that both will handle traffic or process data.&lt;br&gt;
This can cause issues with database schema updates or server-side client state. Flagger supports conditional traffic routing like for A/B testing, but you might be restricted in which extend you do progressive rollouts.&lt;br&gt;
Regarding database or API schema changes: During a default Kubernetes rolling upgrade, new and old application versions are running also in parallel, this holds the exact same issues, albeit in a shorter time frame.&lt;br&gt;
Another variation of progressive delivery especially for frontend applications, where randomly providing new and old application version may have user impact, can be achieved using &lt;a href="https://docs.flagger.app/usage/deployment-strategies#canary-release-with-session-affinity" rel="noopener noreferrer"&gt;session affinity&lt;/a&gt;. In this configuration, Flagger will use cookies for sticking to the new application version once a user hit it.&lt;br&gt;
By now, there's no support for StatefulSets in Flagger, but it is considered. This might extend the opportunities of adopting Flagger to more picky workloads like databases.&lt;/p&gt;

&lt;p&gt;Retrospectively, Flagger already saved us from some outages, and there are lots of ideas to explore on how to adopt it to non-classic request/response driven workloads. In any case we are reckless to commit changes in the Friday evening.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>gitops</category>
      <category>canary</category>
      <category>continuousdelivery</category>
    </item>
    <item>
      <title>Monitoring and Hardening the GitOps Delivery Pipeline with Flux</title>
      <dc:creator>Florian Heubeck</dc:creator>
      <pubDate>Tue, 30 Jan 2024 09:50:24 +0000</pubDate>
      <link>https://forem.com/mms-tech/monitoring-and-hardening-the-gitops-delivery-pipeline-with-flux-1gk</link>
      <guid>https://forem.com/mms-tech/monitoring-and-hardening-the-gitops-delivery-pipeline-with-flux-1gk</guid>
      <description>&lt;p&gt;&lt;em&gt;The ultimate goal of every GitOps setup is complete automation. For being able to operate a system hands-off, its monitoring and alerting has to be reliable and comprehensive. In this blog, you will learn how to monitor a &lt;a href="https://fluxcd.io/" rel="noopener noreferrer"&gt;FluxCD&lt;/a&gt; operated GitOps setup on Kubernetes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article is the first of two accompanying articles to my talk about this topic on the &lt;a href="https://www.mastering-gitops.de/veranstaltung-15616-se-0-die-gitops-delivery-pipeline-ueberwachen-und-haerten-mit-flux-%26-flagger.html" rel="noopener noreferrer"&gt;Mastering GitOps&lt;/a&gt; conference.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update July 2025: Article was reworked for my &lt;a href="https://meine.doag.org/events/cloudland/2025/agenda/#agendaId.5843" rel="noopener noreferrer"&gt;CloudLand Festival talk&lt;/a&gt; and is up-to-date for Flux 2.6&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About automation
&lt;/h2&gt;

&lt;p&gt;With GitOps we try to eliminate any manual task from infrastructure handling and application operation. (Virtual) infrastructure is described declaratively and rolled out by &lt;a href="https://medium.com/mediamarktsaturn-tech-blog/terraforming-mediamarktsaturn-951b2a73551d" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt; or similar tools. Application manifests are templated and bundled as &lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt; charts or &lt;a href="https://kustomize.io/" rel="noopener noreferrer"&gt;Kustomizations&lt;/a&gt;. And when using GitOps operators like Flux, all of these are sourced from VCS repositories (mostly Git) and pulled directly or indirectly (e.g. via Helm repositories or OCI registries) to Kubernetes.&lt;/p&gt;

&lt;p&gt;There's nothing left to do for us but declare and observe, is it?&lt;br&gt;
Well, when everything around configuration and application delivery is automated, so that no one has to get their hands dirty, there's no reason to stop automating after Kubernetes resources got (potentially) created.&lt;/p&gt;

&lt;p&gt;In our early GitOps days back then in 2019 using Flux 1, we spent much time screening logs to find the reason of "nothing happened after change". There was &lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt; in place - then as now - but we were not happy with the provided default metrics. So we created a Kubernetes-Operator/Prometheus-Exporter thing, that attached to the log stream of chosen workloads to derive metrics from patterns applied to the log text. That was quite fun to build, but absolutely not our core business.&lt;br&gt;
Fortunately the Flux stack nowadays has monitoring and alerting capabilities built in, exceeding the imagination of our former self.&lt;/p&gt;

&lt;p&gt;Back to topic: When we're on-call duty, there should be no reason to do anything else but carry our mobile phones around. And during daily business, providing new application versions or configuration changes, it must not be necessary to observe their rollout. Any issue has to be reported, or even mitigated, by the system proactively. Silence means "everything's fine", not: "no idea what's happening".&lt;/p&gt;

&lt;p&gt;As with GitOps there's no classic delivery pipeline that may fail, but continuous reconciliation of current and desired state, the main objectives of monitoring and hardening the setup are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validate desired state changes upfront&lt;/li&gt;
&lt;li&gt;Detect and report any persistent deviation of current from desired state&lt;/li&gt;
&lt;li&gt;Reduce the possible blast radius of problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first topic refers to automated checking and linting of changes targeting the desired state. That's vital, but concrete implementations are very much dependent on the setup itself. On pull/merge-requests, we use a combination of generic YAML linting, validation of Kubernetes manifests utilizing &lt;a href="https://www.checkov.io" rel="noopener noreferrer"&gt;checkov&lt;/a&gt; and Helm &lt;a href="https://github.com/helm/chart-testing/blob/main/doc/ct.md" rel="noopener noreferrer"&gt;chart-testing&lt;/a&gt;, and where possible real installations on &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kind&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Subsequently, we'd like to focus on the aspects of monitoring and alerting at that point of time after configuration changes have been applied.&lt;/p&gt;
&lt;h2&gt;
  
  
  Flux resources and their status
&lt;/h2&gt;

&lt;p&gt;Let's start with a brief overview on Flux' &lt;a href="https://fluxcd.io/flux/components/" rel="noopener noreferrer"&gt;components&lt;/a&gt;, its custom resource definitions (CRDs) and their relations:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F238cc7ao8nrs16q7spvo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F238cc7ao8nrs16q7spvo.png" alt="Flux components and CRDs" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As very common, Kubernetes operators are configured by their appropriate custom resources (CRs). Hence Flux - aka the "GitOps Toolkit" - is composed from multiple operators they even exchange configuration via CRs. Not all relationships are visualized in this diagram as that's not required for its purpose.&lt;/p&gt;

&lt;p&gt;One of the many things that I really like about Flux is its consequent and verbose reflection of errors as part of the Kubernetes resource status and events. All operators write back the result of their actions to the respective CR.&lt;/p&gt;

&lt;p&gt;A properly fetched Helm chart may look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8c39w6cmypjlq55sa2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8c39w6cmypjlq55sa2g.png" alt="kubectl get helmchart" width="800" height="43"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Whereas a failed Helm release shows details on its status:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbjfgaempsf8nycjnttt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbjfgaempsf8nycjnttt.png" alt="Corrupt Helm release" width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition to the build-in readiness of Flux' CRDs, more health checks can be included into the main CR of a Flux setup, the &lt;a href="https://fluxcd.io/flux/components/kustomize/kustomization/" rel="noopener noreferrer"&gt;Kustomization&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.toolkit.fluxcd.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;examiner&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# (...)&lt;/span&gt;
  &lt;span class="na"&gt;healthChecks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;helm.toolkit.fluxcd.io/v1&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRelease&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;examiner&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;examiner&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On reconciliation of a Kustomization, it gets ready by itself only if all referred healthCheck resources got ready, errors provided in its status:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbk5fyss8l4m5h6rvxs9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbk5fyss8l4m5h6rvxs9.png" alt="Unhealthy Kustomization" width="800" height="38"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The health checks are evaluated on every reconciliation of the Kustomization, so that the Kustomization will become un-ready if their fosterlings get ill. That's also reflected in its status, for instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.toolkit.fluxcd.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;lastTransitionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-06-24T10:35:59Z"&lt;/span&gt;
    &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Applied&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;revision:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;main@sha1:2f41be248656587d52d0ddd4ead813df50cb5ada'&lt;/span&gt;
    &lt;span class="na"&gt;observedGeneration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReconciliationSucceeded&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;True"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ready&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;lastTransitionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-06-24T10:35:59Z"&lt;/span&gt;
    &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Health check passed in 5.066648877s&lt;/span&gt;
    &lt;span class="na"&gt;observedGeneration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Succeeded&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;True"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Healthy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Updating the diagram from above, there's a status on every CR, possibly propagated among them, helping us in analyzing issues:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zw09ojz80pmmbq7q7u3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zw09ojz80pmmbq7q7u3.png" alt="CRD status" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring and reporting resource statuses
&lt;/h2&gt;

&lt;p&gt;Given the knowings about resource statuses, we can automate the feedback on problems.&lt;/p&gt;

&lt;p&gt;Our usual monitoring setup is made of the Kubernetes Prometheus stack for metrics collection and alerting rules, and Grafana for visualization.&lt;br&gt;
Both of them are &lt;a href="https://fluxcd.io/flux/guides/monitoring/" rel="noopener noreferrer"&gt;fed by Flux&lt;/a&gt; with resource status information and reconciliation events.&lt;/p&gt;

&lt;p&gt;There's a bunch of metrics exposed by Flux' components itself, like &lt;code&gt;gotk_reconcile_duration_seconds_bucket&lt;/code&gt; holding latency of reconcilation by resource kind, used to observe Flux' health and its connectivity to sources like Git or OCI services.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa00vku5sfefyjz0bgsff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa00vku5sfefyjz0bgsff.png" alt="prometheus reconcile duration query" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Metrics about the custom resource statuses formerly were exposed by Flux itself as well, but in the meanwhile are externalized to components like the kube-state-metrics exporter.&lt;/p&gt;

&lt;p&gt;The repository &lt;a href="https://github.com/fluxcd/flux2-monitoring-example" rel="noopener noreferrer"&gt;flux2-monitoring-example&lt;/a&gt; holds a configuration for the kube-prometheus-stack as well as Grafana dashboards to collect and visualize everything around Flux' CRDs. I'm not replicating the config here, but the samples shown are using it like-for-like without further &lt;a href="https://fluxcd.io/flux/monitoring/custom-metrics/" rel="noopener noreferrer"&gt;customization&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2e1d79zdav12dwv3givl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2e1d79zdav12dwv3givl.png" alt="reconcile_condition not ready" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This information can easily be included into your own Grafana dashboards, like shown in the &lt;a href="https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring/configs/dashboards" rel="noopener noreferrer"&gt;dashboards provided by Flux&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr6y99r5t584djjy8wnn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr6y99r5t584djjy8wnn.png" alt="resource grafana panel" width="800" height="242"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's for observability, but we're even more interested in alerts in case of any reconciliation issues. Using the &lt;a href="https://prometheus-operator.dev/" rel="noopener noreferrer"&gt;Prometheus Operator&lt;/a&gt; for managing our Prometheus components, it may look like this (in its simplest form):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PrometheusRule&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-resources&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;readiness&lt;/span&gt;
      &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ResourceNotReady&lt;/span&gt;
          &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gotk_resource_info{ready!="",ready!="True"}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
          &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.customresource_kind&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;namespace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.namespace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ready'&lt;/span&gt;
          &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2m&lt;/span&gt;
          &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During the reconciliation process, this alert will become "pending", so the &lt;code&gt;for&lt;/code&gt; duration has to be chosen slightly longer than your resources reconciliation lasts. For quicker reactions, multiple rules with different selection criteria may be necessary.&lt;/p&gt;

&lt;p&gt;The labels of the &lt;code&gt;gotk_resource_info&lt;/code&gt; metric provide all information required for determining the severity and maybe also the notification channel of a certain alert:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2fmimh6v88ln9qs7oc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2fmimh6v88ln9qs7oc2.png" alt="prometheus alert helmrelease not ready" width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Usually, we're using different escalation paths depending on the severity of an alert. Everything of interest, that's not indicating an active incident is routed to team-internal chat, sometimes separated by technical or business relevance. Alerts that indicate customer impact and potentially affect multiple systems are routed to Opsgenie (on-call management system) causing service-degradation announcements and notifying the on-call duty person.&lt;/p&gt;

&lt;p&gt;Prometheus Alertmanager offers a lot of integrations like for Opsgenie, and what's not supported first class can be connected with 3rd party components, for instance &lt;a href="https://github.com/prometheus-msteams/prometheus-msteams" rel="noopener noreferrer"&gt;Prometheus MS Teams&lt;/a&gt;. Alerts from the &lt;code&gt;PrometheusRule&lt;/code&gt; example above dispatched to MS Teams can look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgdukita8dg3xcv84bz9s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgdukita8dg3xcv84bz9s.png" alt="Image description" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrated observability
&lt;/h2&gt;

&lt;p&gt;Although this already provides us with most of what we need for being able to react on problems, there are even more great features giving more insights, integrated into the tools we're using anyhow.&lt;/p&gt;

&lt;p&gt;There's another Flux component alongside the others that emits detailed information about what's happening with the Flux managed resources: the notification controller:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmj736h63oc13jrjhpsxw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmj736h63oc13jrjhpsxw.png" alt="overview incl notfication controller" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It gets configured using &lt;code&gt;Provider&lt;/code&gt; and &lt;code&gt;Alert&lt;/code&gt; CRDs and supports a wide range of integrations, of those I'd like to show you my favorites.&lt;/p&gt;

&lt;p&gt;Every change in our system originates from Git commits. The effects of a commit can be provided to the Git management system as commit status. The following example targets GitHub, but &lt;a href="https://fluxcd.io/flux/components/notification/provider/#git-commit-status-updates" rel="noopener noreferrer"&gt;all major providers are supported&lt;/a&gt;.&lt;br&gt;
Because of the different expectation of secret structure there's the need for an access token in addition, that is allowed to write commit statuses. This token is contained in the referenced secret, please see the Flux docs for more details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notification.toolkit.fluxcd.io/v1beta3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github&lt;/span&gt;
  &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://github.com/MediaMarktSaturn/software-supply-chain-security-gitops&lt;/span&gt;
  &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notification.toolkit.fluxcd.io/v1beta3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Alert&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;commit-status&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;providerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github&lt;/span&gt;
  &lt;span class="na"&gt;eventSeverity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;info&lt;/span&gt;
  &lt;span class="na"&gt;eventSources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mms-system&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this configuration, a commit status is attached for every Kustomization listed in the &lt;code&gt;eventSources&lt;/code&gt;. The &lt;code&gt;Alert&lt;/code&gt; CRD is used for all kinds of notifications, it contains the superset of features of the notification controller. For commit status notifications, only &lt;code&gt;Kustomizations&lt;/code&gt; events can be used, as only those carry the commit reference.&lt;/p&gt;

&lt;p&gt;This example configuration will result in commit status in GitHub looking as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzyup677e8igvxeocq926.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzyup677e8igvxeocq926.png" alt="gh commit status" width="659" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The description contains a summary of the Kustomization status and may already point to error causes.&lt;/p&gt;

&lt;p&gt;Another very nice feature makes use of Grafana annotations. With a different &lt;a href="https://fluxcd.io/flux/components/notification/provider/#grafana" rel="noopener noreferrer"&gt;provider&lt;/a&gt;, the notification controller creates annotations using the Grafana API for being included in any dashboard, indicating what has changed at a certain point in time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notification.toolkit.fluxcd.io/v1beta3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana&lt;/span&gt;
  &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://prometheus-operator-grafana.monitoring/api/annotations"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notification.toolkit.fluxcd.io/v1beta3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Alert&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;providerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana&lt;/span&gt;
  &lt;span class="na"&gt;eventSeverity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;info&lt;/span&gt;
  &lt;span class="na"&gt;eventSources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GitRepository&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRelease&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmChart&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, there's support for all event sources, depending on the concrete source there is different metadata shown, commit hashes for &lt;code&gt;GitRepository&lt;/code&gt; or &lt;code&gt;Kustomization&lt;/code&gt;, but Helm chart version for &lt;code&gt;HelmRelease&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Including annotations (that are basically just time markers) in any dashboard is quite easy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6pmunfbjrc13cau88cmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6pmunfbjrc13cau88cmw.png" alt="Image description" width="800" height="741"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But results in impressive vertical red lines in all panels with a time axis:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ju7vbd2yu1lqt3c2w45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ju7vbd2yu1lqt3c2w45.png" alt="Image description" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This way, all the visualizations are just connected right to the change events of the Kubernetes resources, causalities never will be overseen anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Push notifications
&lt;/h2&gt;

&lt;p&gt;There's a long list of "providers" delivering Flux events including a generic webhook that enables any kind of custom implementations.&lt;/p&gt;

&lt;p&gt;Just to pick another common example, this is how to keep informed via MS Teams on everything happening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notification.toolkit.fluxcd.io/v1beta3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;msteams&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;msteams&lt;/span&gt;
  &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;msteams-webhook&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notification.toolkit.fluxcd.io/v1beta3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Alert&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;msteams&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;providerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;msteams&lt;/span&gt;
  &lt;span class="na"&gt;eventSeverity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;info&lt;/span&gt;
  &lt;span class="na"&gt;eventSources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GitRepository&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flux-system&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRelease&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ImagePolicy&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmRelease&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dtrack&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmChart&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dtrack&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OCIRepository&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mms-system&lt;/span&gt;
  &lt;span class="na"&gt;exclusionList&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Health&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;check&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;passed.*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Resulting in messages like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3h7tnmjn72g69oc59i85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3h7tnmjn72g69oc59i85.png" alt="Image description" width="800" height="206"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Indicating an error&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;or &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruaa21i7ajylqvs5xsbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruaa21i7ajylqvs5xsbo.png" alt="Image description" width="800" height="177"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Indicating a successful change&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;These were just some examples on how to enhance observability and monitoring of a GitOps setup using Flux, I deeply recommend to browse the &lt;a href="https://fluxcd.io/flux/" rel="noopener noreferrer"&gt;Flux documentation&lt;/a&gt; and &lt;a href="https://github.com/fluxcd/flux2/releases" rel="noopener noreferrer"&gt;release notes&lt;/a&gt; from time to time as there's much movement in this stack.&lt;/p&gt;

&lt;p&gt;All of this eases operation of our GitOps setup, but for our business applications we need to go further, so read on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not hard to handle
&lt;/h2&gt;

&lt;p&gt;How can we design our GitOps setup in a way that will make it most robust? There are many aspects to consider.&lt;br&gt;
First, it's the human part: We have to make careless mistakes as hard as possible.&lt;br&gt;
As stated in the very beginning, proposed changes to the desired state should be validated in an automated way for basic correctness. To enforce this, no changes to the single source of truth - meaning the Git branch, Flux pulls from - can be allowed without pull/merge-request. Having automated checks on this pull/merge request in addition to the four-eyes-principle should already find the worst reckless errors.&lt;br&gt;
What's less obvious but also prone to break something are merge strategies other than fast-forward. Having auto-merges of files involved generates results, that could not be checked before. Although it may be rare cases in usual source code merges, isn't it always the same places that changes in Kubernetes manifests - causing higher probability of semantically wrong merge results?&lt;/p&gt;

&lt;p&gt;Another strong opinion of mine, without discussing in detail right now, is about the right branching model for multi-stage GitOps repositories.&lt;br&gt;
Yes, Kustomize allows for really sophisticated reuse of manifests. Even if my developer heart screams DRY (don't repeat yourself), I absolutely prefer one-separate-branch-per-stage repositories.&lt;br&gt;
I know all the arguments against it, and I'm also aware of how to handle single-branch setups - but the risk of breaking production with changes only intended for a non-prod environment simply doesn't justify the potential benefits of a single branch. Beside that, I argue, that pull/merge-requests targeting production can be treated differently in terms of mandatory reviews or additional validations in a very easy way, compared to a single-branch setup.&lt;br&gt;
There are many ways of making one-branch-per-stage handy, like templating differences as &lt;a href="https://fluxcd.io/flux/components/kustomize/kustomization/#variable-substitution" rel="noopener noreferrer"&gt;variable substitutions&lt;/a&gt;, packaging commonalities into Helm charts or extracting reusable Kustomizations into common repositories.&lt;br&gt;
In fact, we're able to cherry-pick even complex changes from stage to stage, that feels easier and safer to me than patching Kustomizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Consistency and atomicity
&lt;/h2&gt;

&lt;p&gt;Regarding technical terms for the second, it's about avoiding inconsistencies and side effects.&lt;br&gt;
Flux' top level resource is the &lt;code&gt;Kustomization&lt;/code&gt;. It provides us with many configuration options to reduce the blast radius of errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kustomizations should &lt;a href="https://fluxcd.io/flux/components/kustomize/kustomization/#kustomization-dependencies" rel="noopener noreferrer"&gt;&lt;code&gt;dependOn&lt;/code&gt;&lt;/a&gt; each other. We can ensure that nothing happens to downstream Kustomizations, if there are errors with CRD creation, or mandatory infrastructure changes. In combination with the &lt;a href="https://fluxcd.io/flux/components/kustomize/kustomization/#health-assessment" rel="noopener noreferrer"&gt;&lt;code&gt;readiness&lt;/code&gt;&lt;/a&gt; of a Kustomization allowing us for indirect checks (installed operator came to life), changes fail early and don't propagate to healthy components. Also consider fresh bootstraps - without defined dependencies, newly set-up systems may not be able to succeed installation.&lt;/li&gt;
&lt;li&gt;Critical resources can be protected from deletion using the &lt;a href="https://fluxcd.io/flux/components/kustomize/kustomization/#garbage-collection" rel="noopener noreferrer"&gt;&lt;code&gt;prune&lt;/code&gt;&lt;/a&gt; setting. In general, recreation of resources is no issue - in fact, we're relying on it: Some application doesn't recover from an error? Delete it and let Flux recreate - "Turn it off and on again". But this is not true for everything. For instance, external managed services like load-balancers, IP addresses, SSL certificates would probably change when getting reconfigured implicitly. So we collect all of those kind of resources in a Kustomization that gets durable using &lt;code&gt;prune: false&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Another rare, but breaking case is exactly the opposite: Resources should be recreated but are immutable like &lt;code&gt;Jobs&lt;/code&gt;. Kustomizations with &lt;code&gt;force: true&lt;/code&gt; get us covered, but this should be used with care to not obscure unexpected issues.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoiding error propagation is one thing, also Helm provides us with another very useful feature: Atomic updates. Using the &lt;code&gt;--atomic&lt;/code&gt; flag with &lt;code&gt;helm upgrade&lt;/code&gt; will either properly install the complete package, meaning all rendered manifests, or nothing - rolling back all changes to the last revision of that Helm release.&lt;br&gt;
But Flux goes even further. Detailed instructions &lt;a href="https://fluxcd.io/flux/components/helm/helmreleases/#configuring-failure-remediation" rel="noopener noreferrer"&gt;can be configured for handling failures&lt;/a&gt;. Installation or upgrade of a Helm release can be retried a given number of times, and remediation actions defined like uninstall on failed installation, rollback on failed update - contrary to the default behavior of leaving the Helm release in a failed state.&lt;br&gt;
Depending on the content of a Helm chart, it can be necessary to automatically rollback failed upgrades on production - but leave it failed on non-prod for analyzing the issue.&lt;/p&gt;

&lt;p&gt;What we actually benefit from is the most beloved but also most hated feature: Templating. On any error on rendering the manifests, the Helm upgrade fails and nothing is applied at all. Using plain manifests would result in an inconsistent state - even if this risk is much lower without templating at all 😉.&lt;/p&gt;

&lt;p&gt;But Helm isn't just templating and versioning of manifests, it's a package manager, and during the installation of a package, aka chart, there are many checkpoints we can &lt;a href="https://helm.sh/docs/topics/charts_hooks" rel="noopener noreferrer"&gt;hook&lt;/a&gt; in. Using this, we can prepare for an application update (like updating database schemas), implement dedicated tests, but also use these hooks to validate and optionally force a Helm upgrade to fail and rollback before the entire system is damaged further.&lt;/p&gt;

&lt;p&gt;What I want to say: Don't just throw your manifests in Git and let Flux reconcile until it succeeds or gets stuck, but carefully design your GitOps repository considering your systems' special needs. I admit, it's science as much as art, but as always with software: It's never finished or perfect, so it should be continuously revised.&lt;/p&gt;

&lt;h2&gt;
  
  
  Headless deployments
&lt;/h2&gt;

&lt;p&gt;Eventually, we made it. All our infrastructure components and configurations are reliable, well observable and we're actively notified on issues that have to be taken care of.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/mms-tech/reliable-application-deployments-in-a-gitops-setup-with-flagger-5hjm"&gt;In the next article we will elaborate on how to ensure the reliable, headless (business) application deployment using Flagger, happy reading.&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>gitops</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>Blogging at MMS Technology</title>
      <dc:creator>Florian Heubeck</dc:creator>
      <pubDate>Tue, 30 Jan 2024 09:28:40 +0000</pubDate>
      <link>https://forem.com/mms-tech/blogging-at-mms-technology-3cal</link>
      <guid>https://forem.com/mms-tech/blogging-at-mms-technology-3cal</guid>
      <description>&lt;p&gt;&lt;em&gt;Daddy? How are the little blog entries made?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About Communities
&lt;/h2&gt;

&lt;p&gt;Blogging about blogging - I love this kind of meta stuff.&lt;br&gt;
But what I love even more are communities. Passionate, enthusiastic people, sharing an area of interest. It doesn't matter how junior or senior someone is, what experience they have, or if they are just at the beginning of their journey. The key aspect is only passion.&lt;/p&gt;

&lt;p&gt;We always had strong communities at MediaMarktSaturn. Be it interdisciplinary departments passionate for their solution, matrix-organized chapters excited for their stack like backend developers or frontend hippies, or cross-functional renegades rehearsing the uprising to establish &lt;em&gt;GitOps&lt;/em&gt;.&lt;br&gt;
But recently we lifted this community thing to a new (meta) level.&lt;/p&gt;

&lt;p&gt;Officially founded by engineers and supported by the top management, we proclaimed a company wide &lt;em&gt;Engineering Community&lt;/em&gt;.&lt;br&gt;
There are over 600 people in, who joined of their own accord, helping each other and discussing topics.&lt;br&gt;
As always, there's much less folks very active, but enough to build at least some crews like "API", "Backend Engineering", "Cloud" or "Frontend something".&lt;/p&gt;

&lt;h2&gt;
  
  
  Outreach Activities
&lt;/h2&gt;

&lt;p&gt;A thing that became clear to us, at the latest when we got renamed to "MMS &lt;em&gt;Technology&lt;/em&gt;": There's a lot we have to offer.&lt;br&gt;
So we aligned and established a lean process for different kinds of outreach activities like speaking on meetups or conferences.&lt;br&gt;
But preparing for a conference talk is high effort and not only because of that a seldom event.&lt;/p&gt;

&lt;p&gt;What about all the goodies that happen to us and that are achieved by us day by day? Those &lt;em&gt;little&lt;/em&gt; things that literally drive us nuts hours and days till we figured out how easy it was after you know how.&lt;br&gt;
Those solutions we built for problems which peculiarly hit us as first persons ever, at least according to the known internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finally: Blogging
&lt;/h2&gt;

&lt;p&gt;Guess I'm not the only one who sometimes thinks: I have to talk about... or at least write about &lt;em&gt;this&lt;/em&gt;.&lt;br&gt;
And guess what? Of course there are lots of people by my side who have value to share.&lt;/p&gt;

&lt;p&gt;You think retail business is boring? Fun is what we make of. And we &lt;em&gt;make&lt;/em&gt; great solutions using hottest technology, in nearly every discipline available on the free market.&lt;/p&gt;

&lt;h2&gt;
  
  
  Processes
&lt;/h2&gt;

&lt;p&gt;Long story short: At MMS Technology every person is encouraged to write about tech. There's a very lean approval process, that is (of course) Git based. You take the template, fill in your text and create a pull request.&lt;br&gt;
A group of outreach reviewers - ordinary people, just sharing their passion for tech, who are familiar with our communication guidelines, validating that you don't disclose (real) secrets and consulting you in case of obscurities.&lt;br&gt;
After the pull request got approved and merged, a little &lt;em&gt;GitHub Action&lt;/em&gt; publishes the article to &lt;em&gt;medium.com&lt;/em&gt; and you're done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Beginning
&lt;/h2&gt;

&lt;p&gt;Why am I telling you all of this? Because I love communities. And communities share and profit from each other. I'm absolutely convinced that everyone has valuable things to share - except for our security guys, they have vulnerabilities to share 😉.&lt;br&gt;
Blogs do not need to be novels - in fact, short and precise findings or insights are the best content to provide - not like this text here.&lt;/p&gt;

&lt;p&gt;So - happy to read you soon.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>blogging</category>
      <category>technology</category>
      <category>people</category>
      <category>community</category>
    </item>
    <item>
      <title>LET’S GO!</title>
      <dc:creator>Florian Heubeck</dc:creator>
      <pubDate>Mon, 29 Jan 2024 19:02:51 +0000</pubDate>
      <link>https://forem.com/mms-tech/lets-go-1b23</link>
      <guid>https://forem.com/mms-tech/lets-go-1b23</guid>
      <description>&lt;p&gt;&lt;em&gt;Here we go&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Blog Launch
&lt;/h2&gt;

&lt;p&gt;We, the engineering community of MediaMarktSaturn Technology, the IT partner of the electronics retailer MediaMarkt and Saturn, will share our passion for tech with you.&lt;/p&gt;

&lt;p&gt;You can expect any kind of tech content popping up here; we are a strong team of hundreds of passionate engineers covering all disciplines of software engineering, infrastructure management, cloud operation and much more.&lt;/p&gt;

&lt;p&gt;And we will write about it... stay tuned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;get to know us 👉 &lt;a href="https://mms.tech" rel="noopener noreferrer"&gt;https://mms.tech&lt;/a&gt; 👈&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>letsgo</category>
      <category>technology</category>
      <category>engineering</category>
      <category>people</category>
    </item>
  </channel>
</rss>
