<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Verifa crew</title>
    <description>The latest articles on Forem by Verifa crew (@verifacrew).</description>
    <link>https://forem.com/verifacrew</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1093696%2Fd2528407-862c-4253-9c9a-b1310221b224.jpg</url>
      <title>Forem: Verifa crew</title>
      <link>https://forem.com/verifacrew</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/verifacrew"/>
    <language>en</language>
    <item>
      <title>How to use the AWS Load Balancer Controller to connect multiple EKS clusters with existing Application Load Balancers</title>
      <dc:creator>Verifa crew</dc:creator>
      <pubDate>Thu, 24 Oct 2024 14:05:33 +0000</pubDate>
      <link>https://forem.com/verifacrew/how-to-use-the-aws-load-balancer-controller-to-connect-multiple-eks-clusters-with-existing-application-load-balancers-51ac</link>
      <guid>https://forem.com/verifacrew/how-to-use-the-aws-load-balancer-controller-to-connect-multiple-eks-clusters-with-existing-application-load-balancers-51ac</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally posted on &lt;a href="https://verifa.io/blog/aws-load-balancer-controller-with-existing-alb/" rel="noopener noreferrer"&gt;Verifa's blog&lt;/a&gt;, written by Jacob Lärfors.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exposing services in AWS EKS clusters via Load Balancers can be done in many ways. In this post we explore using the AWS Load Balancer Controller to dynamically bind nodes to existing Application Load Balancers.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;In a project I am working on we manage the AWS infrastructure with Terraform (Application Load Balancers, Elastic Kubernetes Service clusters, Security Groups, etc.). We also have one requirement; the Application Load Balancers (ALBs) need to be treated like pets, primarily because another team manages DNS records. This is why we cannot use Kubernetes controllers to dynamically manage the ALBs and update DNS records. Thus, the problem statement can be summarised as: how to manage AWS EKS clusters and ALBs with Terraform, and attach EKS nodes to ALB TargetGroups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connect nodes to ALBs using Terraform
&lt;/h3&gt;

&lt;p&gt;Our initial implementation used Terraform to attach the AWS AutoScalingGroups to TargetGroups.&lt;/p&gt;

&lt;p&gt;When using &lt;strong&gt;self-managed node groups&lt;/strong&gt; you can pass a list of &lt;code&gt;target_group_arns&lt;/code&gt; to have any nodes part of the AutoScalingGroup to auto-register themselves as targets to the given TargetGroup ARNs. Nice and easy. &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#target_group_arns" rel="noopener noreferrer"&gt;Check the docs&lt;/a&gt;. We use the community &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest" rel="noopener noreferrer"&gt;AWS EKS Terraform module&lt;/a&gt; which supports &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/submodules/self-managed-node-group#input_target_group_arns" rel="noopener noreferrer"&gt;this argument&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When using &lt;strong&gt;EKS managed node groups&lt;/strong&gt;, the option to pass &lt;code&gt;target_group_arns&lt;/code&gt; is not available, and EKS will dynamically generate AutoScalingGroups based on your EKS node group definitions. The side effect here is that the AutoScalingGroup IDs are &lt;strong&gt;not known&lt;/strong&gt; until they have been created. When working with Terraform this becomes a problem. It requires you to run Terraform apply with the &lt;code&gt;-target&lt;/code&gt; option to first provision the EKS node group before you can reference the AutoScalingGroup and attach it to a TargetGroup.&lt;/p&gt;

&lt;p&gt;Here’s a lovely GitHub issue with more details: &lt;a href="https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1539" rel="noopener noreferrer"&gt;https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1539&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This really is the core source of the problem we wanted to address; how to use EKS managed node groups and not have a hacky solution. And the solution we chose was the &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/" rel="noopener noreferrer"&gt;AWS Load Balancer Controller&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Load Balancer Controller
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/" rel="noopener noreferrer"&gt;AWS Load Balancer Controller&lt;/a&gt; is a Kubernetes controller that can manage the lifecycle of AWS Load Balancers, TargetGroups, Listeners (and Rules), and connect them with nodes (and pods) in your Kubernetes cluster. Check out the &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/how-it-works/" rel="noopener noreferrer"&gt;how it works&lt;/a&gt; page for details on the design.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxtzx8atrwhtl990clpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxtzx8atrwhtl990clpe.png" alt="Image description" width="721" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking back at our use case, we want to use an existing Application Load Balancer that is managed by Terraform. If you search for this online, you will most certainly find another lovely GitHub issue: &lt;a href="https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/228" rel="noopener noreferrer"&gt;https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/228&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s a long issue, with lots of suggestions. Personally, I was not concerned with how much of the infrastructure we manage with Terraform vs Kubernetes; the primary goal was a simple solution that did not require too much customisation, and I found the idea of the &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/targetgroupbinding/targetgroupbinding/" rel="noopener noreferrer"&gt;TargetGroupBinding&lt;/a&gt; Custom Resource Definition (CRD) quite appealing.&lt;/p&gt;

&lt;h3&gt;
  
  
  TargetGroupBinding Custom Resource Definition
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/targetgroupbinding/targetgroupbinding/" rel="noopener noreferrer"&gt;TargetGroupBinding&lt;/a&gt; is a CRD that the AWS Load Balancer Controller installs. If you follow the most common use case and manage your ALBs with the AWS Load Balancer controller, it will create TargetGroupBindings under the hood even if you do not interact with them directly. Good news; it is a core feature of the AWS LB Controller, not an extension for people with an edge case. That gave me some confidence.&lt;/p&gt;

&lt;p&gt;It requires an existing TargetGroup and IAM policies to lookup and attach/detach targets to the TargetGroup. There’s a mention of the required IAM policies required &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/deploy/installation/#option-b-attach-iam-policies-to-nodes" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here’s a sample &lt;code&gt;TargetGroupBinding&lt;/code&gt; manifest taken from the docs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;elbv2.k8s.aws/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TargetGroupBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-nginx-binding&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# The service we want to connect to.&lt;/span&gt;
  &lt;span class="c1"&gt;# We use the Ingress Nginx Controller so let's point to that service.&lt;/span&gt;
  &lt;span class="c1"&gt;# NOTE: it was necessary for this TargetGroupBinding to be in the same namespace as the service.&lt;/span&gt;
  &lt;span class="na"&gt;serviceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-nginx&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
  &lt;span class="c1"&gt;# NOTE: need the ARN of the TargetGroup... Bit of a PITA.&lt;/span&gt;
  &lt;span class="c1"&gt;# Would be nice to use tags to look up the TargetGroup, for example.&lt;/span&gt;
  &lt;span class="na"&gt;targetGroupARN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/eks-abcdef/73e2d6bc24d8a067&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For our case, this means managing the ALBs, Listeners, Rules and TargetGroups with Terraform. The AWS Load Balancer Controller would only be responsible for attaching nodes to the specified TargetGroups. This sounds like a clean separation of concerns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiple EKS clusters, same ALB
&lt;/h3&gt;

&lt;p&gt;Expanding on our particular use case, we manage multiple EKS clusters that share Application Load Balancers. We already use ArgoCD &lt;a href="https://argocd-applicationset.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;ApplicationSets&lt;/a&gt; to manage applications across clusters. We have a “root” cluster that runs core services, like ArgoCD, which connects to multiple other clusters. The below diagram is a high-level simplified illustration of the setup we want to achieve. It will be ArgoCD’s job to deploy the AWS Load Balancer Controller and &lt;code&gt;TargetGroupBinding&lt;/code&gt; manifests to the different clusters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9556i2k2896xm6vmllxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9556i2k2896xm6vmllxh.png" alt="Image description" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s look at how we implemented this with the AWS Load Balancer Controller next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Terraform
&lt;/h3&gt;

&lt;p&gt;We use Terraform to manage (amongst other things) the EKS clusters, ALBs and TargetGroups. For implementing the AWS Load Balancer Controller all we needed to do was create the necessary IAM role that can be assumed by a Kubernetes ServiceAccount. The following snippet show this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create IAM role policy granting the kubernetes service account AssumeRoleWithWebIdentity&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"aws_lb_controller"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nx"&gt;principals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Federated"&lt;/span&gt;
      &lt;span class="nx"&gt;identifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.cluster_oidc_issuer}"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;condition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;test&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"StringEquals"&lt;/span&gt;
      &lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${local.cluster_oidc_issuer}:sub"&lt;/span&gt;
      &lt;span class="nx"&gt;values&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"system:serviceaccount:aws-lb-controller:aws-lb-controller"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create an AWS IAM role that will be assumed by our kubernetes service account&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"aws_lb_controller"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${local.cluster_name}-aws-lb-controller"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
  &lt;span class="nx"&gt;inline_policy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${local.cluster_name}-aws-lb-controller"&lt;/span&gt;
    &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;"Version"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"Statement"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;"Action"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
              &lt;span class="s2"&gt;"ec2:DescribeVpcs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"ec2:DescribeSecurityGroups"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"ec2:DescribeInstances"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"elasticloadbalancing:DescribeTargetGroups"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"elasticloadbalancing:DescribeTargetHealth"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"elasticloadbalancing:ModifyTargetGroup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"elasticloadbalancing:ModifyTargetGroupAttributes"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"elasticloadbalancing:RegisterTargets"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"elasticloadbalancing:DeregisterTargets"&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s2"&gt;"Effect"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;"Resource"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has some dependencies so cannot be run “as is”. But if you need help configuring IAM Roles for Service Accounts (IRSA) then I already wrote a post on the topic which you can find &lt;a href="https://dev.to/blog/how-to-assume-an-aws-iam-role-from-a-service-account-in-eks-with-terraform/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;TargetGroupBinding&lt;/code&gt; Kubernetes Custom Resource we need to create requires the TargetGroup ARN which is non deterministic. In our setup, we use Terraform to create the &lt;a href="https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#clusters" rel="noopener noreferrer"&gt;Kubernetes secret&lt;/a&gt; that informs ArgoCD about connected clusters. Within that secret we can attach additional labels that can be accessed by the ArgoCD &lt;a href="https://argocd-applicationset.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;ApplicationSets&lt;/a&gt;, which means we have a very primitive way of passing data from Terraform to ArgoCD without ay extra tools. Note that Kubernetes labels have some restrictions on the &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set" rel="noopener noreferrer"&gt;syntax and character set&lt;/a&gt; that can be used, so we can’t just pass in arbitrary data.&lt;/p&gt;

&lt;p&gt;Here is how we create the Kubernetes secret that essentially “register” a cluster with ArgoCD, and how we pass the TargetGroup name and ID via labels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="c1"&gt;#&lt;/span&gt;
 &lt;span class="c1"&gt;# Extract the target group name and ID to use in ArgoCD secret&lt;/span&gt;
 &lt;span class="c1"&gt;#&lt;/span&gt;
 &lt;span class="nx"&gt;aws_lb_controller&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;targetgroup_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_target_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn_suffix&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;targetgroup_id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_target_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn_suffix&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Get cluster TargetGroup ARNs&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_lb_target_group"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"madeupname-${local.cluster_name}"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create Kubernetes secret in root cluster where ArgoCD is running.&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# The secret tells ArgoCD about a cluster and how to connect (e.g. credentials).&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"kubernetes_secret"&lt;/span&gt; &lt;span class="s2"&gt;"argocd_cluster"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;kubernetes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;

  &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cluster-${local.cluster_name}"&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"argocd"&lt;/span&gt;
    &lt;span class="nx"&gt;labels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# Tell ArgoCD that this secret defines a new cluster&lt;/span&gt;
      &lt;span class="s2"&gt;"argocd.argoproj.io/secret-type"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cluster"&lt;/span&gt;
      &lt;span class="s2"&gt;"environment"&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;
      &lt;span class="s2"&gt;"aws-lb-controller/enabled"&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enabled&lt;/span&gt;
      &lt;span class="c1"&gt;# Kubernetes labels do not allow ARN values, so pass the name and ID separately&lt;/span&gt;
      &lt;span class="s2"&gt;"aws-lb-controller/targetgroup-name"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;targetgroup_name&lt;/span&gt;
      &lt;span class="s2"&gt;"aws-lb-controller/targetgroup-id"&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;targetgroup_id&lt;/span&gt;
   &lt;span class="p"&gt;...&lt;/span&gt;
   &lt;span class="p"&gt;...&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cluster_name&lt;/span&gt;
    &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpoint&lt;/span&gt;
    &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;awsAuthConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;clusterName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="c1"&gt;# Provide the rolearn that was created for this cluster, and which the&lt;/span&gt;
        &lt;span class="c1"&gt;# root ArgoCD role should be able to assume&lt;/span&gt;
        &lt;span class="nx"&gt;roleARN&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;argocd_access&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;tlsClientConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;insecure&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="nx"&gt;caData&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;certificate_authority&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Opaque"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it for the Terraform config. We use these snippets in Terraform modules that get called for each cluster we create, keeping things DRY.&lt;/p&gt;

&lt;h3&gt;
  
  
  ArgoCD
&lt;/h3&gt;

&lt;p&gt;Let’s first look at the directory structure and we can work through the relevant files (note that in our code this exists alongside many other applications inside a Git repository).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="c"&gt;.
&lt;/span&gt;&lt;span class="go"&gt;├── appset-aws-lb-controller.yaml
├── appset-targetgroupbindings.yaml
├── chart
│   ├── Chart.yaml
│   ├── README.md
│   ├── templates
│   │   └── targetgroupbindings.yaml
│   └── values.yaml
└── kustomization.yaml

2 directories, 7 files
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use &lt;a href="https://kustomize.io/" rel="noopener noreferrer"&gt;Kustomize&lt;/a&gt; to connect our ArgoCD applications together (minimising the number of “app of apps” connections needed) and that’s what the &lt;code&gt;kustomization.yaml&lt;/code&gt; file is for, and here it contains the two top-level &lt;code&gt;appset-*.yaml&lt;/code&gt; files.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;appset-aws-lb-controller.yaml&lt;/code&gt; file contains the AWS Load Balancer Controller ApplicationSet which uses the &lt;a href="https://github.com/kubernetes-sigs/aws-load-balancer-controller/tree/main/helm/aws-load-balancer-controller" rel="noopener noreferrer"&gt;Helm chart&lt;/a&gt; to install the controller on each of our clusters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# File: appset-aws-lb-controller.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ApplicationSet&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-lb-controller&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;generators&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# This is a little trick we use for nearly all our apps to control which apps&lt;/span&gt;
  &lt;span class="c1"&gt;# should be installed on which clusters. Terraform creates the secret that&lt;/span&gt;
  &lt;span class="c1"&gt;# contains these labels, so that's where the logic is controlled for setting&lt;/span&gt;
  &lt;span class="c1"&gt;# these labels to true/false.&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;clusters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;aws-lb-controller/enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;syncPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;preserveResourcesOnDeletion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-lb-controller-{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;madeupname&lt;/span&gt;
      &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-lb-controller&lt;/span&gt;
      &lt;span class="na"&gt;syncPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;automated&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;prune&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;selfHeal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;syncOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;CreateNamespace=true&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PruneLast=true&lt;/span&gt;

      &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://aws.github.io/eks-charts"&lt;/span&gt;
        &lt;span class="na"&gt;chart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-load-balancer-controller&lt;/span&gt;
        &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.4.4&lt;/span&gt;
        &lt;span class="na"&gt;helm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;releaseName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-lb-controller&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;clusterName: {{ name }}&lt;/span&gt;
            &lt;span class="s"&gt;serviceAccount:&lt;/span&gt;
              &lt;span class="s"&gt;create: true&lt;/span&gt;
              &lt;span class="s"&gt;annotations:&lt;/span&gt;
                &lt;span class="s"&gt;"eks.amazonaws.com/role-arn": "arn:aws:iam::123456789012:role/{{ name }}-aws-lb-controller"&lt;/span&gt;
              &lt;span class="s"&gt;name: aws-lb-controller&lt;/span&gt;
            &lt;span class="s"&gt;# We won't be using ingresses with this controller.&lt;/span&gt;
            &lt;span class="s"&gt;createIngressClassResource: false&lt;/span&gt;
            &lt;span class="s"&gt;disableIngressClassAnnotation: true&lt;/span&gt;

            &lt;span class="s"&gt;resources:&lt;/span&gt;
              &lt;span class="s"&gt;limits:&lt;/span&gt;
                &lt;span class="s"&gt;cpu: 100m&lt;/span&gt;
                &lt;span class="s"&gt;memory: 128Mi&lt;/span&gt;
              &lt;span class="s"&gt;requests:&lt;/span&gt;
                &lt;span class="s"&gt;cpu: 100m&lt;/span&gt;
                &lt;span class="s"&gt;memory: 128Mi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next up we have the &lt;code&gt;appset-targetgroupbindings.yaml&lt;/code&gt; ApplicationSet which creates the TargetGroupBinding on each of our clusters. For this, we needed to template some values based on the cluster and the most straightforward way I have found with ArgoCD is to create a minimalistic Helm chart for our purpose. This is what the &lt;code&gt;chart/&lt;/code&gt; directory is, which contains a single template file &lt;code&gt;targetgroupbindings.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let’s first look at the ApplicationSet which installs the Helm Chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# File: appset-targetgroupbindings.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ApplicationSet&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;targetgroupbindings&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;generators&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;clusters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;aws-lb-controller/enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
  &lt;span class="na"&gt;syncPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;preserveResourcesOnDeletion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;targetgroupbindings-{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;madeupname&lt;/span&gt;
      &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
        &lt;span class="c1"&gt;# TargetGroupBindings need to be in the same namespace as the service&lt;/span&gt;
        &lt;span class="c1"&gt;# they bind to&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-nginx&lt;/span&gt;
      &lt;span class="na"&gt;syncPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;automated&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;prune&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;selfHeal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;syncOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;CreateNamespace=true&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PruneLast=true&lt;/span&gt;

      &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;url-of-this-repo&amp;gt;"&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apps/aws-lb-controller/chart"&lt;/span&gt;
        &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;master"&lt;/span&gt;
        &lt;span class="na"&gt;helm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;releaseName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;targetgroupbinding&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;# AWS ARNs are not valid Kubernetes label values, so we had to split the ARN up and glue it back together here.&lt;/span&gt;
            &lt;span class="s"&gt;targetGroupArn: "arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/{{ metadata.labels.aws-lb-controller/targetgroup-name }}/{{ metadata.labels.aws-lb-controller/targetgroup-id }}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we can look at the single template in our minimal Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# File: targetgroupbindings.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;elbv2.k8s.aws/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TargetGroupBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-nginx&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;targetType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;instance&lt;/span&gt;
  &lt;span class="na"&gt;serviceRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-nginx-controller&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
  &lt;span class="na"&gt;targetGroupARN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;required "targetGroupArn required" .Values.targetGroupArn&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;
  &lt;span class="c1"&gt;# By default, add all nodes to the cluster unless they have the label&lt;/span&gt;
  &lt;span class="c1"&gt;# exclude-from-lb-targetgroups set&lt;/span&gt;
  &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchExpressions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;exclude-from-lb-targetgroups&lt;/span&gt;
        &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DoesNotExist&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s not an ideal situation to use Helm charts for this purpose, but the inspiration came from the ArgoCD app of apps &lt;a href="https://github.com/argoproj/argocd-example-apps/tree/master/helm-guestbook" rel="noopener noreferrer"&gt;example repository&lt;/a&gt;. Anyway, I am fairly happy with this implementation and it works dynamically for any new clusters that we add to ArgoCD.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post we looked at a fairly specific problem; binding EKS cluster nodes to existing Application Load Balancers using the &lt;code&gt;TargetGroupBinding&lt;/code&gt; CRD from the AWS Load Balancer Controller. The motivation to make this write-up came from the number of people asking about this on GitHub, and I think this is quite a simple and elegant approach.&lt;/p&gt;

&lt;p&gt;A point worth noting is that using the AWS Load Balancer Controller decouples your node management with your cluster management. Let’s say we wanted to use &lt;a href="https://karpenter.sh/" rel="noopener noreferrer"&gt;Karpenter&lt;/a&gt; for autoscaling instead of the defacto cluster-autoscaler. Karpenter will not use AWS AutoScalingGroups but will instead create standalone EC2 instances based on the &lt;a href="https://karpenter.sh/v0.16.2/provisioner/" rel="noopener noreferrer"&gt;Provisioners&lt;/a&gt; you define. This means our previous approach of attaching AutoScalingGroups with TargetGroups will not work as the EC2 instances Karpenter manages will not belong to the AutoScalingGroup and therefore not be automatically attached to the TargetGroup. The AWS Load Balancer Controller doesn’t care how the nodes are created; only that they belong to the cluster and match the label selectors defined. Probably we will look into Karpenter again in the near future for our project now that it &lt;a href="https://github.com/aws/karpenter/issues/942" rel="noopener noreferrer"&gt;supports pod anti-affinity&lt;/a&gt;, as this was previously a blocker for us.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>terraform</category>
      <category>tutorial</category>
      <category>eks</category>
    </item>
    <item>
      <title>How to secure Terraform code with Trivy</title>
      <dc:creator>Verifa crew</dc:creator>
      <pubDate>Wed, 14 Aug 2024 07:10:39 +0000</pubDate>
      <link>https://forem.com/verifacrew/how-to-secure-terraform-code-with-trivy-3m9m</link>
      <guid>https://forem.com/verifacrew/how-to-secure-terraform-code-with-trivy-3m9m</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally posted on &lt;a href="https://verifa.io/blog/how-to-secure-terraform-trivy/" rel="noopener noreferrer"&gt;Verifa's blog&lt;/a&gt;, written by Mike Vainio.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this blog post we will look at securing an AWS Terraform configuration using Trivy to check for known security issues. We will explore different ways of using Trivy, integrating it into your CI pipelines, practical issues you might face and solutions to those issues to get you started with improving the security of your IaC codebases.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Terraform is a powerful tool with a thriving community that makes it easy to find ready-made modules and providers for practically any cloud platform or service that exposes an API. Also internally many companies have a great deal of modules available. One of the strengths of Terraform is that modules provide an abstraction. You don’t have to worry about what is underneath the module’s variables (interface); you provide the necessary values and off you go, but this might lead into some trouble security-wise. Especially in public cloud platforms, it’s easy to expose a VM, load balancer or an object storage bucket publicly to the internet, and when using an abstraction this can happen without you truly acknowledging it. If you are familiar with Terraform, you might say, “Well I will just review the plan before applying”. But if the module is presenting you a plan of creating/modifying 200 resources, are you really confident you can eyeball that information and catch a misconfiguration that would expose your infrastructure to an attacker?&lt;/p&gt;

&lt;p&gt;At the time of writing there are around 16,000 modules available from &lt;a href="https://registry.terraform.io/browse/modules" rel="noopener noreferrer"&gt;HashiCorp’s public registry&lt;/a&gt;. In this post we will pick a couple of AWS modules and check for insecure configurations. For this “check”, we will use an open source tool called &lt;a href="https://trivy.dev/" rel="noopener noreferrer"&gt;Trivy&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Scanners for Terraform
&lt;/h2&gt;

&lt;p&gt;One of the big upsides of maintaining your infrastructure using an IaC approach is the fact that your infrastructure can be analysed by static analysis tools since your infrastructure is in plain text files. We can analyse the infrastructure before creating any resources to get quick feedback on the security posture and fix any issues before deployment. The only problem is that there are so many tools! After trying few alternatives, however, I have settled on a favourite that is both easy to use and effective at finding issues with built-in checks. In the past this favourite tool was &lt;code&gt;tfsec&lt;/code&gt; , but quite recently the development efforts of the Tfsec project have been migrated into the Trivy project. Thus, it’s time to move over to Trivy although it’s not specialised to Terraform like Tfsec was.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!NOTE]&lt;br&gt;
I noticed a few differences between Tfsec and Trivy when comparing their results and I will make note of these later in the hands-on section. Based on the GitHub issues and PRs, both open and closed, I am confident Trivy will eventually match and surpass Tfsec in features and accuracy as the development team looks very keen on closing gaps between the two tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Worth noting that there are some great open-source alternatives to Trivy, but overall we have found Trivy to be both easy to use locally and to integrate into build pipelines.&lt;/p&gt;

&lt;p&gt;There’s of course nobody stopping you from using multiple tools, and when you automate the checks, that might not be a big deal to implement in the end. However, the goal of this blog post is not to focus on comparing different tools. What we want to focus on is that &lt;strong&gt;you should use a tool like this to perform security checks on your Terraform code,&lt;/strong&gt; and it’s quite trivial to accomplish in the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to Trivy
&lt;/h2&gt;

&lt;p&gt;Trivy is a Swiss army knife type of tool for security scanning of various types of artifacts and code. It can scan different targets such as your local filesystem or a container image from a container registry. It can also check for many kinds of security issues such as known vulnerabilities, exposed secrets and most relevant to this blog post; misconfigurations.&lt;/p&gt;

&lt;p&gt;At the time of writing Trivy supports scanning of various IaC configurations such as Terraform, &lt;a href="https://aws.amazon.com/cloudformation/" rel="noopener noreferrer"&gt;CloudFormation&lt;/a&gt; and &lt;a href="https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/overview" rel="noopener noreferrer"&gt;Azure Resource Manager&lt;/a&gt;. So even if your organisation uses different tools across teams, Trivy might just be the right tool. Trivy comes with built-in checks for various cloud platforms and in this blog post we will only use the built-in checks, but you can also define your own &lt;a href="https://aquasecurity.github.io/trivy/latest/docs/scanner/misconfiguration/custom/" rel="noopener noreferrer"&gt;custom checks/policies&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Trivy can also scan for secrets which you should also use in the IaC context, but this is not really specific to the Terraform use-case. I suggest looking into the &lt;a href="https://aquasecurity.github.io/trivy/latest/docs" rel="noopener noreferrer"&gt;Trivy documentation&lt;/a&gt; to discover all of it’s power beyond what I already covered here as this will naturally evolve over time.&lt;/p&gt;

&lt;p&gt;Now it’s time to get our hands dirty and look at an example of how Trivy can save you from doing things that might put your organisation in jeopardy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Trivy
&lt;/h2&gt;

&lt;p&gt;For installation I suggest checking out the &lt;a href="https://aquasecurity.github.io/trivy/latest/getting-started/installation/" rel="noopener noreferrer"&gt;installation guide in the documentation&lt;/a&gt; that covers all supported platforms. But for a quick start, here are a couple of commands that work for most folks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install trivy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Debian/Ubuntu:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apt install trivy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;choco install trivy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are also pre-built packages available for various Linux distros, or grab the binary from GitHub releases: &lt;a href="https://github.com/aquasecurity/trivy/releases" rel="noopener noreferrer"&gt;https://github.com/aquasecurity/trivy/releases&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I highly suggest &lt;a href="https://aquasecurity.github.io/trivy/latest/getting-started/signature-verification/" rel="noopener noreferrer"&gt;verifying the signature&lt;/a&gt; when installing, especially when you are using Trivy in your production build pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scanning an Example Terraform Module
&lt;/h2&gt;

&lt;p&gt;Let’s create an example Terraform root module in order to get something to point Trivy at. Like I mentioned earlier, there are many open-source modules for Terraform that we can utilise in order to quickly build infrastructure. The &lt;a href="https://registry.terraform.io/namespaces/terraform-aws-modules" rel="noopener noreferrer"&gt;AWS modules&lt;/a&gt; are especially popular, so I thought let’s write an example by utilising a couple of these modules with mostly their default configuration. Here’s what I came up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;#main.tf&lt;/span&gt;
&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 5.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;common_tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Terraform&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt;
    &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/vpc/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"5.0.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-vpc"&lt;/span&gt;
  &lt;span class="nx"&gt;cidr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;

  &lt;span class="nx"&gt;azs&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"eu-west-1a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"eu-west-1b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"eu-west-1c"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.1.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.2.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.3.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnets&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.101.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.102.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.103.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;enable_nat_gateway&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;common_tags&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_ami"&lt;/span&gt; &lt;span class="s2"&gt;"amazon_linux"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;most_recent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;owners&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"amazon"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"name"&lt;/span&gt;

    &lt;span class="nx"&gt;values&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;"amzn2-ami-hvm-*-x86_64-gp2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"owner-alias"&lt;/span&gt;

    &lt;span class="nx"&gt;values&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;"amazon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ami&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_ami&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;amazon_linux&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t3.nano"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"alb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/alb/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"8.7.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-alb"&lt;/span&gt;

  &lt;span class="nx"&gt;load_balancer_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"application"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;subnets&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnets&lt;/span&gt;

  &lt;span class="nx"&gt;target_groups&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;name_prefix&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"pref-"&lt;/span&gt;
      &lt;span class="nx"&gt;backend_protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt;
      &lt;span class="nx"&gt;backend_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
      &lt;span class="nx"&gt;target_type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"instance"&lt;/span&gt;
      &lt;span class="nx"&gt;targets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;my_ec2&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;target_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
          &lt;span class="nx"&gt;port&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;http_tcp_listeners&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;port&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
      &lt;span class="nx"&gt;protocol&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt;
      &lt;span class="nx"&gt;target_group_index&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;common_tags&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration is ~100 LoC and it will create a VPC, an EC2 instance and an ALB. Naturally, the ALB also targets the EC2 instance. After creating the file, let’s initialise Terraform to download the external modules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;[!NOTE]&lt;br&gt;
If you are working with local modules then there is no need to run &lt;code&gt;terraform init&lt;/code&gt; before the scan as all files are already present, but remote modules must be fetched in to the &lt;code&gt;.terraform&lt;/code&gt; folder before a scan.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Simplest way to run a Trivy misconfiguration scan is to point it at your current folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy config &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Like mentioned earlier, we can also scan for secrets at the same time with Trivy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy fs &lt;span class="nt"&gt;--scanners&lt;/span&gt; misconfig,secret &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Due to the focus on Terraform, I’ll use the &lt;code&gt;config&lt;/code&gt; subcommand for the rest of the blog post, but in a CI pipeline I would run the secrets scanning definitely for the whole repository as well, not only in IaC folders.&lt;/p&gt;

&lt;p&gt;Before showing the full results, I noticed there are some example configurations picked up by Trivy from the remote modules, such as this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;HIGH: IAM policy document uses sensitive action &lt;span class="s1"&gt;'logs:CreateLogStream'&lt;/span&gt; on wildcarded resource &lt;span class="s1"&gt;'*'&lt;/span&gt;
═══════════════════════════════════════════════════════════════════════════════════════════════════════
You should use the principle of least privilege when defining your IAM policies.
This means you should specify each exact permission required without using wildcards,
as this could cause the granting of access to certain undesired actions, resources and principals.

See https://avd.aquasec.com/misconfig/avd-aws-0057
───────────────────────────────────────────────────────────────────────────────────────────────────────
 modules/vpc/vpc-flow-logs.tf:112
   via modules/vpc/vpc-flow-logs.tf:100-113 &lt;span class="o"&gt;(&lt;/span&gt;data.aws_iam_policy_document.vpc_flow_log_cloudwatch[0]&lt;span class="o"&gt;)&lt;/span&gt;
    via modules/vpc/vpc-flow-logs.tf:97-114 &lt;span class="o"&gt;(&lt;/span&gt;data.aws_iam_policy_document.vpc_flow_log_cloudwatch[0]&lt;span class="o"&gt;)&lt;/span&gt;
     via modules/vpc/examples/complete/main.tf:25-82 &lt;span class="o"&gt;(&lt;/span&gt;module.vpc&lt;span class="o"&gt;)&lt;/span&gt;
───────────────────────────────────────────────────────────────────────────────────────────────────────
  97   data &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"vpc_flow_log_cloudwatch"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  ..
 112 &lt;span class="o"&gt;[&lt;/span&gt;     resources &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
 ...
 114   &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you look closely you notice that the source of the finding is a &lt;code&gt;main.tf&lt;/code&gt; file in the examples folder of the VPC module: &lt;code&gt;via modules/vpc/examples/complete/main.tf:25-82 (module.vpc)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This is not really our configuration and we should not include these files in the scan. This is also a difference between Tfsec and Trivy, when running Tfsec it does not pickup the examples folder.&lt;/p&gt;

&lt;p&gt;However, we can easily resolve this by skipping all files under &lt;code&gt;examples&lt;/code&gt; folders and then we should get proper report of findings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy config &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--skip-dirs&lt;/span&gt; &lt;span class="s1"&gt;'**/examples'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;.terraform/modules/alb/main.tf &lt;span class="o"&gt;(&lt;/span&gt;terraform&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;==========================================&lt;/span&gt;
Tests: 3 &lt;span class="o"&gt;(&lt;/span&gt;SUCCESSES: 1, FAILURES: 2, EXCEPTIONS: 0&lt;span class="o"&gt;)&lt;/span&gt;
Failures: 2 &lt;span class="o"&gt;(&lt;/span&gt;UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 2, CRITICAL: 0&lt;span class="o"&gt;)&lt;/span&gt;

HIGH: Application load balancer is not &lt;span class="nb"&gt;set &lt;/span&gt;to drop invalid headers.
════════════════════════════════════════════════════════════════════════════════════════════════
Passing unknown or invalid headers through to the target poses a potential risk of compromise.

By setting drop_invalid_header_fields to &lt;span class="nb"&gt;true&lt;/span&gt;, anything that doe not conform to well known,
defined headers will be removed by the load balancer.

See https://avd.aquasec.com/misconfig/avd-aws-0052
────────────────────────────────────────────────────────────────────────────────────────────────
 .terraform/modules/alb/main.tf:23
   via .terraform/modules/alb/main.tf:5-63 &lt;span class="o"&gt;(&lt;/span&gt;aws_lb.this[0]&lt;span class="o"&gt;)&lt;/span&gt;
────────────────────────────────────────────────────────────────────────────────────────────────
   5   resource &lt;span class="s2"&gt;"aws_lb"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="nb"&gt;.&lt;/span&gt;
  23 &lt;span class="o"&gt;[&lt;/span&gt;   drop_invalid_header_fields                  &lt;span class="o"&gt;=&lt;/span&gt; var.drop_invalid_header_fields
  ..
  63   &lt;span class="o"&gt;}&lt;/span&gt;
────────────────────────────────────────────────────────────────────────────────────────────────

HIGH: Load balancer is exposed publicly.
════════════════════════════════════════════════════════════════════════════════════════════════
There are many scenarios &lt;span class="k"&gt;in &lt;/span&gt;which you would want to expose a load balancer to the wider internet,
but this check exists as a warning to prevent accidental exposure of internal assets.
You should ensure that this resource should be exposed publicly.

See https://avd.aquasec.com/misconfig/avd-aws-0053
────────────────────────────────────────────────────────────────────────────────────────────────
 .terraform/modules/alb/main.tf:12
   via .terraform/modules/alb/main.tf:5-63 &lt;span class="o"&gt;(&lt;/span&gt;aws_lb.this[0]&lt;span class="o"&gt;)&lt;/span&gt;
────────────────────────────────────────────────────────────────────────────────────────────────
   5   resource &lt;span class="s2"&gt;"aws_lb"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="nb"&gt;.&lt;/span&gt;
  12 &lt;span class="o"&gt;[&lt;/span&gt;   internal           &lt;span class="o"&gt;=&lt;/span&gt; var.internal
  ..
  63   &lt;span class="o"&gt;}&lt;/span&gt;
────────────────────────────────────────────────────────────────────────────────────────────────

.terraform/modules/vpc/main.tf &lt;span class="o"&gt;(&lt;/span&gt;terraform&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;==========================================&lt;/span&gt;
Tests: 1 &lt;span class="o"&gt;(&lt;/span&gt;SUCCESSES: 0, FAILURES: 1, EXCEPTIONS: 0&lt;span class="o"&gt;)&lt;/span&gt;
Failures: 1 &lt;span class="o"&gt;(&lt;/span&gt;UNKNOWN: 0, LOW: 0, MEDIUM: 1, HIGH: 0, CRITICAL: 0&lt;span class="o"&gt;)&lt;/span&gt;

MEDIUM: VPC Flow Logs is not enabled &lt;span class="k"&gt;for &lt;/span&gt;VPC
════════════════════════════════════════════════════════════════════════════════════════════════
VPC Flow Logs provide visibility into network traffic that traverses the VPC and can be used to
detect anomalous traffic or insight during security workflows.

See https://avd.aquasec.com/misconfig/avd-aws-0178
────────────────────────────────────────────────────────────────────────────────────────────────
 .terraform/modules/vpc/main.tf:29-52
────────────────────────────────────────────────────────────────────────────────────────────────
  29 ┌ resource &lt;span class="s2"&gt;"aws_vpc"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  30 │   count &lt;span class="o"&gt;=&lt;/span&gt; local.create_vpc ? 1 : 0
  31 │
  32 │   cidr_block          &lt;span class="o"&gt;=&lt;/span&gt; var.use_ipam_pool ? null : var.cidr
  33 │   ipv4_ipam_pool_id   &lt;span class="o"&gt;=&lt;/span&gt; var.ipv4_ipam_pool_id
  34 │   ipv4_netmask_length &lt;span class="o"&gt;=&lt;/span&gt; var.ipv4_netmask_length
  35 │
  36 │   assign_generated_ipv6_cidr_block     &lt;span class="o"&gt;=&lt;/span&gt; var.enable_ipv6 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;var.use_ipam_pool ? &lt;span class="nb"&gt;true&lt;/span&gt; : null
  37 └   ipv6_cidr_block                      &lt;span class="o"&gt;=&lt;/span&gt; var.ipv6_cidr
  ..
────────────────────────────────────────────────────────────────────────────────────────────────

main.tf &lt;span class="o"&gt;(&lt;/span&gt;terraform&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;===================&lt;/span&gt;
Tests: 3 &lt;span class="o"&gt;(&lt;/span&gt;SUCCESSES: 1, FAILURES: 2, EXCEPTIONS: 0&lt;span class="o"&gt;)&lt;/span&gt;
Failures: 2 &lt;span class="o"&gt;(&lt;/span&gt;UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 2, CRITICAL: 0&lt;span class="o"&gt;)&lt;/span&gt;

HIGH: Instance does not require IMDS access to require a token
════════════════════════════════════════════════════════════════════════════════════════════════

IMDS v2 &lt;span class="o"&gt;(&lt;/span&gt;Instance Metadata Service&lt;span class="o"&gt;)&lt;/span&gt; introduced session authentication tokens which improve
security when talking to IMDS.
By default &amp;lt;code&amp;gt;aws_instance&amp;lt;/code&amp;gt; resource sets IMDS session auth tokens to be optional.
To fully protect IMDS you need to &lt;span class="nb"&gt;enable &lt;/span&gt;session tokens by using &amp;lt;code&amp;gt;metadata_options&amp;lt;/code&amp;gt;
block and its &amp;lt;code&amp;gt;http_tokens&amp;lt;/code&amp;gt; variable &lt;span class="nb"&gt;set &lt;/span&gt;to &amp;lt;code&amp;gt;required&amp;lt;/code&amp;gt;.

See https://avd.aquasec.com/misconfig/avd-aws-0028
────────────────────────────────────────────────────────────────────────────────────────────────
 main.tf:55-59
────────────────────────────────────────────────────────────────────────────────────────────────
  55 ┌ resource &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  56 │   ami           &lt;span class="o"&gt;=&lt;/span&gt; data.aws_ami.amazon_linux.id
  57 │   instance_type &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t3.nano"&lt;/span&gt;
  58 │   subnet_id     &lt;span class="o"&gt;=&lt;/span&gt; element&lt;span class="o"&gt;(&lt;/span&gt;module.vpc.private_subnets, 0&lt;span class="o"&gt;)&lt;/span&gt;
  59 └ &lt;span class="o"&gt;}&lt;/span&gt;
────────────────────────────────────────────────────────────────────────────────────────────────

HIGH: Root block device is not encrypted.
════════════════════════════════════════════════════════════════════════════════════════════════
Block devices should be encrypted to ensure sensitive data is held securely at rest.

See https://avd.aquasec.com/misconfig/avd-aws-0131
────────────────────────────────────────────────────────────────────────────────────────────────
 main.tf:55-59
────────────────────────────────────────────────────────────────────────────────────────────────
  55 ┌ resource &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  56 │   ami           &lt;span class="o"&gt;=&lt;/span&gt; data.aws_ami.amazon_linux.id
  57 │   instance_type &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t3.nano"&lt;/span&gt;
  58 │   subnet_id     &lt;span class="o"&gt;=&lt;/span&gt; element&lt;span class="o"&gt;(&lt;/span&gt;module.vpc.private_subnets, 0&lt;span class="o"&gt;)&lt;/span&gt;
  59 └ &lt;span class="o"&gt;}&lt;/span&gt;
────────────────────────────────────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For brevity I shortened some of the long lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspecting the Results
&lt;/h2&gt;

&lt;p&gt;Unfortunately Trivy does not print a summary in the end like &lt;code&gt;tfsec&lt;/code&gt; does which makes it nice to read the output from bottom to top. Trivy does offer different ways to modify the resulting report, but for the needs of this blog I quickly used &lt;code&gt;grep&lt;/code&gt; to find a short summary of each finding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HIGH: Application load balancer is not set to drop invalid headers.
HIGH: Load balancer is exposed publicly.
MEDIUM: VPC Flow Logs is not enabled for VPC
HIGH: Instance does not require IMDS access to require a token
HIGH: Root block device is not encrypted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looking at the list, I think we want to change the configuration before deploying the resources. Next, let’s look at our options for resolving these findings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resolving the Issues
&lt;/h2&gt;

&lt;p&gt;We have two choices when it comes to resolving these issues so that we can have a nice clean report (until we change the configuration again). We can either resolve the issues by modifying our configuration or we can choose to accept the finding as something that is not relevant for our requirements and ignore the findings for future scans.&lt;/p&gt;

&lt;p&gt;Since we use the public AWS modules, we cannot easily make changes besides the inputs without forking the source module, however we can change the EC2 instance which is defined directly in the &lt;code&gt;main.tf&lt;/code&gt; . So let’s look into the issues related to the EC2 instance first.&lt;/p&gt;

&lt;p&gt;There is a finding related to the AWS Instance Metadata Service (IMDS). The finding is related to making sure the instance uses the IMDSv2 instead of the legacy IMDSv1. Looking at the full report above, you can see that Trivy explicitly tells us what the problem is and how to resolve it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IMDS v2 (Instance Metadata Service) introduced session authentication tokens which improve
security when talking to IMDS.
By default &amp;lt;code&amp;gt;aws_instance&amp;lt;/code&amp;gt; resource sets IMDS session auth tokens to be optional.
To fully protect IMDS you need to enable session tokens by using &amp;lt;code&amp;gt;metadata_options&amp;lt;/code&amp;gt;
block and its &amp;lt;code&amp;gt;http_tokens&amp;lt;/code&amp;gt; variable set to &amp;lt;code&amp;gt;required&amp;lt;/code&amp;gt;.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can read more of the security benefits and scenarios where this configuration matters in the &lt;a href="https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/" rel="noopener noreferrer"&gt;IMDSv2 announcement blog post by AWS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To resolve this, we are going to change the configuration in the following way, like the description suggested:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="p"&gt;resource "aws_instance" "this" {
&lt;/span&gt;   ami           = data.aws_ami.amazon_linux.id
   instance_type = "t3.nano"
   subnet_id     = element(module.vpc.private_subnets, 0)
&lt;span class="gi"&gt;+
+  metadata_options {
+    http_tokens = "required"
+  }
&lt;/span&gt; }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run the scan again you will notice the finding is gone:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy config &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--skip-dirs&lt;/span&gt; &lt;span class="s1"&gt;'**/examples'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s move onto the next issue that is related to the root disk being unencrypted for this EC2 instance. In reality I would configure encryption because it is a common compliancy requirement and AWS makes it very easy, but for the sake of the example let’s ignore this instead, saying that we are ok with running unencrypted root disk:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gi"&gt;+#trivy:ignore:avd-aws-0131
&lt;/span&gt; resource "aws_instance" "this" {
   ami           = data.aws_ami.amazon_linux.id
   instance_type = "t3.nano"
   subnet_id     = element(module.vpc.private_subnets, 0)
&lt;span class="err"&gt;
&lt;/span&gt;   metadata_options {
     http_tokens = "required"
   }
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the inline method for ignoring findings is the most intuitive way in my opinion, this might be familiar to you if you have worked with just about any code linter in the past.&lt;/p&gt;

&lt;p&gt;Now the only remaining issues are related to the AWS modules which we did not author. Unfortunately, right now Trivy can’t figure out that the remote modules are downloaded under different path (&lt;code&gt;.terraform/modules&lt;/code&gt;) than what is declared when specifying the source for the modules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"alb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/alb/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"8.7.0"&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, not an issue when using local modules since the paths nicely match between the findings report and the declaration in the Terraform configuration. This issue is actively being &lt;a href="https://github.com/aquasecurity/trivy/discussions/5872" rel="noopener noreferrer"&gt;discussed in the Trivy GitHub repository&lt;/a&gt;, so when you read this it might be fixed and I should update this blog.&lt;/p&gt;

&lt;p&gt;Luckily Trivy has a cure for this even without us waiting for a fix. I quickly brewed a solution using an &lt;a href="https://aquasecurity.github.io/trivy/v0.48/docs/configuration/filtering/#by-open-policy-agent" rel="noopener noreferrer"&gt;advanced filtering mechanism&lt;/a&gt; in Trivy that uses the &lt;a href="https://www.openpolicyagent.org/docs/latest/policy-language/" rel="noopener noreferrer"&gt;Rego&lt;/a&gt; language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="ow"&gt;package&lt;/span&gt; &lt;span class="n"&gt;trivy&lt;/span&gt;

&lt;span class="ow"&gt;import&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trivy&lt;/span&gt;

&lt;span class="ow"&gt;default&lt;/span&gt; &lt;span class="n"&gt;ignore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="n"&gt;ignore_avdid&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"AVD-AWS-0052"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"AVD-AWS-0053"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ignore_severities&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"LOW"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"MEDIUM"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ignore&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AVDID&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ignore_avdid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ignore&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Severity&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ignore_severities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will resolve the issue with the VPC module because all &lt;code&gt;MEDIUM&lt;/code&gt; findings are ignored, and the  findings in the ALB module are ignored explicitly by the finding IDs.&lt;/p&gt;

&lt;p&gt;Now we can run a scan and include this policy from a local file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy config &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--skip-dirs&lt;/span&gt; &lt;span class="s1"&gt;'**/examples'&lt;/span&gt; &lt;span class="nt"&gt;--ignore-policy&lt;/span&gt; custom-policy.rego
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now all the findings should be resolved (of course if you try this yourself, there might be new built-in checks and you get a different list of findings).&lt;/p&gt;

&lt;p&gt;While authoring the custom ignore policy, I couldn’t figure out how to connect the source of the finding (the ALB module) to the AVDID for a more fine-grained rule, but that does not seem like a big deal to me. You should not place all your IaC configuration into a single root module after all, in production I would separate the VPC creation from this Terraform root module and use an existing one that is part of a different root module.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scanning Terraform Plans
&lt;/h2&gt;

&lt;p&gt;Another way to run the scan is to first create a plan and then convert the plan from the default binary format to JSON, and then point Trivy to scan this plan that contains a list of changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform plan &lt;span class="nt"&gt;--out&lt;/span&gt; tf.plan
terraform show &lt;span class="nt"&gt;-json&lt;/span&gt; tf.plan &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; tfplan.json
trivy config tfplan.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I noticed that there’s one additional finding when running Trivy against the plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CRITICAL: Listener for application load balancer does not use HTTPS.
═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Plain HTTP is unencrypted and human-readable. This means that if a malicious actor was to eavesdrop on your connection,
they would be able to see all of your data flowing back and forth.

You should use HTTPS, which is HTTP over an encrypted (TLS) connection, meaning eavesdroppers cannot read your traffic.

See https://avd.aquasec.com/misconfig/avd-aws-0054
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 main.tf:36
   via main.tf:34-48 (aws_lb_listener.frontend_http_tcp_ffdb4db32d85be4b5cd7539e4d3c6d16)
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ..
  36 [  protocol = "HTTP"
  ..
  48   }
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I found it odd that this was not found in the previous scan and after some testing this seems to be caused by using remote modules (again!). I also noticed that running &lt;code&gt;tfsec&lt;/code&gt; instead of Trivy will catch this issue. When using local modules Trivy can right away pick up this finding. Right now it seems necessary to scan both your plan and your configuration for the best accuracy, but I might revisit this in the future and see if the issue is resolved given the active discussion around remote modules support in Trivy.&lt;/p&gt;

&lt;p&gt;Luckily, we can work around this also by scanning everything at once, so generate the plan and then scan the entire folder (then Trivy processes the configuration AND the plan):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform plan &lt;span class="nt"&gt;--out&lt;/span&gt; tf.plan
terraform show &lt;span class="nt"&gt;-json&lt;/span&gt; tf.plan &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; tfplan.json
trivy config &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--skip-dirs&lt;/span&gt; &lt;span class="s1"&gt;'**/examples'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trivy is smart enough to not duplicate the findings - awesome! With Tfsec it was not possible to scan Terraform plans, so this is great if scanning the plan fits your workflow better.&lt;/p&gt;

&lt;p&gt;As we saw above, scanning the Terraform plan is more accurate than scanning just the files, but the downside is that you need to generate a plan for each Terraform root module and it takes much more time to generate a plan prior to running Trivy. You also need to be able to connect and authenticate to the providers for Terraform to generate the plan, although that’s not typically an issue.&lt;/p&gt;

&lt;p&gt;One more thing to note about plans is that your inline &lt;code&gt;#trivy:ignore&lt;/code&gt; comments will be ignored since that information will not make it into the plan, so if you are using plans primarily for your scanning, then you might need to get comfortable defining the Rego ignore policies instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Trivy in CI
&lt;/h2&gt;

&lt;p&gt;Including Trivy scans in your IaC repositories’ CI pipelines is a must. If you don’t have CI pipelines for your IaC… Well you should! Trivy offers integrations with many CI/CD tools, IDEs and other systems, see the &lt;a href="https://aquasecurity.github.io/trivy/latest/ecosystem/" rel="noopener noreferrer"&gt;documentation for an up-to-date list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When using Trivy in CI it’s wise to use a &lt;a href="https://aquasecurity.github.io/trivy/latest/docs/references/configuration/config-file/" rel="noopener noreferrer"&gt;configuration file&lt;/a&gt; instead of the command line flags, this makes it easy to reproduce the scan using same configuration locally if you need to investigate some new findings. If you are using GitHub Actions, there’s an &lt;a href="https://github.com/aquasecurity/trivy-action/tree/master" rel="noopener noreferrer"&gt;official Action&lt;/a&gt; that you can use to integrate Trivy into your CI pipeline, here’s a simple example which uses a configuration file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-20.04&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout code&lt;/span&gt;
      &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Trivy vulnerability scanner in fs mode&lt;/span&gt;
      &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@master&lt;/span&gt;
      &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;scan-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fs'&lt;/span&gt;
        &lt;span class="na"&gt;scan-ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.'&lt;/span&gt;
        &lt;span class="na"&gt;trivy-config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trivy.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As seen in the above example, in CI you likely want to run the &lt;code&gt;fs&lt;/code&gt; scan which includes by default all the scanners, meaning Trivy will also scan for secrets and vulnerabilities, not only for misconfigurations.&lt;/p&gt;

&lt;p&gt;However, keep in mind &lt;a href="https://verifa.io/blog/keep-your-pipelines-simple" rel="noopener noreferrer"&gt;this excellent blog post by my colleague Thierry&lt;/a&gt;. The Trivy action only really wraps the GitHub workflow YAML inputs to CLI flags. If you are using another CI/CD system, you can simply install and invoke the CLI as well, making transition between CI/CD tools extremely simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus for GitHub Users
&lt;/h2&gt;

&lt;p&gt;If you have an open-source project in GitHub or you pay GitHub for the advanced security features, then you can also upload the Trivy scan results into the GitHub code scanning which you should be using if you are not already. Refer to the &lt;a href="https://github.com/aquasecurity/trivy-action?tab=readme-ov-file#using-trivy-with-github-code-scanning" rel="noopener noreferrer"&gt;Trivy Action’s README&lt;/a&gt; to view a sample configuration of uploading the results. This will help you to track the findings in addition to gatekeeping with a pipeline that must pass always before merging code (or however your team works).&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this blog post we rolled up our sleeves and looked into how to secure Terraform configuration by using static analysis. As an example and recommended tool we explored Trivy, but honestly the tool choice isn’t as important as the principle of integrating such checks into your workflow. I hope you can see the value of running a simple scan over your configuration. Thanks to extensive builtin checks in Trivy you can get actionable findings without spending time reviewing the configuration manually and magically knowing all the security intricacies of AWS infrastructure.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>terraform</category>
      <category>tutorial</category>
      <category>security</category>
    </item>
    <item>
      <title>Demystifying Service Level acronyms and Error Budgets</title>
      <dc:creator>Verifa crew</dc:creator>
      <pubDate>Wed, 12 Jun 2024 13:31:46 +0000</pubDate>
      <link>https://forem.com/verifacrew/demystifying-service-level-acronyms-and-error-budgets-1m57</link>
      <guid>https://forem.com/verifacrew/demystifying-service-level-acronyms-and-error-budgets-1m57</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally posted on &lt;a href="https://verifa.io/blog/demystifying-service-level-acronyms/" rel="noopener noreferrer"&gt;Verifa's blog&lt;/a&gt;, written by Lauri Suomalainen&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Availability, fault tolerance, reliability, resilience. These are some of the terms that pop up when delivering digital services to users at scale. Acronyms related to Service Levels tend to pop up as well. Most developers have at least seen SLA, SLO and SLI and some even know what they mean. However, based on personal experience, not many people who work in the intersection of writing, delivering and maintaining software necessarily know how to make use of them in their software delivery process.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this fundamental level blog post I will explain what different Service Level concepts mean and how to use them effectively in the software delivery process.&lt;/p&gt;

&lt;p&gt;I also did a talk at DevOps Finland on this topic, &lt;a href="https://verifa.io/blog/service-levels-error-budgets-devops-finland-talk/" rel="noopener noreferrer"&gt;Service Levels, Error budgets, and why your dev teams should care.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Service Level and why does it matter?
&lt;/h2&gt;

&lt;p&gt;Depending on the source, I have seen claims that anywhere from 40% to a &lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610582/" rel="noopener noreferrer"&gt;whopping 90% of a software systems lifetime costs&lt;/a&gt; consist of operational and maintenance costs, making the development costs of the software pale in comparison. &lt;a href="https://assets.new.siemens.com/siemens/assets/api/uuid:3d606495-dbe0-43e4-80b1-d04e27ada920/dics-b10153-00-7600truecostofdowntime2022-144.pdf" rel="noopener noreferrer"&gt;Additionally, costs of even short service breaks and unplanned downtime are significant&lt;/a&gt; and getting more expensive still. This goes to show that being able to maintain your service availability and preferably being able to preemptively react to service degradation is not just a matter of convenience, but carries a very real price tag with business consequences.&lt;/p&gt;

&lt;p&gt;Service Level embodies the overall performance of your software system. It consists of goals that, when met, indicate that your system is performing at the desired level, and measurements which tell if those goals are being met, exceeded or if the system is underperforming. There are three distinct concepts associated with the Service Level: Service Level &lt;em&gt;Agreement&lt;/em&gt; (SLA), Service Level &lt;em&gt;Objective&lt;/em&gt; (SLO) and Service Level &lt;em&gt;Indicator&lt;/em&gt; (SLI).&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Level Agreements
&lt;/h3&gt;

&lt;p&gt;Service Level Agreements are the base level of performance and functionality you promise to your users, be those paying customers or developers using your internal tooling platform and databases. Typically SLAs are seen to relate to service availability and is expressed as a percentage like 99,9 (colloquially ‘three-nines’, ‘four-nines’ for 99,99% and so on), but soon we will see that this is a simplification. Especially in public cloud computing, failing to meet a set SLA carries a contractual penalty for the provider and a compensation to their clients, such as refunds or discounts, so it is in a provider’s best interest to react preemptively when the software system shows signs of degradation. That brings us to Service Level Objectives (SLOs).&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Level Objectives
&lt;/h3&gt;

&lt;p&gt;Service Level Objectives are goal values you set for your software system. They are not contractually bound like SLA values, but they still define the minimum baseline for your software system to be considered functional. It is a good practice to set SLOs slightly stricter than the thresholds defined in your SLAs; if your SLA promises 99,5% availability, 99,7% SLO gives you some leeway to fix problems in your software before they manifest for your users and start incurring sanctions. Obviously, you want to detect the symptoms before you start violating your SLOs and you do that by monitoring and measuring your Service Level Indicators (SLIs).&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Level Indicators
&lt;/h3&gt;

&lt;p&gt;Service Level Indicators are metrics you collect about your software system’s health. Often coined under general term of ‘availability’, SLIs are specific technical measurements the system produces. However, what constitutes of availability and unavailability varies from system to system. Straightforward simplifications on availability such as ‘my server is on and reachable 99,9% of the time’ can easily hide symptoms of a badly behaving system. SLIs should be values that actually matter to the users and their experience with the software system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8x51rio1t6mb182d66o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8x51rio1t6mb182d66o.png" alt="Image description" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What to measure and how?
&lt;/h2&gt;

&lt;p&gt;So, general ‘availability’ is not a good enough metric. Why?&lt;/p&gt;

&lt;p&gt;A very heavy-handed example would be an Industrial Control System (ICS) that is only used during the day when the work is done on the factory floor. However, there’s a bug in this ICS that causes random disconnections and freezes when a certain load is reached in the system. This happens frequently during the day, but never outside working hours. If you would only monitor for server health or network connectivity (instead of HTTP error codes for example), your metrics would never reveal the problem affecting your users. In this scenario it does not matter if your SLO is met as it does not give you insight to how your users interact with the systems and how they experience using it. In the worst case scenario your SLI is just plain wrong, but even a good SLI may hide bad behaviour if the measurement window is too wide.&lt;/p&gt;

&lt;p&gt;A simplistic way to measure availability is to look at the ‘good time’ your system experiences divided by total time. In the example above, you could have a health check or a liveness probe periodically checking on the server health and everything would look fine based on it. A more refined approach to measuring availability would be to measure the &lt;a href="https://sre.google/workbook/implementing-slos/" rel="noopener noreferrer"&gt;ratio of good interactions against the total number of interactions.&lt;/a&gt; Metrics like latency, error ratio, throughput and correctness might matter more to your users than just raw liveness. Server availability is the basic requirement, serving requests correctly and in a timely manner is what brings value.&lt;/p&gt;

&lt;p&gt;As always with complex systems, there is no silver bullet to choosing correct SLIs. In some cases we could for example tolerate some number of false positives or incomplete data as long as we get it fast whereas in other cases we could be willing to tolerate a system with notable latency or subpar throughput if we can be sure that the data we get is always correct.&lt;/p&gt;

&lt;p&gt;When you have identified your SLIs, you have to set SLOs and SLAs. As a rule of thumb, every system breaks somehow sometime. Even if you managed to build an infallible system, external forces like network congestion and hardware failure could hinder your performance. That’s why it is unrealistic to aim for 100% availability. The goals you set for your system are also not static. When developing and launching new software, you probably want to set your goals modestly for starters. As you gain more data on user interactions and loads your system experiences, you can re-evaluate the goals while you keep improving the software. This brings us to our next topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why should you care about service level in software development?
&lt;/h2&gt;

&lt;p&gt;I have said it before and I will say it again: software development is a customer service job. No commercial software system exists for its own sake. There are end users, your clients, who get something out of the software you build and your software should meet their needs constantly if you want to succeed. While user research might tell you what features you should build next, your service level tells how the features you already built are performing. With the ‘you build it, you run it’ approach becoming more prevalent within the industry, maintaining existing products increasingly becomes an exercise in the realms of software development processes rather than just an operational task. Best of all, monitoring your service levels allows you to make data-driven decisions when working on your software system.&lt;/p&gt;

&lt;p&gt;I had an interesting discussion about service levels with a colleague who is managing a software team in a product company. I asked if they had SLAs and SLOs in place and he assured me they do. I also queried about their working practices and he told me they work in two week sprints building new features, but every now and then, usually after major releases, they have so-called ‘cooldown’ sprints where they work on improving the existing code base, refactoring and erasing technical debt. I said that’s just great, fantastic even. Technical debt will stifle the productivity and development speed in the long run, so I applaud any formal efforts taken trying to fight it.&lt;/p&gt;

&lt;p&gt;Then I asked a few harder questions that revealed some room for improvement. The first one was: “What do you do if your SLOs are not being met?” He told me, that their SLOs were regarded more like key performance indicators: something they should strive for but is not actively acted upon. The second question I asked was: “How do they determine when to have a cooldown sprint”. From the answer I deduced that the decision was made somewhat at whim and when the feature backlog was not actively bursting from the seams with high priority stuff.&lt;/p&gt;

&lt;p&gt;My main gripe with these answers is that breaking an SLO should always warrant action. If there are no procedures tied to it, an SLO becomes hollow fluff. That does not mean you should treat all SLO violations as major incidents; it is as unrealistic to expect 100% availability as it is to meet SLOs 100% of the time. Failing to meet an SLO should at least cause the software development team to stop and consider if they should prioritise their work or, say, have a cooldown sprint.&lt;/p&gt;

&lt;p&gt;Enter error budgets. It took a while to get here from the title. One could say that error budgets are a tool and indicator on how much you can… muck around before you have to start finding out. But what are they and how do they work?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9lnf1wwoq0e5g4vznzh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9lnf1wwoq0e5g4vznzh.png" alt="Image description" width="506" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use error budgets in software development?
&lt;/h2&gt;

&lt;p&gt;Consider your service has some availability SLO of 95% tied to a monthly aggregated SLI (which, as a side note, allows for terribly long outages of &lt;a href="https://availability.sre.xyz/" rel="noopener noreferrer"&gt;1,52 days per month&lt;/a&gt;!). Now you are doing some top notch software development and consistently manage to achieve 97% availability (your software is uncooperative only some 43 minutes each day…). That means, you have 97%-95%=2% budget to do risky stuff that can break your software &lt;strong&gt;before&lt;/strong&gt; you are breaking your SLO. In minutes, that is an additional 28,8 on top of the current downtime.&lt;/p&gt;

&lt;p&gt;Now talking about doing risky things in the software development context might evoke thoughts about deploying very experimental features, prototypes or even untested changes (and if you considered that, it’s OK. It is called an intrusive thought and everyone has them), but these are quite extreme examples. One should bear in mind, that in software development any change carries inherent risk in complex interconnected systems. You can use error budgets to release more frequently and with more confidence. If you do canary deployments or A/B testing, you can roll out new features faster to a wider audience because your error budget gives you this leeway. You could plan and perform maintenance breaks knowing you will not violate your SLOs. I think one of the most important things is that you get a data-driven indicator which allows you to make informed choices when balancing between system reliability and innovating new features.&lt;/p&gt;

&lt;p&gt;Building on the previous example, consider you introduce a new change to your software system. Everything seems fine until in a couple of days you find out that your daily downtime has gone from 43 minutes to some 58 minutes. You realise that the feature you shipped has caused some extra instability in your system and that this single feature just made a dent to your availability: from 97% to 96%. You are still not violating the SLO, but just this new feature is now taking 50% of your error budget, leaving you with less freedom to develop new features. If your outage time would have gone over 72 minutes per day, the error budget would show that you will run out of it before the end of the month: time to immediately switch over to maintenance mode before our end users start complaining!&lt;/p&gt;

&lt;p&gt;Now you are sitting there with your error (and in a sense, development) budget cut in half, gnashing your teeth, even realising that maybe 95% SLO is not that high and some improvements must be made. What can be done before we spend the rest of our budget? That is when you should realise, that the error budget is there for you to spend! You look at your gutted budget and realise that even if you would not optimise the feature you just shipped, you could still afford 14 minutes and 24 seconds a day, or a whopping 7,2 hours per month downtime without breaking your SLOs. Encouraged, you and your team of developers and operations people (hopefully a somewhat overlapping group) can schedule a safe and informed downtime where you perform some much needed reliability improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  In conclusion
&lt;/h2&gt;

&lt;p&gt;When building and serving software you care both about evolving it, but also about its availability and reliability. Focusing too much on the former can result in software robust on features, but brittle in architecture and maintainability, eventually slowing down the development as the majority of time is spent firefighting yet another failure. Focusing too much on the latter grinds the development to a halt as the best way to ensure reliability is to avoid making changes.&lt;/p&gt;

&lt;p&gt;Using Service Level Agreements, Objectives and Indicators and Error Budgets effectively in your software development process enables you to strike the right balance between change versus stability. They define common goals to your developers and operations, promoting co-operation and data-driven decision making. They give your teams more ownership and agenda over the products they build and make it easier to react to problems before they can take effect.&lt;/p&gt;

&lt;p&gt;I also did a talk at DevOps Finland on this topic, &lt;a href="https://verifa.io/blog/service-levels-error-budgets-devops-finland-talk/" rel="noopener noreferrer"&gt;Service Levels, Error budgets, and why your dev teams should care.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>softwaredevelopment</category>
      <category>sre</category>
      <category>monitoring</category>
      <category>cicd</category>
    </item>
    <item>
      <title>How to assume an AWS IAM role from a Service Account in EKS with Terraform</title>
      <dc:creator>Verifa crew</dc:creator>
      <pubDate>Wed, 15 May 2024 12:45:15 +0000</pubDate>
      <link>https://forem.com/verifacrew/how-to-assume-an-aws-iam-role-from-a-service-account-in-eks-with-terraform-28gd</link>
      <guid>https://forem.com/verifacrew/how-to-assume-an-aws-iam-role-from-a-service-account-in-eks-with-terraform-28gd</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally posted on &lt;a href="https://verifa.io/blog/how-to-assume-an-aws-iam-role-from-a-service-account-in-eks-with-terraform/" rel="noopener noreferrer"&gt;Verifa's blog&lt;/a&gt;, written by Jacob Lärfors&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When working with AWS Elastic Kubernetes Service (EKS) clusters, your pods will likely want to interact with other AWS services and possibly other EKS clusters. In a recent project we were setting up&lt;/strong&gt; &lt;a href="https://argo-cd.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;&lt;strong&gt;ArgoCD&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;with multiple EKS clusters and our goal was to use Kubernetes Service Accounts to assume an AWS IAM role to authenticate with other EKS clusters. This led to some learning and discovering that we'd like to share with you.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When running workloads in EKS, the running pods will operate under a service account which allows us to enforce &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/" rel="noopener noreferrer"&gt;RBAC within a Kubernetes cluster&lt;/a&gt;. Well, we are not going to talk more about that in this post, we want to talk about how we can do things &lt;em&gt;outside&lt;/em&gt; of our cluster and interact with other AWS services. The &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html" rel="noopener noreferrer"&gt;AWS documentation&lt;/a&gt; for this is fairly good if you want a reference point. There is also a &lt;a href="https://www.eksworkshop.com/beginner/110_irsa/preparation/" rel="noopener noreferrer"&gt;workshop&lt;/a&gt; on this topic which might be useful to run through.&lt;/p&gt;

&lt;p&gt;In our case, we were trying to communicate across EKS clusters to allow ArgoCD to manage multiple clusters and there is a pretty mammoth &lt;a href="https://github.com/argoproj/argo-cd/issues/2347" rel="noopener noreferrer"&gt;GitHub issue&lt;/a&gt; with people struggling (and succeeding!) with this. That GitHub issue partly inspired this blog post - if it was an easy topic people would not struggle and a blog would not be necessary ;)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Plan
&lt;/h2&gt;

&lt;p&gt;The simple breakdown of what we need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An EKS cluster with an &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html" rel="noopener noreferrer"&gt;IAM OIDC provider&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A Kubernetes Service Account in the EKS cluster&lt;/li&gt;
&lt;li&gt;An AWS IAM role which we are going to &lt;em&gt;assume&lt;/em&gt; (meaning we can do whatever that role is able to do)&lt;/li&gt;
&lt;li&gt;An AWS IAM role policy that allows our Service Account (2.) to assume our AWS IAM role (3.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We won't bore you with creating an EKS cluster and an IAM OIDC provider... Pick your poison for how you want to do this... We personally use Terraform and the &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest" rel="noopener noreferrer"&gt;awesome EKS module&lt;/a&gt; that has a convenient input &lt;code&gt;enable_irsa&lt;/code&gt; which creates the OIDC provider for us.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Deployment with Terraform
&lt;/h2&gt;

&lt;p&gt;Before we create the Service Account and the IAM role we need to define the names of these as there's a bit of a cyclic dependency - the Service Account needs to know the role ARN, and the role policy needs to know the Service Account name and namespace (if we want to limit scope, which we do!).&lt;/p&gt;

&lt;h3&gt;
  
  
  Locals
&lt;/h3&gt;

&lt;p&gt;So let's define some locals to keep things simple and DRY.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# locals.tf&lt;/span&gt;

&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;k8s_service_account_name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"iam-role-test"&lt;/span&gt;
  &lt;span class="nx"&gt;k8s_service_account_namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt;

  &lt;span class="c1"&gt;# Get the EKS OIDC Issuer without https:// prefix&lt;/span&gt;
  &lt;span class="nx"&gt;eks_oidc_issuer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;trimprefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;identity&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;oidc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;issuer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"https://"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  IAM
&lt;/h3&gt;

&lt;p&gt;And let's define the Terraform code that creates the IAM role with a policy allowing the service account to assume that role.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# iam.tf&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Get the caller identity so that we can get the AWS Account ID&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_caller_identity"&lt;/span&gt; &lt;span class="s2"&gt;"current"&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Get the EKS cluster we want to target&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_eks_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"eks"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;cluster-name&amp;gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create the IAM role that will be assumed by the service account&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"iam_role_test"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"iam-role-test"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;iam_role_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create IAM policy allowing the k8s service account to assume the IAM role&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"iam_role_test"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nx"&gt;principals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Federated"&lt;/span&gt;
      &lt;span class="nx"&gt;identifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer}"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Limit the scope so that only our desired service account can assume this role&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;test&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"StringEquals"&lt;/span&gt;
      &lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${local.eks_oidc_issuer}:sub"&lt;/span&gt;
      &lt;span class="nx"&gt;values&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"system:serviceaccount:${local.k8s_service_account_namespace}:${local.k8s_service_account_name}"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Account and Pod
&lt;/h3&gt;

&lt;p&gt;Then we need to create some Kubernetes resources. When working with Terraform it can make a lot of sense to use the &lt;a href="https://registry.terraform.io/providers/hashicorp/kubernetes/latest" rel="noopener noreferrer"&gt;Terraform Kubernetes Provider&lt;/a&gt; to apply our Kubernetes resources, especially as Terraform knows the ARN of the role and we can reuse our locals. However, if you don't want yet another provider dependency in Terraform you can easily do this with vanilla Kubernetes.&lt;/p&gt;

&lt;h4&gt;
  
  
  Terraform Kubernetes
&lt;/h4&gt;

&lt;p&gt;NOTE: you will need to configure the Kubernetes provider if you want to do this via Terraform&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# kubernetes.tf&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create the Kubernetes service account which will assume the AWS IAM role&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"kubernetes_service_account"&lt;/span&gt; &lt;span class="s2"&gt;"iam_role_test"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k8s_service_account_name&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k8s_service_account_namespace&lt;/span&gt;
    &lt;span class="nx"&gt;annotations&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# This annotation is needed to tell the service account which IAM role it&lt;/span&gt;
      &lt;span class="c1"&gt;# should assume&lt;/span&gt;
      &lt;span class="s2"&gt;"eks.amazonaws.com/role-arn"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;iam_role_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Deploy Kubernetes Pod with the Service Account that can assume an AWS IAM role&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"kubernetes_pod"&lt;/span&gt; &lt;span class="s2"&gt;"iam_role_test"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"iam-role-test"&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k8s_service_account_namespace&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;service_account_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k8s_service_account_name&lt;/span&gt;
    &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"iam-role-test"&lt;/span&gt;
      &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"amazon/aws-cli:latest"&lt;/span&gt;
      &lt;span class="c1"&gt;# Sleep so that the container stays alive&lt;/span&gt;
      &lt;span class="c1"&gt;# #continuous-sleeping&lt;/span&gt;
      &lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/bin/bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"--"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;args&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"while true; do sleep 5; done;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Vanilla Kubernetes
&lt;/h4&gt;

&lt;p&gt;And now the same as above with vanilla Kubernetes YAML.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iam-role-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# TODO: replace ACCOUNT_ID with your account id&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::&amp;lt;ACCOUNT_ID&amp;gt;:role/iam-role-test&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iam-role-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iam-role-test&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iam-role-test&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amazon/aws-cli:latest&lt;/span&gt;
      &lt;span class="c1"&gt;# Sleep so that the container stays alive&lt;/span&gt;
      &lt;span class="c1"&gt;# #continuous-sleeping&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/bin/bash"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;while&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;true;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;do&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sleep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;done;"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Verify the setup
&lt;/h2&gt;

&lt;p&gt;We can describe the pod (i.e. &lt;code&gt;kubectl describe pod iam-role-test&lt;/code&gt;) and check the volumes, mounts and environment variables attached to the pod, but seeing as we way launched a pod with the AWS CLI, let's just get in there and check! Exec into the running container and execute the &lt;code&gt;aws&lt;/code&gt; CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Exec into the running pod&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-ti&lt;/span&gt; iam-role-test &lt;span class="nt"&gt;--&lt;/span&gt; /bin/bash

&lt;span class="c"&gt;# Check the AWS Security Token Service identity&lt;/span&gt;
bash-4.2# aws sts get-caller-identity
&lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"UserId"&lt;/span&gt;: &lt;span class="s2"&gt;"AROA46FON4H773JH4MPJD:botocore-session-1637837863"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Account"&lt;/span&gt;: &lt;span class="s2"&gt;"123456789101"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Arn"&lt;/span&gt;: &lt;span class="s2"&gt;"arn:aws:sts::123456789101:assumed-role/iam-role-test/botocore-session-1637837863"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Check the AWS environment variables&lt;/span&gt;
bash-4.2# &lt;span class="nb"&gt;env&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"AWS_"&lt;/span&gt;
&lt;span class="nv"&gt;AWS_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;arn:aws:iam::&amp;lt;ACCOUNT_ID&amp;gt;:role/iam-role-test
&lt;span class="nv"&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/var/run/secrets/eks.amazonaws.com/serviceaccount/token
&lt;span class="nv"&gt;AWS_DEFAULT_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eu-west-1
&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the AWS Service Token Service (STS) confirms that we have successfully assumed the role we wanted to! And if we check our environment variables we can see that these have been in injected when we started the pod, and the &lt;code&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/code&gt; file is the part that is sensitive and mounted when we run the container.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative Approaches
&lt;/h2&gt;

&lt;p&gt;If we remove the service account from the pod and use the default service account (which exists per namespace), we can see who AWS STS thinks we are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Exec into the running pod&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-ti&lt;/span&gt; iam-role-test &lt;span class="nt"&gt;--&lt;/span&gt; /bin/bash

&lt;span class="c"&gt;# Check the AWS Security Token Service identity&lt;/span&gt;
bash-4.2# aws sts get-caller-identity
&lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"UserId"&lt;/span&gt;: &lt;span class="s2"&gt;"AROA46FON4H72Q3SPL6SC:i-0d0aff479cf2e2405"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Account"&lt;/span&gt;: &lt;span class="s2"&gt;"123456789101"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Arn"&lt;/span&gt;: &lt;span class="s2"&gt;"arn:aws:sts::&amp;lt;ACCOUNT_ID&amp;gt;:assumed-role/&amp;lt;cluster-name&amp;gt;XXXXXXXXX/i-&amp;lt;node-instance-id&amp;gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Check the AWS environment variables&lt;/span&gt;
bash-4.2# &lt;span class="nb"&gt;env&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"AWS_"&lt;/span&gt;
&lt;span class="c"&gt;# ... it's empty!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Having created the cluster with the EKS Terraform module, it has created a role for our autoscaling node group... and what we could do is allow this role to assume another role which grants us the access that we might need...&lt;/p&gt;

&lt;p&gt;However, I find managing Service Accounts in Kubernetes much easier than assigning roles to node groups and it seems this is the recommended approach based on searches online. However I thought it worth mentioning here anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Now that you can run a Pod with a service account that can assume an IAM role, just give that IAM role the permissions it needs to do what you want.&lt;/p&gt;

&lt;p&gt;In our case, we needed the role to access another EKS cluster and so the role does not need &lt;em&gt;any&lt;/em&gt; &lt;em&gt;more policies&lt;/em&gt; in AWS, but it needs to be added to the &lt;code&gt;aws-auth&lt;/code&gt; &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html" rel="noopener noreferrer"&gt;ConfigMap that controls the RBAC&lt;/a&gt; of the target cluster. But let's not delve into that here, there's already a ton of posts around that :)&lt;/p&gt;

&lt;p&gt;To add a cluster in ArgoCD you can either use the &lt;code&gt;argocd&lt;/code&gt; CLI, or do it with Kubernetes Secrets. Of course we did it with Kubernetes Secrets, and of course we did that with Terraform after we create the cluster!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Get the target cluster details to use in our secret&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_eks_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"target"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;cluster_name&amp;gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# Create a secret that represents a new cluster in ArgoCD.&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# ArgoCD will use the provided config to connect and configure the target cluster&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"kubernetes_secret"&lt;/span&gt; &lt;span class="s2"&gt;"cluster"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"argocd-cluster-name"&lt;/span&gt;
    &lt;span class="nx"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"argocd"&lt;/span&gt;
    &lt;span class="nx"&gt;labels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# Tell ArgoCD that this secret defines a new cluster&lt;/span&gt;
      &lt;span class="s2"&gt;"argocd.argoproj.io/secret-type"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cluster"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# Just a display name&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpoint&lt;/span&gt;
    &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;awsAuthConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;clusterName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="c1"&gt;# NOTE: roleARN not needed as ArgoCD will already assume the role that&lt;/span&gt;
        &lt;span class="c1"&gt;# has access to the target cluster (added to aws-auth ConfigMap)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;tlsClientConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;insecure&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="nx"&gt;caData&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_eks_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;certificate_authority&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Opaque"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And boom! We can now create EKS clusters with Terraform and register them with ArgoCD using our Service Account that can assume an AWS IAM Role that is added to the target cluster RBAC... Sometimes it's confusing just to write this stuff, but happy Terraform, Kubernetes and AWS'ing (and GitOps'ing with ArgoCD perhaps)!&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful Links
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;AWS EKS IAM Roles for Service Accounts (IRSA): &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Workshop on IAM Roles for Service Accounts (IRSA): &lt;a href="https://www.eksworkshop.com/beginner/110_irsa/preparation/" rel="noopener noreferrer"&gt;https://www.eksworkshop.com/beginner/110_irsa/preparation/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create AWS IAM OIDC Provider for EKS: &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Managing users or IAM roles in EKS: &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS EKS Terraform Module: &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest" rel="noopener noreferrer"&gt;https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ArgoCD: &lt;a href="https://argo-cd.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;https://argo-cd.readthedocs.io/en/stable/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Related GitHub issue on ArgoCD: &lt;a href="https://github.com/argoproj/argo-cd/issues/2347" rel="noopener noreferrer"&gt;https://github.com/argoproj/argo-cd/issues/2347&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>devops</category>
      <category>terraform</category>
    </item>
    <item>
      <title>How to Debug Failing Build Agent Pods in Kubernetes-enabled Jenkins</title>
      <dc:creator>Verifa crew</dc:creator>
      <pubDate>Fri, 03 May 2024 08:59:31 +0000</pubDate>
      <link>https://forem.com/verifa/how-to-debug-failing-build-agent-pods-in-kubernetes-enabled-jenkins-2eae</link>
      <guid>https://forem.com/verifa/how-to-debug-failing-build-agent-pods-in-kubernetes-enabled-jenkins-2eae</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally posted on &lt;a href="https://verifa.io/blog/how-to-debug-failing-build-agent-pods-in-kubernetes-enabled-jenkins/"&gt;Verifa's blog&lt;/a&gt;, written by Andreas Lärfors&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running Jenkins in a Kubernetes cluster is a great way to enable auto-scaling of the infrastructure hosting your Jenkins Build Agents. However, when the Build Agent Pods fail to start correctly, it can be difficult to troubleshoot. In this short article we look at some simple things you can do to figure out what's going wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our Jenkins system is deployed on Kubernetes, with both Jenkins Master and the Jenkins Build Agents running as Pods. We recently had an issue where Build Agent Pods were not starting correctly. Because of our auto-scaling setup, this led to hundreds or thousands of Build Agent Pods being created in the cluster, and the infrastructure (Nodes) scaling up accordingly. This was first discovered when we reached our Cost Cap limit on our EKS cluster. Cleaning up required the deletion of around 7,000 Pods.&lt;/p&gt;

&lt;p&gt;It was clear that we had to identify the cause of this issue. The cause itself is not that interesting in itself, but the method of debugging is, and might help you if you are facing similar issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 0: Inspect your Jenkins Build Logs
&lt;/h3&gt;

&lt;p&gt;You've probably already done this, but you should of course start by looking at the build logs in Jenkins. Here's what ours were saying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;15:58:40 Started by user andreasverifa

15:58:40 Running in Durability level: MAX_SURVIVABILITY

15:58:40 [Pipeline] Start of Pipeline

15:58:41 [Pipeline] podTemplate

15:58:41 [Pipeline] {

15:58:41 [Pipeline] node

15:58:47 Created Pod: kubernetes staging/example-projects-andreas-test-1-x34f1-qzgqg-3cggn

15:58:56 Still waiting to schedule task

15:58:56 All nodes of label 'Example-Projects_andreas-test_1-x34f1' are offline

15:58:57 Created Pod: kubernetes staging/example-projects-andreas-test-1-x34f1-qzgqg-1yvt8

15:59:07 Created Pod: kubernetes staging/example-projects-andreas-test-1-x34f1-qzgqg-l3t8l

15:59:17 Created Pod: kubernetes staging/example-projects-andreas-test-1-x34f1-qzgqg-w784y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oh dear, a new Pod created every 10 seconds. It's easy to see why our cluster was filling up with failed Pods.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inspect your Pods
&lt;/h3&gt;

&lt;p&gt;So why are the Pods failing to start? Let's (kubectl) describe them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl describe pod example-projects-andreas-test-...

Containers:
  jnlp:
    Container ID:   ...
    Image:          jenkins/inbound-agent:4.3-4-jdk11
    Image ID:       ...
    Port:           &amp;lt;none&amp;gt;
    Host Port:      &amp;lt;none&amp;gt;
    State:          Failed
      Started:      ...
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  256Mi
    Environment:
      ...
    Mounts:
      ...
  build:
    Container ID:   ...
    Image:          ...
    Image ID:       ...
    Port:           &amp;lt;none&amp;gt;
    Host Port:      &amp;lt;none&amp;gt;
    State:          Running
      Started:      ...
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        600m
      memory:     768Mi
    Environment:  &amp;lt;none&amp;gt;
    Mounts:
      ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our Pipeline defines a Pod that contains two Containers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"jnlp" - the container which runs the Jenkins Agent&lt;/li&gt;
&lt;li&gt;"build" - the container in which we build our software and run the pipeline steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the description above we can see that the jnlp container has Failed, so let's get the logs and investigate why:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl logs example-projects-andreas-test-... jnlp

INFO: Protocol JNLP4-connect encountered an unexpected exception
java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name:
...
Jun 18, 2021 12:04:41 PM hudson.remoting.jnlp.Main&lt;span class="nv"&gt;$CuiListener&lt;/span&gt; error
SEVERE: The server rejected the connection: None of the protocols were accepted
java.lang.Exception: The server rejected the connection: None of the protocols were accepted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From these logs we can see that the connection from the jnlp Agent Container to the Jenkins Master failed. And the reason for the failure is that the server (Master) rejected the connection.&lt;/p&gt;

&lt;p&gt;The next step in this process is in hindsight unnecessary, but it documents our troubleshooting process and might be useful for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Manually Override Pods (Optional)
&lt;/h3&gt;

&lt;p&gt;We've established that the jnlp Container fails to connect to the Jenkins Master. One of the issues with containerized systems is that we often cannot debug what is going wrong on startup without modifying the container. That's what we're going to do now.&lt;/p&gt;

&lt;p&gt;Our Jenkins build containers are defined as inline-YAML in our pipeline scripts. Let's take an example pipeline and make the jnlp container sleep for 10000 seconds instead of completing the normal startup routine (where the connection would usually fail):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight groovy"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;kubernetes&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;yaml&lt;/span&gt; &lt;span class="s1"&gt;'''
apiVersion: v1
kind: Pod
metadata:
  name: andreas-test
spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:4.3-4-jdk11
    command: ["sleep", "10000"]
'''&lt;/span&gt;

        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;stages&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Say Hello World'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"jnlp"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Hello World!"&lt;/span&gt;
                &lt;span class="o"&gt;}&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On startup, the jnlp container does not register with the Jenkins Master (as we've overridden the default startup/entrypoint with the &lt;strong&gt;command&lt;/strong&gt; value). So the Jenkins Job finds no available Nodes and continues spawning Pods... So let's cancel the Jenkins Job.&lt;/p&gt;

&lt;p&gt;But we are now left with at least one Pod which is in state "Running" instead of "Failed". This allows us to debug the connection process.&lt;br&gt;
Let's exec into the running container and try running the entrypoint. We can determine the entrypoint by looking at the Dockerfile for this image, for example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ENTRYPOINT ["/usr/local/bin/jenkins-agent"]&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; example-projects-andreas-test-... &lt;span class="nt"&gt;--&lt;/span&gt; /bin/sh

&lt;span class="nv"&gt;$ &lt;/span&gt;sh &lt;span class="nt"&gt;-x&lt;/span&gt; /usr/local/bin/jenkins-agent
...
&lt;span class="nb"&gt;exec&lt;/span&gt; /usr/local/openjdk-11/bin/java &lt;span class="nt"&gt;-cp&lt;/span&gt; /usr/share/jenkins/agent.jar hudson.remoting.jnlp.Main &lt;span class="nt"&gt;-headless&lt;/span&gt; &lt;span class="nt"&gt;-tunnel&lt;/span&gt; jenkins-agent:50000 &lt;span class="nt"&gt;-url&lt;/span&gt; http://jenkins:8080/ &lt;span class="nt"&gt;-workDir&lt;/span&gt; /home/jenkins/agent ... example-projects-andreas-test-...
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ok, there's our connect command! We can now run this manually to test the connection.&lt;/p&gt;

&lt;p&gt;We have seen that the server is refusing the connection because of "Unknown client name", which suggests the server is not expecting the client (jnlp Agent) to connect.&lt;/p&gt;

&lt;p&gt;We manually added a Node in the Jenkins GUI with the name of our Pod. Once this was done, running the entrypoint/Java connect command succeeded in connecting the jnlp Agent to the Jenkins Master!&lt;/p&gt;

&lt;p&gt;So we have now deduced that the Agent is failing to connect to the Master because the Master is not expecting the Agent. This suggests that the Kubernetes plugin is failing to "register" the Pod with the Master.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Inspect Kubernetes Plugin Logs
&lt;/h3&gt;

&lt;p&gt;We probably should have jumped straight from Step 1 to Step 3. However, Step 2 helped confirm that there are no external causes for why the Agent is failing to connect to the Master. It is just conditions in Jenkins which are causing the issue.&lt;/p&gt;

&lt;p&gt;Suspecting the Kubernetes plugin, we want to see the logs of the Kubernetes plugin. The Jenkins plugin page for the Kubernetes plugin contains the following information:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For more detail, configure a new Jenkins log recorder for &lt;strong&gt;org.csanchez.jenkins.plugins.kubernetes&lt;/strong&gt; at &lt;strong&gt;ALL&lt;/strong&gt; level.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once done, we could see the following output in the new log recorder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Created Pod: kubernetes staging/example-projects-andreas-test-...
... WARNING org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
Error in provisioning; agent=KubernetesSlave name: example-projects-andreas-test-...
java.lang.NoSuchMethodError: 'java.lang.Object io.fabric8.kubernetes.client.dsl.PodResource.watch(java.lang.Object)'
  at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:170)
  at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:294)
  at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
  at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:829)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aha! "&lt;strong&gt;NoSuchMethodError&lt;/strong&gt;" is typical of version incompatibilities between plugin dependencies!&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Skipping over some details, we concluded that the error is caused by a version conflict; the &lt;a href="https://plugins.jenkins.io/kubernetes"&gt;"kubernetes" plugin&lt;/a&gt;has a dependency on the &lt;a href="https://plugins.jenkins.io/kubernetes-client-api/"&gt;"kubernetes-client-api" plugin&lt;/a&gt;. We are using fixed version numbers for our plugins, and our kubernetes plugin version was fixed at 1.29.2. Looking at the pom.xml for the source of this version, we could see that the dependency of the kubernetes-client-api plugin was for &lt;strong&gt;version 4.13.2-1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Our Jenkins image had recently been updated and this seemed to have updated all the non-fixed versions of plugins, i.e. the transitive dependencies. Upon inspection, we were running &lt;strong&gt;version 5.4.1 of the kubernetes-client-api&lt;/strong&gt; plugin!&lt;/p&gt;

&lt;p&gt;The quick solution was to upgrade the kubernetes plugin to the latest version, 1.30.0. This immediately resolved the issue and jobs began running as normal again.&lt;/p&gt;

&lt;p&gt;The long-term solution to this problem, to prevent it from occurring again, is to build our own tagged &amp;amp; versioned Jenkins image. Each change then means incrementing the tag/version, and so we can more easily roll back changes. This is, in other words, a "complete" Jenkins image, which is bundled with the specific plugin versions we want to use.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>containers</category>
      <category>jenkins</category>
    </item>
  </channel>
</rss>
