<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Derek Berger</title>
    <description>The latest articles on Forem by Derek Berger (@derekberger).</description>
    <link>https://forem.com/derekberger</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1198543%2Fc921e844-6606-447c-b26b-0bba2391fe1c.jpeg</url>
      <title>Forem: Derek Berger</title>
      <link>https://forem.com/derekberger</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/derekberger"/>
    <language>en</language>
    <item>
      <title>Swapping out microservices gracefully with the help of AWS</title>
      <dc:creator>Derek Berger</dc:creator>
      <pubDate>Mon, 07 Oct 2024 15:10:58 +0000</pubDate>
      <link>https://forem.com/derekberger/swapping-out-microservices-gracefully-with-the-help-of-aws-4n7d</link>
      <guid>https://forem.com/derekberger/swapping-out-microservices-gracefully-with-the-help-of-aws-4n7d</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html" rel="noopener noreferrer"&gt;AWS load balancer controller&lt;/a&gt; is a key enabler for running services in &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html" rel="noopener noreferrer"&gt;Amazon EKS&lt;/a&gt;, using AWS APIs to provision load balancer resources.  But this controller can help with more than just everyday management of load balancers.  For instance, it greatly simplified how my team released APIs during a major project to rewrite our services in a new language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Our application follows a &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.6/how-it-works/" rel="noopener noreferrer"&gt;common pattern&lt;/a&gt; for running microservices in EKS.  Outside requests come into our clusters through application load balancers (ALBs).  The ALBs’ &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html" rel="noopener noreferrer"&gt;target groups&lt;/a&gt; forward requests according to &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-listeners.html#path-conditions" rel="noopener noreferrer"&gt;path-based rules&lt;/a&gt; that correspond to services’ endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1g0gxjh0nnuqpccxooo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1g0gxjh0nnuqpccxooo.png" alt="Image description" width="502" height="336"&gt;&lt;/a&gt;ALBs fronting EKS services&lt;/p&gt;

&lt;p&gt;The load balancer controller manages our ALBs based on Ingress resources defined in services’ Helm manifests.  We keep these manifests in our version control system, and deploy them through pull requests. &lt;/p&gt;

&lt;p&gt;Here’s an abbreviated example from one of our services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt; &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;alb&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;alb.ingress.kubernetes.io/group.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;login&lt;/span&gt;
    &lt;span class="na"&gt;alb.ingress.kubernetes.io/subnets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;subnet-a,subnet-b,subnet-c'&lt;/span&gt;
&lt;span class="na"&gt;alb.ingress.kubernetes.io/healthcheck-path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/help-i-am-alive'&lt;/span&gt;
    &lt;span class="na"&gt;alb.ingress.kubernetes.io/success-codes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;200,404'&lt;/span&gt;
&lt;span class="na"&gt;alb.ingress.kubernetes.io/target-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ip'&lt;/span&gt;
&lt;span class="na"&gt;alb.ingress.kubernetes.io/backend-protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HTTPS'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an Ingress is deployed, the controller provisions the ALB, applies the path-based rules, and creates the target group that points to the service’s pods. It handles additional behaviors for  &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/guide/ingress/annotations/#certificate-arn" rel="noopener noreferrer"&gt;certificates&lt;/a&gt;, &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/guide/ingress/annotations/#load-balancer-attributes" rel="noopener noreferrer"&gt;ELB access logs&lt;/a&gt;, &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.5/guide/ingress/annotations/#healthcheck-path" rel="noopener noreferrer"&gt;health checks&lt;/a&gt; and more through annotations.  If we deploy updates to an ingress, the controller keeps the ALB in sync with its definition.&lt;/p&gt;

&lt;h2&gt;
  
  
  In with the new (but not quite out with the old)
&lt;/h2&gt;

&lt;p&gt;During the project to rewrite our microservices, we continued to define service and ingress resources in Helm manifests.  A new challenge would be to run old and new services side by side while we incrementally rewrote and released individual APIs.  We wanted requests for rewritten APIs to be forwarded to the new service, while requests for all other APIs remain forwarded to its older counterpart.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/ingress/annotations/#ingressgroup" rel="noopener noreferrer"&gt;Ingress Group&lt;/a&gt; feature made this possible in part by consolidating old and new Ingress resources under the same ALB with the original &lt;code&gt;group.name&lt;/code&gt; annotation.  When the team released an API, we just added a &lt;code&gt;pathType: Exact&lt;/code&gt; rule for that endpoint and deployed its ingress.&lt;/p&gt;

&lt;p&gt;Here is an excerpt from a new service’s ingress, with some &lt;code&gt;pathType: Exact&lt;/code&gt; path-based rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;alb&lt;/span&gt;
 &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;alb.ingress.kubernetes.io/group.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;login&lt;/span&gt;
&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/login/path1'&lt;/span&gt;
   &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exact&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/login/path2'&lt;/span&gt;
   &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exact&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here again is the original service’s ingress, which has a single &lt;code&gt;pathType: Prefix&lt;/code&gt; rule, catching anything that does not match the new service’s path-based rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;alb&lt;/span&gt;
 &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;alb.ingress.kubernetes.io/group.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;login&lt;/span&gt;
&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/login/'&lt;/span&gt;
   &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because we defined both ingresses with &lt;code&gt;alb.ingress.kubernetes.io/group.name: login&lt;/code&gt;, the controller would apply both sets of rules to the original ALB, letting the new service steal requests, or so we hoped, from the original service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not so fast
&lt;/h2&gt;

&lt;p&gt;The problem with this was that the &lt;code&gt;pathType: Prefix&lt;/code&gt; would match &lt;em&gt;every&lt;/em&gt; request to &lt;code&gt;/api/login/&lt;/code&gt;, including &lt;code&gt;/api/login/path1&lt;/code&gt; and &lt;code&gt;/api/login/path2&lt;/code&gt;. We had no guarantee that requests for those would be forwarded to the new service.&lt;/p&gt;

&lt;p&gt;To solve this, we could have just replaced the &lt;code&gt;Prefix&lt;/code&gt; path with &lt;code&gt;Exact&lt;/code&gt; paths for all the APIs we still wanted forwarded to the old service.  That would have spared us from creating a new ALB, but would add complexity and friction to our releases, requiring changes to two ingresses with every release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Help from AWS
&lt;/h2&gt;

&lt;p&gt;We found a more elegant solution with a subtle but powerful controller feature called &lt;a href="https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/ingress/annotations/#group.order" rel="noopener noreferrer"&gt;group.order&lt;/a&gt;.  By assigning a smaller order number to the new service, group ordering ensured the controller would find a match for its path rules first.&lt;/p&gt;

&lt;p&gt;Here's the new service's Ingress again, now with &lt;code&gt;alb.ingress.kubernetes.io/group.order&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;alb&lt;/span&gt;
 &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;alb.ingress.kubernetes.io/group.name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;login&lt;/span&gt;
   &lt;span class="na"&gt;alb.ingress.kubernetes.io/group.order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/login/rewritten-path1'&lt;/span&gt;
   &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exact&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/api/login/rewritten/path2'&lt;/span&gt;
   &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exact&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, we could set a higher &lt;code&gt;group.order&lt;/code&gt; value for the original Ingress and leave it alone until all endpoints were transitioned. Then we just replaced all the pathType: Exact rules in the new service’s manifest with a &lt;code&gt;pathType: Prefix&lt;/code&gt; rule and deleted the old service.  The same approach worked for all of our services with Ingress resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The AWS load balancer controller's &lt;code&gt;group.order&lt;/code&gt; feature has made it trivial for my team release new APIs. The experience reminds me that maintaining infrastructure as code provides benefits beyond everyday management of infrastructure. Features like &lt;code&gt;group.order&lt;/code&gt; allow engineers to spend more time on features and less less time managing the infrastructure that they run on.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cloudskills</category>
      <category>aws</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Taking Your Releases Into Overdrive with GitHub Actions</title>
      <dc:creator>Derek Berger</dc:creator>
      <pubDate>Wed, 14 Aug 2024 20:04:36 +0000</pubDate>
      <link>https://forem.com/devsatasurion/taking-your-releases-into-overdrive-with-github-actions-3edj</link>
      <guid>https://forem.com/devsatasurion/taking-your-releases-into-overdrive-with-github-actions-3edj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;GitHub Actions’ seamless integration with version control simplifies creating and executing operations and infrastructure workflows. Two key features of Actions for building efficient workflows are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/en/actions/creating-actions/creating-a-composite-action" rel="noopener noreferrer"&gt;Composite actions&lt;/a&gt;.  Composite actions let you create combinations of steps that you can reuse across different kinds of workflows.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/defining-outputs-for-jobs" rel="noopener noreferrer"&gt;Job outputs&lt;/a&gt;.  Outputs make values derived from one job's steps available to downstream jobs' steps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, I’ll share how Actions’ integration with version control, composite actions, and job outputs helped my team advance our use of Actions to automate production deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mostly manual automation
&lt;/h2&gt;

&lt;p&gt;The workflow that &lt;a href="https://dev.to/devsatasurion/optimizing-devops-automation-in-the-aws-cloud-with-github-actions-42o8"&gt;my team built for disaster recovery&lt;/a&gt; uses a composite action that cuts off DNS traffic to the impaired region. It lets us execute failover in just one step, and the same workflow can be used for any operation that requires rerouting production traffic for an extended time.  We just manually trigger it exactly the same way we would when failing over, specifying which DNS to change and which region to cut off.  &lt;/p&gt;

&lt;p&gt;While that can be helpful for major changes like infrastructure upgrades, major changes are not common.  More often we simply deploy application or configuration updates via pull requests, following &lt;a href="https://github.blog/enterprise-software/devops/applying-gitops-principles-to-your-operations" rel="noopener noreferrer"&gt;GitOps practices&lt;/a&gt;.  To keep our uptime as high as possible and avoid disrupting customers, we tried using the failover workflow to reroute DNS during these everyday changes.&lt;/p&gt;

&lt;p&gt;That at least rerouted traffic how we wanted, but it required multiple manual steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger failover workflow to change DNS.&lt;/li&gt;
&lt;li&gt;Merge pull request to deploy application change.&lt;/li&gt;
&lt;li&gt;Verify pods roll out.&lt;/li&gt;
&lt;li&gt;Trigger failover workflow to restore DNS.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Considering the workflow required thinking about which DNS needed to change, and manually selecting the right DNS and region options, the procedure took more like five steps. And despite automation for applying DNS changes and verifying pods, it became a drudgery to deploy anything.  &lt;/p&gt;

&lt;p&gt;Even worse, whenever we wanted to apply the change to multiple clusters, we’d have to repeat every step multiple times. &lt;/p&gt;

&lt;p&gt;This was counterproductive, inefficient, and discouraged us from deploying as frequently as we could have.&lt;/p&gt;

&lt;h2&gt;
  
  
  Eradicating the toil
&lt;/h2&gt;

&lt;p&gt;What we needed was a workflow to deploy everyday changes in one step, not &lt;em&gt;four or five&lt;/em&gt;, so we set out to build a new workflow that uses the same composite actions as the failover workflow, but makes better overall use of GitHub Actions' automation capabilities.&lt;/p&gt;

&lt;p&gt;First, since all production changes are deployed via pull request, instead of &lt;code&gt;workflow_dispatch&lt;/code&gt; we use &lt;a href="https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull_request" rel="noopener noreferrer"&gt;pull request triggers&lt;/a&gt; based on branch, path, and event type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Production deployment&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
   &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/region1/releases/namespace1/"&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/region1/releases/namespace2/"&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/region2/releases/namespace1/"&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/region2/releases/namespace2/"&lt;/span&gt;
   &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;closed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We broke the new workflow down into 4 jobs, which correspond to the old manual steps.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Determine what changed.&lt;/li&gt;
&lt;li&gt;Make DNS change to stop traffic.&lt;/li&gt;
&lt;li&gt;Verify services’ pods roll out successfully&lt;/li&gt;
&lt;li&gt;Make DNS change to restore traffic.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the first job, we start by ensuring the workflow only proceeds upon &lt;em&gt;merged&lt;/em&gt; pull requests, and not just any pull request closed event, by adding this &lt;code&gt;if&lt;/code&gt; condition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;job1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(github.event_name == 'pull_request') &amp;amp;&amp;amp; (github.event.pull_request.merged == &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unlike the failover workflow, the new workflow must automatically determine the right region and DNS to cut off. The &lt;a href="https://github.com/dorny/paths-filter" rel="noopener noreferrer"&gt;dorny paths-filter action&lt;/a&gt; was our help for this.&lt;/p&gt;

&lt;p&gt;We first use it to determine region based on change path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Determine region&lt;/span&gt;
       &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dorny/paths-filter@v3&lt;/span&gt;
       &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;region-filter&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
           &lt;span class="s"&gt;region-1: 'path/to/cluster/account/region1/**'&lt;/span&gt;
           &lt;span class="s"&gt;region-2: 'path/to/cluster/account/region2/**'&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we have a filter for DNS, which is important to get right because the DNS we want to reroute will vary depending on which services change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Determine DNS to change&lt;/span&gt;
       &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dorny/paths-filter@v3&lt;/span&gt;
       &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dns-filter&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
           &lt;span class="s"&gt;dns1: &lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/1'&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/2'&lt;/span&gt;
           &lt;span class="s"&gt;dns2:&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/3'&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/4'&lt;/span&gt;
           &lt;span class="s"&gt;dns3:&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/1'&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/2'&lt;/span&gt;
           &lt;span class="s"&gt;dns4:&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/1'&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/2'&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/5'&lt;/span&gt;
             &lt;span class="s"&gt;- 'path/to/services/6'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final filter determines which pods to validate, also based on which services have changed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Determine services&lt;/span&gt;
       &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dorny/paths-filter@v3&lt;/span&gt;
       &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service-filter&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
           &lt;span class="s"&gt;service-1:&lt;/span&gt;
             &lt;span class="s"&gt;- path/to/services/1&lt;/span&gt;
           &lt;span class="s"&gt;service-2:&lt;/span&gt;
             &lt;span class="s"&gt;- path/to/services/2&lt;/span&gt;
           &lt;span class="s"&gt;service-3:&lt;/span&gt;
             &lt;span class="s"&gt;- path/to/services/3&lt;/span&gt;
           &lt;span class="s"&gt;service-4:&lt;/span&gt;
             &lt;span class="s"&gt;- path/to/services/4 &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, &lt;a href="https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/defining-outputs-for-jobs" rel="noopener noreferrer"&gt;job outputs&lt;/a&gt; are what makes all these filtered values available to downstream jobs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;dns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.dns-filter.outputs.changes }}&lt;/span&gt;
     &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.service-filter.outputs.changes }}&lt;/span&gt;
     &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.region-filter.outputs.region-1 == 'true' &amp;amp;&amp;amp; 'region-1' || 'region-2' }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second job applies the DNS change, but includes a condition to only proceed if it finds values in the outputs for DNS and region.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt; &lt;span class="na"&gt;job2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;job1&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
   &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.job1.outputs.dns != '[]' &amp;amp;&amp;amp; needs.job1.outputs.dns != '' &amp;amp;&amp;amp; needs.job1.outputs.region != '[]' &amp;amp;&amp;amp; needs.job1.outputs.region != '' }}&lt;/span&gt;
   &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;dns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJSON(needs.job1.outputs.dns) }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It calls the same composite action as the failover procedure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Stop traffic to ${{ needs.job1.outputs.region }} ${{ matrix.dns }}&lt;/span&gt;
       &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./.github/actions/dns-change'&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;dns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.dns }}&lt;/span&gt;
         &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stop&lt;/span&gt;
         &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.job1.outputs.region }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The third job verifies pods roll out, applying the values from the &lt;code&gt;services&lt;/code&gt; output to a matrix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;job3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;job1&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;job2&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
   &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.job1.outputs.services != '[]' &amp;amp;&amp;amp; needs.job1.outputs.services != '' }}&lt;/span&gt;
   &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Validate pods&lt;/span&gt;
   &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJSON(needs.job1.outputs.services) }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And calling the composite action to execute the pod validation steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="s"&gt;- name&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;validate deployments and pods&lt;/span&gt;
       &lt;span class="s"&gt;uses&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./.github/actions/pod-validation'&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;deployment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.service }}&lt;/span&gt;
         &lt;span class="na"&gt;cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.job1.outputs.region == region-1' &amp;amp;&amp;amp; '1' || '2' }}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If every pod rolls out successfully, the workflow proceeds to restore the traffic cut off in job 2.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt; &lt;span class="na"&gt;job4&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restore traffic to ${{ needs.determine-changes.outputs.region }} in ${{ matrix.dns }}&lt;/span&gt;
   &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;dns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJSON(needs.job1.outputs.dns) }}&lt;/span&gt;
   &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restore traffic to ${{ needs.job1.outputs.region }} ${{ matrix.dns }}&lt;/span&gt;
       &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./.github/actions/dns-change'&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;dns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.dns }}&lt;/span&gt;
         &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start&lt;/span&gt;
         &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ needs.job1.outputs.region }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any step fails, the workflow fails and traffic remains routed away from the failed cluster while the team investigates.  If necessary, we can open a pull request to revert the change. Merging it will trigger the workflow again, effectively validating the rollback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Success!
&lt;/h2&gt;

&lt;p&gt;By combining the composite actions we created for the failover workflow with path filters and job outputs into a new workflow, deployments now take one manual step: merging a pull request.&lt;/p&gt;

&lt;p&gt;The workflow takes over from there, automatically making the proper DNS changes, verifying impacted pods roll out, restoring DNS traffic, and notifying us of results.&lt;/p&gt;

&lt;p&gt;Unburdened by multiple manual steps, our team deployed 32 changes to production in the first month of using the workflow. In the previous month, we deployed 17 changes.  The results so far have been promising, and we'll continue looking for ways to make our release practices even better with Actions.&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>automation</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Optimizing DevOps automation in the AWS cloud with GitHub Actions</title>
      <dc:creator>Derek Berger</dc:creator>
      <pubDate>Fri, 09 Feb 2024 13:26:35 +0000</pubDate>
      <link>https://forem.com/devsatasurion/optimizing-devops-automation-in-the-aws-cloud-with-github-actions-42o8</link>
      <guid>https://forem.com/devsatasurion/optimizing-devops-automation-in-the-aws-cloud-with-github-actions-42o8</guid>
      <description>&lt;p&gt;DevOps is not just about automation, but automation &lt;em&gt;is&lt;/em&gt; core to an effective DevOps practice, driving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Greater consistency and repeatability.&lt;/li&gt;
&lt;li&gt;Faster and more efficient workflows.&lt;/li&gt;
&lt;li&gt;Increased traceability and visibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AWS CLI is an essential automation tool for DevOps tasks in AWS cloud environments.  In this article you'll see its power for automating DevOps tasks, and how it becomes even more powerful when combined with GitHub Actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Power of the AWS CLI
&lt;/h2&gt;

&lt;p&gt;As I’ve written &lt;a href="https://dev.to/devsatasurion/building-a-multi-region-highly-available-identity-provider-with-the-aws-cloud-and-ory-hydra-5c5e"&gt;previously&lt;/a&gt;, my team at Asurion handles disaster recovery/failover with the &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/creating-disaster-recovery-mechanisms-using-amazon-route-53/" rel="noopener noreferrer"&gt;Secondary Takes Over Primary (STOP)&lt;/a&gt; pattern, following a typical STOP implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DNS records are associated with Route 53 health checks, which are associated with specific S3 objects. &lt;/li&gt;
&lt;li&gt;Health check status is controlled by uploading (or deleting) its associated S3 object.&lt;/li&gt;
&lt;li&gt;A failing health check cuts off requests to its DNS target. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The AWS CLI is the obvious tool for executing the failover procedure. Its commands to trigger DNS changes in accordance with the STOP failover pattern are &lt;code&gt;put-object&lt;/code&gt; and &lt;code&gt;delete-object&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To upload the object and cut off traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &amp;lt;bucket-name&amp;gt; &lt;span class="nt"&gt;--key&lt;/span&gt; &amp;lt;object-name&amp;gt; &lt;span class="nt"&gt;--body&lt;/span&gt; &amp;lt;file-to-upload&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To reverse the operation and restore DNS traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3api delete-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &amp;lt;your-bucket-name&amp;gt; &lt;span class="nt"&gt;--key&lt;/span&gt; &amp;lt;your-object-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the response from the S3 API is &lt;code&gt;OK&lt;/code&gt;, it’s probably safe to assume that the health check change has been triggered. But Route 53 also has a rich API, which lets us extend the script to verify the health check status has flipped.&lt;/p&gt;

&lt;p&gt;The commands for Route 53 aren’t as simple as S3, but the steps are straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Query for the relevant health check ID with the &lt;code&gt;aws route53 list-health-checks&lt;/code&gt; command.&lt;/li&gt;
&lt;li&gt;Get all &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-determining-health-of-endpoints.html#dns-failover-determining-health-of-endpoints-monitor-endpoint" rel="noopener noreferrer"&gt;Route 53 health checkers&lt;/a&gt; for that health check ID with &lt;code&gt;aws route53 get-health-check-status&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Repeat step 2 until every health checker has the expected status, &lt;code&gt;HEALTHY&lt;/code&gt; or &lt;code&gt;UNHEALTHY&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach works anytime we want to stop traffic to an endpoint, whether for disaster recovery, testing major infrastructure changes without disrupting users, or regular patching cycles.&lt;/p&gt;

&lt;p&gt;Automating these steps in a shell script checked into version control alleviates drudgery and provides consistency and repeatability for our failover procedure.  The procedures become even more efficient and transparent when executed with GitHub Actions.   &lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Actions in action
&lt;/h2&gt;

&lt;p&gt;What makes GitHub Actions unique, compared to other platforms for building, testing, and deploying infrastructure, is its seamless integration with version control. This simplifies triggering jobs to execute tasks, like our scripted failover procedure. &lt;/p&gt;

&lt;p&gt;One very powerful feature of GitHub Actions is its &lt;a href="https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs" rel="noopener noreferrer"&gt;matrix strategy&lt;/a&gt;, which trivializes executing the same procedure with different configurations. In our scenario, multiple health check changes could be triggered and verified, in parallel, with a single job.&lt;/p&gt;

&lt;p&gt;First, following the &lt;a href="https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch" rel="noopener noreferrer"&gt;example from GitHub's own documentation&lt;/a&gt;, the workflow is defined with inputs and &lt;code&gt;workflow_dispatch&lt;/code&gt;, which let the job be triggered with the GitHub API, CLI, or browser interface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS CLI Execution&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;config1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Config&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;boolean'&lt;/span&gt;
        &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="na"&gt;config2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Config&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2'&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;boolean'&lt;/span&gt;
        &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="na"&gt;config3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Config&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3'&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;boolean'&lt;/span&gt;
        &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow has two jobs. The first job creates a matrix of configurations based on inputs. Any combination of the 3 inputs can be added to the matrix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;create-matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-18.04&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.config.outputs.matrix }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;config&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="s"&gt;echo "matrix=[$(echo '${{ inputs.config1 &amp;amp;&amp;amp; '"config1"' || '' }}&lt;/span&gt;
          &lt;span class="s"&gt;${{ inputs.config2 &amp;amp;&amp;amp; '"config2"' || '' }} &lt;/span&gt;
          &lt;span class="s"&gt;${{ inputs.config3 &amp;amp;&amp;amp; '"config3"' || '' }} &lt;/span&gt;
          &lt;span class="s"&gt;| sed 's/ *$//; s/^ *//; s/  */,/g')]" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second job executes the script with the 1, 2, or 3 configurations added to the matrix in the first job.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;execute-script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-18.04&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prod&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create-matrix&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.create-matrix.outputs.config) }}&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Executing script&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Executing script with ${{ matrix.config }}&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./.github/actions/script-name'&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.config }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The AWS CLI is a powerful tool for automating DevOps tasks in the AWS cloud. GitHub Actions provides a platform for executing workflows seamlessly and transparently from version control, all without the overhead of a separate build system. &lt;/p&gt;

&lt;p&gt;Combined, they enable transparent, frequent, incremental, reversible changes in production at scale.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>productivity</category>
      <category>automation</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Building a multi-region highly available identity provider with the AWS cloud and Ory Hydra</title>
      <dc:creator>Derek Berger</dc:creator>
      <pubDate>Tue, 07 Nov 2023 15:35:22 +0000</pubDate>
      <link>https://forem.com/devsatasurion/building-a-multi-region-highly-available-identity-provider-with-the-aws-cloud-and-ory-hydra-5c5e</link>
      <guid>https://forem.com/devsatasurion/building-a-multi-region-highly-available-identity-provider-with-the-aws-cloud-and-ory-hydra-5c5e</guid>
      <description>&lt;p&gt;AsurionID is an OpenID Connect (OIDC) compatible identity provider.  It allows Asurion developers to easily integrate identity and access management into their applications using a standard protocol (OIDC) and open-source libraries.  Our team worked from specific requirements, including custom user experience and low cost, so we decided to build a homegrown solution instead of using an off-the-shelf solution.  We built AsurionID on AWS using open-source Ory Hydra and custom microservices.&lt;/p&gt;

&lt;h2&gt;
  
  
  High availability using multi-AZ in a single region
&lt;/h2&gt;

&lt;p&gt;As shown in the diagram below, in AsurionID's initial architecture its microservices ran on Amazon Elastic Kubernetes Service (EKS) across 3 Availability Zones (AZs) in a single region.  Amazon ElastiCache for Redis, used for storing temporary session data, was also deployed in 2 AZs (primary in one AZ and replica in another AZ).  We used Amazon Aurora multi-AZ features to protect the database against AZ-level failures.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjmv9o8ac6tlhmzh3oars.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjmv9o8ac6tlhmzh3oars.jpg" alt="Multi-AZ high availability" width="800" height="644"&gt;&lt;/a&gt;Multi-AZ high availability&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
This provided AsurionID with availability of up to three nines (99.9%) in a single region.  As more and more applications adopted AsurionID for identity and access management, it became more critical to our business.  We wanted to protect AsurionID against region-level service disruptions which are less frequent but can be more impactful.  That’s what led us to multi-region architecture. &lt;/p&gt;

&lt;h2&gt;
  
  
  Designed for protection against regional service disruptions
&lt;/h2&gt;

&lt;p&gt;In our latest architecture, all microservices now run in active-active mode, in two EKS clusters, across two AWS regions. With active-active, both regions' services are always live and taking traffic, and we use Route 53 weighted routing to distribute customer traffic between the two regions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfecub6240h2j2eh4gja.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfecub6240h2j2eh4gja.jpg" alt="Multi-region, active-active microservices" width="800" height="451"&gt;&lt;/a&gt;Multi-region, active-active microservices&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
We leverage Route 53 inverted health checks, following the &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/creating-disaster-recovery-mechanisms-using-amazon-route-53/" rel="noopener noreferrer"&gt;Secondary Takes Over Primary (STOP)&lt;/a&gt; pattern, to handle failover if microservices encounter region-level disruption.&lt;/p&gt;

&lt;p&gt;In our implementation of STOP, we associate the weighted DNS records with the inverted health checks, and those health checks with S3 objects. We invoke health check failure for a particular DNS by uploading its associated object. The failing health check stops Route 53 from forwarding requests to its associated regional ALB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhecymf47njj83v486z2f.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhecymf47njj83v486z2f.jpg" alt="STOP pattern for failing over microservices" width="800" height="516"&gt;&lt;/a&gt;STOP pattern for failing over microservices&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
With this approach, we have achieved &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/static-stability.html" rel="noopener noreferrer"&gt;static stability&lt;/a&gt; and independence from the Route 53 &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/control-planes-and-data-planes.html" rel="noopener noreferrer"&gt;control plane&lt;/a&gt; for failing over our microservices, which has resulted in higher availability for AsurionID microservices, up to four nines (99.99%).&lt;/p&gt;

&lt;p&gt;We have taken a slightly different approach for the caching layer.  Since we cache only ephemeral data like one-time passcodes (OTP), we aren’t replicating this data to the secondary region.  But we have another ElastiCache for Redis cluster always running in the secondary region, and in case our caching layer is impaired by an AWS regional service interruption, we would invoke failover using STOP, just like our microservices. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07pzye0qa6vsgu3mqkk2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07pzye0qa6vsgu3mqkk2.jpg" alt="Multi-region caching architecture" width="800" height="405"&gt;&lt;/a&gt;Multi-region caching architecture&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
This new architecture has helped us achieve static stability and control plane independence for the caching layer as well as the application layer. &lt;/p&gt;

&lt;p&gt;For the database, we are using &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html" rel="noopener noreferrer"&gt;Aurora Global database&lt;/a&gt; with a read replica in the secondary region.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t7d314mdtns0mx3tyun.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t7d314mdtns0mx3tyun.jpg" alt="Aurora Global database" width="800" height="383"&gt;&lt;/a&gt;Aurora Global database&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
In case of a region-level Aurora impairment, we would promote the second region's instance to primary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Enhancements
&lt;/h2&gt;

&lt;p&gt;We now strive for the same static stability and control-plane independence in the database layer as we have for our microservices and caching layers.  In our current database architecture, the promotion of the read replica triggers a Lambda that updates Route 53 CNAME values (a control plane function) to route all application traffic to the new primary database cluster.  We are looking for new approaches to database failover that use data plane operations.&lt;/p&gt;

&lt;p&gt;One potential option is &lt;a href="https://aws.amazon.com/route53/application-recovery-controller/" rel="noopener noreferrer"&gt;AWS Route 53 Application Recovery Controller (ARC)&lt;/a&gt;.  Route 53 ARC works with Route 53 health checks to enable failover using the data plane, with the extra capability of checking the standby database to ensure it is ready for failover.  ARC can also fail over an entire application stack in one operation, making it expandable to our cache and microservice layers. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we have walked you through how AsurionID started out with a multi-AZ approach to high availability and how we further improved availability with a multi-region architecture. Our architecture protects AsurionID against regional AWS service disruptions, achieves static stability, and uses data plane functions for failing over the microservices and caching layers.&lt;/p&gt;

&lt;p&gt;While the primary goals of our multi-region architecture were improved availability and resiliency, the architecture has provided the team with even more benefits.  We can now perform releases and infrastructure upgrades during business hours without impacting customers by routing traffic to one region while performing tasks in the other.  The ability to perform critical operations during the day has improved the quality of life for the engineers.  Of course, we could have realized these capabilities with a single-region architecture, but for us, they became additional benefits of a multi-region architecture.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.asurion.com/" rel="noopener noreferrer"&gt;Asurion&lt;/a&gt; is a leading tech care company that provides device protection, tech support, repair, and replacement services to 300 million customers worldwide.  It partners with mobile carriers, retailers, and device manufacturers to deliver innovative solutions for smartphones, tablets, computers, and home appliances in over 20 countries worldwide.  Asurion is headquartered in Nashville, TN.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>resiliency</category>
      <category>cloudskills</category>
      <category>highavailability</category>
    </item>
  </channel>
</rss>
