<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sebastian Mincewicz</title>
    <description>The latest articles on Forem by Sebastian Mincewicz (@sebolabs).</description>
    <link>https://forem.com/sebolabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F827973%2F3a4ce389-c3ad-4222-b06b-dd1ca7baafb6.png</url>
      <title>Forem: Sebastian Mincewicz</title>
      <link>https://forem.com/sebolabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sebolabs"/>
    <language>en</language>
    <item>
      <title>Reusing CloudFront, ALB, and API Gateway in a Serverless Platform</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Tue, 17 Mar 2026 22:00:00 +0000</pubDate>
      <link>https://forem.com/aws-builders/reusing-cloudfront-alb-and-api-gateway-in-a-serverless-platform-1boi</link>
      <guid>https://forem.com/aws-builders/reusing-cloudfront-alb-and-api-gateway-in-a-serverless-platform-1boi</guid>
      <description>&lt;p&gt;Modern serverless platforms often need to support a mix of public-facing entry points - web applications and APIs - while keeping internal communication private and tightly controlled. Non-functional requirements around security, compliance, and operational isolation typically drive this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxanvn5a04iewc7tzl6fz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxanvn5a04iewc7tzl6fz.png" alt="Post Banner" width="800" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the same time, such platforms are increasingly built around multiple domain services, each representing a bounded context and owned by a small team. Independent deployments, limited blast radius, and clear ownership are usually explicit architectural goals.&lt;/p&gt;

&lt;p&gt;These goals, however, often collide with another equally important requirement: speed. In fast-paced development environments, the ability to spin up feature environments on demand, test changes quickly, and tear everything down without friction is critical to maintaining delivery velocity.&lt;/p&gt;

&lt;p&gt;This post explores how architectural choices at the edge - specifically around CloudFront, Application Load Balancers (ALB), and API Gateway - can either enable or severely constrain that speed. It looks at how careful reuse of these components can support isolated, per-service deployments and on-demand environments without driving up cost or operational complexity, and where the trade-offs start to appear.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architectural choices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Frontend
&lt;/h3&gt;

&lt;p&gt;In many serverless platforms, the frontend acts as the primary public entry point. A common pattern is to place CloudFront at the edge and route requests to different backend origins depending on their nature.&lt;/p&gt;

&lt;p&gt;Static assets are typically served from S3, while dynamic requests are forwarded to compute workloads that render pages or handle frontend-driven API calls. These dynamic workloads may run as Lambda functions or containers, often inside a VPC to meet security and compliance requirements.&lt;/p&gt;

&lt;p&gt;At this point, there is an architectural choice to make: whether dynamic frontend traffic should be handled by the &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/choosing-the-right-aws-service-for-your-microservice-endpoints/services-comparison.html" rel="noopener noreferrer"&gt;API Gateway or an Application Load Balancer (ALB)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;From a non-functional requirements perspective, separating public access from private execution is key. While API Gateway is well-suited for exposing managed APIs, frontend traffic often benefits from being terminated at ALB, with requests forwarded to compute that remains fully private within a VPC. &lt;strong&gt;CloudFront supports private origins through VPC-origin integrations with internal ALB, whereas private API Gateways cannot be used directly as CloudFront origins&lt;/strong&gt;. This makes ALB a more natural fit when frontend compute is intentionally kept off the public internet.&lt;/p&gt;

&lt;p&gt;Placing &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/lambda-functions.html" rel="noopener noreferrer"&gt;Lambda functions behind an ALB&lt;/a&gt; and inside a VPC also addresses increasingly common security requirements in regulated environments. When compute is fronted by an internal ALB, CloudFront becomes the only public ingress path by design. The ALB is not internet-facing and cannot be accessed directly, which ensures that all external traffic is consistently terminated, inspected, and controlled at the edge.&lt;/p&gt;

&lt;p&gt;This pattern significantly reduces the risk of accidental or intentional bypass of edge-level controls. Combined with VPC-based execution, it enables tighter governance of inbound and outbound traffic through WAF, security groups, routing, and egress filtering. Together, these measures provide stronger guarantees than relying solely on managed service boundaries or publicly reachable endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1pvx6i2nzflns66bs19w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1pvx6i2nzflns66bs19w.png" alt="CF-ALB-Frontend" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From an operational perspective, ALB offers lower latency for HTTP workloads, native support for both host-based and path-based routing, and seamless integration with both Lambda and container-based runtimes such as ECS. These characteristics make it particularly well-suited for frontend and backend-for-frontend (BFF) use cases, where routing flexibility, performance, and network controls often outweigh API-centric features such as request validation, usage plans, or request transformation.&lt;/p&gt;

&lt;p&gt;This does not diminish the role of API Gateway in the overall architecture. API Gateway remains a strong choice for internal and external APIs, especially where API lifecycle management, authentication, throttling, and contract enforcement are primary concerns. Using ALB at the frontend edge simply allows each component to be applied where it fits best, rather than forcing a single service to satisfy fundamentally different requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Internal API
&lt;/h3&gt;

&lt;p&gt;Internally, platforms often expose functionality through APIs consumed by other services or internal clients. A key architectural decision is whether to operate a single internal API Gateway or multiple gateways aligned with services, domains, or teams.&lt;/p&gt;

&lt;p&gt;There is no universal answer. Smaller platforms or early-stage systems often benefit from a single internal API, as it simplifies discovery and reduces operational overhead. As platforms grow, multiple APIs aligned with domains or "two-pizza teams" can become more appropriate, particularly when ownership boundaries, release cadence, or non-functional requirements diverge.&lt;/p&gt;

&lt;p&gt;What matters more than the number of gateways, however, is recognising that &lt;strong&gt;API shape and deployment shape are not the same thing&lt;/strong&gt;. A single logical API does not require a single infrastructure deployment unit, just as multiple services do not automatically justify multiple gateways. Treating these concerns separately allows teams to optimise for independent releases, blast-radius reduction, and clearer ownership, while keeping the external API surface coherent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febepn857ewyrgoupjiwv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febepn857ewyrgoupjiwv.png" alt="SingleVsMultipleAPIGs" width="800" height="740"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This distinction becomes especially important in high-paced environments, where the ability to evolve services independently should not be constrained by how APIs are presented or grouped at the edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backend services and infrastructure-as-code
&lt;/h3&gt;

&lt;p&gt;From an infrastructure-as-code (IaC) perspective, backend services are often deployed as independent units to support isolation, targeted releases, and reduced blast radius. This aligns naturally with domain-driven design, where each service represents a bounded context and can evolve independently.&lt;/p&gt;

&lt;p&gt;When multiple backend services are exposed through &lt;strong&gt;a single API Gateway&lt;/strong&gt;, this model becomes more nuanced. API Gateway configuration changes - such as routes, integrations, or method settings - are only made effective through an explicit &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-deploy-api.html" rel="noopener noreferrer"&gt;deployment step&lt;/a&gt;. As a result, while service-specific configuration can be defined independently, the API Gateway itself still has a shared deployment lifecycle.&lt;/p&gt;

&lt;p&gt;This does not make per-service backend stacks incompatible with a single API Gateway, but it does require additional structure. In practice, this means decomposing API Gateway configuration into distinct concerns: a base or core API configuration, per-service or per-domain route and integration definitions, and a dedicated deployment component responsible for applying changes to the gateway. That deployment component must be notified when any service updates its portion of the API configuration, without introducing tight coupling between services or requiring knowledge of how many services exist.&lt;/p&gt;

&lt;p&gt;Without this explicit separation, a shared API Gateway can easily become a point of implicit coupling, where independent service deployments are forced to coordinate around a central API deployment. With the right decomposition and signalling in place, however, a single logical API can still support independent service lifecycles and parallel provisioning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxdvnz4dealdxtlsbqq2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxdvnz4dealdxtlsbqq2.png" alt="TFsplit" width="800" height="916"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In contrast, using an &lt;strong&gt;API Gateway per backend service&lt;/strong&gt; naturally scopes configuration and deployment to the service itself, avoiding the need for cross-service coordination at the gateway level. The trade-off is increased operational overhead and a more fragmented API surface. The choice between these approaches is therefore not about feasibility, but about how much complexity is absorbed by infrastructure design versus operational management, and how much flexibility is required as the platform continues to evolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  On-demand feature environments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why they matter
&lt;/h3&gt;

&lt;p&gt;Fast feedback is essential. Feature environments allow teams to validate changes in realistic conditions, unblock reviews, and catch integration issues early. While tools like &lt;a href="https://www.localstack.cloud/" rel="noopener noreferrer"&gt;LocalStack&lt;/a&gt; can help with local development, they have limitations - especially for integration testing, CI pipelines, and workflows involving multiple services.&lt;/p&gt;

&lt;p&gt;On-demand environments bridge that gap, but naïvely provisioning full stacks per feature quickly becomes expensive and slow, too slow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reuse as an enabler
&lt;/h3&gt;

&lt;p&gt;Not all infrastructure needs to be duplicated. Components like CloudFront, ALB, and API Gateway are surprisingly well-suited to reuse, provided routing is designed with that goal in mind.&lt;/p&gt;

&lt;p&gt;This is not just a cost consideration. Creating or modifying edge components such as CloudFront distributions or load balancers can easily take ten minutes or more, which makes them a poor fit for fast, iterative workflows. When feature environments depend on provisioning or reconfiguring these components, feedback loops slow down dramatically.&lt;/p&gt;

&lt;p&gt;The key, therefore, is deciding what varies per environment and what stays shared. By reusing long-lived edge infrastructure and shifting environment-specific concerns into routing, configuration, and backend resources, platforms can keep costs predictable and provisioning times low. The trade-off is that this requires upfront discipline in routing design, naming conventions, and configuration boundaries, but that investment pays off quickly as the environment count and delivery pace increase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ephemeral on-demand environments
&lt;/h3&gt;

&lt;p&gt;Ephemeral environments, the golden grail, aim for maximum speed. They rely on shared edge infrastructure and differentiate environments through routing.&lt;/p&gt;

&lt;p&gt;This approach typically requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shared CloudFront configuration with a common domain name&lt;/li&gt;
&lt;li&gt;Path-based routing at ALB&lt;/li&gt;
&lt;li&gt;Frontend support for runtime &lt;code&gt;basePath&lt;/code&gt; and &lt;code&gt;assetPrefix&lt;/code&gt; changes&lt;/li&gt;
&lt;li&gt;Multiple API Gateway stages mapped to environment identifiers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9c9hypc1z8yyq7fh6k09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9c9hypc1z8yyq7fh6k09.png" alt="EphmeralEnvs" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Internal API design is irrelevant here&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When these conditions are met, spinning up a new environment becomes little more than adding routing rules and deploying service-specific resources. Cleanup is trivial, and costs remain largely constant regardless of the number of environments. Neither CloudFront nor the ALB needs to be aware of how many environments exist or may exist in the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semi-static on-demand environments
&lt;/h3&gt;

&lt;p&gt;Some frontends (e.g., Next.js) require environment-specific configuration at build time and cannot easily adapt to runtime path changes. In these cases, a semi-static approach is often necessary.&lt;/p&gt;

&lt;p&gt;Here, each environment may have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shared CloudFront configuration with alternate domain names and static Route 53 records&lt;/li&gt;
&lt;li&gt;Shared ALB with host-based routing&lt;/li&gt;
&lt;li&gt;Any API Gateway configuration implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt35u9wu8jd44aaboi12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt35u9wu8jd44aaboi12.png" alt="SemiStaticEnvs" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Internal API design is irrelevant here&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In contrast to the ephemeral environments option, this model requires CloudFront and Route 53 to be prepared upfront. Each environment must be explicitly represented through distribution configuration and DNS records, making environments longer-lived and less dynamic by design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical limits and quotas
&lt;/h3&gt;

&lt;p&gt;In practice, hard service quotas are rarely the first blocker. More often, provisioning time and operational friction become limiting factors as environments scale.&lt;/p&gt;

&lt;p&gt;That said, quotas do influence how far reuse patterns can go and whether a limit can be raised or must be designed around matters. API Gateway REST APIs support a limited number of stages per API (10 by default, soft limit), which constrains stage-based environment reuse. Private API Gateway custom domain names are also subject to soft limits. On the load-balancing side, an ALB supports up to 100 listener rules (soft limit) and 100 target groups (hard limit), the latter requiring an architectural change.&lt;/p&gt;

&lt;p&gt;For example, reusing a single ALB across ten environments can consume listener rules perhaps not so quickly, while a single API Gateway with one stage per environment approaches its stage limit fast. These constraints don't prevent reuse, but they highlight the need to understand early which limits can be adjusted and which require rethinking the design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The patterns described in this post are not a single fixed architecture, but a &lt;strong&gt;set of reuse-oriented edge and routing strategies&lt;/strong&gt; for serverless platforms built on CloudFront, ALB, and API Gateway. Their value lies in being applied selectively.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;lower environments&lt;/strong&gt; - feature, preview, and integration stages - reusing edge components and differentiating environments through routing enables fast provisioning, easy teardown, and predictable costs. These environments benefit most from shared CloudFront distributions, shared ALBs, and carefully structured API Gateway reuse, where speed and feedback outweigh the need for strict isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upper environments are different&lt;/strong&gt;. Staging, pre-production, and production typically require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standalone, fully isolated infrastructure&lt;/li&gt;
&lt;li&gt;Right-sized capacity and scaling characteristics&lt;/li&gt;
&lt;li&gt;Stable DNS and edge configuration&lt;/li&gt;
&lt;li&gt;The ability to run meaningful load, stress, and performance tests against an environment that truly behaves like production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those environments, reuse at the edge is usually inappropriate. Dedicated CloudFront distributions, ALBs, and API Gateways are not an optimisation failure - they are a requirement.&lt;/p&gt;

&lt;p&gt;Supporting both approaches within the same platform does introduce additional complexity, particularly in infrastructure-as-code. Designing IaC that remains DRY while allowing some environments to share edge infrastructure and others to be fully isolated requires clear layering, strong conventions, and explicit ownership boundaries. This is more demanding than a uniform setup, but it is entirely achievable with deliberate design.&lt;/p&gt;

&lt;p&gt;These patterns optimise for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast feedback and experimentation where environment churn is high&lt;/li&gt;
&lt;li&gt;Independent service evolution with controlled blast radius&lt;/li&gt;
&lt;li&gt;Predictable cost and provisioning time at scale&lt;/li&gt;
&lt;li&gt;The ability to evolve toward stricter isolation when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are not optimised for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maximum isolation across all environments&lt;/li&gt;
&lt;li&gt;Treating edge infrastructure as immutable per environment&lt;/li&gt;
&lt;li&gt;Eliminating all shared components&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Being explicit about where reuse applies - and where it must not - allows the platform to move quickly without compromising correctness. The goal is not to reuse everything, but to reuse the &lt;em&gt;right&lt;/em&gt; things, in the &lt;em&gt;right&lt;/em&gt; places, for the &lt;em&gt;right&lt;/em&gt; reasons.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>architecture</category>
      <category>cloud</category>
      <category>aws</category>
    </item>
    <item>
      <title>AWS CodeBuild-powered GitHub Actions self-hosted runners — without webhooks</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Tue, 03 Feb 2026 13:45:26 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-codebuild-powered-github-actions-self-hosted-runners-without-webhooks-je4</link>
      <guid>https://forem.com/aws-builders/aws-codebuild-powered-github-actions-self-hosted-runners-without-webhooks-je4</guid>
      <description>&lt;p&gt;This topic may sound familiar, but this post intentionally goes beyond what you’ll find in AWS documentation or official blog posts.&lt;/p&gt;

&lt;p&gt;The goal is to &lt;strong&gt;avoid webhooks&lt;/strong&gt; and instead achieve maximum flexibility when using GitHub Actions for CI/CD with AWS CodeBuild-powered, ephemeral runners &lt;strong&gt;only when they are actually needed&lt;/strong&gt;, while continuing to rely on GitHub-hosted runners for everything else.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftlhntb5fwpmutnybiz3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftlhntb5fwpmutnybiz3i.png" alt="post image" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Webhooks — one-way door
&lt;/h2&gt;

&lt;p&gt;I’m not saying webhooks are wrong. If you have a clear reason to use them, they can be a good fit. However, in practice, they often become a one-way route that reduces flexibility — and I strongly prefer fit-for-purpose solutions.&lt;/p&gt;

&lt;p&gt;Relevant AWS documentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/action-runner-overview.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/codebuild/latest/userguide/action-runner-overview.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/devops/aws-codebuild-managed-self-hosted-github-action-runners/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/devops/aws-codebuild-managed-self-hosted-github-action-runners/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main issue with the webhook option is that you subscribe to specific GitHub event types, which effectively pushes you toward using CodeBuild for all jobs — unless you introduce increasingly complex filters and workflow logic.&lt;/p&gt;

&lt;p&gt;At that point, your GitHub Actions configuration starts encoding infrastructure decisions, which is rarely ideal.&lt;/p&gt;

&lt;h2&gt;
  
  
  On-demand &amp;amp; ad-hoc runners — flexibility
&lt;/h2&gt;

&lt;p&gt;Instead, there’s a way to make CodeBuild-powered self-hosted runners available explicitly for the workflows that need them — and only when they’re actually required.&lt;/p&gt;

&lt;p&gt;The idea is to start a CodeBuild project &lt;strong&gt;on demand&lt;/strong&gt;, scoped to a specific GitHub Actions workflow run, and configured so it can only be used by that run. This avoids clashes, ghost runs, or unintended usage across repositories within your GitHub organisation.&lt;/p&gt;

&lt;p&gt;Consider a setup where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;some workflows only build artifacts,&lt;/li&gt;
&lt;li&gt;others run unit tests or static analysis,&lt;/li&gt;
&lt;li&gt;others perform deployments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of these can run perfectly fine on GitHub-hosted runners. However, there are cases where that breaks down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;running tests against IP-allowlist-protected public endpoints,&lt;/li&gt;
&lt;li&gt;running tests against private endpoints accessible only from within a custom VPC.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the first case, some teams attempt to add GitHub runners’ public IP ranges to allowlists. This creates a false sense of security, as those endpoints remain accessible to a large, shared address space.&lt;/p&gt;

&lt;p&gt;In the second case — private endpoints — it’s simply a hard stop.&lt;/p&gt;

&lt;p&gt;So how do you bring both worlds together and cover all of these use cases?&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture overview
&lt;/h3&gt;

&lt;p&gt;In this architecture, a GitHub App is created with its associated private key, enabling secure authentication with one or more repositories. AWS CodeBuild leverages this identity to generate temporary tokens and register itself dynamically as an ephemeral GitHub Actions runner.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This approach builds on the standard GitHub -&amp;gt; AWS OIDC integration, as documented &lt;a href="https://docs.github.com/en/actions/how-tos/secure-your-work/security-harden-deployments/oidc-in-aws" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yl74flpq4w105monscg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yl74flpq4w105monscg.png" alt="Flow HLD" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub App
&lt;/h3&gt;

&lt;p&gt;A key part of making this pattern robust and production-ready is how the GitHub runner registration token is obtained. While many examples rely on a Personal Access Token (PAT), using a &lt;strong&gt;GitHub App&lt;/strong&gt; is the more appropriate choice for team and organisation-level setups.&lt;/p&gt;

&lt;p&gt;A GitHub App can be installed on &lt;strong&gt;specific repositories only&lt;/strong&gt;, which aligns well with the idea of tightly scoped, purpose-built runners. This ensures that a CodeBuild-provisioned runner can only ever register against repositories you explicitly allow, reducing both blast radius and the risk of accidental reuse across the organisation.&lt;/p&gt;

&lt;p&gt;The permission model is also much cleaner than with PATs. At a minimum, the GitHub App needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read access to code and metadata&lt;/li&gt;
&lt;li&gt;read and write access to administration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No organisation-wide permissions are required, and the short-lived installation access tokens generated by the App naturally fit the ephemeral runner lifecycle.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;For &lt;strong&gt;personal projects&lt;/strong&gt;, prototypes, or one-off experiments, a &lt;strong&gt;fine-grained or classic PAT&lt;/strong&gt; can still be a perfectly acceptable and much simpler option. It avoids the additional setup overhead of a GitHub App and is often “good enough” when the scope and risk are limited.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  AWS CodeBuild project &amp;amp; buildspec
&lt;/h3&gt;

&lt;p&gt;From the CodeBuild project perspective, it really doesn’t get any simpler. The project does not need to be configured with a source repository at all — the buildspec can remain fully generic and work for any repository within your GitHub organisation.&lt;/p&gt;

&lt;p&gt;The snippet below assumes the CodeBuild project is configured to use Amazon Linux running on Graviton-powered infrastructure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# REQUIRED INPUT:
# - GITHUB_ORG (from IaC)
# - GITHUB_APP_ID (from IaC)
# - GITHUB_APP_INSTALLATION_ID (from IaC)
# - GITHUB_APP_PK_ASM_SECRET_ARN (from IaC)
# - GITHUB_REPO (from GHA workflow)
# - GITHUB_ACTIONS_RUNNER_NAME (from GHA workflow)

version: 0.2

env:
  variables:
    USER_HOME_DIR: "/home/runneruser"
  secrets-manager:
    GITHUB_APP_PK: "${GITHUB_APP_PK_ASM_SECRET_ARN}"

phases:
  install:
    commands:
      - |
        echo "&amp;gt; Creating runner user..."
        useradd -m runneruser -d $USER_HOME_DIR
        echo "runneruser ALL = NOPASSWD:/usr/bin/yum" &amp;gt;&amp;gt; /etc/sudoers # &amp;lt;- consider when need to install deps on the fly

        echo "&amp;gt; Downloading the lastest runner installation package..."
        cd $USER_HOME_DIR
        RUNNER_VERSION=$(curl -s https://api.github.com/repos/actions/runner/tags | jq -r '.[0].name' | sed 's/^v//')
        echo "Latest tag: $RUNNER_VERSION"
        curl -Ls https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-arm64-${RUNNER_VERSION}.tar.gz -o actions-runner.tar.gz
        mkdir actions-runner &amp;amp;&amp;amp; tar xzf actions-runner.tar.gz -C actions-runner
        chown -R runneruser:runneruser .

  build:
    commands:
      - |
        echo "&amp;gt; Generating GitHub App JWT (header + payload)..."
        cd $USER_HOME_DIR/actions-runner
        now=$(date +%s)
        iat=$((${now} - 60))  # Issues 1 miute in the past
        exp=$((${now} + 600)) # Expires 10 minutes in the future

        b64enc() { openssl base64 | tr -d '=' | tr '/+' '_-' | tr -d '\n'; }

        header_json='{
            "typ":"JWT",
            "alg":"RS256"
        }'
        # Header encode
        header=$(echo -n "${header_json}" | b64enc)

        payload_json="{
            \"iat\":${iat},
            \"exp\":${exp},
            \"iss\":\"${GITHUB_APP_ID}\"
        }"
        # Payload encode
        payload=$( echo -n "${payload_json}" | b64enc )

        # Signature
        header_payload="${header}"."${payload}"
        signature=$(
            openssl dgst -sha256 -sign &amp;lt;(echo -n "${GITHUB_APP_PK}") \
            &amp;lt;(echo -n "${header_payload}") | b64enc
        )
        # Generate an installation token for the app
        JWT="${header_payload}"."${signature}"

        echo "&amp;gt; Requesting GitHub App installation access token..."
        INSTALLATION_TOKEN=$(curl --request POST \
            --url "https://api.github.com/app/installations/${GITHUB_APP_INSTALLATION_ID}/access_tokens" \
            --header "Accept: application/vnd.github+json" \
            --header "Authorization: Bearer ${JWT}" \
            --header "X-GitHub-Api-Version: 2022-11-28"  \
          | jq -r '.token'
        )

        echo "&amp;gt; Requesting ephemeral runner registration token..."
        GITHUB_RUNNER_TOKEN=$(curl --request POST \
            --url "https://api.github.com/repos/${GITHUB_ORG}/${GITHUB_REPO}/actions/runners/registration-token" \
            --header "Accept: application/vnd.github+json" \
            --header "Authorization: Bearer ${INSTALLATION_TOKEN}" | jq -r '.token'
        )

        echo "&amp;gt; Configuring GitHub Actions runner for ${GITHUB_ORG}/${GITHUB_REPO} ..."
        su runneruser -c "./config.sh \
          --url https://github.com/${GITHUB_ORG}/${GITHUB_REPO} \
          --token ${GITHUB_RUNNER_TOKEN} \
          --unattended --ephemeral \
          --name ${GITHUB_ACTIONS_RUNNER_NAME} \
          --labels self-hosted,${GITHUB_ACTIONS_RUNNER_NAME}"

        echo "&amp;gt; Starting runner..."
        su runneruser -c "./run.sh"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you wish you can build a custom image and that way get rid of the &lt;code&gt;install&lt;/code&gt; phase entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Actions workflow
&lt;/h3&gt;

&lt;p&gt;Below is a workflow snippet that completes the configuration and shows how everything fits together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jobs:
  start-cb-runner:
    runs-on: ubuntu-latest
    outputs:
      runner_name: ${{ steps.start-cb-project.outputs.runner_name }}
    steps:
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v5
        with:
          audience: sts.amazonaws.com
          aws-region: ${{ env.AWS_REGION }}
          role-to-assume: ${{ env.AWS_ROLE_ARN }}
          role-session-name: GithubActionsSession

      - name: Start CodeBuild project
        id: start-cb-project
        run: |
          RUNNER_NAME="cb-runner-${GITHUB_RUN_ID}-${GITHUB_SHA::8}"
          CODEBUILD_PROJECT_NAME="${{ env.CODEBUILD_PROJECT_NAME }}"
          echo "runner_name=$RUNNER_NAME" &amp;gt;&amp;gt; $GITHUB_OUTPUT
          PAYLOAD=$(jq -n \
            --arg project "$CODEBUILD_PROJECT_NAME" \
            --arg gh-repo "${{ github.event.repository.name }}" \
            --arg gha-runner-name "$RUNNER_NAME" \
            '{
              projectName: $project,
              environmentVariablesOverride: [
                {name: "GITHUB_REPO", value: $gh-repo, type: "PLAINTEXT"},
                {name: "GITHUB_ACTIONS_RUNNER_NAME", value: $gha-runner-name, type: "PLAINTEXT"}
              ]
            }')
          aws codebuild start-build --cli-input-json "$PAYLOAD" &amp;gt; /dev/null

  run-tests-on-cb-runner:
    needs: start-cb-runner
    runs-on: [self-hosted, "${{ needs.start-cb-runner.outputs.runner_name }}"]
    steps:
      - uses: actions/checkout@v6
      - name: Introduction message
        run: |
          echo "Testing ..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From a performance perspective, bringing a runner online should take &lt;strong&gt;no more than ~30 seconds&lt;/strong&gt;, after which it is ready to pick up the queued job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9z0ng2kuccxz7lw3uo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl9z0ng2kuccxz7lw3uo4.png" alt="GHA view" width="800" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the less obvious but critical elements is the &lt;code&gt;RUNNER_NAME&lt;/code&gt;. In teams where multiple developers and testers run workflows in parallel, there is always a risk of workflows competing for runners. By generating a unique runner name per workflow run and passing it into the CodeBuild project, you guarantee that the runner you spin up is used exclusively for that specific execution and cannot be accidentally picked up by another job.&lt;/p&gt;

&lt;p&gt;Finally, the CodeBuild project output should look similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Configuring and starting runner for sebolabs/my-repo ...
--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------
# Authentication
√ Connected to GitHub
# Runner Registration
√ Runner successfully added
# Runner settings
√ Settings Saved.
√ Connected to GitHub
Current runner version: '2.331.0’
2026-01-19 20:34:07Z: Listening for Jobs
2026-01-19 20:34:09Z: Running job: run-tests-on-cb-runner
2026-01-19 20:34:17Z: Job run-tests-on-cb-runner completed with result: Succeeded
√ Removed .credentials
√ Removed .runner
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...
[Container] 2026/01/19 20:34:17.683704 Phase complete: BUILD State: SUCCEEDED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;Using AWS CodeBuild–powered, ephemeral GitHub Actions self-hosted runners without webhooks gives you precise, on-demand control over when and why you leave the GitHub-hosted runner pool. You retain full workflow flexibility, keep logs and execution visibility inside the GitHub Actions UI, and only incur additional infrastructure when a workflow genuinely requires network proximity to protected or private endpoints.&lt;/p&gt;

&lt;p&gt;This model avoids over-coupling your CI/CD design to webhook-driven automation, reduces the risk of unintended runner usage, and scales cleanly even when many workflows are triggered in parallel. Treating CodeBuild runners as a specialised, opt-in execution environment — rather than a default — keeps both architecture and blast radius under control.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Historically, self-hosted GitHub Actions runners were effectively free from GitHub’s billing perspective — you only paid for the infrastructure you ran them on. That changes on &lt;strong&gt;March 1, 2026&lt;/strong&gt;, when GitHub will introduce a &lt;strong&gt;$0.002 per-minute GitHub Actions cloud platform charge&lt;/strong&gt; for self-hosted runner usage, with those minutes counting toward your plan.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>aws</category>
      <category>githubactions</category>
      <category>cicd</category>
      <category>devops</category>
    </item>
    <item>
      <title>AWS Anywhere - a route to EKS Hybrid Nodes</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Fri, 03 Jan 2025 13:16:24 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-anywhere-a-route-to-eks-hybrid-nodes-4h6d</link>
      <guid>https://forem.com/aws-builders/aws-anywhere-a-route-to-eks-hybrid-nodes-4h6d</guid>
      <description>&lt;p&gt;In an era where cloud and on-premises environments increasingly converge, the ability to seamlessly integrate these ecosystems has never been more critical. This story explores what I like to call &lt;em&gt;"AWS Anywhere"&lt;/em&gt; - an overarching concept encompassing a suite of AWS capabilities that enable seamless hybrid operations. From establishing a &lt;strong&gt;Site-to-Site VPN&lt;/strong&gt; to bridging your on-premises network with AWS, configuring &lt;strong&gt;Route 53 Inbound Resolvers&lt;/strong&gt; to enable private connections to &lt;strong&gt;VPC Endpoint Interfaces&lt;/strong&gt;, leveraging &lt;strong&gt;IAM Roles Anywhere&lt;/strong&gt; for secure identity management, and setting up the &lt;strong&gt;SSM agent&lt;/strong&gt; for streamlined operations, this journey culminates in deploying &lt;strong&gt;EKS Hybrid Nodes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This builds on my earlier story, &lt;a href="https://dev.to/aws-builders/aws-landing-zone-hybrid-networking-ele"&gt;AWS Landing Zone 3: Hybrid Networking&lt;/a&gt;, where I explored hybrid networking fundamentals. This time, I'm going a step further by providing source code and guides, making it easier for anyone to replicate the setup and put these concepts into action.&lt;/p&gt;

&lt;p&gt;This story takes a practical approach, using a &lt;strong&gt;Raspberry Pi&lt;/strong&gt; as the on-premises node to simulate real-world scenarios &lt;strong&gt;at home&lt;/strong&gt;. By the end, you'll understand how these AWS services combine to create a unified hybrid infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  EKS Hybrid Nodes TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://medium.com/r/?url=https%3A%2F%2Faws.amazon.com%2Feks%2Fhybrid-nodes%2F" rel="noopener noreferrer"&gt;Amazon EKS Hybrid Nodes&lt;/a&gt;, introduced at AWS re:Invent 2024, allow businesses to run Kubernetes workloads seamlessly across on-premises, edge, and cloud environments. This solution simplifies Kubernetes management by offloading control plane availability and scalability to AWS while integrating with services like centralized logging and monitoring. It enables organizations to maximize existing infrastructure while modernizing deployments with AWS cloud capabilities. This unified approach reduces operational complexity and accelerates application modernization.&lt;br&gt;
Before we get there, let me take you through the foundational steps that must be set up first.&lt;/p&gt;
&lt;h2&gt;
  
  
  Source code &amp;amp; guides
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Terraform
&lt;/h3&gt;

&lt;p&gt;The Terraform configuration files are available via the link below. These files include switches - disabled by default - represented by boolean variables that enable specific functionalities, all of which are detailed in the sections below.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/sebolabs/aws-anywhere-tf" rel="noopener noreferrer"&gt;https://github.com/sebolabs/aws-anywhere-tf&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Guides.md
&lt;/h3&gt;

&lt;p&gt;I've also utilized a combination of &lt;code&gt;templatefile()&lt;/code&gt; and &lt;code&gt;local_file&lt;/code&gt; resources to generate markdown guides. These guides provide step-by-step instructions and commands to configure the on-premises side of the setup, whether you're using a Raspberry Pi or another machine. Once a functionality is enabled and applied, a tailored guide file is generated, complete with references and values specific to your environment.&lt;/p&gt;
&lt;h2&gt;
  
  
  AWS Anywhere
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Raspberry Pi
&lt;/h3&gt;

&lt;p&gt;For my setup, I used my Raspberry Pi, the same one I used a few years ago to simulate hybrid networking as mentioned in the introduction. It runs Ubuntu 24.04, ensuring compatibility with all necessary installations to make this integration work seamlessly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvnqoiyba00nbxej0k3d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvnqoiyba00nbxej0k3d.png" alt="My Raspberry Pi" width="800" height="600"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ssh pi                                                                                                                                                                                                                         255 ✘ │ 14:50:42 

Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 6.8.0-1017-raspi aarch64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 * Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK8s
   just raised the bar for easy, resilient and secure K8s cluster deployment.

   https://ubuntu.com/engage/secure-kubernetes-at-the-edge

Last login: Sat Dec 28 14:37:17 2024 from 192.168.101.10

seb@pi:~$ neofetch
            .-/+oossssoo+/-.               seb@pi
        `:+ssssssssssssssssss+:`           ------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.1 LTS aarch64
    .ossssssssssssssssssdMMMNysssso.       Host: Raspberry Pi 4 Model B Rev 1.4
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.8.0-1017-raspi
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 3 days, 5 hours, 24 mins
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 810 (dpkg)
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Terminal: /dev/pts/0
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: (4) @ 1.800GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Memory: 399MiB / 7802MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once it's up and running, we can move on to the next step.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS VPC defaults
&lt;/h3&gt;

&lt;p&gt;By default, apart from the Transit Gateway responsible for integrating different networks, the module configures a Transit VPC. This VPC hosts VPC Interface Endpoints and can also act as a centralized egress point in your landing zone setup. Here's what you get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckbwbsqbi8fhiahlijd8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckbwbsqbi8fhiahlijd8.png" alt="HLD: Part 1 - Transit Gateway + Transit VPC" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Site-to-Site VPN
&lt;/h3&gt;

&lt;p&gt;AWS Site-to-Site VPN is a flexible and cost-effective solution for securely connecting on-premises networks to AWS. While AWS Direct Connect offers a more reliable, dedicated connection with lower latency, Site-to-Site VPN is often chosen for its quicker, simpler setup or when Direct Connect isn't available. The VPN connection uses IPsec tunnels over the Internet, ensuring secure communication between local networks and AWS resources.&lt;/p&gt;

&lt;p&gt;On the AWS side, this is a managed service. On my Pi, I configured the VPN connection using &lt;strong&gt;StrongSwan&lt;/strong&gt;, an open-source IPsec-based VPN solution. StrongSwan provides flexible configuration and integrates seamlessly with diverse network setups. By pairing StrongSwan with AWS's managed service, I maintain granular control over the VPN configuration while benefiting from AWS's operational simplicity.&lt;/p&gt;

&lt;p&gt;To get it configured start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;on_prem_s2s_vpn_enabled = true
on_prem_props           = { ... }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and upon successful Terraform apply you'll get a guide on how to configure StrongSwan. For testing purposes feel free to limit it to a single VPN tunnel. When done here's what you get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzgy32e2nobv8akxhxr9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzgy32e2nobv8akxhxr9.png" alt="HLD: Part 2- Site-to-Site VPN" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The private subnet is included in case you wish to spin up an EC2 instance for testing purposes, such as confirming successful connectivity to and from a real server running in the VPC.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7eb2sg7y8mjhz6t0sceq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7eb2sg7y8mjhz6t0sceq.png" alt="Site-to-Site VPN connection detail" width="800" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fok22xmmr5x3mmvtvaam0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fok22xmmr5x3mmvtvaam0.png" alt="Site-to-Site VPN connection tunnels" width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid DNS and private access to VPC Interface Endpoints
&lt;/h3&gt;

&lt;p&gt;Hybrid DNS and private access to VPC Interface Endpoints, alongside AWS PrivateLink, enable secure, private connectivity to AWS services. By integrating a local &lt;strong&gt;Bind&lt;/strong&gt; DNS server with &lt;strong&gt;Route 53 Inbound Resolver&lt;/strong&gt; over the configured Site-to-Site VPN, DNS queries for AWS services are routed privately within AWS. This setup allows the use of VPC Interface Endpoints to connect to AWS APIs. PrivateLink ensures that traffic to services such as S3, Systems Manager, or EKS remains within the AWS network, avoiding exposure to the public Internet and enhancing both security and performance.&lt;/p&gt;

&lt;p&gt;To get it configured start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;r53_inbound_resolver_enabled = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and upon successful Terraform apply you'll get a guide on how to configure Bind. When done here's what you get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx9nlugaaqlx38n3ordr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx9nlugaaqlx38n3ordr.png" alt="HLD: Part 3: Route 53 Inbound Resolver + Bind + VPC Interface Endpoints" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ibn6tqzuk8czj1i09no.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ibn6tqzuk8czj1i09no.png" alt="Route 53 Inbound Resolver" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM Roles Anywhere
&lt;/h3&gt;

&lt;p&gt;With the Site-to-Site VPN in place, I can securely extend my on-premises network to AWS, providing seamless communication between local infrastructure and AWS resources. Integrating IAM Roles Anywhere enables secure and temporary access to AWS services from on-prem systems. This leverages the VPN connection and IAM roles for secure authentication. Additionally, with private access to VPC Interface Endpoints, on-prem systems can resolve AWS service API addresses to private IPs, ensuring traffic remains within the AWS network for enhanced security, and avoiding public Internet exposure.&lt;/p&gt;

&lt;p&gt;To get it configured start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iam_roles_anywhere_enabled = true

# NOTE: Terraform must be re-run once the CA cert is uploaded to SSM PS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and upon successful Terraform apply you'll get a guide on how to configure a local CA, generate certificates, and leverage IAM to access AWS services. When done here's what you get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9teh11npbomcr4hlb7dt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9teh11npbomcr4hlb7dt.png" alt="HLD: Part 4 - IAM Roles Anywhere + CA" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fi5ucd27xpvcdrcgx4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fi5ucd27xpvcdrcgx4w.png" alt="CloudTrail CreateSession event from Pi" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  SSM Agent
&lt;/h3&gt;

&lt;p&gt;With the Site-to-Site VPN in place, private access to AWS services via VPC Interface Endpoints configured, and IAM Roles Anywhere enabling secure role-based access, the next step is integrating the SSM Agent. The SSM Agent facilitates secure, managed access to instances in AWS, including on-premises servers, via AWS Systems Manager. By leveraging IAM roles, the SSM Agent ensures that commands and configurations are executed securely, enabling full management of infrastructure across both on-premises and AWS environments. Additionally, communication between the agent and the service remains private, as all traffic flows through established private connections to AWS services, avoiding the public Internet.&lt;/p&gt;

&lt;p&gt;To get it configured start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssm_hybrid_activation_registred     = true
ssm_advanced_instances_tier_enabled = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and upon successful Terraform apply you'll get a guide on how to configure the SSM Agent. When done here's what you get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgv00ecmsjqr04z9whp1m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgv00ecmsjqr04z9whp1m.png" alt="HLD: Part 5 - SSM Agent" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gnoe04ayg0cainoydp2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gnoe04ayg0cainoydp2.png" alt="ISSM Fleet Manager managed nodes" width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb5rxbmsaes83o3xyi7p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyb5rxbmsaes83o3xyi7p.png" alt="SSM Session Manager connection to Pi" width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  EKS Hybrid Nodes continued
&lt;/h2&gt;

&lt;p&gt;The functionalities described and configured above provide a robust foundation for integrating on-premises systems with AWS. They also cover the key prerequisites for connecting an on-premises Kubernetes node to an EKS cluster in AWS.&lt;/p&gt;

&lt;p&gt;The source code does not include specific EKS cluster configuration, as such setups vary by use case. Instead, it assumes an EKS cluster with hybrid node support enabled is already running in a separate VPC. The provided configuration focuses on integrating the cluster with the Transit Gateway and the necessary IAM-related resources.&lt;/p&gt;

&lt;p&gt;AWS provides comprehensive guidance for configuring everything from scratch, especially the CNI-related aspects, which can be found &lt;a href="https://medium.com/r/?url=https%3A%2F%2Fdocs.aws.amazon.com%2Feks%2Flatest%2Fuserguide%2Fhybrid-nodes-overview.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; while most of the non-EKS-specific prerequisites have already been covered in this post.&lt;/p&gt;

&lt;p&gt;And don't forget, we're only simulating this at home!&lt;/p&gt;

&lt;p&gt;To get it configured start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eks_hybrid_nodes_enabled = true
eks_props                = { ... }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and upon successful Terraform apply you'll get a guide on how to configure your node (Pi). When done here's what you get most likely (depending on your individual EKS cluster setup):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyva9y3fqjrqog6apcjpq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyva9y3fqjrqog6apcjpq.png" alt="HLD: Part 6- EKS Hybrid Nodes" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frhihc5gfmxps94xm9tx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frhihc5gfmxps94xm9tx0.png" alt="EKS cluster info &amp;amp; nodes" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz55jnhp3tgel2ssg0zae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz55jnhp3tgel2ssg0zae.png" alt="EKS Pi Node details" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is it!&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;What started as a personal Proof of Concept turned into a hands-on guide for building a hybrid infrastructure that connects on-premises systems with AWS. Using a Raspberry Pi as a simulated on-prem node, this journey explored key AWS services and functionalities required to establish a route to EKS Hybrid Nodes.&lt;/p&gt;

&lt;p&gt;The lessons learned here - from setting up a Site-to-Site VPN to enabling private API access and configuring a hybrid Kubernetes node - showcase practical steps that can inform professional designs. These configurations pave the way for securely running Kubernetes workloads across hybrid environments, leveraging AWS for control plane management while maintaining local resources.&lt;/p&gt;

&lt;p&gt;The benefits of this approach are clear: enhanced security with private connectivity, simplified operations through managed AWS services, and the ability to experiment and learn on a small scale while gaining insights applicable to enterprise-grade scenarios. This PoC not only highlights the potential of hybrid cloud architectures but also demonstrates how such integrations can help modernize on-prem systems and provide flexibility for diverse business needs.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>Databricks on AWS</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Fri, 03 Jan 2025 12:14:50 +0000</pubDate>
      <link>https://forem.com/aws-builders/databricks-on-aws-1hnp</link>
      <guid>https://forem.com/aws-builders/databricks-on-aws-1hnp</guid>
      <description>&lt;p&gt;This is to share my experience around building an enterprise data platform powered by Databricks on AWS. It’s not about the data side of things but purely about the platform architecture and wider configuration aspects.&lt;/p&gt;

&lt;p&gt;While Databricks provides their customers with many different capabilities and features to fulfill a spectrum of needs, and depending on individual requirements, there are still common elements and considerations Databricks users must make their decisions on.&lt;/p&gt;

&lt;p&gt;Just look out for 💡 down below, as you may find the information they are highlighting relevant or even get a head start with using Databricks on AWS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Account and workspaces
&lt;/h2&gt;

&lt;p&gt;It’s not about AWS accounts and Amazon Workspaces. When working with Databricks on AWS, you should quickly learn to be precise when discussing architectures and configurations to avoid unnecessary confusion about what’s what. You’ll see why… mark my words!&lt;/p&gt;

&lt;p&gt;This time it’s about Databricks accounts and workspaces. With the Enterprise Edition (E2) model, Databricks introduced a highly scalable multi-tenant environment, a successor to previous deployment options that have been deprecated. By the way, it’s running on Kubernetes.&lt;/p&gt;

&lt;p&gt;Here’s what it looks like on a very high level:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfynbm5nzmf0osvdvdh3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfynbm5nzmf0osvdvdh3.png" alt="Image description" width="720" height="365"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Databricks&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;💡 The account console is hosted in the US West (Oregon) AWS region, while you choose which AWS region (15 currently) you want to have your workspaces deployed into.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Databricks account&lt;/strong&gt; is used to manage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users and their access to objects and resources,&lt;/li&gt;
&lt;li&gt;Workspaces and cloud resources,&lt;/li&gt;
&lt;li&gt;Metastores — the top-level container for catalogs in Unity Catalog (💡 There can only be a single metastore per account per region),&lt;/li&gt;
&lt;li&gt;Other account-level settings like SSO, SCIM, security controls, various optional features, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;Databricks workspace&lt;/strong&gt;, on the other hand, is a Databricks deployment that can be considered an environment for data engineers to access Databricks assets. To configure a workspace, the following cloud-relevant information must be provided:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Credentials ~ an IAM role&lt;/li&gt;
&lt;li&gt;Storage ~ an S3 bucket&lt;/li&gt;
&lt;li&gt;Storage CMK ~ a KMS key&lt;/li&gt;
&lt;li&gt;Network ~ a VPC with subnets and security group(s)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, it’s up to you how you want to manage the workspaces, but we went with a workspace per environment, meaning every SDLC environment is represented by a distinct Databricks workspace and its corresponding AWS account, having all the necessary services and resources configured accordingly. All that configuration was covered with IaC.&lt;/p&gt;

&lt;h3&gt;
  
  
  Databricks REST APIs
&lt;/h3&gt;

&lt;p&gt;Databricks exposes two APIs, the &lt;a href="https://docs.databricks.com/api/account/introduction" rel="noopener noreferrer"&gt;Account API&lt;/a&gt; and the &lt;a href="https://docs.databricks.com/api/workspace/introduction" rel="noopener noreferrer"&gt;Workspace API&lt;/a&gt;, and so depending on what resource you’re configuring you interact with one of these. You can see many of the methods are listed as Public Preview; however, when you go with IaC you must simply consider them Public/GA as you have no choice but to use and rely on them.&lt;/p&gt;

&lt;p&gt;💡 Not everything makes perfect sense when it comes to what is managed through which API, but maybe that’s just me. An example can be the &lt;a href="https://docs.databricks.com/api/workspace/artifactallowlists" rel="noopener noreferrer"&gt;Artifacts Allowlist&lt;/a&gt;. Namely, it is configurable with the Workspace API while that configuration is considered global as it’s linked to the Unity Catalog, i.e., it applies to all workspaces in your account. Now, having a workspace per environment, you may have different artifacts per environment that need to be whitelisted. In that case, you must bring them together and contain them as a single list. Moreover, when interacting with the Workspace API, you must provide a workspace host URL. Say you don’t have any workspaces yet or, just like in our case, you have workspaces representing distinct environments, and you want to set up the allow list — which workspace URL would you use to configure that global setting?&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;On a high level, the following diagram visualizes what the core components are and how they are spread across the AWS accounts, where one belongs to Databricks and another one belongs to the customer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowpkt56yos8o3f8t8eco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowpkt56yos8o3f8t8eco.png" alt="Image description" width="720" height="851"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Databricks&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As you can imagine, to make both sides interact with each other securely, it all relies on trust between both AWS accounts which is fulfilled with the use of a cross-account IAM role that is granted permissions to launch clusters or manage data in S3 in the customer AWS account. It’s always the same &lt;code&gt;41435176782&lt;/code&gt; AWS account and &lt;code&gt;arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterROle-14S5ZJVKOTYTL&lt;/code&gt; IAM role that will keep popping up in Security Hub findings if not properly marked as trusted. When using the serverless compute, there’s another combination of an AWS account ID and IAM role.&lt;br&gt;
Then, at the network layer, there are whitelisting mechanisms, covered in the following sections down below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication and access control
&lt;/h3&gt;

&lt;p&gt;While Databricks comes with its user store, it’s a common pattern to leverage SSO and configure Databricks with an IdP that is already used in your organization. The &lt;strong&gt;unified login&lt;/strong&gt; option that is now enabled by default allows you to manage one SSO configuration in your account that is used for the account and Databricks workspaces.&lt;/p&gt;

&lt;p&gt;💡 The unified login does not yet support your workspaces having the public access completely disabled and the only way to do that is to contact Databricks to get the unified login disabled for your account and then configure SSO on a per-workspace basis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.databricks.com/en/admin/users-groups/scim/index.html" rel="noopener noreferrer"&gt;SCIM provisioning&lt;/a&gt; feature allows for syncing groups and users from your IdP, like Microsoft Entra ID, and use them in Databricks to grant permissions while following the least privilege principles. Here, make sure you use groups and not individual user accounts to grant permissions.&lt;br&gt;
Service principals on the other hand must be managed locally in Databricks while they still can be members of the SCIM-synced groups.&lt;/p&gt;

&lt;p&gt;Service Principals can use personal access tokens (PAT) or OAuth with automation, while it’s recommended to go with the latter wherever you can. Make sure you verify the available &lt;a href="https://docs.databricks.com/en/dev-tools/auth/index.html" rel="noopener noreferrer"&gt;authentication types&lt;/a&gt; based on the use case.&lt;/p&gt;

&lt;p&gt;💡 For example, currently, Qlik does not support OAuth for Databricks hosted in AWS, while it does for Databricks hosted in Azure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Networking
&lt;/h2&gt;

&lt;p&gt;It’s important to understand that in the E2 Databricks deployment model, the control plane is in the hands of Databricks while you use their web applications to access the account console and your workspaces. That architecture requires all those endpoints to be accessible over the public Internet or rather publicly resolvable.&lt;/p&gt;

&lt;p&gt;For the &lt;strong&gt;Databricks account&lt;/strong&gt; endpoint, there’s only a single option you use for restricting access on the network level and that is the &lt;strong&gt;IP Access List&lt;/strong&gt;.&lt;br&gt;
For the &lt;strong&gt;Workspaces&lt;/strong&gt; endpoints, apart from the IP access list, you can leverage &lt;strong&gt;AWS PrivateLink&lt;/strong&gt;. Whether you decide to use it for the front-end access, it’s up to your requirements but you should definitely use it for the back-end access, i.e., secure cluster connectivity. We went for both!&lt;/p&gt;

&lt;h3&gt;
  
  
  Private-only connectivity
&lt;/h3&gt;

&lt;p&gt;The following diagram nicely visualizes the concept.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1c8b4h4uuepxb0vlte1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1c8b4h4uuepxb0vlte1.png" alt="Image description" width="720" height="888"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: Databricks&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can see that Databricks exposes two APIs privately (SCC Relay API and Workspace API) with the use of the &lt;strong&gt;VPC Endpoint Services&lt;/strong&gt; for you to connect with the &lt;strong&gt;VPC Interface Endpoints&lt;/strong&gt; you configure in your VPCs and whitelist in the Databricks account console.&lt;/p&gt;

&lt;p&gt;💡 The Workspace endpoint is not only used for the front-end access but also REST API and ODBC/JDBC connections; hence, you must realize that by closing the front-end door, you’re restricting all types of connectivity to only whitelisted networks.&lt;/p&gt;

&lt;p&gt;💡 VPC Interface Endpoints don’t have to live in the VPC you have your workspace configured with. This means, as long as your VPC Interface Endpoint is whitelisted (by ID) it can belong to any AWS account and any VPC which enables their &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/centralized-access-to-vpc-private-endpoints.html" rel="noopener noreferrer"&gt;centralization&lt;/a&gt;. However, you can control which endpoints can be used to access a given workspace.&lt;/p&gt;

&lt;p&gt;💡 No matter whether you disable public access to your workspace or not, the &lt;strong&gt;access verification&lt;/strong&gt; process is &lt;strong&gt;always&lt;/strong&gt; the same and &lt;strong&gt;starts with authentication&lt;/strong&gt;. Yes, the IP access list and AWS PrivateLink go second, while the IP access list only applies when accessing workspaces from public IP addresses. All that is influenced by the Databricks control plane architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Delta-sharing
&lt;/h3&gt;

&lt;p&gt;Delta-sharing is an open-source approach to data sharing across data, analytics, and AI developed and implemented by Databricks. Among other things, it allows for sharing data between Databricks customers having their own accounts and workspaces.&lt;/p&gt;

&lt;p&gt;💡 Delta-sharing is happening privately using the AWS backbone and Databricks AWS account, which means that going fully private, i.e., disabling the public access to your workspaces does not affect the ability to use that feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analytics tools
&lt;/h3&gt;

&lt;p&gt;Now, when every tool has its cloud-based option, so do various analytics tools like Power BI or Qlik Sense. I don’t know all of them, but at least these two provide a solution that allows for establishing private connectivity between them and Databricks. That is with the use of &lt;strong&gt;gateways&lt;/strong&gt; — Power BI Gateway and Qlik Data Gateway respectively. Such a gateway must be deployed in a VPC in the customer AWS account in a way that it can connect to Databricks privately with the use of that front-end VPC Interface Endpoint, while the connection to a given analytics tool is established from the gateway using encryption and whitelisting mechanisms, and so no endpoint on the AWS side is exposed publicly.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Graviton
&lt;/h2&gt;

&lt;p&gt;Why pay more if you can pay less?&lt;/p&gt;

&lt;p&gt;💡 AWS Graviton-powered EC2 instances for Databricks clusters are supported; however, &lt;a href="https://docs.databricks.com/en/compute/configure.html#graviton-limitations" rel="noopener noreferrer"&gt;limitations&lt;/a&gt; may make it useless in some use cases so make sure you know what they are before you calculate your ARR based on the assumption you can use them everywhere for anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logging and monitoring with CloudWatch
&lt;/h2&gt;

&lt;p&gt;To make sure all relevant logs go to CloudWatch where they can be retained and analyzed, especially when cluster nodes are usually transient, but also to make use of non-default EC2 performance metrics, you can leverage &lt;a href="https://docs.databricks.com/en/init-scripts/cluster-scoped.html" rel="noopener noreferrer"&gt;cluster-scoped init scripts&lt;/a&gt; to install and configure the &lt;strong&gt;CloudWatch agent&lt;/strong&gt;.&lt;br&gt;
Those init scripts must be whitelisted with the aforementioned Artifacts Allowlist before they can be used.&lt;/p&gt;

&lt;p&gt;💡 Considering the compute plane can be configured with an instance profile one can leverage that fact to configure a custom logger and having properly set up IAM policies stream logs directly to custom log groups.&lt;/p&gt;

&lt;p&gt;Databricks also produces &lt;strong&gt;audit logs&lt;/strong&gt; that can be stored in &lt;strong&gt;S3&lt;/strong&gt;. From there they can be pushed to CloudWatch (or elsewhere) for analysis and event management purposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats 💡
&lt;/h2&gt;

&lt;p&gt;Here are some things it’s good to know sooner rather than later to avoid headaches or other surprises when using Databricks (in general):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It’s constantly evolving, so pay attention to the &lt;a href="https://docs.databricks.com/en/release-notes/product/index.html" rel="noopener noreferrer"&gt;release notes&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For the same reason, documentation is sometimes unclear as they try to make it cover both the new things and features that have now been deprecated while some customers are still using them.&lt;/li&gt;
&lt;li&gt;Not every feature is available in all regions so make sure you know the &lt;a href="https://docs.databricks.com/en/resources/feature-region-support.html" rel="noopener noreferrer"&gt;coverage&lt;/a&gt; before picking one. For example, currently, serverless compute features are not yet available in the London region.&lt;/li&gt;
&lt;li&gt;Make sure you know the difference between roles, entitlements permissions, and grants — it can become confusing sometimes.&lt;/li&gt;
&lt;li&gt;Using SQL to grant a given principal permissions to Databrick objects requires running compute resources as every statement must be run somewhere, don’t expect to use direct queries to the control plane; it’s not Kubernetes.&lt;/li&gt;
&lt;li&gt;Follow &lt;a href="https://docs.databricks.com/en/lakehouse-architecture/security-compliance-and-privacy/best-practices.html" rel="noopener noreferrer"&gt;Databricks security best practices&lt;/a&gt; and analyze carefully any weaknesses in your architecture that allow for data exfiltration. For example, make sure any egress traffic is controlled and inspected by using AWS Network Firewall or any other next-generation firewall solution.&lt;/li&gt;
&lt;li&gt;In case of issues or doubts, don’t hesitate to contact Databricks and get a hold of a Solution Architect who can help you understand things and even give a hint on how other customers go about some aspects.&lt;/li&gt;
&lt;li&gt;When using Terraform, refer to the &lt;a href="https://registry.terraform.io/providers/databricks/databricks/latest/docs" rel="noopener noreferrer"&gt;guides&lt;/a&gt; available in the registry, just don’t use them blindly and adapt to your design and standards.&lt;/li&gt;
&lt;li&gt;It’s not always a good idea to manage Databricks classic compute clusters themselves with IaC unless their configuration can remain unchanged, instead, it’s worth considering keeping their definition in source control and getting them deployed with CI/CD running Databricks CLI while still managing all their dependencies in IaC.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, I’m not planning to keep this post updated with features availability changes, so everything here can only be considered relevant at the time of its publication 😉&lt;/p&gt;

&lt;h2&gt;
  
  
  Lastly…
&lt;/h2&gt;

&lt;p&gt;There’s more to comprehend for a platform solution architect than you can imagine. To make it right, I’d suggest having an AWS expert working alongside a data architect, as that’s the best way to get all the different requirements and best practices considered and implemented properly.&lt;/p&gt;

&lt;p&gt;To help with that journey, you can have a look at the &lt;a href="https://www.databricks.com/learn/training/certification" rel="noopener noreferrer"&gt;Databricks certification program&lt;/a&gt; and &lt;a href="https://www.databricks.com/learn/training/login" rel="noopener noreferrer"&gt;Partner Academy&lt;/a&gt; to help you learn Databricks and optionally even get certified or achieve an accreditation.&lt;/p&gt;

</description>
      <category>databricks</category>
      <category>data</category>
      <category>aws</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Re-platforming to AWS Lambda container images</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Fri, 26 May 2023 20:00:00 +0000</pubDate>
      <link>https://forem.com/aws-builders/re-platforming-to-aws-lambda-container-images-1hn2</link>
      <guid>https://forem.com/aws-builders/re-platforming-to-aws-lambda-container-images-1hn2</guid>
      <description>&lt;p&gt;This is an attempt at how event-driven backend processing services can potentially be migrated with minimum changes from a container orchestration service to AWS Lambda powered by container images.&lt;br&gt;
The driver for this story was one of my recent migration projects where a fleet of services running on an old EKS version had to be migrated to a brand-new environment running the latest versions of everything. During that journey, one of the weaknesses that got revealed in the architecture of the system in question was insufficient resistance to data loss due to the lack of loose coupling capabilities. That made me think… firstly, how a particular backend processing service can be redesigned accordingly, and secondly, whether we need EKS for this at all.&lt;br&gt;
For the sake of this story let’s say the former has already been solved and so I’m going to focus on the latter as one of the exercises I did was checking how difficult &lt;strong&gt;re-platforming a containerized service from Amazon ECS or Amazon EKS to AWS Lambda with container images support&lt;/strong&gt; is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vr0t9ua2t9j3qoe3uop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vr0t9ua2t9j3qoe3uop.png" alt="AWS Lambda container images" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  EKS, ECS, and Lambda
&lt;/h2&gt;

&lt;p&gt;It’s important to emphasize again that I’m focusing on an event-driven backend processing service which is why this comparison is focused on a specific, but still common, use case rather than a holistic capabilities overview of these services. EKS, ECS, and Lambda are not mutually exclusive and each of these services is better at one thing while less optimal at the other. It’s usually a game of trade-offs anyway but depending on a scenario, workload type, number of services in an environment, etc., to come up with the best architecture those services from AWS can be combined or implemented interchangeably. My observations show that Lambda is barely considered whenever containerized workloads are in use, if at all. These days, containers usually equal Kubernetes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Event-driven architecture
&lt;/h3&gt;

&lt;p&gt;Conceptually, I’m a big fan of this type of architecture. Do only when there’s something to do and upon getting notified about it. Once done, turn off the lights and wait for another alarm to be woken up. Repeat forever. It’s efficient, sustainable, and cost-effective.&lt;br&gt;
While Lambda along with other serverless services is designed to be at the heart of event-driven solutions, both EKS and ECS can be used to deal with such requirements, however, not always efficiently enough and not without additional overheads or complexity associated.&lt;/p&gt;
&lt;h3&gt;
  
  
  Scaling
&lt;/h3&gt;

&lt;p&gt;One of the crucial capabilities of a service that is used to support event-driven architecture is dynamic and adaptive scaling. Lambda is known for its native scaling capabilities and handling of in-flight requests in parallel that can be controlled and supported with the use of the concurrency settings that were comprehensively described &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. A Lambda function can be triggered by the majority of relevant AWS services, if not all of them, and therefore perfectly adapts to unpredictable traffic events.&lt;br&gt;
In the world of Kubernetes, with EKS, there are add-ons (autoscalers) like &lt;em&gt;Keda&lt;/em&gt; that enable similar capabilities with the use of supported &lt;a href="https://keda.sh/docs/latest/scalers/" rel="noopener noreferrer"&gt;scalers&lt;/a&gt; and allow for pod scaling based on external metrics.&lt;br&gt;
ECS on the other hand supports so-called &lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-autoscaling-stepscaling.html" rel="noopener noreferrer"&gt;step scaling&lt;/a&gt; that leverages CloudWatch alarms and adjusts the number of service tasks in steps based on the size of the alarm breach. Interestingly, that type of scaling is not recommended as per the ECS Developer Guide which is why I decided to give it a try as a part of a PoC for this story. An alternative approach that I’ve seen people implementing is a Lambda function triggered periodically that watches a given external metric and based on its value manipulates the number of running ECS tasks. Even then, it’s not flexible and automatic as thresholds must be predefined.&lt;/p&gt;
&lt;h2&gt;
  
  
  Migration, modernization, re-platforming
&lt;/h2&gt;

&lt;p&gt;While migrating more and more workloads to the cloud remains high on the list of initiatives organizations want to make progress on, according to &lt;a href="https://www.flexera.com/about-us/press-center/flexera-2023-state-of-the-cloud-report" rel="noopener noreferrer"&gt;Flexera 2023 State of the Cloud Report&lt;/a&gt; cost savings is the top one for the seventh year in a row. Moreover, 71% of heavy cloud users want to optimize the existing use of the cloud.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl6hm5d7hxgr4dyp8q21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl6hm5d7hxgr4dyp8q21.png" alt="Flexera1" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx14ajwuc7t7d2daso16h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx14ajwuc7t7d2daso16h.png" alt="Flexera2" width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the same time, cloud spend along with security and expertise are the top three cloud challenges recognized.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8yelqlfq1u84sdjq8m5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk8yelqlfq1u84sdjq8m5.png" alt="Flexera3" width="800" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Therefore, a potential path leading to cost optimization that simultaneously can elevate security and doesn’t necessarily require extensive expertise is modernizing with the use of AWS Lambda.&lt;br&gt;
Depending on where you are with your cloud adoption journey and what the key business drivers are, you should choose the most appropriate migration strategy for your workloads. &lt;strong&gt;Modernization&lt;/strong&gt; is something you can go for straight away or consider as the next step after re-hosting first.&lt;br&gt;
&lt;strong&gt;Re-platforming&lt;/strong&gt; can turn out to be a golden mean as it allows for reshaping by leveraging cloud-native capabilities &lt;strong&gt;without modifying the application source code or its core architecture&lt;/strong&gt;. That way you can benefit from increased flexibility and resilience with reduced time and financial investments. Obviously, it doesn’t mean it is the way and any decision on implementing any strategy should be supported by relevant analysis.&lt;/p&gt;
&lt;h2&gt;
  
  
  PoC: ECS to Lambda
&lt;/h2&gt;

&lt;p&gt;Let’s imagine a scenario where the goal is to re-platform from ECS to Lambda with zero or minimal changes to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application code (&lt;em&gt;Python&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;container image build process (&lt;em&gt;Dockerfile&lt;/em&gt;)
to reduce risks associated with the platform change itself.
Why ECS and not EKS? Because for some reason I haven’t got much experience with it hence considered this a chance to see what it’s capable of in the given scenario and let myself expand my horizons.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  HLD
&lt;/h3&gt;

&lt;p&gt;It’s about re-platforming from this…&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4adg4jw48elqz5ge3pk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4adg4jw48elqz5ge3pk.png" alt="ECS HLD" width="800" height="587"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;to this…&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6n68tygbpyboxcn397s0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6n68tygbpyboxcn397s0.png" alt="Lambda HLD" width="800" height="260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application code&lt;/strong&gt;&lt;br&gt;
What this sample code below does is it receives messages from a given SQS queue and deletes them immediately. That’s it, just enough for this PoC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;QUEUE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQS_QUEUE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;MAX_NUM_MSGS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQS_MAX_NUM_MSGS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;VISIBILITY_TO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQS_VISIBILITY_TO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;WAIT_SECONDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQS_WAIT_SECONDS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;sqs_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;get_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;QUEUE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;MaxNumberOfMessages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MAX_NUM_MSGS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;VisibilityTimeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VISIBILITY_TO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;WaitTimeSeconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WAIT_SECONDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

    &lt;span class="n"&gt;messages_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number of messages received: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;messages_length&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message body: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;del_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;QUEUE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReceiptHandle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ReceiptHandle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deletion status code: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;del_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ResponseMetadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HTTPStatusCode&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dockerfile &amp;amp; RIC
&lt;/h3&gt;

&lt;p&gt;The only thing that had to be added to the existing Dockerfile, for Lambda to work with a container image, was installing the &lt;strong&gt;AWS runtime interface client&lt;/strong&gt; (RIC) for Python — one additional line still keeping the image generic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM python:3.10

WORKDIR /app

COPY requirements.txt  .
RUN pip install -r requirements.txt

RUN pip install awslambdaric

COPY app.py .

CMD ["python", "app.py"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RIC is an implementation of the Lambda runtime API that allows for extending a base image to become Lambda-compatible and is represented by an interface that enables receiving and sending requests from and to the AWS Lambda service. While it can be installed for Python, Node.js, Go, Java, .NET, and Ruby, AWS also delivers Amazon Linux-powered container images for those runtimes &lt;a href="https://gallery.ecr.aws/lambda/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Even though they might be bigger than standard (&lt;em&gt;slim&lt;/em&gt;) base images they are pro-actively cached by the Lambda service meaning they don’t have to be pulled entirely every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda function configuration
&lt;/h3&gt;

&lt;p&gt;From the Lambda function configuration perspective, to instruct Lambda how to run the application code both &lt;em&gt;ENTRYPOINT&lt;/em&gt; and &lt;em&gt;CWD&lt;/em&gt; settings must be overridden as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2toq1p6n3sc2st0d4lt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2toq1p6n3sc2st0d4lt.png" alt="Lambda config" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that’s it. It’s ready to be invoked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parallel testing
&lt;/h3&gt;

&lt;p&gt;For that purpose, I have used the SQS queue seeder Python code below and ran it from Lambda. Based on the settings provided, it generates random strings and sends them as messages to as many queues as you want, in this case, there were two, one for ECS and another one for Lambda.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# sqs_seeder.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="c1"&gt;# SQS queues
&lt;/span&gt;&lt;span class="n"&gt;SQS_QUEUE_URLS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="c1"&gt;# queue for ECS,
&lt;/span&gt;    &lt;span class="c1"&gt;# queue for Lambda,
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Settings
&lt;/span&gt;&lt;span class="n"&gt;NUMBER_OF_MSG_BATCHES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;MAX_MSGS_PER_BATCH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;MIN_SECONDS_BETWEEN_MSG_BATCHES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;MAX_SECONDS_BETWEEN_MSG_BATCHES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;

&lt;span class="n"&gt;sqs_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sqs_send_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msgs_count&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;MSGS_COUNT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msgs_count&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[INFO] Number of messages in batch: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MSGS_COUNT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;MSGS_COUNT&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;MESSAGE_CHAR_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;MESSAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ascii_uppercase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MESSAGE_CHAR_SIZE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;queue_url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;SQS_QUEUE_URLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[INFO] Sending message &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MESSAGE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;queue_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;queue_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;MessageBody&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[INFO] Message sent, ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MessageId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ERROR] Queue URL: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;MSGS_COUNT&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;NUMBER_OF_MSG_BATCHES&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[INFO] Number of message batches: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;NUMBER_OF_MSG_BATCHES&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;BATCH_NUMBER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;NUMBER_OF_MSG_BATCHES&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[INFO] Batch number #&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BATCH_NUMBER&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;NUMBER_OF_MSGS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MAX_MSGS_PER_BATCH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;sqs_send_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NUMBER_OF_MSGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;BATCH_NUMBER&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;NUMBER_OF_MSG_BATCHES&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;NUMBER_OF_MSG_BATCHES&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;SLEEP_TIME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MIN_SECONDS_BETWEEN_MSG_BATCHES&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MAX_SECONDS_BETWEEN_MSG_BATCHES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[INFO] Sleeping &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SLEEP_TIME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SLEEP_TIME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are the ECS service step scaling policies configured that didn’t do what I initially thought they would do but more on this later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwytixr8tulnn46w9rhbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwytixr8tulnn46w9rhbf.png" alt="ECS Autoscaling config" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Lambda function was configured with reserved concurrency set to 1 to try to give ECS a head start.&lt;/p&gt;

&lt;p&gt;The screenshot below shows relevant metrics illustrating how messages were being loaded, received, and deleted from the two individual queues.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnix9hg62lrer5me5a864.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnix9hg62lrer5me5a864.png" alt="ECS vs. Lambda" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clearly, something wasn’t right about ECS step scaling. It’s just not what I was hoping for. I thought it would be continuously adding 5 tasks as the number of messages in the queue grows but that wasn’t the case. According to the &lt;strong&gt;&lt;em&gt;scale-out&lt;/em&gt;&lt;/strong&gt; action, for the second run, I expected to get 15 tasks while the DesiredTaskCount remained at the level of 5, so it looks like a one-off scaling operation that is not adaptive over time. I admit I got misled by the Set/Add actions available there. I should have done better at reading and managing my expectations. Now I see why it wasn’t recommended.&lt;br&gt;
Either way, it is also crystal clear when looking at the metrics that Lambda scaling is seamless and fast while ECS needs time before the CloudWatch alarm kicks in. That is because for AWS metrics the minimum period is 1 minute and so the reaction is not immediate.&lt;br&gt;
Finally, not sure why but the &lt;strong&gt;&lt;em&gt;scale-in&lt;/em&gt;&lt;/strong&gt; action didn’t work, namely, it did not set the number of tasks to 0 even though the alarm history was showing that the action was successfully executed. I had to do it by hand.&lt;/p&gt;

&lt;p&gt;When it comes to SQS itself, I learned one thing that seems important from the performance and efficiency point of view. Namely, having the &lt;em&gt;MaxNumberOfMessages&lt;/em&gt; set to 10 (max) for the ReceiveMessage API call when the number of messages is not big enough, it will still return a single message most of the time. More on that &lt;a href="https://github.com/boto/boto3/issues/324" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;While I had no intention of going deep into ECS step scaling and demystifying things, even if it worked the way I expected it would still perform worse than Lambda. However, &lt;strong&gt;this is not about judging but about knowing your options and their constraints&lt;/strong&gt;. It is also about realizing caveats relevant to the integration points and their limitations too. While one service may seem to be fit for purpose, there might be something else that will eventually have a decisive impact on the final design.&lt;br&gt;
The takeaway is that thanks to containers' portability such a comparison doesn’t necessarily have to be difficult to execute or requires much effort to draw some conclusions that will help in decision-making. A Lambda function powered by a container image is probably one of the easiest things to configure and so as long as there are no obvious obstacles, quite the opposite, when there are indications of potential improvement, why wouldn’t you give it a chance? The worst thing that can happen is that you’ll end up with additional arguments for your decision.&lt;br&gt;
Again, know your options! Don’t hesitate to try things out by running PoCs to find the best solution and remember that modernization is a continuous improvement process that never ends. Technology, business requirements, functionalities, and KPIs keep on changing over time, therefore, try to continuously assess your solutions against the AWS Well-Architected Framework and strive for optimization.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>containers</category>
      <category>devops</category>
    </item>
    <item>
      <title>Amazon VPC Lattice — feasibility study</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Thu, 04 May 2023 19:36:24 +0000</pubDate>
      <link>https://forem.com/aws-builders/amazon-vpc-lattice-feasibility-study-d1i</link>
      <guid>https://forem.com/aws-builders/amazon-vpc-lattice-feasibility-study-d1i</guid>
      <description>&lt;p&gt;Amazon VPC Lattice has now become generally available (March 2023) and finally, I managed to give it a try and see whether it would meet the expectations it had aroused back at AWS re:Invent 2022. There were quite a few of them, e.g. having the ability to avoid using VPC peerings or VPC service endpoints to facilitate cross-account, cross-VPC applications communication while separating the core networking management from individual services configuration across the estate, as well as easily defining and attaching services to a wider application networking mesh.&lt;/p&gt;

&lt;p&gt;Just in case…&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Amazon VPC Lattice is a fully managed application networking service that you use to connect, secure, and monitor all of your services across multiple accounts and virtual private clouds (VPCs).” ~ AWS&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Need to know more? — check it out &lt;a href="https://docs.aws.amazon.com/vpc-lattice/latest/ug/what-is-vpc-service-network.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; since the rest of this story assumes you know the basics and is all about presenting the results of my early experimentation with VPC Lattice as well as sharing my findings and opinions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof of concept
&lt;/h2&gt;

&lt;p&gt;The idea was simple — I wanted to test out VPC Lattice functional capabilities as much as possible with just a light touch on the non-functional ones. And so I ended up building this…&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatlf92ghqpkiqyagtlur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatlf92ghqpkiqyagtlur.png" alt="Image description" width="800" height="1057"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What you can see above is a set-up containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;two AWS accounts being a part of the same AWS organization&lt;/li&gt;
&lt;li&gt;two VPC Lattice service networks where &lt;strong&gt;Test #1&lt;/strong&gt; is shared through RAM&lt;/li&gt;
&lt;li&gt;two VPC Lattice services where &lt;strong&gt;test-svc-1&lt;/strong&gt; is shared through RAM for association with the &lt;strong&gt;Test #2&lt;/strong&gt; service network&lt;/li&gt;
&lt;li&gt;three VPCs associated with service networks where &lt;strong&gt;VPC .103&lt;/strong&gt; is client-only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;test-svc-1&lt;/strong&gt; service with several different target types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: those Lambda-powered curlers in all three VPCs are there to facilitate sending custom HTTP requests to any VPC Lattice service for testing purposes. Need that Lambda function source code to try it yourself? It’s not sophisticated but does the job and here it is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import http.client
import ssl
from urllib.parse import urljoin

###################################################################################
REQ_PROTOCOL = '&amp;lt;???&amp;gt;'                            # options: 'http', 'https'
REQ_HOST     = '&amp;lt;???&amp;gt;'                            # e.g. myservice.mydomain.aws
REQ_PATH     = '/'                                # e.g. '/lambda80', '/lambda443'
REQ_HEADERS  = {                                  # a map of custom headers
    # "My-Header": "lambda-hh",                       
    "Content-Type": "application/json"
}
###################################################################################

def get(protocol, host, path, headers, event):
    """GET request"""

    if protocol == "https":
        conn = http.client.HTTPSConnection(host, context = ssl._create_unverified_context())
    elif protocol == "http":
        conn = http.client.HTTPConnection(host)
    else:
        return "[ERR] Unknown protocol provided!"

    try:
        conn.request('GET', path, json.dumps(event), headers)
        res = conn.getresponse()
        location_header = res.getheader("location")

        if location_header is not None:
            location = urljoin(path, location_header)
            # print(location)
            return get(protocol, host, location, headers, event)

        data = res.read()
    except Exception as error:
        return f"[ERR] {str(error)}"

    return data

def lambda_handler(event, context):
    """Lambda handler"""

    response = get(
        REQ_PROTOCOL,
        REQ_HOST,
        REQ_PATH,
        REQ_HEADERS,
        event
    )

    return {
        "statusCode": 200,
        "body": response,
        "headers": {
            "Content-Type": "application/json"
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  VPC Lattice service
&lt;/h2&gt;

&lt;p&gt;That &lt;strong&gt;test-svc-1&lt;/strong&gt; is the main element though and most of the focus was put on the VPC Lattice service, service networks configuration aspects as well as cross-account and cross-VPC communication to services. It exposes privately several microservices under different combinations of protocols, ports, and paths represented by various AWS compute services and configured with the following target groups (TGs):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 in ASG&lt;/li&gt;
&lt;li&gt;ALB with EC2 in ASG&lt;/li&gt;
&lt;li&gt;ALB with ECS powered by Fargate&lt;/li&gt;
&lt;li&gt;Lambda functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apart from one Lambda function, everything else runs in private subnets across multiple availability zones.&lt;/p&gt;

&lt;p&gt;Guess what!? It all worked very nicely!&lt;br&gt;
But hey! Doesn’t that configuration and its elements look any familiar?&lt;br&gt;
To me, it did hence I’m going to risk the following statement…&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Amazon VPC Lattice service is an implementation of a private Application Load Balancer with cross-account- and auth-related features in mind.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  VPC Lattice service vs. ALB
&lt;/h3&gt;

&lt;p&gt;First of all, VPC Lattice is meant to satisfy application layer load-balancing with weighted targets and blue/green (B/G) deployment support.&lt;br&gt;
Moreover, let’s have a look at the target group’s target types available in both cases:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8tvm0xk6bkixr5nojk77.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8tvm0xk6bkixr5nojk77.png" alt="VPC Lattice service TG vs. EC2 (ALB) TG configuration options" width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see both TGs support pretty much the same target types with only small differences which doesn’t necessarily mean a given option is missing. E.g. even though the &lt;strong&gt;Amazon EC2 Auto Scaling&lt;/strong&gt; is not explicitly mentioned in the Instances section I managed to successfully attach an ASG as per the diagram above because that option simply exists:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivlknqmbcqf8syck4v4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivlknqmbcqf8syck4v4i.png" alt="Image description" width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yes, that same ASG can be attached at the same time to an ALB and a VPC Lattice service. Nothing surprising here as a given service may need to be accessible not only privately to another service but also to clients over a public network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffv7hh6vdyk8ysnkbmpck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffv7hh6vdyk8ysnkbmpck.png" alt="Image description" width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Moreover, the VPC Lattice service routing configuration consists of listeners and rules just like in the case of ALB, however, all traffic can only be forwarded with no manipulation whatsoever.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyn7lvcbqettpp0mhglx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyn7lvcbqettpp0mhglx.png" alt="Image description" width="800" height="803"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, the costs of just running a VPC Lattice service vs. an ALB are almost identical. For example, in the Ireland (eu-west-1) region, it’s $0.0275 vs. $0.0252 per hour.&lt;/p&gt;
&lt;h3&gt;
  
  
  EKS integration
&lt;/h3&gt;

&lt;p&gt;I believe the Gateway API and Amazon EKS form a more specific use case that deserves a separate touch, therefore it’s not in the scope of this study.&lt;br&gt;
For those interested though, I can tell that there is an &lt;strong&gt;AWS Gateway API controller&lt;/strong&gt; that is meant to let you connect services across different EKS clusters by leveraging Amazon VPC Lattice, and more info on how to do that can be found &lt;a href="https://www.gateway-api-controller.eks.aws.dev/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Admin vs. Developer
&lt;/h2&gt;

&lt;p&gt;A key characteristic when announcing VPC Lattice was that it finally enables the separation of duties between network administrators and developers who can now freely define and manage services themselves.&lt;br&gt;
This reminds me of times shortly after the AWS Lambda introduction and foreshadowing of the no-Ops era. Time will tell also in this case whether no admins involvement in configuring VPC Lattice services is doable or not.&lt;br&gt;
Either way, the idea is that developers simply get a VPC to develop and define their services so then they can share them with admins who control VPC Lattice service networks by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;associating services and VPCs to networks based on requirements,&lt;/li&gt;
&lt;li&gt;sharing service networks with AWS accounts or organizations,&lt;/li&gt;
&lt;li&gt;enforcing authentication on service network access.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Sharing
&lt;/h3&gt;

&lt;p&gt;Both VPC Lattice services and service networks can be shared with the use of AWS Resource Access Manager (RAM). While services are shared to be associated with different service networks, service networks are shared so that other principals can associate their VPCs and communicate with services associated with those networks.&lt;br&gt;
To better understand what can and cannot be done as a shared resource owner and/or consumer scan the “&lt;a href="https://docs.aws.amazon.com/vpc-lattice/latest/ug/sharing.html#sharing-perms" rel="noopener noreferrer"&gt;Responsibilities and permissions for shared resources&lt;/a&gt;” section in docs as the information contained there can become very helpful when designing more complex, enterprise-grade architectures and strategies.&lt;/p&gt;
&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;The rules defining allowed network communication between individual services living in VPCs are applied with the use of &lt;strong&gt;security groups&lt;/strong&gt; (SGs) configured for every VPC to service network association. VPC Lattice IPv4 and IPv6 managed prefix lists on the other hand simplify the other part of that set-up that involves clients and targets security rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkp969znxv2bu2f9tjzk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkp969znxv2bu2f9tjzk.png" alt="Image description" width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another element of the wider security are &lt;strong&gt;auth policies&lt;/strong&gt; that can be applied on both the service network and service level. They are represented by IAM policy documents and are meant to control in a more granular way what principal has access to which service or a group of services.&lt;br&gt;
Both SGs and auth policies are optional but recommended.&lt;/p&gt;
&lt;h3&gt;
  
  
  Logging and monitoring
&lt;/h3&gt;

&lt;p&gt;While all out-of-the-box features are nicely described in the docs, one thing worth emphasizing is that both VPC Lattice service and service network logs can be streamed concurrently to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudWatch Log Group&lt;/li&gt;
&lt;li&gt;S3 bucket&lt;/li&gt;
&lt;li&gt;Kinesis Data Firehose delivery stream&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Awesome! Would love to see that for any AWS service, including the ALB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3mji0h2f58al0tiokzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo3mji0h2f58al0tiokzq.png" alt="[test-svc-1] CloudWatch Logs Insights query output" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  VPC Lattice wrap-up
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Caveats
&lt;/h3&gt;

&lt;p&gt;As Amazon VPC Lattice service is still new, there are some caveats and limitations one should know about. One of them claimed as temporary is the ability to create only an exact match path condition (case insensitive) listener rules in the console. To configure the HTTP match condition for my PoC, I had to use the following AWS CLI command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ aws vpc-lattice create-rule \
--name lambda-hh \
--service-identifier svc-0ce77bf32833f5b5b \
--listener-identifier listener-0f287b2d41f2cb905 \
--action '{ "forward": { "targetGroups": [ { "targetGroupIdentifier": "tg-0096d77adfb1bcd29", "weight": 1 } ] } }' \
--match '{ "httpMatch": { "headerMatches": [ { "caseSensitive": false, "match": { "contains": "lambda-hh" }, "name": "My-Header" } ] } }' \
--priority 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BTW, don’t think you can match the Host header with that rule — it’s not like an ALB with host-based routing.&lt;br&gt;
However, that rule creation through the API resulted in the associated &lt;strong&gt;listener becoming uneditable&lt;/strong&gt; from the console anymore (see screenshot above, the “Edit listener” button is greyed out).&lt;/p&gt;

&lt;p&gt;Just because of the abilities delivered with VPC Lattice your appetite for more complex and sophisticated scenarios may grow so don’t forget &lt;strong&gt;every VPC can be associated with a single service network at the same time&lt;/strong&gt;! Therefore, it’s important to design your architecture accordingly.&lt;/p&gt;

&lt;p&gt;When working with &lt;strong&gt;Lambda functions as targets&lt;/strong&gt;, as long as your function doesn’t require access to your custom VPC resources there’s no need to configure it with a VPC. Even when you send a request via a VPC Lattice service to a Lambda function set up with a VPC it won’t use the associated ENI in that VPC for communication. Instead, it will always communicate with your function via the Lambda API in the region where it is located, and what is clearly visible above on that CW Logs Insights query output screenshot — see entries without the destinationVpcId.&lt;br&gt;
Need more details on that? Have a look at my “&lt;a href="https://medium.com/faun/aws-lambda-security-paradox-3002475dbe97" rel="noopener noreferrer"&gt;Lambda security paradox&lt;/a&gt;” story from 2019.&lt;/p&gt;

&lt;p&gt;There is also one thing that I found very easy to miss. Namely, even though the docs say you should have rules allowing traffic from clients to VPC Lattice one may forget to &lt;strong&gt;whitelist &lt;em&gt;local&lt;/em&gt; clients living in the VPC associated with the service network&lt;/strong&gt; while only allowing those on the other side of the network. In other words, when configuring a client (like that &lt;strong&gt;test-svc-curler&lt;/strong&gt; Lambda function in the &lt;strong&gt;VPC .103&lt;/strong&gt;) you must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[Lambda SG] allow outbound traffic to the VPC Lattice prefix list&lt;/li&gt;
&lt;li&gt;[VPC SG] allow inbound traffic from the Lambda SG&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Last but not least, make sure you realize what the default &lt;strong&gt;service quotas&lt;/strong&gt; are and which ones can be adjusted upon request.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1stwfnqpg6afpv5ziee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1stwfnqpg6afpv5ziee.png" alt="Image description" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Expectations
&lt;/h3&gt;

&lt;p&gt;After watching the re:Invent introductory video I expected a bit more when it comes to path-based routing. Namely, I thought it would allow for defining the entire application routing at the VPC Lattice services level regardless of the target type. Having forwarding as the only available action, when using EC2 or ECS as targets, that path is always forwarded to the backend that must consider that information and know how to deal with such requests and not throw a 404.&lt;br&gt;
While that is not a problem when using Lambda and Gateway API for EKS, I thought it would be great to avoid having to keep routes mapping in sync between infrastructure and application code, especially when they may be kept in different repositories and have independent deployment pipelines.&lt;/p&gt;

&lt;p&gt;And hey, where’s private Amazon API Gateway target support!?&lt;/p&gt;

&lt;h3&gt;
  
  
  The good, the great, and the awesome!
&lt;/h3&gt;

&lt;p&gt;Either way, I have no doubt that Amazon VPC Lattice is a superb improvement over what had to be put in place to satisfy similar private, cross-account, cross-VPC service-to-service comms. While there’s always room for enhancement and more features, that I’m sure AWS will be introducing over time and based on customers' feedback, it has already made things easier by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simplifying service-to-service cross-VPC comms,&lt;/li&gt;
&lt;li&gt;mitigating the IP overlap issue,&lt;/li&gt;
&lt;li&gt;enhancing service-to-service comms security,&lt;/li&gt;
&lt;li&gt;facilitating B/G deployments or A/B testing,&lt;/li&gt;
&lt;li&gt;supporting migration and modernization activities,&lt;/li&gt;
&lt;li&gt;and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, a great thing about VPC Lattice is that services can be associated with many service networks at the same time providing maximum flexibility and extensibility. Can’t wait to start using it on future projects!&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>networking</category>
      <category>aws</category>
      <category>devops</category>
    </item>
    <item>
      <title>Amazon EKS with Terraform and GitOps in minutes</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Mon, 28 Nov 2022 11:17:27 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-eks-with-terraform-and-gitops-in-minutes-2ncp</link>
      <guid>https://forem.com/aws-builders/aws-eks-with-terraform-and-gitops-in-minutes-2ncp</guid>
      <description>&lt;p&gt;This one is simply a result of a need that I had and that was about getting a fully functional, flexible, and secure Amazon EKS cluster set up in under half an hour to be able to test anything asap. For that, I did not want to spend too much time developing IaC myself as there are so many great sources out there that are worth supporting rather than reinventing the wheel. The force is there in the community and as an AWS Community Builder I came across something that met my expectations hence I’m sharing my experience hoping you may find it helpful too.&lt;/p&gt;

&lt;p&gt;It is meant to get you your EKS cluster while you can go buy yourself a coffee ☕️&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwwn67eykcgq89837je9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwwn67eykcgq89837je9.png" alt="Image description" width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This time I will start the other way around and go straight away to the solution while context and other details can be found down below.&lt;br&gt;
The only thing to reveal at this stage is that I’m leveraging &lt;a href="http://aws-ia.github.io/terraform-aws-eks-blueprints" rel="noopener noreferrer"&gt;Amazon EKS Blueprints for Terraform&lt;/a&gt; 🚀&lt;/p&gt;
&lt;h2&gt;
  
  
  MVP
&lt;/h2&gt;

&lt;p&gt;While one can use the flexibility of the EKS Blueprints solution to set things up in many different ways and depending on individual requirements, I’ve got the minimal/initial configuration I start with, and that consists of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the control plane with &lt;em&gt;whitelisted&lt;/em&gt; public access,&lt;/li&gt;
&lt;li&gt;the data plane (spot EC2 instances) communicating with the control plane privately,&lt;/li&gt;
&lt;li&gt;all EKS-managed add-ons enabled and using the most recent versions,&lt;/li&gt;
&lt;li&gt;ArgoCD publicly accessible (&lt;em&gt;whitelisted&lt;/em&gt;) through an ALB configured with a Route53 domain and an ACM certificate,&lt;/li&gt;
&lt;li&gt;a set of additional add-ons deployed with the use of ArgoCD and following the GitOps approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following extra add-ons are enabled by default:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cluster autoscaler&lt;/li&gt;
&lt;li&gt;AWS load balancer controller&lt;/li&gt;
&lt;li&gt;External DNS&lt;/li&gt;
&lt;li&gt;FluentBit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7ce75r4kf0av8isbtnl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7ce75r4kf0av8isbtnl.png" alt="Image description" width="800" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s the code that sets everything up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sebolabs/eks-tf-gitops/tree/release-1-0-0" rel="noopener noreferrer"&gt;EKS-TF-GITOPS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s opinionated, however, I believe it’s a perfect starting point where you have a fully functional Kubernetes cluster with GitOps support and can immediately start deploying and testing anything you want.&lt;/p&gt;

&lt;p&gt;The Terraform code (/terraform) in this repo consists of three components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;account&lt;/strong&gt; (optional) — covers S3 bucket for storing logs, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;core&lt;/strong&gt; — covers networking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;k8s&lt;/strong&gt; — covers EKS cluster configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, there’s a K8s configuration (/k8s) covering add-ons that is periodically read by ArgoCD to keep things set up as declared in Git — the GitOps way. If you decide to use that code simply look for TODOs and provide values that will be relevant to your set-up.&lt;/p&gt;

&lt;p&gt;Finally, after running Terraform and then going to get your well-deserved coffee…&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9u3gtdk16bi3j1t568xo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9u3gtdk16bi3j1t568xo.png" alt="Image description" width="800" height="260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;… it’s there, up and running!&lt;/p&gt;

&lt;p&gt;It needs a couple of minutes to deploy the add-ons automatically, including the &lt;strong&gt;AWS Load Balancer controller&lt;/strong&gt; and &lt;strong&gt;External DNS&lt;/strong&gt; responsible for exposing the ArgoCD UI publicly.&lt;br&gt;
Then, you just have to retrieve ArgoCD's initial admin password…&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl -n argocd get secret argocd-initial-admin-secret \
    -o jsonpath=”{.data.password}” | base64 -d 

n20x3mwZoapDv9JC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and you can log in 😎&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpk7bmyv6x4volrot78l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpk7bmyv6x4volrot78l.png" alt="Image description" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  App of Apps
&lt;/h2&gt;

&lt;p&gt;Now there’s that first ArgoCD application called add-ons where the enabled core K8s controllers belong to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuhgwciu2ianc0fszx7f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuhgwciu2ianc0fszx7f.png" alt="Image description" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To learn more about the App of Apps pattern in ArgoCD check this &lt;a href="https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logging
&lt;/h3&gt;

&lt;p&gt;Apart from cluster logs that were enabled also the logs from all the pods get nicely delivered to CloudWatch and can be easily queried with Logs Insights. See some examples below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqk0141v6lgxu60cqae3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqk0141v6lgxu60cqae3.png" alt="Image description" width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdu6n51ftxan6v2f9anz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdu6n51ftxan6v2f9anz.png" alt="Image description" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  EKS Blueprints
&lt;/h2&gt;

&lt;p&gt;Now, let’s get to the roots of the solution…&lt;/p&gt;

&lt;p&gt;EKS Blueprints helps you compose complete EKS clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. With EKS Blueprints, you describe the configuration for the desired state of your EKS environment, such as the control plane, worker nodes, and Kubernetes add-ons, as an IaC blueprint.&lt;/p&gt;

&lt;p&gt;Looks like a sponsored advertisement? Maybe, but it’s not!&lt;br&gt;
What I was after personally is something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;follows best practices&lt;/li&gt;
&lt;li&gt;is flexible and extensible&lt;/li&gt;
&lt;li&gt;is being actively supported&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First of all, EKS Blueprints turned out not only to be an open-source project supported by some K8s and AWS enthusiasts only. It’s a result of cooperation between AWS representatives, their partners, and others as the answer to customers’ needs. That I believe has defined the direction and shaped the foundation of what that project represents. One of its pillars is that it follows AWS Well-Architected Framework best practices and therefore lets you focus more on the functional side of your set-up.&lt;br&gt;
Secondly, it supports a wide and constantly growing range of Kubernetes &lt;a href="http://aws-ia.github.io/terraform-aws-eks-blueprints/add-ons/" rel="noopener noreferrer"&gt;add-ons&lt;/a&gt;. The EKS Blueprints allow it to be deployed either with Terraform or with AWS CDK, the two probably most popular tools out there.&lt;br&gt;
Then, it implements the so-called &lt;a href="https://aws-ia.github.io/terraform-aws-eks-blueprints/add-ons/#gitops-bridge" rel="noopener noreferrer"&gt;GitOps bridge&lt;/a&gt; that takes care of configuring resources (e.g. IAM roles and service accounts) to satisfy add-on functionalities requirements.&lt;br&gt;
Lastly, the already mentioned growing community — people using it for real, professionals testing it in battle, on real projects, in many different use cases, and for other purposes — made me realize it is not an ephemeral thing and enough quality is there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats
&lt;/h2&gt;

&lt;p&gt;Things one should be aware of when using EKS Blueprints…&lt;/p&gt;

&lt;p&gt;There are multiple modules calls happening behind the scenes due to the fact the EKS Blueprints support quite a wide range of various controllers/add-ons which ultimately makes the Terraform initialization last a bit longer (~3 minutes).&lt;/p&gt;

&lt;p&gt;When configuring private connectivity between the data plane and the API server endpoint sometimes things don’t work at the beginning. AWS recommends that if your endpoint does not resolve to a private IP address within the VPC you should enable public access and then disable it again.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html" rel="noopener noreferrer"&gt;Amazon EKS cluster endpoint access control&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deleting K8s namespaces created with Terraform is not that straightforward so check the link below to get yourself unblocked just in case.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/eks-terminated-namespaces/" rel="noopener noreferrer"&gt;Troubleshoot terminated Amazon EKS namespaces&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beware! Terraform doesn’t know about AWS resources provisioned by K8s controllers running on the cluster. Make sure you tidy up after running &lt;code&gt;terraform destroy&lt;/code&gt; or know what you should do to make the controllers delete relevant resources before destroying your infrastructure. The last thing you want is to have a couple of ALBs hanging there and costing you $16–$20 per month each.&lt;/p&gt;

</description>
      <category>blockchain</category>
      <category>cryptocurrency</category>
    </item>
    <item>
      <title>AWS Landing Zone: Hybrid networking</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Thu, 07 Jul 2022 08:32:27 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-landing-zone-hybrid-networking-ele</link>
      <guid>https://forem.com/aws-builders/aws-landing-zone-hybrid-networking-ele</guid>
      <description>&lt;p&gt;Previously, I fleshed out the core aspects of AWS Control Tower managed landing zone and brought closer how to approach accounts baselining to maintain consistency and elevate the security level across the estate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://faun.pub/aws-landing-zone-1-expanding-control-tower-managed-estate-d6dd52e5b06c" rel="noopener noreferrer"&gt;AWS Landing Zone #1: Expanding Control Tower managed estate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://faun.pub/aws-landing-zone-2-control-tower-account-factory-and-baselining-debd72d2167" rel="noopener noreferrer"&gt;AWS Landing Zone #2: Control Tower Account Factory and baselining&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This one is more of a try-out of how potentially hybrid networking could be set up, including hybrid DNS. Why so? It’s because it really depends on the individual requirements of an organisation. Such requirements can oscillate around various aspects like security, scalability, performance etc. and so the final architecture should be carefully considered to make sure the correct model is applied. Otherwise, one may end up with a configuration that in the long term won’t fit while modifications to such a fundamental matter can turn out to be very costly in many ways.&lt;br&gt;
I decided to define objectives that could fit into a wide range of use cases there might be and set things up myself to gain even more experience with the recent AWS network services as well as share my thoughts, as usual.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid networking
&lt;/h2&gt;

&lt;p&gt;Hybrid networking is nothing else than just connecting on-premises networks with the ones in the cloud in a secure and performant way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivdete7r5l6iho67iuck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivdete7r5l6iho67iuck.png" alt="Image description" width="713" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To establish hybrid network connectivity three elements are required:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AWS hybrid connectivity service (Virtual Private Gateway, Transit Gateway, Direct Connect Gateway)&lt;/li&gt;
&lt;li&gt;Hybrid network connection (AWS managed or software VPN, Direct Connect)&lt;/li&gt;
&lt;li&gt;On-prem customer gateway&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Objectives
&lt;/h3&gt;

&lt;p&gt;As for my on-prem network, I simply chose my home network there was only one set of building blocks I could use and these were respectively:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AWS Transit Gateway (TGW)&lt;/li&gt;
&lt;li&gt;AWS Managed Site-to-Site VPN (S2S VPN)&lt;/li&gt;
&lt;li&gt;StrongSwan @ RaspberryPi&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The functional objectives were to get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;centralised TGW in the Networking account shared across the Organisation with the use of AWS Resource Access Manager (RAM)&lt;/li&gt;
&lt;li&gt;centralised egress routing via NAT Gateway (NGW) and Internet Gateway (IGW) living in the Networking account&lt;/li&gt;
&lt;li&gt;hybrid DNS&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hybrid DNS
&lt;/h3&gt;

&lt;p&gt;DNS is a critical component of every network. I wanted to make sure that I can resolve my local/home DNS domain (&lt;strong&gt;sebolabs.home&lt;/strong&gt;) hosted on my Synology NAS from AWS accounts across my Organization and at the same time be able to resolve Route53 (R53) Private Hosted Zones’ records configured in those accounts.&lt;br&gt;
The challenge here was the decision itself that must have been made in order to get it set up in the least complicated way keeping in mind the associated costs coming from the fact the R53 resolvers are not the cheapest services out there. They cannot be avoided though as the VPC R53 native resolver (.2) is not reachable from outside of AWS. The concept of using the Inbound/Outbound resolvers is pretty straightforward per se, however, things get more complicated when considering a multi-account set-up.&lt;/p&gt;

&lt;p&gt;One other objective that I set was to be able to provide flexibility and autonomy to manage the R53 Private Hosted Zones (PHZ) within individual accounts but under one condition. That condition was that those hosted zones must overlap with the root hosted zone living in the Networking account along with the resolvers, namely they must represent subdomains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Networking AWS account root R53 PHZ: &lt;strong&gt;&lt;em&gt;sebolabs.aws&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sandbox AWS account R53 PHZ: sandbox.&lt;strong&gt;&lt;em&gt;sebolabs.aws&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;another AWS account R53 PHZ: any.&lt;strong&gt;&lt;em&gt;sebolabs.aws&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apart from the overlapping domain namespaces, one other requirement here is that all the R53 PHZ across the Organization accounts that want to benefit from the hybrid DNS must be associated with the VPC the root PHZ is associated with and where the R53 resolvers are located. At the same time, the Outbound R53 resolver must be shared through RAM to be associated with all other VPCs. The alternative of centralising multiple hosted zones in a shared AWS account didn’t feel appealing to me and so that was the idea I went with.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution
&lt;/h2&gt;

&lt;p&gt;Just to make it crystal clear, the solution presented below is a sort of an MVP that in a real-world scenario would have to be expanded at least to introduce enough resiliency and performance. I will touch slightly upon that matter later in this section. Bear with me…&lt;/p&gt;

&lt;h3&gt;
  
  
  High-level design
&lt;/h3&gt;

&lt;p&gt;For my PoC, I set everything up just like explained above leveraging my AWS Organization Networking account and a Sandbox one.&lt;br&gt;
My on-prem network on the other hand is represented by a single &lt;strong&gt;Raspberry Pi 4&lt;/strong&gt; running &lt;strong&gt;Ubuntu 20.04&lt;/strong&gt; and &lt;strong&gt;StrongSwan 1.9.4&lt;/strong&gt;, as well as &lt;strong&gt;Synology DS218+&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;To keep the diagram below as clean as possible to highlight the concept the traffic lines were made to flow through the TGW while associated NICs are there to just indicate they physically exist as all that traffic is handled by them in fact. For the same reason, there are no availability zones visualised while this entire set-up is Multi AZ’d, as well as local routes in the routing tables were omitted.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0vx69rt8l047z686mzy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0vx69rt8l047z686mzy.png" alt="Image description" width="765" height="1935"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Transit Gateway
&lt;/h3&gt;

&lt;p&gt;As you can see the Transit Gateway routing has been simplified as the use case above is not complex. Normally, there would be multiple TGW routing tables assigned to attachments depending on the individual connectivity requirements of a particular VPC.&lt;/p&gt;

&lt;p&gt;The centralised egress out to the Internet is apart from a way of reducing the costs of running NAT and Internet gateways in each VPC requiring them the ability to introduce security appliances (“bump-in-the-wire”) combined with AWS Gateway Load Balancer for traffic inspection or make use of the AWS Network Firewall.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49sjjp5152188wt906ij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49sjjp5152188wt906ij.png" alt="Image description" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DNS resolution
&lt;/h3&gt;

&lt;p&gt;The support for overlapping domain names that is the core concept of the proposed set-up was introduced in late 2019 and made it easy to distribute permissions for managing private hosted zones across the organisation.&lt;br&gt;
At the same time, it allows the R53 resolver route traffic based on the most specific match. If there is no hosted zone that exactly matches the domain name in the request, the R53 resolver checks for a hosted zone that has a name that is the parent of the domain name in the request.&lt;/p&gt;

&lt;p&gt;As PHZs are global constructs and not regional they are also a perfect means to support DR scenarios leveraging a multi-region solution. Similar thing with R53 Inbound/Outbound resolvers, another pair of resolvers can be configured in another region to failover to in case of the primary region failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveats
&lt;/h3&gt;

&lt;p&gt;The above is obviously just a fundament. Things complicate when you start considering how your workloads will run across your Organisation managed accounts and how services they host will be exposed.&lt;br&gt;
Now, when there’s a centralised egress what about exposing your services to the Internet? Wait, wasn’t one of the main ideas behind centralising the egress to disallow the creation of Internet Gateways in managed accounts through SCPs? In such a case you probably either centralise your ingress or maybe disallow the creation of NAT Gateways and association of public IP addresses.&lt;br&gt;
Hereby, I just wanted to emphasise how one decision can drive another one and eventually influence the shape of the target solution. In the end, every Organisation wants to end up with patterns and procedures for doing things, don’t they?&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Going back to my initial statement, there are multiple ways such a hybrid network can be designed and implemented depending on requirements. Each individual functionality must be carefully thought through.&lt;br&gt;
Related aspects that I came across when working for companies in their cloud enablement phase were, among other things, considerations around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;centralised VPC with subnets sharing with RAM along with centrally managed R53 PHZs per share&lt;/li&gt;
&lt;li&gt;single, centralised ingress ALB/NLB with multiple rules passing traffic to internal ALBs/NLBs with a firewall in between
Not all those ideas turned out to be good choices therefore thinking global and making your target solution as flexible as possible is the way to go. For that of course, there must be a lot of experience within the team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now when AWS offers an enormous range of services and options to deliver very complex solutions while organisations decide to migrate to the cloud we’re back to centralising things but in a different place. The reason for that is that with hundreds of AWS accounts running a big number of workloads organisations want to ensure some control and elevate the level of security which is probably the most important key factor for them when deciding on migrating to the cloud. The times when individual projects were treated independently seem to be over. We’re back doing the same work around networking but in someone else’s data centre :)&lt;/p&gt;

&lt;p&gt;As all these things are not always easy to comprehend, especially when AWS services evolve I strongly suggest following the &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/" rel="noopener noreferrer"&gt;Networking and Content Delivery Blog&lt;/a&gt; from AWS where you can find many useful clues, and solutions or at least get your head around what’s going on in that world.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>hybridnetwork</category>
      <category>landingzone</category>
      <category>devops</category>
    </item>
    <item>
      <title>Auth Portal powered by AWS/AzureAD and built with CDKs</title>
      <dc:creator>Sebastian Mincewicz</dc:creator>
      <pubDate>Fri, 10 Jun 2022 08:39:28 +0000</pubDate>
      <link>https://forem.com/aws-builders/auth-portal-powered-by-awsazuread-and-built-with-cdks-2jhk</link>
      <guid>https://forem.com/aws-builders/auth-portal-powered-by-awsazuread-and-built-with-cdks-2jhk</guid>
      <description>&lt;p&gt;This one aims to bring together all the pieces required to build and deploy an authentication portal in AWS leveraging Azure AD as IdP. Something that has recently been more and more often used across AWS projects and this time I thought I would go about it a bit differently to try out new things and therefore gain more insights into some tech I haven’t had a chance to master yet. As usual, at the same time sharing some thoughts and experiences with whoever’s interested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;p&gt;From the stack outlined below, I’ve already got a huge experience with AWS and Terraform which should not be a surprise if you read my previous &lt;a href="https://medium.com/@sebolabs" rel="noopener noreferrer"&gt;publications&lt;/a&gt;. However, this time I wanted to shift my focus to using some other popular, open-source tooling I had limited knowledge of and put them into the mix.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt; — to host the Portal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CDK&lt;/strong&gt; (v. 2.26.0)— for developing the Portal (&lt;em&gt;type-script&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure AD&lt;/strong&gt; — as an identity service provider (federated authentication)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CDKtf&lt;/strong&gt; (v. 0.11.0) — for configuring Azure AD (&lt;em&gt;type-script&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React/Amplify&lt;/strong&gt; — for a bit of frontend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As for my job, I must take decisions from time to time on what tooling should be used to deliver a solution, I badly wanted to give &lt;strong&gt;&lt;em&gt;AWS CDK&lt;/em&gt;&lt;/strong&gt; a full end-to-end go to witness whether that framework can be easily used for something more than just a PoC. And as there was &lt;strong&gt;&lt;em&gt;CDKtf&lt;/em&gt;&lt;/strong&gt; too I decided to give that one a chance as well.&lt;br&gt;
Regarding the frontend bit, don’t expect too much ;) It’s not something I do and only set that up to cover up the entire infrastructure stack this story is mainly about and to show a tangible result.&lt;/p&gt;
&lt;h2&gt;
  
  
  Design
&lt;/h2&gt;

&lt;p&gt;The high-level diagram below represents the spectrum of services composing the Auth Portal in AWS and its integration with Azure AD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fya4hrjly1yb70u9fhwvn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fya4hrjly1yb70u9fhwvn.png" alt="Image description" width="800" height="480"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In case you want to deepen your understanding of the SAML user pool IdP authentication flow please navigate to this page: &lt;a href="https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pools-saml-idp-authentication.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pools-saml-idp-authentication.html&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Goals
&lt;/h3&gt;

&lt;p&gt;Apart from the fact that I wanted to get some hands-on experience with CDKs I also set several goals for myself in terms of what I wanted to try out. These were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cognito hosted UI (with customisations)&lt;/li&gt;
&lt;li&gt;Cognito custom domain&lt;/li&gt;
&lt;li&gt;Pre token generation Lambda trigger&lt;/li&gt;
&lt;li&gt;Cognito required claims&lt;/li&gt;
&lt;li&gt;Azure AD additional/custom claims&lt;/li&gt;
&lt;li&gt;Azure AD → Cognito claims mappings&lt;/li&gt;
&lt;li&gt;Azure App access restrictions based on security group membership&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While I abandoned the last one as I wanted to keep the &lt;em&gt;FREE&lt;/em&gt; Azure subscription plan that only allows access restrictions on the user account basis, the rest turned out not to be difficult to configure.&lt;/p&gt;
&lt;h3&gt;
  
  
  Source code
&lt;/h3&gt;

&lt;p&gt;The source code for the entire stack can be referenced below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sebolabs/auth-portal" rel="noopener noreferrer"&gt;https://github.com/sebolabs/auth-portal&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To use it, you’ll have to export the required environment variables.&lt;br&gt;
Additionally, it expects you to have your own public domain and a valid ACM certificate in the N. Virginia region (&lt;em&gt;CloudFront and Cognito requirement&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;Because the frontend part is optional I decided to keep it separate from the AWS portal stack code. If you want to test the authentication flow in the simplest possible way just set the &lt;code&gt;S3_DUMMY_PAGE_DEPLOY&lt;/code&gt; environment variable to &lt;code&gt;true&lt;/code&gt;, surf to the address below, sign in and then check things with Developers Tools in the browser and &lt;a href="https://jwt.io" rel="noopener noreferrer"&gt;jwt.io&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://&amp;lt;cognito custom domain&amp;gt;/login?response_type=code&amp;amp;client_id=&amp;lt;cognito client id&amp;gt;&amp;amp;redirect_uri=&amp;lt;portal site url&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F391ehz6fazfe7f1t7bbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F391ehz6fazfe7f1t7bbp.png" alt="Image description" width="800" height="405"&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h3&gt;
  
  
  Caveats
&lt;/h3&gt;

&lt;p&gt;Cognito &lt;strong&gt;custom domain&lt;/strong&gt; feature assumes that if you expose the Cognito authentication endpoint at &lt;code&gt;auth.test.example.com&lt;/code&gt; then your landing page is &lt;code&gt;test.example.com&lt;/code&gt;. If that's not the case, like in my example, then Cognito expects you to have a resolvable A record configured for &lt;code&gt;test.example.com&lt;/code&gt; to perform some verification. Moreover, on the Azure side, you must also let your root domain &lt;code&gt;example.com&lt;/code&gt; be verified by configuring a TXT record with a provided value.&lt;/p&gt;

&lt;p&gt;Cognito &lt;strong&gt;required claims&lt;/strong&gt; can only be set up at the user pool provisioning stage and so cannot be modified at a later time. This means that if you change your mind you’ll have to delete your Cognito user pool and create it again. This also means your user pool ID changes along with the client application ID and so both the Azure AD and the frontend configuration must be updated with new values. What can turn out to be even more disturbing is the fact that you lose all user profiles previously created in Cognito. Luckily, users can now be imported from CSV.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Pre token generation Lambda function&lt;/strong&gt; hasn’t got any logic customising the claims and was set up only to see what useful information it can produce &lt;em&gt;out of the box&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  AWS &amp;amp; CDK
&lt;/h2&gt;

&lt;p&gt;With CDK the stack gets synthesized and translated into a CloudFormation template. Depending on features you decide to leverage across your code there could be additional, custom resources spun up for you to satisfy requirements.&lt;br&gt;
A cool feature here is that any changes to the IAM resources require your attention and approval before getting deployed. This is useful from the security/audit perspective.&lt;br&gt;
One thing to keep in mind when bootstrapping a CDK project is the &lt;code&gt;qualifier&lt;/code&gt; option that helps you avoid resource name clashes when provisioning multiple bootstrap stacks in the same AWS account.&lt;/p&gt;

&lt;p&gt;Must read: &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/best-practices.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cdk/v2/guide/best-practices.html&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cdk deploy
✨  Synthesis time: 3.59s

PortalStack: deploying...
[0%] start: Publishing 483ae06ed27ef8ca76e011264d772420593a6cfe8544759c306ef3b98c9e25be:XXXXXXXXXXXX-eu-central-1
[...]
[100%] success: Published ca2e471276d39c586eae61d73c2e253eb08b4a648a8676f2000f81271b73a405:XXXXXXXXXXXX-eu-central-1

PortalStack: creating CloudFormation changeset...
✅  PortalStack
✨  Deployment time: 393.82s

Outputs:
PortalStack.cognitoDomainName = https://auth.test.sebolabs.net
PortalStack.cognitoUserPoolClientId = 72661fo2r4bgob1ateqskfhicd
PortalStack.cognitoUserPoolId = eu-central-1_XXXXXXXXX
PortalStack.frontendS3BucketName = sebolabs-test-portal-XXXXXXXXXXXX-eu-central-1
PortalStack.portalSiteUrl = https://portal.test.sebolabs.net

Stack ARN:
arn:aws:cloudformation:eu-central-1:XXXXXXXXXXXX:stack/PortalStack/e9974180-e322-11ec-ab92-02e74f818fb0

✨  Total time: 397.4s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3an9jvtpkpkderkoyky7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3an9jvtpkpkderkoyky7.png" alt="Image description" width="800" height="475"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;One of the fundamentals and a huge advantage of using CDK is the fact that infrastructure code can be tested just like application code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ npm test
&amp;gt; portal@0.1.0 test
&amp;gt; jest
 PASS  test/cf.test.ts
 PASS  test/cognito.test.ts
 PASS  test/s3.test.ts
 PASS  test/lambda.test.ts
Test Suites: 4 passed, 4 total
Tests:       9 passed, 9 total
Snapshots:   0 total
Time:        3.852 s, estimated 4 s
Ran all test suites.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure AD &amp;amp; CDKtf
&lt;/h2&gt;

&lt;p&gt;With CDKtf the stack gets synthesized and translated into a Terraform plan (zipped) that is then executed either locally or remotely.&lt;br&gt;
When using Terraform Cloud for your backend you can have it to store your state and optionally use it to run Terraform keeping a track of all your runs in one place. Another nice feature of using that remote backend is that it also versions states and highlights changes (diff) between consecutive Terraform runs.&lt;br&gt;
Furthermore, CDKtf supports most of the Terraform well-known commands e.g. &lt;code&gt;output&lt;/code&gt; that can become very useful for example to pass certain values between stages in a CI/CD pipeline, but also &lt;code&gt;locals&lt;/code&gt;,&lt;code&gt;remote states&lt;/code&gt; and other fundamental Terraform features.&lt;/p&gt;

&lt;p&gt;Must read: &lt;a href="https://www.terraform.io/cdktf" rel="noopener noreferrer"&gt;https://www.terraform.io/cdktf&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cdktf deploy
sebolabs-aad-auth-portal  Initializing the backend...
sebolabs-aad-auth-portal  Initializing provider plugins...
                          - Reusing previous version of hashicorp/azuread from the dependency lock file
sebolabs-aad-auth-portal  - Using previously-installed hashicorp/azuread v2.22.0
sebolabs-aad-auth-portal  Terraform has been successfully initialized!

azuread_claims_mapping_policy.portal_cmp (portal_cmp): Refreshing state... [id=a733375e-b67c-49de-9142-12af23de9afa]
azuread_group.portal_users (portal_users): Refreshing state... [id=e426314e-14fb-4415-9fd8-e170981b378e]
azuread_application.portal_app (portal_app): Refreshing state... [id=764cc3ae-2cf0-4cd5-9521-091b7bd3bece]
azuread_service_principal.portal_sp (portal_sp): Refreshing state... [id=a4d4926d-c5cc-402c-96c4-016dc396325f]
azuread_service_principal_claims_mapping_policy_assignment.portal_cmpa (portal_cmpa): Refreshing state... [id=a4d4926d-c5cc-402c-96c4-016dc396325f/claimsMappingPolicy/a733375e-b67c-49de-9142-12af23de9afa]

No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

sebolabs-aad-auth-portal
  app_id = 6e8b9c9d-715b-4371-810f-4avc07a27c2x
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4i1dgjgwhcc2k7v9zssk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4i1dgjgwhcc2k7v9zssk.png" alt="Image description" width="800" height="1025"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;Likewise, CDKtf also enables you to run unit tests against your code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ npm test
&amp;gt; portal-azure@1.0.0 test
&amp;gt; jest
PASS  __tests__/main-test.ts
  Terraform
    ✓ check if the produced terraform configuration is valid
    ✎ todo check if this can be planned
  AzureAD configuration
    ✎ todo should contain an application
    ✎ todo should contain a service principal
    ✎ todo should contain a claims mapping policy
    ✎ todo should contain a service principal claims mapping policy assignment
Test Suites: 1 passed, 1 total
Tests:       5 todo, 1 passed, 6 total
Snapshots:   0 total
Time:        4.438 s, estimated 5 s
Ran all test suites.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Outcome
&lt;/h2&gt;

&lt;p&gt;And finally, here’s the result of putting all the things together…&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4u11iwr6fyobq3m2rgr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4u11iwr6fyobq3m2rgr.png" alt="Image description" width="800" height="424"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjues1xw0mnwxj6kx0kdk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjues1xw0mnwxj6kx0kdk.png" alt="Image description" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ef4po1ljvl695gcajhu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ef4po1ljvl695gcajhu.png" alt="Image description" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gi8g5w1xzzueqswwdcc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gi8g5w1xzzueqswwdcc.png" alt="Image description" width="800" height="438"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnvm11n2tt8bprgf0b8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnvm11n2tt8bprgf0b8.png" alt="Image description" width="800" height="522"&gt;&lt;/a&gt;     &lt;/p&gt;

&lt;p&gt;💡 In case some user information was not retrieved or mapped as expected, it’s worth comparing the SAML response (&lt;strong&gt;&lt;em&gt;idpresponse&lt;/em&gt;&lt;/strong&gt; payload) with the ID JWT token (&lt;strong&gt;&lt;em&gt;token&lt;/em&gt;&lt;/strong&gt; preview) using Developer Tools from within a browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  CloudWatch Logs Insights
&lt;/h3&gt;

&lt;p&gt;The aforementioned Lambda function that is meant to be used for customising ID token claims can also be used to just log certain information carried by tokens. Especially when Cognito is a black box not revealing anything, such data can become very useful e.g. if the authentication flow must be debugged or to generate some users activity statistics. Moreover, very often project teams working on the AWS side of things have no access to the Azure AD application sign-in logs and so that way they can gain some insights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c5hd3o07r7mi1nw6459.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c5hd3o07r7mi1nw6459.png" alt="Image description" width="800" height="298"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS &amp;amp; Azure AD
&lt;/h3&gt;

&lt;p&gt;First of all, setting up an end-to-end authentication flow using Amazon Cognito and Azure AD is fairly simple. Obviously, there are some caveats here and there + things one must be aware of as both services are managed cloud services working based on some assumptions etc. Therefore, it is worth spending some time reading relevant documentation in advance.&lt;/p&gt;

&lt;p&gt;The authentication mechanism configured as a part of this story is just a beginning though. Going further, there’s logic one may want to implement with Lambda triggers and also the entire authorisation flow when integrating such a frontend solution with backend services. Regarding the latter, there are decisions to be made e.g. what type of an authoriser should be used when integrating with API Gateways and even more on access scopes. Either way, having a token carrying correct claims is a good start.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS CDK &amp;amp; CDKtf
&lt;/h3&gt;

&lt;p&gt;On the CDKs side of things, I must admit both have turned out to be very appealing and promising! During my try-out, I certainly enjoyed the fact things simply happen automatically without me having to worry about things I used to worry about when working with cloud infrastructure orchestrators natively. There were several issues I came across but they were rather related either to the underlying orchestrators or cloud providers themselves, nothing that would want me to ditch any of the CDKs.&lt;/p&gt;

&lt;p&gt;In terms of AWS CDK, I’m not a big fan of some of the concepts CloudFormation is based on, e.g. the fact you can’t delete a resource manually in the console and then rerun &lt;code&gt;cdk deploy&lt;/code&gt; to calculate what’s missing and reprovision that resource. Instead, you get a resource not found exception.&lt;/p&gt;

&lt;p&gt;In terms of CDKtf, having configured only five resources is probably not representative enough to draw big conclusions, however, it just worked and literally took me minutes to have a working deployment mechanism.&lt;/p&gt;

&lt;p&gt;Just like with any other frameworks, e.g. the serverless framework, CDKs can massively simplify developers' lives. With AWS Security Speciality hat on, however, I want to emphasize the security aspect of the infrastructure being provisioned as a part of application development. I’ve seen many times already how insecure such infrastructure can become when it’s developed by individuals having not enough security-in-the-cloud awareness, especially when there’s networking involved. Therefore, getting compliant constructs or modules is definitely something that should be considered by organisations and project teams who care more about just application features.&lt;/p&gt;

&lt;p&gt;Finally, both CDKs require you to figure out the best way you want to go about environments and how you’re going to provide environment-specific values when deploying stacks. It’s not as straightforward as with Terraform environment files but once you get it right you can’t go wrong with CDKs. Both frameworks can still be considered quite new and I strongly believe that with time they’ll become even more robust.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>cdk</category>
      <category>cdktf</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
