<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Khachatur Ashotyan</title>
    <description>The latest articles on Forem by Khachatur Ashotyan (@lanycrost).</description>
    <link>https://forem.com/lanycrost</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3937478%2F8fa8de9d-1526-4267-a8a3-a5e77f7e416f.jpeg</url>
      <title>Forem: Khachatur Ashotyan</title>
      <link>https://forem.com/lanycrost</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lanycrost"/>
    <language>en</language>
    <item>
      <title>Spot instances as GitHub Actions runners</title>
      <dc:creator>Khachatur Ashotyan</dc:creator>
      <pubDate>Sat, 23 May 2026 09:04:32 +0000</pubDate>
      <link>https://forem.com/lanycrost/spot-instances-as-github-actions-runners-h19</link>
      <guid>https://forem.com/lanycrost/spot-instances-as-github-actions-runners-h19</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/lanycrost/jenkins-as-a-code-or-how-i-stopped-clicking-around-in-the-ui-1nko"&gt;Part 1&lt;/a&gt; covered Jenkins as Code with ephemeral workers. &lt;a href="https://dev.to/lanycrost/macos-workers-or-how-i-built-my-own-mac-cloud-bhm"&gt;Part 2&lt;/a&gt; covered macOS workers. This one is about moving a chunk of the CI workload off Jenkins entirely, onto GitHub Actions, with EC2 spot instances as the runner fleet.&lt;/p&gt;

&lt;p&gt;This isn't a "Jenkins is dead, use GitHub Actions" post. Jenkins still handles the heavy builds: macOS, Windows, anything that runs for hours or needs custom orchestration. GitHub Actions runs alongside it for a narrower class of workload where it fits better.&lt;/p&gt;

&lt;p&gt;What follows is the self-hosted spot runner pattern: how to point GitHub Actions at your own ephemeral EC2 fleet, and the things that bite once you do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother
&lt;/h2&gt;

&lt;p&gt;GitHub's managed runners are fine for small teams. There are a few reasons to switch:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Cost at volume.&lt;/strong&gt; GitHub bills its managed Linux runners at $0.008/minute ($0.48/hour). Fine for a few builds a day. Last month we ran 80,887 runner-minutes across 29,347 jobs (~1,350 hours). On managed runners that would have been ~$647. Our actual EC2 bill for the runner fleet was ~$160 - $130 on spot, $28 EC2-Other (EBS, ENIs, data transfer). Roughly 4x cheaper, and the gap widens the more you run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Instance shape.&lt;/strong&gt; Managed runners come in fixed sizes. Builds that need 16 vCPUs and 64 GB of RAM, a GPU, or arm64 either pay for the largest tier or don't fit at all. Self-hosted lets you pick whatever EC2 instance type the build actually needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Network access.&lt;/strong&gt; Builds that talk to private resources (internal artifact registries, RDS, anything behind a VPC) are awkward on managed runners. Self-hosted runners live inside your VPC, so they hit those resources directly without proxies or tunnels.&lt;/p&gt;

&lt;p&gt;Cost is what got us to try it. The VPC access and instance flexibility came along for free.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9pggar8py9dq0ahs306c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9pggar8py9dq0ahs306c.jpg" alt="github actions and cost explorer charts" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What a "self-hosted spot runner" is
&lt;/h2&gt;

&lt;p&gt;A self-hosted GitHub Actions runner is a small agent. It registers with a GitHub repo or org, polls for jobs matching its labels, runs them, and reports results back. Anything that can run the binary works as a host (bare metal, VM, container, whatever).&lt;/p&gt;

&lt;p&gt;It can be either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent: registered once, sits there, picks up jobs as they come.&lt;/li&gt;
&lt;li&gt;Ephemeral: single-use registration token, picks up one job, de-registers, shuts down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We went with ephemeral. Long-lived self-hosted runners combine the operational burden of managing a host with the build-pollution risk of a shared agent and a security blast radius that never closes.&lt;/p&gt;

&lt;p&gt;Every GitHub Actions job gets its own EC2 spot instance, freshly launched from a Packer-baked AMI. The job runs, then the instance is terminated. Job runs, instance terminates. The same one-build-per-worker lifecycle as our Jenkins workers, on a different control plane.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;No single service does this end-to-end. Either you wire it together yourself, or you reach for one of the open-source modules that already does. I went with &lt;a href="https://github.com/github-aws-runners/terraform-aws-github-runner" rel="noopener noreferrer"&gt;terraform-aws-github-runner&lt;/a&gt;. It's the most mature module in this space and fits cleanly into a Terraform-managed AWS account. (If you remember the project under its old name, &lt;code&gt;philips-labs/terraform-aws-github-runner&lt;/code&gt;, it's the same code, moved to the &lt;code&gt;github-aws-runners&lt;/code&gt; org.)&lt;/p&gt;

&lt;p&gt;When someone opens a PR that triggers a workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;GitHub fires a &lt;code&gt;workflow_job&lt;/code&gt; webhook the moment a job is queued.&lt;/li&gt;
&lt;li&gt;API Gateway plus a webhook Lambda check the HMAC, filter for relevant runner labels, and push a message onto an SQS queue.&lt;/li&gt;
&lt;li&gt;A scale-up Lambda drains that queue. For each queued job it launches an EC2 spot instance from a specific AMI, with user-data carrying a single-use registration token.&lt;/li&gt;
&lt;li&gt;The instance comes up. Cloud-init runs; the runner binary registers itself with GitHub and starts polling for jobs.&lt;/li&gt;
&lt;li&gt;The runner came up with matching labels, so GitHub schedules the queued job onto it.&lt;/li&gt;
&lt;li&gt;The workflow runs whatever your YAML says: checkout, build, test, push artifacts.&lt;/li&gt;
&lt;li&gt;The runner is registered as &lt;code&gt;--ephemeral&lt;/code&gt;, so the agent exits after one job. A scheduled scale-down Lambda cleans up anything left over.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Lambda code, the SQS queues, and the IAM glue all live inside the module. You don't write any of that yourself. What you do write is the Terraform configuration that declares which runners exist, which AMI they use, and which instance types are eligible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ro13504c4qpe7xway6m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ro13504c4qpe7xway6m.jpg" alt="spot github runners architecture diagram" width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-tier runners
&lt;/h2&gt;

&lt;p&gt;This setup gets useful in practice when you split runners into tiers distinguished by labels. Workflows pick a tier by setting &lt;code&gt;runs-on:&lt;/code&gt; in the workflow YAML.&lt;/p&gt;

&lt;p&gt;We run three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small (&lt;code&gt;t3.medium&lt;/code&gt; / &lt;code&gt;m5.large&lt;/code&gt;): linters, formatters, doc builds, anything that doesn't really stress a CPU. Spawns fast and spot capacity at this size is never a problem.&lt;/li&gt;
&lt;li&gt;Large (&lt;code&gt;m5.xlarge&lt;/code&gt; / &lt;code&gt;c5.xlarge&lt;/code&gt;): the typical build-and-test workflow that wants some CPU but doesn't hammer it.&lt;/li&gt;
&lt;li&gt;Compute-intensive (&lt;code&gt;c7a.4xlarge&lt;/code&gt; / &lt;code&gt;c8a.8xlarge&lt;/code&gt;): compile-heavy builds, large test suites, anything that scales with cores.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tier is its own call to the same Terraform module with different labels and instance-type lists. Sanitized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"github-runners"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"github-aws-runners/github-runner/aws//modules/multi-runner"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.0"&lt;/span&gt;

  &lt;span class="nx"&gt;multi_runner_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"linux-x64-small"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;runner_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;runner_extra_labels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"linux,x64,small"&lt;/span&gt;
        &lt;span class="nx"&gt;instance_types&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;default_instances&lt;/span&gt;
        &lt;span class="nx"&gt;ami_filter&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*ci-runner-x64*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;enable_ephemeral_runners&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="nx"&gt;enable_spot_instances&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="s2"&gt;"linux-x64-compute-intensive"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;runner_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;runner_extra_labels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"linux,x64,compute-intensive"&lt;/span&gt;
        &lt;span class="nx"&gt;instance_types&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compute_intensive&lt;/span&gt;
        &lt;span class="nx"&gt;ami_filter&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*ci-runner-x64*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;enable_ephemeral_runners&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="nx"&gt;enable_spot_instances&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# ...and so on for the other tiers&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Common stuff: webhook secret, GitHub app credentials, VPC config, etc.&lt;/span&gt;
  &lt;span class="nx"&gt;github_app&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;webhook_secret&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;random_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhook_secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hex&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnets&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Picking the tier in a workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;self-hosted&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;linux&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;x64&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;compute-intensive&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub matches the &lt;code&gt;runs-on&lt;/code&gt; array against runner labels and picks any registered runner that has them all. The Lambda only spawns instances on demand, so an unused tier costs nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scheduling - warm pool during hours, off at night
&lt;/h2&gt;

&lt;p&gt;Pure on-demand scaling sounds ideal in theory. Zero idle runners, pay per job, the Lambda spawns instances only when GitHub queues something. Two patterns spoil it in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The morning-rush problem.&lt;/strong&gt; The first few PRs of the day queue around the time people log in. On pure on-demand, every one of those jobs eats the full cold-start latency, somewhere between 60 and 120 seconds from queue to running. A dozen developers pushing at 9am turns into a visible backlog.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 3am problem.&lt;/strong&gt; Even on spot, idle runners cost something. There's EBS attached to the warm AMIs, plus the always-on orchestration Lambdas. Outside business hours the queue is mostly empty, so there's no reason to keep capacity hot.&lt;/p&gt;

&lt;p&gt;The runner module addresses both with idle pools and scheduled scaling.&lt;/p&gt;

&lt;p&gt;What works for us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;During business hours (weekdays 08:00 to 20:00 in our primary timezone), we keep a warm pool of N runners per tier, sitting registered with GitHub and ready to grab the first matching job. When one claims a job, the scale-up Lambda spawns a replacement, so the pool stays at N. Cold start for the user effectively disappears.&lt;/li&gt;
&lt;li&gt;Outside that window, the pool size drops to zero. Late-night and weekend jobs still run; they just pay the cold-start tax. Most of what runs at those hours is scheduled batch work that doesn't care about an extra minute.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Terraform for it (sanitized, per-tier):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;runner_config&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# ... labels, instance_types, ami_filter as before ...&lt;/span&gt;

  &lt;span class="nx"&gt;enable_ephemeral_runners&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;enable_spot_instances&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="c1"&gt;# Warm pool - kept at this size during the cron windows below.&lt;/span&gt;
  &lt;span class="nx"&gt;idle_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cron&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"0 8 * * MON-FRI"&lt;/span&gt;   &lt;span class="c1"&gt;# ramp up at 08:00 weekdays&lt;/span&gt;
      &lt;span class="nx"&gt;timeZone&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
      &lt;span class="nx"&gt;idleCount&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cron&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"0 20 * * MON-FRI"&lt;/span&gt;  &lt;span class="c1"&gt;# ramp down at 20:00 weekdays&lt;/span&gt;
      &lt;span class="nx"&gt;timeZone&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
      &lt;span class="nx"&gt;idleCount&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few notes about this pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Warm-pool runners are still single-use, just pre-launched. Each one picks up one job and dies, with the pool replenished by fresh instances rather than reused ones. That keeps the byte-identical state property from the Jenkins side intact.&lt;/li&gt;
&lt;li&gt;Pool size is a tuning decision. Too small and you still cold-start during the rush; too big and you're burning money on idle capacity. We tune per tier based on the morning queue depth we actually observe. The compute-intensive tier gets a smaller pool because those jobs are rarer.&lt;/li&gt;
&lt;li&gt;Spot eviction during pool-idle time is fine. If AWS reclaims a pool runner before it ever picks up a job, the scale-up Lambda just launches a replacement. The pool size is a target, not a fixed set of instances.&lt;/li&gt;
&lt;li&gt;Holidays are a remaining problem. The cron schedule doesn't know about public holidays, so on a Monday holiday the pool still ramps up at 08:00 to serve nobody. The cost is small enough that nobody's been motivated to build a calendar-aware scheduler.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During working hours the developer experience is roughly as fast as managed runners. Outside those hours the bill is close to nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AMI is still a Packer image
&lt;/h2&gt;

&lt;p&gt;As with the Jenkins workers, anything the runner needs to do its job (language runtimes, build tools, Docker, cached dependencies) gets baked into a Packer AMI ahead of time. The AMI is versioned, lives in your AWS account, and is referenced by the runner's &lt;code&gt;ami_filter&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For GitHub Actions the image is usually lighter than the Jenkins worker AMIs - GHA workflows install most of their tooling at runtime via &lt;code&gt;setup-node&lt;/code&gt;, &lt;code&gt;setup-python&lt;/code&gt;, &lt;code&gt;setup-java&lt;/code&gt;. So the base image just needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ubuntu (or whatever OS).&lt;/li&gt;
&lt;li&gt;The GitHub Actions runner binary, pre-downloaded.&lt;/li&gt;
&lt;li&gt;Docker (for &lt;code&gt;docker build&lt;/code&gt; / &lt;code&gt;docker run&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;AWS CLI (most workflows hit S3 or ECR).&lt;/li&gt;
&lt;li&gt;Basic build deps: &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;jq&lt;/code&gt;, &lt;code&gt;unzip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A stable runtime or two we don't want to redownload every build (Node LTS, Python 3.x).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The image stays around 5 to 10 GB, small enough that pulling and booting fits comfortably inside the cold-start budget we already have.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spot interruptions
&lt;/h2&gt;

&lt;p&gt;The thing everyone asks the first time they look at spot for CI is what happens when AWS reclaims the instance mid-build.&lt;/p&gt;

&lt;p&gt;The build fails and GitHub re-queues it. A fresh instance picks it up. There's no recovery logic to write, because the runner is ephemeral and the workflow ought to be idempotent anyway. One partial build is lost, the retry starts clean.&lt;/p&gt;

&lt;p&gt;Spot interruption notices give you two minutes of warning before AWS pulls the plug. The runner listens for that signal and de-registers from GitHub cleanly before shutdown, which the module handles for you. Without that, GitHub briefly shows a "runner went offline mid-job" error before the retry. Annoying, but not fatal.&lt;/p&gt;

&lt;p&gt;In practice the interruption rate I see is around 1-3% of jobs on the small and large tiers, and a bit higher on compute-intensive because the larger instance types have less spot capacity per AZ. For most workloads that's a fine trade for the savings. For workflows that genuinely can't tolerate a retry (release builds, deploys with side effects), I either flip &lt;code&gt;enable_spot_instances = false&lt;/code&gt; for that tier or send the job over to Jenkins, where the lifecycle is more tightly controlled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs vs. Jenkins workers
&lt;/h2&gt;

&lt;p&gt;"Should this run on Jenkins or GitHub Actions?" comes up a lot. How I think about it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload shape&lt;/th&gt;
&lt;th&gt;Where it goes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PR-triggered, short, idempotent&lt;/td&gt;
&lt;td&gt;GitHub Actions on spot. Quick spin-up, cheap, no Jenkins overhead.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-running build (1h+)&lt;/td&gt;
&lt;td&gt;Jenkins. Spot interruption risk is too high for long jobs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;macOS / Windows builds&lt;/td&gt;
&lt;td&gt;Jenkins. The worker setup from Parts 1 and 2 lives there.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom orchestration (matrix sharding, dynamic parallelism, gated promotion)&lt;/td&gt;
&lt;td&gt;Jenkins. Groovy DSL handles this more flexibly than the GHA matrix.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploys / releases with side effects&lt;/td&gt;
&lt;td&gt;Jenkins on dedicated workers, or GHA on on-demand. No spot.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open-source / contributor-facing repos&lt;/td&gt;
&lt;td&gt;GitHub Actions. Don't expose Jenkins to contributors.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Builds that need ephemeral access to a specific cloud service&lt;/td&gt;
&lt;td&gt;Whichever is in the right VPC. Usually GHA for the small stuff.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two systems cover different shapes of work. Moving everything to GitHub Actions would have been a mistake, but moving the small PR-scoped jobs off Jenkins freed up real capacity for the big builds that remain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm still figuring out
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Cold-start latency outside the warm pool window. The pool hides it during business hours, but outside those hours every job eats the full 60-120 seconds. We're fine with this trade-off most of the time, though it occasionally annoys people working evenings.&lt;/li&gt;
&lt;li&gt;Spot capacity in specific AZs. The compute-intensive tier sometimes can't get spot capacity in our preferred AZ and the queue backs up. The module's multi-AZ fallback helps but doesn't eliminate it. On genuinely bursty days we fall back to on-demand temporarily.&lt;/li&gt;
&lt;li&gt;Holiday-aware pool scheduling. The cron schedule doesn't know about public holidays, so we burn a small amount of money ramping up on holidays nobody is working. Low impact, but it's the kind of thing that bothers you every time you remember.&lt;/li&gt;
&lt;li&gt;AMI sprawl. Every architecture (x64, arm64) and base-image variant is its own AMI lineage. We rebuild them on a schedule via Packer the same way we do the Jenkins worker AMIs, but the operational overhead is a real cost.&lt;/li&gt;
&lt;li&gt;Cost attribution. Spot instances inherit tags from the launch template, but not every downstream resource (EBS volumes, ENIs) picks up the right cost-attribution tags automatically. That's a separate problem and I'm not opening it here.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;All three posts in this series end up at the same place. Ephemeral workers, baked images, everything orchestrated from git, secrets pulled from a vault at runtime. Jenkins is one way to wire that pattern together, and GitHub Actions on self-hosted spot is another. Nothing says you pick one and only one.&lt;/p&gt;

&lt;p&gt;The worker lifecycle is the part you can't compromise on: don't keep workers between builds. Once that's in place, everything else (Jenkins versus GHA, spot versus on-demand, Tart versus vSphere) is swappable, and you can change your mind later without burning the platform down.&lt;/p&gt;

&lt;p&gt;That wraps the series for now. If any of it saves you a week of figuring it out yourself, this was worth writing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix - tools mentioned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/github-aws-runners/terraform-aws-github-runner" rel="noopener noreferrer"&gt;terraform-aws-github-runner&lt;/a&gt; - the Terraform module that wires up the whole thing. (Formerly &lt;code&gt;philips-labs/terraform-aws-github-runner&lt;/code&gt;.)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/en/actions/hosting-your-own-runners" rel="noopener noreferrer"&gt;GitHub Actions self-hosted runners&lt;/a&gt; - official docs.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.packer.io/" rel="noopener noreferrer"&gt;HashiCorp Packer&lt;/a&gt; - bakes the runner AMIs.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt; - calls the module above.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/ec2/spot/" rel="noopener noreferrer"&gt;AWS EC2 spot&lt;/a&gt; - cheap interruptible compute.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/lambda/" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; and &lt;a href="https://aws.amazon.com/sqs/" rel="noopener noreferrer"&gt;SQS&lt;/a&gt; - the queueing/orchestration glue (managed by the module).&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Part 3 of My CI/CD Odyssey. Thanks for reading. If you run self-hosted CI differently, I'd be curious to hear about it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>devops</category>
      <category>cicd</category>
      <category>aws</category>
    </item>
    <item>
      <title>MacOS Workers, or how I built my own Mac cloud</title>
      <dc:creator>Khachatur Ashotyan</dc:creator>
      <pubDate>Fri, 22 May 2026 08:14:07 +0000</pubDate>
      <link>https://forem.com/lanycrost/macos-workers-or-how-i-built-my-own-mac-cloud-bhm</link>
      <guid>https://forem.com/lanycrost/macos-workers-or-how-i-built-my-own-mac-cloud-bhm</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/lanycrost/jenkins-as-a-code-or-how-i-stopped-clicking-around-in-the-ui-1nko"&gt;Part 1&lt;/a&gt; I laid out the Jenkins-as-a-Code setup (JCasC, Job DSL, ephemeral workers, Packer images), and said macOS workers deserved a separate post. This is that post.&lt;/p&gt;

&lt;p&gt;For anyone who's never run macOS builds in CI: most things that are easy on Linux turn out to be hard on macOS, often for reasons that don't apply anywhere else. Apple's licensing rules mean you can't just spin up a Mac in AWS the way you do an Ubuntu box. Then there's the keychain, the signing tooling, and the Xcode versioning. The typical answer at most companies is a few Mac minis under someone's desk that everybody SSHes into, and that works for a single team right up until the company depends on it.&lt;/p&gt;

&lt;p&gt;I wanted the same setup on macOS that I had for Linux and Windows: a fresh worker per build, destroyed when the build finishes. Getting there took a while.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xqnnp6n4ktbuh6v0dxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xqnnp6n4ktbuh6v0dxc.png" alt="mac minis on rack" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why macOS is hard in the first place
&lt;/h2&gt;

&lt;p&gt;A few things to keep in mind first, because they explain why the architecture below looks the way it does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The cloud Mac story is awkward.&lt;/strong&gt; &lt;a href="https://aws.amazon.com/ec2/instance-types/mac/" rel="noopener noreferrer"&gt;EC2 Mac instances&lt;/a&gt; exist - real Mac hardware in AWS data centers, you can rent one. But they're dedicated hosts with a 24-hour minimum allocation (Apple's licensing, not AWS being weird) and per-hour pricing is brutal next to Linux. If your worker lives for 30 minutes and you pay for 24 hours, the per-build math is rough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Apple's EULA only allows macOS to run on Apple hardware.&lt;/strong&gt; Which means you can't legally virtualize macOS on a Linux box. Real Mac hardware has to be in the loop somewhere - yours, rented, or in someone else's rack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. macOS virtualization is its own ecosystem.&lt;/strong&gt; On Intel Macs the answer used to be VMware (vSphere or Fusion) or VirtualBox. On Apple Silicon, neither works the same way. Everything goes through Apple's Virtualization.framework now, and the tooling around it is still young.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Signing and notarization credentials fight you on ephemeral VMs.&lt;/strong&gt; Developer ID certificates, app-specific passwords, the keychain - none of it was designed for "fresh VM every build". It assumes a developer's laptop. Making it work in CI is its own rabbit hole.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. macOS images are huge.&lt;/strong&gt; A baked Packer image with Xcode is 60-80 GB. Pulling that from cold storage is slow, so caching matters a lot more here than on Linux.&lt;/p&gt;

&lt;p&gt;All five show up repeatedly in the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I evaluated and didn't pick
&lt;/h2&gt;

&lt;p&gt;There aren't many serious players in macOS CI. I evaluated the commercial options seriously before landing on what's below. None of them were bad. The economics just didn't line up for us.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://veertu.com/anka-build/" rel="noopener noreferrer"&gt;Veertu Anka&lt;/a&gt;&lt;/strong&gt; - the most mature paid platform for macOS CI virtualization. Roughly what Tart does, plus a polished UI, enterprise support, more features. Licensing is per-host or per-VM, which adds up fast once you have a real fleet. Credible if you've got budget and want a vendor to call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.macstadium.com/" rel="noopener noreferrer"&gt;MacStadium&lt;/a&gt;&lt;/strong&gt; - managed-Mac hosting. You rent physical Macs in their DC, optionally with their orchestration layer (Orka). Good if you don't want to rack Macs yourself. Per-host per-month pricing fits a steady-state fleet; spiky volume or existing hardware makes it worse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://aws.amazon.com/ec2/instance-types/mac/" rel="noopener noreferrer"&gt;AWS EC2 Mac instances&lt;/a&gt;&lt;/strong&gt; - see above. Worth it for very low-volume work, where avoiding ops outweighs the per-hour bill. The 24-hour minimum kills it for high-volume ephemeral CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions managed macOS runners&lt;/strong&gt; - fine for OSS projects and small teams. Per-minute pricing gets painful at real volume. And the image is fixed - the moment you need anything past stock Xcode, you're stuck.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What sold me on Tart was the licensing more than the technology. &lt;a href="https://tart.run/licensing/" rel="noopener noreferrer"&gt;The commercial license&lt;/a&gt; is free for personal and small-scale use, and the paid tier doesn't scale linearly with fleet size the way Anka's per-host model does. It's affordable at our build volume, and at one or two Macs you pay nothing at all.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://cirruslabs.org/" rel="noopener noreferrer"&gt;As of April 2026 - when Cirrus Labs joined OpenAI - the licensing got better still: Tart, Vetu and Orchard have been relicensed under a more permissive license and the commercial fees dropped entirely.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The rest of the &lt;a href="https://cirruslabs.org/" rel="noopener noreferrer"&gt;Cirrus Labs&lt;/a&gt; toolchain holds up too. &lt;a href="https://github.com/cirruslabs/orchard" rel="noopener noreferrer"&gt;Orchard&lt;/a&gt; sits on top of Tart for fleet orchestration, and &lt;a href="https://github.com/cirruslabs/cirrus-cli" rel="noopener noreferrer"&gt;Cirrus CLI&lt;/a&gt; lets you run CI tasks locally against a Tart VM. Being able to reproduce a Jenkins job on my laptop has saved hours of debugging CI-only failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three ways I ended up provisioning macOS workers
&lt;/h2&gt;

&lt;p&gt;No single tool covered everything I needed, so I ended up with three provisioners for three shapes of Mac fleet. All three follow the same pattern as in Part 1 (a Jenkins job invokes the provisioner, the worker comes up from a Packer image, runs the build, and gets destroyed), but the layer underneath each one is different.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A - Tart, for Apple Silicon
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/cirruslabs/tart" rel="noopener noreferrer"&gt;Tart&lt;/a&gt; is a small open-source CLI from Cirrus Labs that wraps Apple's &lt;code&gt;Virtualization.framework&lt;/code&gt;. Hand it an OCI-compatible image (basically a tarball with the macOS VM disk) and it boots a VM on Apple Silicon in seconds. Images are reusable and layerable - it's the closest thing to "docker but for macOS VMs" I've come across.&lt;/p&gt;

&lt;p&gt;How it fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hardware: a fleet of Mac minis (or Studios) we own or rent, sitting in a rack.&lt;/li&gt;
&lt;li&gt;Each Mac runs Tart on the host.&lt;/li&gt;
&lt;li&gt;A Jenkins job grabs an available host, &lt;code&gt;tart clone&lt;/code&gt;s from a known image tag, &lt;code&gt;tart run&lt;/code&gt;s it, registers the VM as a Jenkins agent, then &lt;code&gt;tart delete&lt;/code&gt;s when the build finishes.&lt;/li&gt;
&lt;li&gt;Packer's &lt;a href="https://github.com/cirruslabs/packer-plugin-tart" rel="noopener noreferrer"&gt;&lt;code&gt;tart-cli&lt;/code&gt; source&lt;/a&gt; builds the images. Xcode, Homebrew, signing tools, language runtimes - all baked in at image-build time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good: spin-up is fast - &lt;code&gt;tart clone&lt;/code&gt; to "agent connected" is under a minute. The image is a snapshot, so every build starts from byte-identical state.&lt;/p&gt;

&lt;p&gt;The not-so-good: you still need to own or rent the Macs. The "real Apple machines in a rack" problem doesn't go away, you just orchestrate around it. And the Tart ecosystem is young - expect to write glue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B - vSphere / VCSA, for the older Intel fleet
&lt;/h3&gt;

&lt;p&gt;Before Apple Silicon, the Mac fleet was a stack of Intel Mac minis hooked into a &lt;a href="https://www.vmware.com/products/vsphere.html" rel="noopener noreferrer"&gt;vSphere&lt;/a&gt; cluster. macOS VMs were managed as ESXi guests, the same way any other VMware VM would be.&lt;/p&gt;

&lt;p&gt;How it fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ESXi on each Mac mini (the only OS Apple's licensing lets you install on a Mac and still host macOS guests).&lt;/li&gt;
&lt;li&gt;A golden macOS VM template lives in vSphere, baked by Packer's &lt;a href="https://developer.hashicorp.com/packer/integrations/hashicorp/vsphere" rel="noopener noreferrer"&gt;vSphere ISO builder&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Jenkins runs Terraform with the &lt;a href="https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs" rel="noopener noreferrer"&gt;vSphere provider&lt;/a&gt; to clone the template (linked clones are faster), bring up the VM, register it as an agent, tear it down after.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup predates Apple Silicon and it still works, but it's the heaviest of the three. Linked clones help with spawn time, but it's still slower than Tart, and vSphere itself is a chunky thing to operate on top of that.&lt;/p&gt;

&lt;p&gt;It's the responsible-enterprise path. If you already run VMware in your org it slots in fine, but nobody starting fresh in 2026 would pick it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option C - Orchard, for pooled / remote Macs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/cirruslabs/orchard" rel="noopener noreferrer"&gt;Orchard&lt;/a&gt; is also from Cirrus Labs, in the same family as Tart. Instead of orchestrating individual Mac hosts yourself, Orchard sits as a controller in front of a pool of workers and you request a VM through its API. It handles scheduling, queuing, and lifecycle for you.&lt;/p&gt;

&lt;p&gt;How it fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pool of Macs (yours, or from a managed provider like MacStadium), and you don't want individual Jenkins jobs picking physical machines.&lt;/li&gt;
&lt;li&gt;Jenkins calls Orchard's API for a VM with a given image and resource profile, runs the build, releases the VM.&lt;/li&gt;
&lt;li&gt;Capacity is the real constraint - 20 builds queued, 5 Macs free, Orchard handles the rest.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good: Jenkins doesn't need to know where the Mac physically lives, which is a clean separation between provisioning and scheduling.&lt;/p&gt;

&lt;p&gt;The not-so-good: it's yet another piece of infrastructure to run, which only pays off past a certain fleet size. With two or three Macs, raw Tart is simpler.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnzjcy9yawd1fyh7xqr47.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnzjcy9yawd1fyh7xqr47.png" alt="tart vs vSphere vs orchard" width="799" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What gets baked into the Packer image
&lt;/h2&gt;

&lt;p&gt;The principle stays the same: bake everything we can into the image so the build itself doesn't pay any setup time. The macOS image ends up being the heaviest in our fleet by a wide margin.&lt;/p&gt;

&lt;p&gt;What goes into a typical macOS worker image:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OS:&lt;/strong&gt; pinned macOS point release. Xcode compatibility is brittle - chasing "latest" is a bad idea.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Xcode:&lt;/strong&gt; pinned version + command line tools. Xcode alone is 30+ GB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Homebrew + packages:&lt;/strong&gt; every brewed tool the build needs, pre-installed and pre-warmed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language runtimes:&lt;/strong&gt; Node, Python, Ruby - pinned to match production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build tools:&lt;/strong&gt; CMake, Ninja, Conan, whatever the project actually uses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signing tools:&lt;/strong&gt; &lt;code&gt;codesign&lt;/code&gt;, &lt;code&gt;notarytool&lt;/code&gt;, &lt;code&gt;xcrun&lt;/code&gt; - ship with Xcode, but worth confirming they're on PATH.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-warmed caches:&lt;/strong&gt; Conan, npm, brew - anything that would otherwise download on the first build.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Packer template itself is short. Most of the work is in a chain of shell scripts that run after the base macOS install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="s2"&gt;"tart-cli"&lt;/span&gt; &lt;span class="s2"&gt;"macos"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vm_base_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ghcr.io/cirruslabs/macos-monterey-base:latest"&lt;/span&gt;
  &lt;span class="nx"&gt;vm_name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"macos-ci-${var.image_version}"&lt;/span&gt;
  &lt;span class="nx"&gt;cpu_count&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
  &lt;span class="nx"&gt;memory_gb&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
  &lt;span class="nx"&gt;disk_size_gb&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;
  &lt;span class="nx"&gt;ssh_username&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"admin"&lt;/span&gt;
  &lt;span class="nx"&gt;ssh_password&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"admin"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;build&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;sources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"source.tart-cli.macos"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;provisioner&lt;/span&gt; &lt;span class="s2"&gt;"shell"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;scripts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;"scripts/post-install.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"scripts/brew-setup.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"scripts/xcode.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"scripts/nodejs-setup.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"scripts/deps.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"scripts/prewarm-caches.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The template is maybe 30 lines. The real work is in the shell scripts, which live in the same repo and go through the same PR review as the rest of the infra.&lt;/p&gt;

&lt;p&gt;A baked image is around 60-80 GB. Storage matters, but cache locality matters more. Pulling a fresh 70 GB image from the registry on every first boot would crater throughput across the fleet, so we pre-cache base images on each host out of band.&lt;/p&gt;

&lt;h2&gt;
  
  
  The signing-on-ephemeral-VMs problem
&lt;/h2&gt;

&lt;p&gt;Signing eats more first-setup time than anything else on this list, which is why it gets its own section.&lt;/p&gt;

&lt;p&gt;Apple's signing pipeline assumes a developer machine with a persistent keychain - you unlock it once and sign apps for the rest of the day. With ephemeral CI VMs that breaks: every VM is brand new, no keychain, no saved password.&lt;/p&gt;

&lt;p&gt;What we landed on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Developer ID cert + private key live in a secrets manager (AWS Secrets Manager, Vault, whatever). Never in the image, never in git.&lt;/li&gt;
&lt;li&gt;At job start, the pipeline pulls the cert + key and imports them into a temporary keychain it creates on the VM.&lt;/li&gt;
&lt;li&gt;That keychain has a random password just for this build. It dies with the VM.&lt;/li&gt;
&lt;li&gt;Notarization credentials (app-specific password or notarytool API key) come from the same secrets manager. Used directly - no keychain needed.&lt;/li&gt;
&lt;li&gt;Build ends, VM is destroyed, keychain goes with it. Same lifecycle as the worker.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Trimmed-down version of the keychain-bootstrap script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;KEYCHAIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ci-build.keychain"&lt;/span&gt;
&lt;span class="nv"&gt;KEYCHAIN_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 24&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Create a brand-new keychain just for this build.&lt;/span&gt;
security create-keychain &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
security set-keychain-settings &lt;span class="nt"&gt;-lut&lt;/span&gt; 21600 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
security unlock-keychain &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Add it to the search list so codesign can find it.&lt;/span&gt;
security list-keychains &lt;span class="nt"&gt;-d&lt;/span&gt; user &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;security list-keychains &lt;span class="nt"&gt;-d&lt;/span&gt; user | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'"'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Import the cert + key from the secrets-manager-provided files.&lt;/span&gt;
security import &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DEVELOPER_ID_CERT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-P&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CERT_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-T&lt;/span&gt; /usr/bin/codesign

&lt;span class="c"&gt;# Grant codesign permission to use the key without prompting.&lt;/span&gt;
security set-key-partition-list &lt;span class="nt"&gt;-S&lt;/span&gt; apple-tool:,apple: &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$KEYCHAIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Things that bit us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;set-key-partition-list&lt;/code&gt; is mandatory on modern macOS. Without it, &lt;code&gt;codesign&lt;/code&gt; pops a UI password prompt that nothing will ever answer on a headless VM, and the build hangs indefinitely.&lt;/li&gt;
&lt;li&gt;The keychain must be in the search list. A keychain that exists but isn't searched is invisible to &lt;code&gt;codesign&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Notarization is asynchronous. &lt;code&gt;notarytool submit --wait&lt;/code&gt; does block until it's done, but "done" can be several minutes away, so make sure your build timeouts account for it.&lt;/li&gt;
&lt;li&gt;Stapling fails silently if you forget it. Notarization succeeds and the artifact ships, but end users still see a Gatekeeper warning because the ticket isn't stapled. Run &lt;code&gt;xcrun stapler staple &amp;lt;artifact&amp;gt;&lt;/code&gt; after notarization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is deep magic, but first-time setup tends to eat a week of debugging on most teams. Budget for that, and get the keychain bootstrap script working before you write the rest of the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs - which one should you pick?
&lt;/h2&gt;

&lt;p&gt;Probably more than one, depending on what fleet you've inherited. But if I were starting from scratch in 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;If you have...&lt;/th&gt;
&lt;th&gt;Pick&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A small pool of Apple Silicon Macs you own&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Tart&lt;/strong&gt;, directly. Free at this scale, nothing extra to run.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A larger fleet of Apple Silicon Macs, mixed ownership / remote&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Tart + Orchard&lt;/strong&gt; - same licensing, proper scheduling on top.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;An existing vSphere installation and Intel Macs&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;vSphere / VCSA&lt;/strong&gt;. Don't rebuild what works.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need enterprise support, budget isn't tight&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Veertu Anka&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Don't want to rack Macs, want a managed fleet&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;MacStadium&lt;/strong&gt; (with their orchestration, or your own).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No physical Macs, very low volume&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;EC2 Mac&lt;/strong&gt;. The 24-hour minimum stings, but sometimes the operational simplicity wins.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open-source project, low volume&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;GitHub-hosted macOS runners&lt;/strong&gt;. Free for OSS, nothing to host.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No physical Macs, high volume&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;MacStadium&lt;/strong&gt; or similar. EC2 Mac economics break at this scale.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The approach I'd push back on is "let's just use Mac minis under someone's desk". It works for a single team, but the moment every iOS release across the company depends on it, you've got a bottleneck nobody owns.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm still figuring out
&lt;/h2&gt;

&lt;p&gt;A few open problems I haven't fully solved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image freshness - Xcode updates land every few weeks, and keeping the Packer image current without breaking everyone's build is constant work. We rebuild on a schedule and pin each job to a specific image version. The rebuild itself is a 90-minute job.&lt;/li&gt;
&lt;li&gt;Cost. Mac hardware is expensive whether you own it or rent it. Above a certain build volume the math works; below that, per-build cost stings.&lt;/li&gt;
&lt;li&gt;Apple Silicon transition for older code. Some of our C++ code still has Intel-only deps that haven't been ported. Those builds run on the vSphere/Intel fleet, which is shrinking. "Rewrite all the legacy build deps for arm64" is its own multi-quarter project.&lt;/li&gt;
&lt;li&gt;Notarization queue times. Apple's notarization service has bad days where submissions take 20+ minutes. Nothing to do from our side - macOS builds just have a longer tail than everything else.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;macOS CI doesn't get clean. There's no "just run a pod in EKS" equivalent, you'll have physical hardware in the loop, probably more than one hypervisor, and a signing problem that doesn't exist on any other platform. What's worked for us is treating macOS the way we treat everything else: ephemeral workers from a baked image, triggered by a job in git, with secrets pulled from a vault at runtime. Once the contract matches what Linux and Windows do, macOS stops being the part of CI that nobody wants to own.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix - tools mentioned in this post
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cirrus Labs toolchain&lt;/strong&gt; (the one I ended up on)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/cirruslabs/tart" rel="noopener noreferrer"&gt;Tart&lt;/a&gt; - macOS VMs on Apple Silicon via &lt;code&gt;Virtualization.framework&lt;/code&gt;. Free for small-scale; &lt;a href="https://tart.run/licensing/" rel="noopener noreferrer"&gt;licensing&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/cirruslabs/orchard" rel="noopener noreferrer"&gt;Orchard&lt;/a&gt; - controller for pooled Tart hosts.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/cirruslabs/cirrus-cli" rel="noopener noreferrer"&gt;Cirrus CLI&lt;/a&gt; - run CI tasks locally against a Tart VM using a &lt;code&gt;.cirrus.yml&lt;/code&gt; config.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/cirruslabs/packer-plugin-tart" rel="noopener noreferrer"&gt;Packer Tart plugin&lt;/a&gt; - Packer builder for Tart images.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Commercial alternatives I evaluated&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://veertu.com/anka-build/" rel="noopener noreferrer"&gt;Veertu Anka&lt;/a&gt; - paid platform for macOS CI virtualization, polished, enterprise support.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.macstadium.com/" rel="noopener noreferrer"&gt;MacStadium&lt;/a&gt; - managed Mac hosting + optional &lt;a href="https://www.macstadium.com/orka" rel="noopener noreferrer"&gt;Orka&lt;/a&gt; orchestration.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/ec2/instance-types/mac/" rel="noopener noreferrer"&gt;AWS EC2 Mac instances&lt;/a&gt; - real Apple hardware in AWS, 24-hour minimum allocation.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners" rel="noopener noreferrer"&gt;GitHub-hosted macOS runners&lt;/a&gt; - fine for OSS / small scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Other&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.vmware.com/products/vsphere.html" rel="noopener noreferrer"&gt;vSphere&lt;/a&gt; and the &lt;a href="https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs" rel="noopener noreferrer"&gt;Terraform vSphere provider&lt;/a&gt; - for the older Intel fleet.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.packer.io/" rel="noopener noreferrer"&gt;HashiCorp Packer&lt;/a&gt; - bakes all the worker images.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.apple.com/documentation/security/notarizing_macos_software_before_distribution/customizing_the_notarization_workflow" rel="noopener noreferrer"&gt;Apple's notarytool&lt;/a&gt; - the modern notarization CLI.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This is Part 2 of My CI/CD Odyssey. Follow me here on dev.to if you want to get pinged when Part 3 drops. And if you're doing macOS CI differently, I'd love to hear about it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>macos</category>
      <category>cicd</category>
      <category>devops</category>
      <category>jenkins</category>
    </item>
    <item>
      <title>Jenkins as a Code, or how I stopped clicking around in the UI</title>
      <dc:creator>Khachatur Ashotyan</dc:creator>
      <pubDate>Mon, 18 May 2026 16:35:13 +0000</pubDate>
      <link>https://forem.com/lanycrost/jenkins-as-a-code-or-how-i-stopped-clicking-around-in-the-ui-1nko</link>
      <guid>https://forem.com/lanycrost/jenkins-as-a-code-or-how-i-stopped-clicking-around-in-the-ui-1nko</guid>
      <description>&lt;p&gt;I've been running Jenkins for years now. Different companies, different team sizes, but the same story keeps repeating, and at some point I couldn't take it anymore. So I decided to write some of it down. This is Part 1 of what I'm calling &lt;strong&gt;My CI/CD Odyssey&lt;/strong&gt; - ideas I tried, things that blew up in my face, and stuff I still use today.&lt;/p&gt;

&lt;p&gt;Later chapters get into the painful stuff: building macOS workers without losing your mind, &lt;strong&gt;spot instances as GitHub Actions runners&lt;/strong&gt; to cut costs, plus a few other rabbit holes. First, the beginning. That's where most of the pain came from.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "before" picture, and why it hurts
&lt;/h2&gt;

&lt;p&gt;Anyone who's worked with Jenkins for a while knows this scene. Somebody opens the Jenkins UI, clicks "New Item", picks a freestyle or pipeline job, fills in twenty-something fields, scrolls past a wall of plugin options, hits Save. A month later somebody else has to figure out why a job behaves differently in &lt;code&gt;stage&lt;/code&gt; than in &lt;code&gt;prod&lt;/code&gt;, and the answer is "because Arthur clicked a different checkbox in February and nobody remembers".&lt;/p&gt;

&lt;p&gt;That was my world for a long time. Multi-tier environments (&lt;code&gt;stage&lt;/code&gt;, &lt;code&gt;prod&lt;/code&gt;, sometimes more), and on top of that, sometimes more than one Jenkins instance per tier. Each one configured by hand: plugins installed manually, pipelines copy-pasted from one Jenkins to another and edited in place, credentials added by hand, workers attached one at a time. Then one day you wake up and realize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nobody remembers what plugins are installed where.&lt;/li&gt;
&lt;li&gt;The "stage" Jenkins doesn't match prod anymore. You find out when a pipeline breaks in prod.&lt;/li&gt;
&lt;li&gt;A Friday afternoon plugin update kills a build. Rolling it back is a human clicking buttons under stress.&lt;/li&gt;
&lt;li&gt;A new team member joins, and you burn three days explaining tribal knowledge that should live in a repo.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one was what finally pushed me. Tribal knowledge is fine in a team of two sharing a desk, but past that it costs weeks of onboarding for every new hire.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea: treat Jenkins like any other piece of code
&lt;/h2&gt;

&lt;p&gt;So I started reading. Jenkins is infrastructure. We already do infrastructure-as-code for everything else - Terraform for cloud, Helm for Kubernetes, Ansible for hosts - so why is Jenkins the one piece still managed by hand? Controller, jobs, credentials wiring, workers - pull it all out of a git repo.&lt;/p&gt;

&lt;p&gt;What I wrote down for myself:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I want a Jenkins where I can throw away the VM, the cluster, the config, run a pipeline, and ten minutes later have the same Jenkins back. And &lt;code&gt;stage&lt;/code&gt; should be code-to-code identical to &lt;code&gt;prod&lt;/code&gt;, so when I test a plugin upgrade in stage I know how it'll behave in prod.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Anyone who's been burned by a "but it worked in stage" deploy knows why this matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The building blocks
&lt;/h2&gt;

&lt;p&gt;When I started designing this, it broke into a handful of moving pieces. None of them are revolutionary on their own, but wiring them together is where the value lives.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1tg081zv8f8le07bs74.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1tg081zv8f8le07bs74.png" alt="A three-column architecture diagram: git repo on the left, Jenkins controller and operator in the middle, ephemeral Linux pods + cloud VMs + macOS VMs on the right." width="799" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. JCasC - Jenkins Configuration as Code
&lt;/h3&gt;

&lt;p&gt;This is the foundation. &lt;a href="https://plugins.jenkins.io/configuration-as-code/" rel="noopener noreferrer"&gt;&lt;strong&gt;JCasC&lt;/strong&gt;&lt;/a&gt; is a Jenkins plugin that defines the controller config in YAML - system settings, security realm, authorization strategy, clouds, credentials, tools, global libraries. The controller reads the YAML on boot and configures itself.&lt;/p&gt;

&lt;p&gt;The first time I rebuilt a controller from a YAML file, I stopped clicking through the UI for good. The controller only knows about things in the YAML, so anything else might as well not exist.&lt;/p&gt;

&lt;p&gt;Minimal example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jenkins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;systemMessage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Managed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;by&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;JCasC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;do&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;edit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;UI"&lt;/span&gt;
  &lt;span class="na"&gt;numExecutors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;EXCLUSIVE&lt;/span&gt;
  &lt;span class="na"&gt;securityRealm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;clientID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${GITHUB_CLIENT_ID}&lt;/span&gt;
      &lt;span class="na"&gt;clientSecret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${GITHUB_CLIENT_SECRET}&lt;/span&gt;
  &lt;span class="na"&gt;clouds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kubernetes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eks"&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jenkins"&lt;/span&gt;
        &lt;span class="na"&gt;jenkinsUrl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://jenkins.jenkins.svc.cluster.local:8080"&lt;/span&gt;
&lt;span class="na"&gt;unclassified&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;globalLibraries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;libraries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ci-libs"&lt;/span&gt;
        &lt;span class="na"&gt;defaultVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt;
        &lt;span class="na"&gt;retriever&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;modernSCM&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;scm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;git&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;remote&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://github.com/&amp;lt;org&amp;gt;/ci-libs.git"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fifteen lines of YAML, and that's most of the controller.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Job DSL - jobs from a git repo
&lt;/h3&gt;

&lt;p&gt;JCasC handles the controller but not the jobs. For that I used the &lt;a href="https://plugins.jenkins.io/job-dsl/" rel="noopener noreferrer"&gt;&lt;strong&gt;Job DSL plugin&lt;/strong&gt;&lt;/a&gt;. Jobs live as Groovy files in a git repo, and a small "seeder" job in Jenkins polls the repo and rebuilds jobs from the DSL files on each run. Deleting a job from git removes it from Jenkins on the next poll; changing a parameter in git rolls forward the same way.&lt;/p&gt;

&lt;p&gt;The Jenkins UI ends up effectively read-only from a configuration perspective. Anyone who tries to edit a job in the UI gets overwritten by the next seeder run, which is by design.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://jenkinsci.github.io/job-dsl-plugin/" rel="noopener noreferrer"&gt;Look here for declarative API&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Helm + Kubernetes for the controller
&lt;/h3&gt;

&lt;p&gt;I run the Jenkins controller in &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;. The deployment uses the &lt;a href="https://github.com/jenkinsci/helm-charts" rel="noopener noreferrer"&gt;official Helm chart&lt;/a&gt;, with a persistent volume for the home directory and a sidecar that injects JCasC config from a &lt;code&gt;ConfigMap&lt;/code&gt;. Upgrading Jenkins is a chart version bump, rolling back is the same chart at the previous version. The plugin list sits in &lt;code&gt;values.yaml&lt;/code&gt;, version-pinned and reviewed in a PR like any other code change.&lt;/p&gt;

&lt;p&gt;This is when plugin upgrades stopped feeling like Friday-night events. Each upgrade goes through &lt;code&gt;stage&lt;/code&gt; in a PR and gets the same review as application code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Side note: if you don't want to deal with Helm, the community maintains a &lt;a href="https://github.com/jenkinsci/kubernetes-operator" rel="noopener noreferrer"&gt;&lt;strong&gt;Jenkins Kubernetes Operator&lt;/strong&gt;&lt;/a&gt; that's CRD-first. I went with Helm because the upgrade story is simpler, but the operator is fine if you're already deep in operators.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  4. Packer for worker images
&lt;/h3&gt;

&lt;p&gt;Then there's the workers, the machines that actually run builds. I went all-in on &lt;a href="https://www.packer.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;Packer&lt;/strong&gt;&lt;/a&gt;. Every worker image is baked from a Packer template in git, with the base OS, language runtimes, SDKs, and build tools pre-installed. Each image has a version, and the worker config pins to a specific one.&lt;/p&gt;

&lt;p&gt;Before Packer, every worker was a slightly different snowflake, hand-installed and slowly drifting. After Packer, every worker booted from &lt;code&gt;v1.2.3&lt;/code&gt; is byte-for-byte identical to every other one. When a dependency upgrade breaks something, you know which image introduced it, and pinning back to the previous version is a one-line PR.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Ephemeral workers - born, used, destroyed
&lt;/h3&gt;

&lt;p&gt;The ephemeral worker piece is what ties everything together, and it's the part I'm proudest of. Workers in this setup are strictly ephemeral: a new worker per build, never a long-lived agent we reboot once a week. A pipeline asks Jenkins for a worker; a dedicated job spins one up from a known Packer image, the build runs on it, and the worker gets destroyed when the build finishes. Every build starts on a fresh machine.&lt;/p&gt;

&lt;p&gt;The spin-up mechanism varies by platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux builds: the &lt;a href="https://plugins.jenkins.io/kubernetes/" rel="noopener noreferrer"&gt;Jenkins Kubernetes plugin&lt;/a&gt; schedules a pod in EKS from a container image we baked. Build finishes, pod is deleted. Lifecycle is seconds to minutes.&lt;/li&gt;
&lt;li&gt;AWS EC2 / Azure VMs (Linux and Windows): a dedicated job runs &lt;a href="https://developer.hashicorp.com/terraform" rel="noopener noreferrer"&gt;terraform&lt;/a&gt; to provision and de-provision instances from Packer templates.&lt;/li&gt;
&lt;li&gt;macOS VMs: the same idea, but macOS virtualization is its own ecosystem. A fresh macOS VM gets booted from a Packer-baked image on each build (Tart on Apple Silicon hosts, vSphere for the older fleet, or &lt;a href="https://github.com/cirruslabs/orchard" rel="noopener noreferrer"&gt;Orchard&lt;/a&gt; for pooled remote Macs), the build runs, the VM is torn down. macOS deserves its own post (Part 2), but the lifecycle is the same: provisioned for one build, then torn down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every build starts from byte-identical state. Not "mostly the same", not "the same except for &lt;code&gt;~/.cache&lt;/code&gt;". If the image tag is &lt;code&gt;v1.2.3&lt;/code&gt;, every build on it starts from the exact filesystem snapshot Packer produced. There's no operator history sitting on the disk.&lt;/p&gt;

&lt;p&gt;That eliminates a whole class of bugs: leftover state on the agent, the weird &lt;code&gt;~/.cache&lt;/code&gt; nobody cleaned up, a disk full of artifacts from three weeks ago, the Friday-only flake from a leak that's been growing since Monday. None of it survives, because the worker doesn't live long enough to accumulate it.&lt;/p&gt;

&lt;p&gt;It also makes "build is non-reproducible" investigations faster. If two builds against the same commit produce different artifacts, the cause is almost never the worker, since both ran on a fresh one.&lt;/p&gt;

&lt;p&gt;Security gets simpler too. Secrets pulled onto a worker disappear with the worker, so no long-lived agent holds old tokens. If a credential ever leaks into a build environment, the worker is gone within minutes and the leak goes with it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65jvcvpkb9y00w4vxxdo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65jvcvpkb9y00w4vxxdo.png" alt="One build, one worker, three platforms, same contract" width="800" height="742"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Terraform / Terragrunt for everything else
&lt;/h3&gt;

&lt;p&gt;Everything that isn't Jenkins itself (VPCs, IAM, secret stores, the EKS cluster, image galleries) lives in &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;, wrapped with &lt;a href="https://terragrunt.gruntwork.io/" rel="noopener noreferrer"&gt;Terragrunt&lt;/a&gt; so the same modules get reused across &lt;code&gt;stage&lt;/code&gt; and &lt;code&gt;prod&lt;/code&gt; with different inputs. That's why &lt;code&gt;stage&lt;/code&gt; ends up code-to-code identical to &lt;code&gt;prod&lt;/code&gt;: the same modules at the same versions, just with different variables.&lt;/p&gt;

&lt;p&gt;To check how prod will behave under a change, you run the same Terraform with &lt;code&gt;ENV=stage&lt;/code&gt; instead of &lt;code&gt;ENV=prod&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it all clicks together
&lt;/h2&gt;

&lt;p&gt;The flow ends up looking like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Somebody opens a PR - new job, plugin bump, JCasC tweak, new Packer image, whatever.&lt;/li&gt;
&lt;li&gt;CI validates: YAML lint, Groovy compile checks, &lt;code&gt;terraform plan&lt;/code&gt;, Packer build for any changed images.&lt;/li&gt;
&lt;li&gt;PR gets reviewed and merged.&lt;/li&gt;
&lt;li&gt;On merge, &lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt; applies infra via Terraform. The Jenkins seeder picks up new DSL files on its next poll.&lt;/li&gt;
&lt;li&gt;The next build that needs a worker pulls the new image.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Jenkins UI becomes a view onto what the repo says should be running, while the repo itself holds the truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this fixed for me
&lt;/h2&gt;

&lt;p&gt;What changed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp8w99edkczsvh6cuxit8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp8w99edkczsvh6cuxit8.png" alt="Before / After - Jenkins as Code" width="800" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We stopped seeing "works on stage, breaks on prod" bugs. Because stage runs the same code as prod with the same modules at the same versions, when it works in stage it works in prod (modulo data).&lt;/li&gt;
&lt;li&gt;Plugin upgrades aren't Friday-night events anymore. A bad one gets reverted like any other change.&lt;/li&gt;
&lt;li&gt;Onboarding got much faster. New engineers read the repo instead of getting a Jenkins UI tour and a Slack thread of secrets.&lt;/li&gt;
&lt;li&gt;Disaster recovery actually works. If I lost the controller VM, the EKS cluster, or even the whole account, the repo alone is enough to rebuild it.&lt;/li&gt;
&lt;li&gt;We get an audit trail without writing one. Every pipeline change is a git commit with an author, a timestamp, and a PR description.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm still figuring out
&lt;/h2&gt;

&lt;p&gt;This isn't a finished story. A few things still keep me up at night:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;macOS workers are the hardest piece. AWS does offer Mac instances, but the 24-hour minimum allocation and bare-metal model make them nothing like spinning up a Linux VM, and the hypervisor, licensing, and hardware constraints push the whole macOS story onto its own track. Part 2 covers it: &lt;a href="https://github.com/cirruslabs/tart" rel="noopener noreferrer"&gt;Tart&lt;/a&gt;, virtualization on Apple Silicon, the trade-offs between self-hosted and cloud-mac providers, and the signing and notarization pain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub Actions costs add up at scale. You can offload heavier workloads to spot-instance runners cheaply, though spot brings its own trade-offs. Part 3 walks through that.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;If you're still managing Jenkins through the UI, it's rarely about laziness. The cost shows up in places that don't make it onto any dashboard: the engineer who leaves and takes the only working configuration in their head, the 2am plugin-upgrade breakage, the customer-facing deploy that fails because stage and prod had quietly drifted apart for six months. Jenkins as Code doesn't make those costs disappear, but it surfaces them as PRs I can see and review, which for me has been worth the work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix - tools and plugins I leaned on
&lt;/h2&gt;

&lt;p&gt;For anyone who wants to skip straight to the implementations, here's what's wired up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jenkins plugins&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://plugins.jenkins.io/configuration-as-code/" rel="noopener noreferrer"&gt;Configuration as Code (JCasC)&lt;/a&gt;: the controller config in YAML.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://plugins.jenkins.io/job-dsl/" rel="noopener noreferrer"&gt;Job DSL&lt;/a&gt;: jobs defined in Groovy in a git repo.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://plugins.jenkins.io/kubernetes/" rel="noopener noreferrer"&gt;Kubernetes plugin&lt;/a&gt;: ephemeral pod agents in EKS.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://plugins.jenkins.io/workflow-cps-global-lib/" rel="noopener noreferrer"&gt;Pipeline: Shared Groovy Libraries&lt;/a&gt;: the global libraries that hold reusable pipeline code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/jenkinsci/helm-charts" rel="noopener noreferrer"&gt;Jenkins official Helm chart&lt;/a&gt;: what I use to deploy the controller.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/jenkinsci/kubernetes-operator" rel="noopener noreferrer"&gt;Jenkins Kubernetes Operator&lt;/a&gt;: the CRD-based alternative, if you prefer operators over Helm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Image building&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.packer.io/" rel="noopener noreferrer"&gt;HashiCorp Packer&lt;/a&gt;: bakes all the worker images (Linux, Windows, macOS).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;: everything outside Jenkins (VPCs, IAM, secrets, EKS, image galleries).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://terragrunt.gruntwork.io/" rel="noopener noreferrer"&gt;Terragrunt&lt;/a&gt;: keeps the same modules DRY across &lt;code&gt;stage&lt;/code&gt; and &lt;code&gt;prod&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; / &lt;a href="https://aws.amazon.com/eks/" rel="noopener noreferrer"&gt;Amazon EKS&lt;/a&gt;: where the Jenkins controller lives.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt;: package manager for the Kubernetes side.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;: applies Terraform on merge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Coming up in later parts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/cirruslabs/tart" rel="noopener noreferrer"&gt;Tart&lt;/a&gt;: macOS VMs on Apple Silicon (Part 2).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/cirruslabs/orchard" rel="noopener noreferrer"&gt;Orchard&lt;/a&gt;: Tart cluster orchestration for macOS fleets (Part 2).&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This is Part 1 of My CI/CD Odyssey. Follow me here on dev.to if you want to be pinged when Part 2 drops. And if you're doing Jenkins as Code differently, I'd love to hear about it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>jenkins</category>
      <category>devops</category>
      <category>cicd</category>
      <category>gitops</category>
    </item>
  </channel>
</rss>
