Forem: Khachatur Ashotyan

Spot instances as GitHub Actions runners

Khachatur Ashotyan — Sat, 23 May 2026 09:04:32 +0000

Part 1 covered Jenkins as Code with ephemeral workers. Part 2 covered macOS workers. This one is about moving a chunk of the CI workload off Jenkins entirely, onto GitHub Actions, with EC2 spot instances as the runner fleet.

This isn't a "Jenkins is dead, use GitHub Actions" post. Jenkins still handles the heavy builds: macOS, Windows, anything that runs for hours or needs custom orchestration. GitHub Actions runs alongside it for a narrower class of workload where it fits better.

What follows is the self-hosted spot runner pattern: how to point GitHub Actions at your own ephemeral EC2 fleet, and the things that bite once you do.

Why bother

GitHub's managed runners are fine for small teams. There are a few reasons to switch:

1. Cost at volume. GitHub bills its managed Linux runners at $0.008/minute ($0.48/hour). Fine for a few builds a day. Last month we ran 80,887 runner-minutes across 29,347 jobs (~1,350 hours). On managed runners that would have been ~$647. Our actual EC2 bill for the runner fleet was ~$160 - $130 on spot, $28 EC2-Other (EBS, ENIs, data transfer). Roughly 4x cheaper, and the gap widens the more you run.

2. Instance shape. Managed runners come in fixed sizes. Builds that need 16 vCPUs and 64 GB of RAM, a GPU, or arm64 either pay for the largest tier or don't fit at all. Self-hosted lets you pick whatever EC2 instance type the build actually needs.

3. Network access. Builds that talk to private resources (internal artifact registries, RDS, anything behind a VPC) are awkward on managed runners. Self-hosted runners live inside your VPC, so they hit those resources directly without proxies or tunnels.

Cost is what got us to try it. The VPC access and instance flexibility came along for free.

What a "self-hosted spot runner" is

A self-hosted GitHub Actions runner is a small agent. It registers with a GitHub repo or org, polls for jobs matching its labels, runs them, and reports results back. Anything that can run the binary works as a host (bare metal, VM, container, whatever).

It can be either:

Persistent: registered once, sits there, picks up jobs as they come.
Ephemeral: single-use registration token, picks up one job, de-registers, shuts down.

We went with ephemeral. Long-lived self-hosted runners combine the operational burden of managing a host with the build-pollution risk of a shared agent and a security blast radius that never closes.

Every GitHub Actions job gets its own EC2 spot instance, freshly launched from a Packer-baked AMI. The job runs, then the instance is terminated. Job runs, instance terminates. The same one-build-per-worker lifecycle as our Jenkins workers, on a different control plane.

The architecture

No single service does this end-to-end. Either you wire it together yourself, or you reach for one of the open-source modules that already does. I went with terraform-aws-github-runner. It's the most mature module in this space and fits cleanly into a Terraform-managed AWS account. (If you remember the project under its old name, philips-labs/terraform-aws-github-runner, it's the same code, moved to the github-aws-runners org.)

When someone opens a PR that triggers a workflow:

GitHub fires a workflow_job webhook the moment a job is queued.
API Gateway plus a webhook Lambda check the HMAC, filter for relevant runner labels, and push a message onto an SQS queue.
A scale-up Lambda drains that queue. For each queued job it launches an EC2 spot instance from a specific AMI, with user-data carrying a single-use registration token.
The instance comes up. Cloud-init runs; the runner binary registers itself with GitHub and starts polling for jobs.
The runner came up with matching labels, so GitHub schedules the queued job onto it.
The workflow runs whatever your YAML says: checkout, build, test, push artifacts.
The runner is registered as --ephemeral, so the agent exits after one job. A scheduled scale-down Lambda cleans up anything left over.

The Lambda code, the SQS queues, and the IAM glue all live inside the module. You don't write any of that yourself. What you do write is the Terraform configuration that declares which runners exist, which AMI they use, and which instance types are eligible.

Multi-tier runners

This setup gets useful in practice when you split runners into tiers distinguished by labels. Workflows pick a tier by setting runs-on: in the workflow YAML.

We run three:

Small (t3.medium / m5.large): linters, formatters, doc builds, anything that doesn't really stress a CPU. Spawns fast and spot capacity at this size is never a problem.
Large (m5.xlarge / c5.xlarge): the typical build-and-test workflow that wants some CPU but doesn't hammer it.
Compute-intensive (c7a.4xlarge / c8a.8xlarge): compile-heavy builds, large test suites, anything that scales with cores.

Each tier is its own call to the same Terraform module with different labels and instance-type lists. Sanitized:

module "github-runners" {
  source  = "github-aws-runners/github-runner/aws//modules/multi-runner"
  version = "~> 6.0"

  multi_runner_config = {
    "linux-x64-small" = {
      runner_config = {
        runner_extra_labels = "linux,x64,small"
        instance_types      = local.default_instances
        ami_filter          = { name = ["*ci-runner-x64*"] }
        enable_ephemeral_runners = true
        enable_spot_instances    = true
      }
    }

    "linux-x64-compute-intensive" = {
      runner_config = {
        runner_extra_labels = "linux,x64,compute-intensive"
        instance_types      = local.compute_intensive
        ami_filter          = { name = ["*ci-runner-x64*"] }
        enable_ephemeral_runners = true
        enable_spot_instances    = true
      }
    }

    # ...and so on for the other tiers
  }

  # Common stuff: webhook secret, GitHub app credentials, VPC config, etc.
  github_app = { ... }
  webhook_secret = random_id.webhook_secret.hex
  vpc_id  = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
}

Picking the tier in a workflow:

jobs:
  build:
    runs-on: [self-hosted, linux, x64, compute-intensive]
    steps:
      - uses: actions/checkout@v4
      - run: make build

GitHub matches the runs-on array against runner labels and picks any registered runner that has them all. The Lambda only spawns instances on demand, so an unused tier costs nothing.

Scheduling - warm pool during hours, off at night

Pure on-demand scaling sounds ideal in theory. Zero idle runners, pay per job, the Lambda spawns instances only when GitHub queues something. Two patterns spoil it in practice.

The morning-rush problem. The first few PRs of the day queue around the time people log in. On pure on-demand, every one of those jobs eats the full cold-start latency, somewhere between 60 and 120 seconds from queue to running. A dozen developers pushing at 9am turns into a visible backlog.

The 3am problem. Even on spot, idle runners cost something. There's EBS attached to the warm AMIs, plus the always-on orchestration Lambdas. Outside business hours the queue is mostly empty, so there's no reason to keep capacity hot.

The runner module addresses both with idle pools and scheduled scaling.

What works for us:

During business hours (weekdays 08:00 to 20:00 in our primary timezone), we keep a warm pool of N runners per tier, sitting registered with GitHub and ready to grab the first matching job. When one claims a job, the scale-up Lambda spawns a replacement, so the pool stays at N. Cold start for the user effectively disappears.
Outside that window, the pool size drops to zero. Late-night and weekend jobs still run; they just pay the cold-start tax. Most of what runs at those hours is scheduled batch work that doesn't care about an extra minute.

The Terraform for it (sanitized, per-tier):

runner_config = {
  # ... labels, instance_types, ami_filter as before ...

  enable_ephemeral_runners = true
  enable_spot_instances    = true

  # Warm pool - kept at this size during the cron windows below.
  idle_config = [
    {
      cron      = "0 8 * * MON-FRI"   # ramp up at 08:00 weekdays
      timeZone  = "Europe/Berlin"
      idleCount = 3
    },
    {
      cron      = "0 20 * * MON-FRI"  # ramp down at 20:00 weekdays
      timeZone  = "Europe/Berlin"
      idleCount = 0
    },
  ]
}

A few notes about this pattern:

Warm-pool runners are still single-use, just pre-launched. Each one picks up one job and dies, with the pool replenished by fresh instances rather than reused ones. That keeps the byte-identical state property from the Jenkins side intact.
Pool size is a tuning decision. Too small and you still cold-start during the rush; too big and you're burning money on idle capacity. We tune per tier based on the morning queue depth we actually observe. The compute-intensive tier gets a smaller pool because those jobs are rarer.
Spot eviction during pool-idle time is fine. If AWS reclaims a pool runner before it ever picks up a job, the scale-up Lambda just launches a replacement. The pool size is a target, not a fixed set of instances.
Holidays are a remaining problem. The cron schedule doesn't know about public holidays, so on a Monday holiday the pool still ramps up at 08:00 to serve nobody. The cost is small enough that nobody's been motivated to build a calendar-aware scheduler.

During working hours the developer experience is roughly as fast as managed runners. Outside those hours the bill is close to nothing.

The AMI is still a Packer image

As with the Jenkins workers, anything the runner needs to do its job (language runtimes, build tools, Docker, cached dependencies) gets baked into a Packer AMI ahead of time. The AMI is versioned, lives in your AWS account, and is referenced by the runner's ami_filter.

For GitHub Actions the image is usually lighter than the Jenkins worker AMIs - GHA workflows install most of their tooling at runtime via setup-node, setup-python, setup-java. So the base image just needs:

Ubuntu (or whatever OS).
The GitHub Actions runner binary, pre-downloaded.
Docker (for docker build / docker run).
AWS CLI (most workflows hit S3 or ECR).
Basic build deps: git, curl, jq, unzip.
A stable runtime or two we don't want to redownload every build (Node LTS, Python 3.x).

The image stays around 5 to 10 GB, small enough that pulling and booting fits comfortably inside the cold-start budget we already have.

Spot interruptions

The thing everyone asks the first time they look at spot for CI is what happens when AWS reclaims the instance mid-build.

The build fails and GitHub re-queues it. A fresh instance picks it up. There's no recovery logic to write, because the runner is ephemeral and the workflow ought to be idempotent anyway. One partial build is lost, the retry starts clean.

Spot interruption notices give you two minutes of warning before AWS pulls the plug. The runner listens for that signal and de-registers from GitHub cleanly before shutdown, which the module handles for you. Without that, GitHub briefly shows a "runner went offline mid-job" error before the retry. Annoying, but not fatal.

In practice the interruption rate I see is around 1-3% of jobs on the small and large tiers, and a bit higher on compute-intensive because the larger instance types have less spot capacity per AZ. For most workloads that's a fine trade for the savings. For workflows that genuinely can't tolerate a retry (release builds, deploys with side effects), I either flip enable_spot_instances = false for that tier or send the job over to Jenkins, where the lifecycle is more tightly controlled.

Trade-offs vs. Jenkins workers

"Should this run on Jenkins or GitHub Actions?" comes up a lot. How I think about it:

Workload shape	Where it goes
PR-triggered, short, idempotent	GitHub Actions on spot. Quick spin-up, cheap, no Jenkins overhead.
Long-running build (1h+)	Jenkins. Spot interruption risk is too high for long jobs.
macOS / Windows builds	Jenkins. The worker setup from Parts 1 and 2 lives there.
Custom orchestration (matrix sharding, dynamic parallelism, gated promotion)	Jenkins. Groovy DSL handles this more flexibly than the GHA matrix.
Deploys / releases with side effects	Jenkins on dedicated workers, or GHA on on-demand. No spot.
Open-source / contributor-facing repos	GitHub Actions. Don't expose Jenkins to contributors.
Builds that need ephemeral access to a specific cloud service	Whichever is in the right VPC. Usually GHA for the small stuff.

The two systems cover different shapes of work. Moving everything to GitHub Actions would have been a mistake, but moving the small PR-scoped jobs off Jenkins freed up real capacity for the big builds that remain.

What I'm still figuring out

Cold-start latency outside the warm pool window. The pool hides it during business hours, but outside those hours every job eats the full 60-120 seconds. We're fine with this trade-off most of the time, though it occasionally annoys people working evenings.
Spot capacity in specific AZs. The compute-intensive tier sometimes can't get spot capacity in our preferred AZ and the queue backs up. The module's multi-AZ fallback helps but doesn't eliminate it. On genuinely bursty days we fall back to on-demand temporarily.
Holiday-aware pool scheduling. The cron schedule doesn't know about public holidays, so we burn a small amount of money ramping up on holidays nobody is working. Low impact, but it's the kind of thing that bothers you every time you remember.
AMI sprawl. Every architecture (x64, arm64) and base-image variant is its own AMI lineage. We rebuild them on a schedule via Packer the same way we do the Jenkins worker AMIs, but the operational overhead is a real cost.
Cost attribution. Spot instances inherit tags from the launch template, but not every downstream resource (EBS volumes, ENIs) picks up the right cost-attribution tags automatically. That's a separate problem and I'm not opening it here.

Closing thought

All three posts in this series end up at the same place. Ephemeral workers, baked images, everything orchestrated from git, secrets pulled from a vault at runtime. Jenkins is one way to wire that pattern together, and GitHub Actions on self-hosted spot is another. Nothing says you pick one and only one.

The worker lifecycle is the part you can't compromise on: don't keep workers between builds. Once that's in place, everything else (Jenkins versus GHA, spot versus on-demand, Tart versus vSphere) is swappable, and you can change your mind later without burning the platform down.

That wraps the series for now. If any of it saves you a week of figuring it out yourself, this was worth writing.

Appendix - tools mentioned

terraform-aws-github-runner - the Terraform module that wires up the whole thing. (Formerly philips-labs/terraform-aws-github-runner.)
GitHub Actions self-hosted runners - official docs.
HashiCorp Packer - bakes the runner AMIs.
Terraform - calls the module above.
AWS EC2 spot - cheap interruptible compute.
AWS Lambda and SQS - the queueing/orchestration glue (managed by the module).

Part 3 of My CI/CD Odyssey. Thanks for reading. If you run self-hosted CI differently, I'd be curious to hear about it in the comments.

MacOS Workers, or how I built my own Mac cloud

Khachatur Ashotyan — Fri, 22 May 2026 08:14:07 +0000

In Part 1 I laid out the Jenkins-as-a-Code setup (JCasC, Job DSL, ephemeral workers, Packer images), and said macOS workers deserved a separate post. This is that post.

For anyone who's never run macOS builds in CI: most things that are easy on Linux turn out to be hard on macOS, often for reasons that don't apply anywhere else. Apple's licensing rules mean you can't just spin up a Mac in AWS the way you do an Ubuntu box. Then there's the keychain, the signing tooling, and the Xcode versioning. The typical answer at most companies is a few Mac minis under someone's desk that everybody SSHes into, and that works for a single team right up until the company depends on it.

I wanted the same setup on macOS that I had for Linux and Windows: a fresh worker per build, destroyed when the build finishes. Getting there took a while.

Why macOS is hard in the first place

A few things to keep in mind first, because they explain why the architecture below looks the way it does.

1. The cloud Mac story is awkward. EC2 Mac instances exist - real Mac hardware in AWS data centers, you can rent one. But they're dedicated hosts with a 24-hour minimum allocation (Apple's licensing, not AWS being weird) and per-hour pricing is brutal next to Linux. If your worker lives for 30 minutes and you pay for 24 hours, the per-build math is rough.

2. Apple's EULA only allows macOS to run on Apple hardware. Which means you can't legally virtualize macOS on a Linux box. Real Mac hardware has to be in the loop somewhere - yours, rented, or in someone else's rack.

3. macOS virtualization is its own ecosystem. On Intel Macs the answer used to be VMware (vSphere or Fusion) or VirtualBox. On Apple Silicon, neither works the same way. Everything goes through Apple's Virtualization.framework now, and the tooling around it is still young.

4. Signing and notarization credentials fight you on ephemeral VMs. Developer ID certificates, app-specific passwords, the keychain - none of it was designed for "fresh VM every build". It assumes a developer's laptop. Making it work in CI is its own rabbit hole.

5. macOS images are huge. A baked Packer image with Xcode is 60-80 GB. Pulling that from cold storage is slow, so caching matters a lot more here than on Linux.

All five show up repeatedly in the rest of this post.

What I evaluated and didn't pick

There aren't many serious players in macOS CI. I evaluated the commercial options seriously before landing on what's below. None of them were bad. The economics just didn't line up for us.

Veertu Anka - the most mature paid platform for macOS CI virtualization. Roughly what Tart does, plus a polished UI, enterprise support, more features. Licensing is per-host or per-VM, which adds up fast once you have a real fleet. Credible if you've got budget and want a vendor to call.
MacStadium - managed-Mac hosting. You rent physical Macs in their DC, optionally with their orchestration layer (Orka). Good if you don't want to rack Macs yourself. Per-host per-month pricing fits a steady-state fleet; spiky volume or existing hardware makes it worse.
AWS EC2 Mac instances - see above. Worth it for very low-volume work, where avoiding ops outweighs the per-hour bill. The 24-hour minimum kills it for high-volume ephemeral CI.
GitHub Actions managed macOS runners - fine for OSS projects and small teams. Per-minute pricing gets painful at real volume. And the image is fixed - the moment you need anything past stock Xcode, you're stuck.

What sold me on Tart was the licensing more than the technology. The commercial license is free for personal and small-scale use, and the paid tier doesn't scale linearly with fleet size the way Anka's per-host model does. It's affordable at our build volume, and at one or two Macs you pay nothing at all.

As of April 2026 - when Cirrus Labs joined OpenAI - the licensing got better still: Tart, Vetu and Orchard have been relicensed under a more permissive license and the commercial fees dropped entirely.

The rest of the Cirrus Labs toolchain holds up too. Orchard sits on top of Tart for fleet orchestration, and Cirrus CLI lets you run CI tasks locally against a Tart VM. Being able to reproduce a Jenkins job on my laptop has saved hours of debugging CI-only failures.

Three ways I ended up provisioning macOS workers

No single tool covered everything I needed, so I ended up with three provisioners for three shapes of Mac fleet. All three follow the same pattern as in Part 1 (a Jenkins job invokes the provisioner, the worker comes up from a Packer image, runs the build, and gets destroyed), but the layer underneath each one is different.

Option A - Tart, for Apple Silicon

Tart is a small open-source CLI from Cirrus Labs that wraps Apple's Virtualization.framework. Hand it an OCI-compatible image (basically a tarball with the macOS VM disk) and it boots a VM on Apple Silicon in seconds. Images are reusable and layerable - it's the closest thing to "docker but for macOS VMs" I've come across.

How it fits:

Hardware: a fleet of Mac minis (or Studios) we own or rent, sitting in a rack.
Each Mac runs Tart on the host.
A Jenkins job grabs an available host, tart clones from a known image tag, tart runs it, registers the VM as a Jenkins agent, then tart deletes when the build finishes.
Packer's tart-cli source builds the images. Xcode, Homebrew, signing tools, language runtimes - all baked in at image-build time.

The good: spin-up is fast - tart clone to "agent connected" is under a minute. The image is a snapshot, so every build starts from byte-identical state.

The not-so-good: you still need to own or rent the Macs. The "real Apple machines in a rack" problem doesn't go away, you just orchestrate around it. And the Tart ecosystem is young - expect to write glue.

Option B - vSphere / VCSA, for the older Intel fleet

Before Apple Silicon, the Mac fleet was a stack of Intel Mac minis hooked into a vSphere cluster. macOS VMs were managed as ESXi guests, the same way any other VMware VM would be.

How it fits:

ESXi on each Mac mini (the only OS Apple's licensing lets you install on a Mac and still host macOS guests).
A golden macOS VM template lives in vSphere, baked by Packer's vSphere ISO builder.
Jenkins runs Terraform with the vSphere provider to clone the template (linked clones are faster), bring up the VM, register it as an agent, tear it down after.

This setup predates Apple Silicon and it still works, but it's the heaviest of the three. Linked clones help with spawn time, but it's still slower than Tart, and vSphere itself is a chunky thing to operate on top of that.

It's the responsible-enterprise path. If you already run VMware in your org it slots in fine, but nobody starting fresh in 2026 would pick it.

Option C - Orchard, for pooled / remote Macs

Orchard is also from Cirrus Labs, in the same family as Tart. Instead of orchestrating individual Mac hosts yourself, Orchard sits as a controller in front of a pool of workers and you request a VM through its API. It handles scheduling, queuing, and lifecycle for you.

How it fits:

A pool of Macs (yours, or from a managed provider like MacStadium), and you don't want individual Jenkins jobs picking physical machines.
Jenkins calls Orchard's API for a VM with a given image and resource profile, runs the build, releases the VM.
Capacity is the real constraint - 20 builds queued, 5 Macs free, Orchard handles the rest.

The good: Jenkins doesn't need to know where the Mac physically lives, which is a clean separation between provisioning and scheduling.

The not-so-good: it's yet another piece of infrastructure to run, which only pays off past a certain fleet size. With two or three Macs, raw Tart is simpler.

What gets baked into the Packer image

The principle stays the same: bake everything we can into the image so the build itself doesn't pay any setup time. The macOS image ends up being the heaviest in our fleet by a wide margin.

What goes into a typical macOS worker image:

OS: pinned macOS point release. Xcode compatibility is brittle - chasing "latest" is a bad idea.
Xcode: pinned version + command line tools. Xcode alone is 30+ GB.
Homebrew + packages: every brewed tool the build needs, pre-installed and pre-warmed.
Language runtimes: Node, Python, Ruby - pinned to match production.
Build tools: CMake, Ninja, Conan, whatever the project actually uses.
Signing tools: codesign, notarytool, xcrun - ship with Xcode, but worth confirming they're on PATH.
Pre-warmed caches: Conan, npm, brew - anything that would otherwise download on the first build.

The Packer template itself is short. Most of the work is in a chain of shell scripts that run after the base macOS install:

source "tart-cli" "macos" {
  vm_base_name = "ghcr.io/cirruslabs/macos-monterey-base:latest"
  vm_name      = "macos-ci-${var.image_version}"
  cpu_count    = 4
  memory_gb    = 8
  disk_size_gb = 120
  ssh_username = "admin"
  ssh_password = "admin"
}

build {
  sources = ["source.tart-cli.macos"]

  provisioner "shell" {
    scripts = [
      "scripts/post-install.sh",
      "scripts/brew-setup.sh",
      "scripts/xcode.sh",
      "scripts/nodejs-setup.sh",
      "scripts/deps.sh",
      "scripts/prewarm-caches.sh",
    ]
  }
}

The template is maybe 30 lines. The real work is in the shell scripts, which live in the same repo and go through the same PR review as the rest of the infra.

A baked image is around 60-80 GB. Storage matters, but cache locality matters more. Pulling a fresh 70 GB image from the registry on every first boot would crater throughput across the fleet, so we pre-cache base images on each host out of band.

The signing-on-ephemeral-VMs problem

Signing eats more first-setup time than anything else on this list, which is why it gets its own section.

Apple's signing pipeline assumes a developer machine with a persistent keychain - you unlock it once and sign apps for the rest of the day. With ephemeral CI VMs that breaks: every VM is brand new, no keychain, no saved password.

What we landed on:

Developer ID cert + private key live in a secrets manager (AWS Secrets Manager, Vault, whatever). Never in the image, never in git.
At job start, the pipeline pulls the cert + key and imports them into a temporary keychain it creates on the VM.
That keychain has a random password just for this build. It dies with the VM.
Notarization credentials (app-specific password or notarytool API key) come from the same secrets manager. Used directly - no keychain needed.
Build ends, VM is destroyed, keychain goes with it. Same lifecycle as the worker.

Trimmed-down version of the keychain-bootstrap script:

#!/usr/bin/env bash
set -euo pipefail

KEYCHAIN="ci-build.keychain"
KEYCHAIN_PASSWORD="$(openssl rand -base64 24)"

# Create a brand-new keychain just for this build.
security create-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN"
security set-keychain-settings -lut 21600 "$KEYCHAIN"
security unlock-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN"

# Add it to the search list so codesign can find it.
security list-keychains -d user -s "$KEYCHAIN" $(security list-keychains -d user | tr -d '"')

# Import the cert + key from the secrets-manager-provided files.
security import "$DEVELOPER_ID_CERT" -k "$KEYCHAIN" -P "$CERT_PASSWORD" -T /usr/bin/codesign

# Grant codesign permission to use the key without prompting.
security set-key-partition-list -S apple-tool:,apple: -s -k "$KEYCHAIN_PASSWORD" "$KEYCHAIN" >/dev/null

Things that bit us:

set-key-partition-list is mandatory on modern macOS. Without it, codesign pops a UI password prompt that nothing will ever answer on a headless VM, and the build hangs indefinitely.
The keychain must be in the search list. A keychain that exists but isn't searched is invisible to codesign.
Notarization is asynchronous. notarytool submit --wait does block until it's done, but "done" can be several minutes away, so make sure your build timeouts account for it.
Stapling fails silently if you forget it. Notarization succeeds and the artifact ships, but end users still see a Gatekeeper warning because the ticket isn't stapled. Run xcrun stapler staple <artifact> after notarization.

None of this is deep magic, but first-time setup tends to eat a week of debugging on most teams. Budget for that, and get the keychain bootstrap script working before you write the rest of the pipeline.

Trade-offs - which one should you pick?

Probably more than one, depending on what fleet you've inherited. But if I were starting from scratch in 2026:

If you have...	Pick
A small pool of Apple Silicon Macs you own	Tart, directly. Free at this scale, nothing extra to run.
A larger fleet of Apple Silicon Macs, mixed ownership / remote	Tart + Orchard - same licensing, proper scheduling on top.
An existing vSphere installation and Intel Macs	vSphere / VCSA. Don't rebuild what works.
Need enterprise support, budget isn't tight	Veertu Anka.
Don't want to rack Macs, want a managed fleet	MacStadium (with their orchestration, or your own).
No physical Macs, very low volume	EC2 Mac. The 24-hour minimum stings, but sometimes the operational simplicity wins.
Open-source project, low volume	GitHub-hosted macOS runners. Free for OSS, nothing to host.
No physical Macs, high volume	MacStadium or similar. EC2 Mac economics break at this scale.

The approach I'd push back on is "let's just use Mac minis under someone's desk". It works for a single team, but the moment every iOS release across the company depends on it, you've got a bottleneck nobody owns.

What I'm still figuring out

A few open problems I haven't fully solved:

Image freshness - Xcode updates land every few weeks, and keeping the Packer image current without breaking everyone's build is constant work. We rebuild on a schedule and pin each job to a specific image version. The rebuild itself is a 90-minute job.
Cost. Mac hardware is expensive whether you own it or rent it. Above a certain build volume the math works; below that, per-build cost stings.
Apple Silicon transition for older code. Some of our C++ code still has Intel-only deps that haven't been ported. Those builds run on the vSphere/Intel fleet, which is shrinking. "Rewrite all the legacy build deps for arm64" is its own multi-quarter project.
Notarization queue times. Apple's notarization service has bad days where submissions take 20+ minutes. Nothing to do from our side - macOS builds just have a longer tail than everything else.

Closing thought

macOS CI doesn't get clean. There's no "just run a pod in EKS" equivalent, you'll have physical hardware in the loop, probably more than one hypervisor, and a signing problem that doesn't exist on any other platform. What's worked for us is treating macOS the way we treat everything else: ephemeral workers from a baked image, triggered by a job in git, with secrets pulled from a vault at runtime. Once the contract matches what Linux and Windows do, macOS stops being the part of CI that nobody wants to own.

Appendix - tools mentioned in this post

Cirrus Labs toolchain (the one I ended up on)

Tart - macOS VMs on Apple Silicon via Virtualization.framework. Free for small-scale; licensing.
Orchard - controller for pooled Tart hosts.
Cirrus CLI - run CI tasks locally against a Tart VM using a .cirrus.yml config.
Packer Tart plugin - Packer builder for Tart images.

Commercial alternatives I evaluated

Veertu Anka - paid platform for macOS CI virtualization, polished, enterprise support.
MacStadium - managed Mac hosting + optional Orka orchestration.
AWS EC2 Mac instances - real Apple hardware in AWS, 24-hour minimum allocation.
GitHub-hosted macOS runners - fine for OSS / small scale.

Other

vSphere and the Terraform vSphere provider - for the older Intel fleet.
HashiCorp Packer - bakes all the worker images.
Apple's notarytool - the modern notarization CLI.

This is Part 2 of My CI/CD Odyssey. Follow me here on dev.to if you want to get pinged when Part 3 drops. And if you're doing macOS CI differently, I'd love to hear about it in the comments.

Jenkins as a Code, or how I stopped clicking around in the UI

Khachatur Ashotyan — Mon, 18 May 2026 16:35:13 +0000

I've been running Jenkins for years now. Different companies, different team sizes, but the same story keeps repeating, and at some point I couldn't take it anymore. So I decided to write some of it down. This is Part 1 of what I'm calling My CI/CD Odyssey - ideas I tried, things that blew up in my face, and stuff I still use today.

Later chapters get into the painful stuff: building macOS workers without losing your mind, spot instances as GitHub Actions runners to cut costs, plus a few other rabbit holes. First, the beginning. That's where most of the pain came from.

The "before" picture, and why it hurts

Anyone who's worked with Jenkins for a while knows this scene. Somebody opens the Jenkins UI, clicks "New Item", picks a freestyle or pipeline job, fills in twenty-something fields, scrolls past a wall of plugin options, hits Save. A month later somebody else has to figure out why a job behaves differently in stage than in prod, and the answer is "because Arthur clicked a different checkbox in February and nobody remembers".

That was my world for a long time. Multi-tier environments (stage, prod, sometimes more), and on top of that, sometimes more than one Jenkins instance per tier. Each one configured by hand: plugins installed manually, pipelines copy-pasted from one Jenkins to another and edited in place, credentials added by hand, workers attached one at a time. Then one day you wake up and realize:

Nobody remembers what plugins are installed where.
The "stage" Jenkins doesn't match prod anymore. You find out when a pipeline breaks in prod.
A Friday afternoon plugin update kills a build. Rolling it back is a human clicking buttons under stress.
A new team member joins, and you burn three days explaining tribal knowledge that should live in a repo.

That last one was what finally pushed me. Tribal knowledge is fine in a team of two sharing a desk, but past that it costs weeks of onboarding for every new hire.

The idea: treat Jenkins like any other piece of code

So I started reading. Jenkins is infrastructure. We already do infrastructure-as-code for everything else - Terraform for cloud, Helm for Kubernetes, Ansible for hosts - so why is Jenkins the one piece still managed by hand? Controller, jobs, credentials wiring, workers - pull it all out of a git repo.

What I wrote down for myself:

I want a Jenkins where I can throw away the VM, the cluster, the config, run a pipeline, and ten minutes later have the same Jenkins back. And stage should be code-to-code identical to prod, so when I test a plugin upgrade in stage I know how it'll behave in prod.

Anyone who's been burned by a "but it worked in stage" deploy knows why this matters.

The building blocks

When I started designing this, it broke into a handful of moving pieces. None of them are revolutionary on their own, but wiring them together is where the value lives.

1. JCasC - Jenkins Configuration as Code

This is the foundation. JCasC is a Jenkins plugin that defines the controller config in YAML - system settings, security realm, authorization strategy, clouds, credentials, tools, global libraries. The controller reads the YAML on boot and configures itself.

The first time I rebuilt a controller from a YAML file, I stopped clicking through the UI for good. The controller only knows about things in the YAML, so anything else might as well not exist.

Minimal example:

jenkins:
  systemMessage: "Managed by JCasC - do not edit in the UI"
  numExecutors: 0
  mode: EXCLUSIVE
  securityRealm:
    github:
      clientID: ${GITHUB_CLIENT_ID}
      clientSecret: ${GITHUB_CLIENT_SECRET}
  clouds:
    - kubernetes:
        name: "eks"
        namespace: "jenkins"
        jenkinsUrl: "http://jenkins.jenkins.svc.cluster.local:8080"
unclassified:
  globalLibraries:
    libraries:
      - name: "ci-libs"
        defaultVersion: "main"
        retriever:
          modernSCM:
            scm:
              git:
                remote: "https://github.com/<org>/ci-libs.git"

Fifteen lines of YAML, and that's most of the controller.

2. Job DSL - jobs from a git repo

JCasC handles the controller but not the jobs. For that I used the Job DSL plugin. Jobs live as Groovy files in a git repo, and a small "seeder" job in Jenkins polls the repo and rebuilds jobs from the DSL files on each run. Deleting a job from git removes it from Jenkins on the next poll; changing a parameter in git rolls forward the same way.

The Jenkins UI ends up effectively read-only from a configuration perspective. Anyone who tries to edit a job in the UI gets overwritten by the next seeder run, which is by design.

Look here for declarative API

3. Helm + Kubernetes for the controller

I run the Jenkins controller in Kubernetes. The deployment uses the official Helm chart, with a persistent volume for the home directory and a sidecar that injects JCasC config from a ConfigMap. Upgrading Jenkins is a chart version bump, rolling back is the same chart at the previous version. The plugin list sits in values.yaml, version-pinned and reviewed in a PR like any other code change.

This is when plugin upgrades stopped feeling like Friday-night events. Each upgrade goes through stage in a PR and gets the same review as application code.

Side note: if you don't want to deal with Helm, the community maintains a Jenkins Kubernetes Operator that's CRD-first. I went with Helm because the upgrade story is simpler, but the operator is fine if you're already deep in operators.

4. Packer for worker images

Then there's the workers, the machines that actually run builds. I went all-in on Packer. Every worker image is baked from a Packer template in git, with the base OS, language runtimes, SDKs, and build tools pre-installed. Each image has a version, and the worker config pins to a specific one.

Before Packer, every worker was a slightly different snowflake, hand-installed and slowly drifting. After Packer, every worker booted from v1.2.3 is byte-for-byte identical to every other one. When a dependency upgrade breaks something, you know which image introduced it, and pinning back to the previous version is a one-line PR.

5. Ephemeral workers - born, used, destroyed

The ephemeral worker piece is what ties everything together, and it's the part I'm proudest of. Workers in this setup are strictly ephemeral: a new worker per build, never a long-lived agent we reboot once a week. A pipeline asks Jenkins for a worker; a dedicated job spins one up from a known Packer image, the build runs on it, and the worker gets destroyed when the build finishes. Every build starts on a fresh machine.

The spin-up mechanism varies by platform:

Linux builds: the Jenkins Kubernetes plugin schedules a pod in EKS from a container image we baked. Build finishes, pod is deleted. Lifecycle is seconds to minutes.
AWS EC2 / Azure VMs (Linux and Windows): a dedicated job runs terraform to provision and de-provision instances from Packer templates.
macOS VMs: the same idea, but macOS virtualization is its own ecosystem. A fresh macOS VM gets booted from a Packer-baked image on each build (Tart on Apple Silicon hosts, vSphere for the older fleet, or Orchard for pooled remote Macs), the build runs, the VM is torn down. macOS deserves its own post (Part 2), but the lifecycle is the same: provisioned for one build, then torn down.

Every build starts from byte-identical state. Not "mostly the same", not "the same except for ~/.cache". If the image tag is v1.2.3, every build on it starts from the exact filesystem snapshot Packer produced. There's no operator history sitting on the disk.

That eliminates a whole class of bugs: leftover state on the agent, the weird ~/.cache nobody cleaned up, a disk full of artifacts from three weeks ago, the Friday-only flake from a leak that's been growing since Monday. None of it survives, because the worker doesn't live long enough to accumulate it.

It also makes "build is non-reproducible" investigations faster. If two builds against the same commit produce different artifacts, the cause is almost never the worker, since both ran on a fresh one.

Security gets simpler too. Secrets pulled onto a worker disappear with the worker, so no long-lived agent holds old tokens. If a credential ever leaks into a build environment, the worker is gone within minutes and the leak goes with it.

6. Terraform / Terragrunt for everything else

Everything that isn't Jenkins itself (VPCs, IAM, secret stores, the EKS cluster, image galleries) lives in Terraform, wrapped with Terragrunt so the same modules get reused across stage and prod with different inputs. That's why stage ends up code-to-code identical to prod: the same modules at the same versions, just with different variables.

To check how prod will behave under a change, you run the same Terraform with ENV=stage instead of ENV=prod.

How it all clicks together

The flow ends up looking like this:

Somebody opens a PR - new job, plugin bump, JCasC tweak, new Packer image, whatever.
CI validates: YAML lint, Groovy compile checks, terraform plan, Packer build for any changed images.
PR gets reviewed and merged.
On merge, GitHub Actions applies infra via Terraform. The Jenkins seeder picks up new DSL files on its next poll.
The next build that needs a worker pulls the new image.

The Jenkins UI becomes a view onto what the repo says should be running, while the repo itself holds the truth.

What this fixed for me

What changed:

We stopped seeing "works on stage, breaks on prod" bugs. Because stage runs the same code as prod with the same modules at the same versions, when it works in stage it works in prod (modulo data).
Plugin upgrades aren't Friday-night events anymore. A bad one gets reverted like any other change.
Onboarding got much faster. New engineers read the repo instead of getting a Jenkins UI tour and a Slack thread of secrets.
Disaster recovery actually works. If I lost the controller VM, the EKS cluster, or even the whole account, the repo alone is enough to rebuild it.
We get an audit trail without writing one. Every pipeline change is a git commit with an author, a timestamp, and a PR description.

What I'm still figuring out

This isn't a finished story. A few things still keep me up at night:

macOS workers are the hardest piece. AWS does offer Mac instances, but the 24-hour minimum allocation and bare-metal model make them nothing like spinning up a Linux VM, and the hypervisor, licensing, and hardware constraints push the whole macOS story onto its own track. Part 2 covers it: Tart, virtualization on Apple Silicon, the trade-offs between self-hosted and cloud-mac providers, and the signing and notarization pain.
GitHub Actions costs add up at scale. You can offload heavier workloads to spot-instance runners cheaply, though spot brings its own trade-offs. Part 3 walks through that.

Closing thought

If you're still managing Jenkins through the UI, it's rarely about laziness. The cost shows up in places that don't make it onto any dashboard: the engineer who leaves and takes the only working configuration in their head, the 2am plugin-upgrade breakage, the customer-facing deploy that fails because stage and prod had quietly drifted apart for six months. Jenkins as Code doesn't make those costs disappear, but it surfaces them as PRs I can see and review, which for me has been worth the work.

Appendix - tools and plugins I leaned on

For anyone who wants to skip straight to the implementations, here's what's wired up:

Jenkins plugins

Configuration as Code (JCasC): the controller config in YAML.
Job DSL: jobs defined in Groovy in a git repo.
Kubernetes plugin: ephemeral pod agents in EKS.
Pipeline: Shared Groovy Libraries: the global libraries that hold reusable pipeline code.

Deployment

Jenkins official Helm chart: what I use to deploy the controller.
Jenkins Kubernetes Operator: the CRD-based alternative, if you prefer operators over Helm.

Image building

HashiCorp Packer: bakes all the worker images (Linux, Windows, macOS).

Infrastructure

Terraform: everything outside Jenkins (VPCs, IAM, secrets, EKS, image galleries).
Terragrunt: keeps the same modules DRY across stage and prod.
Kubernetes / Amazon EKS: where the Jenkins controller lives.
Helm: package manager for the Kubernetes side.
GitHub Actions: applies Terraform on merge.

Coming up in later parts

Tart: macOS VMs on Apple Silicon (Part 2).
Orchard: Tart cluster orchestration for macOS fleets (Part 2).

This is Part 1 of My CI/CD Odyssey. Follow me here on dev.to if you want to be pinged when Part 2 drops. And if you're doing Jenkins as Code differently, I'd love to hear about it in the comments.