Forem: Laszlo Fogas

The how and why we built our SaaS platform on Hetzner and Kubernetes

Laszlo Fogas — Mon, 13 Mar 2023 10:45:42 +0000

Hello everyone, Laszlo here, the founder of Gimlet.io 👋. In this blog post I try to address various aspects of how and why we built our SaaS platform on Hetzner and Kubernetes and not on one of the hyperscaler cloud providers.

It is an interesting time. While we were building our platform on Hetzner, David Heinemeier Hansson, the founder of 37Signals has also broke ground on their new project to leave AWS in favor of a managed data center. His posts on the topic had a good run on social media. While DHH is a contrarian and sometimes good at starting trends, we are not here to start or join any movement. In this post I simply want to share our choices and experiences with Hetzner with the curious minded. Show what tools we used, what trade-offs we had to consider, what are the pros and cons in early 2023 of not using one of the big cloud providers. So let's get started, shall we?

First things first, why Hetzner?

Why Hetzner?

I don't want to drag it out too long: it is price.

We built our compute nodes on the EX43 model, the newest staple machine of Hetzner. The hexa-core, Alder Lake CPU with 64GB RAM and 2x512GB NVMe SSD comes in at 52.1 EUR in one of Hetzner's German datacenter. No VAT in our case.

Comparing this to

AWS's m6a.2xlarge and m6i.2xlarge instances costing $281 and $312 respectively in eu-west-1
and Google Cloud's c3-standard-8 and n2-standard-8 comes in $277 and $311 in europe-west4

there is a significant price advantage on Hetzner's side.

And that's only comparing to eight virtual CPU cores. If we factor in the number of disk IOPS we get from a bare metal server, the at least double memory size, and the fact that we are getting not eight virtual CPUs, but six real CPU cores with twelve threads, Hetzner being 5 times cheaper is an understatement.

But not all things are equal, Hetzner being five times cheaper comes at a price. We are missing out on features that hyperscalers spent a decade building.

What are we losing by not hosting on one of the big clouds?

We only need a few things to run our platform, and among those, the things we miss the most on Hetzner are:

First and foremost we need Kubernetes, we would use managed Kubernetes if it would be available.
We need a highly available SQL database, but RDS or CloudSQL is not available.
Virtual networks and security groups. We had to work even for basic network and host security on Hetzner.

Addressing the missing managed Kubernetes

Managing Kubernetes is not something we prefer doing and usually we rely on the managed Kubernetes offerings, but on Hetzner it is just not possible.

Even though we manage Kubernetes ourselves, we tried to minimize the moving parts. Simplicity is key for us, therefore we chose to use the k3s project. K3s is a fully compliant Kubernetes distribution with a few notable simplifications:

k3s is using an SQL database as storage, so we don't have to deal with etcd at all.
It is packaged as a single binary and it has a matching configuration experience. All cluster operations - like managing certificates - are reduced to either an argument of the k3s binary, or a CLI command
It is secure by default with reasonable defaults for lightweight environments

We took one more notable architectural decision that eased our Kubernetes cluster building: we keep our state in SQL databases, so we did not have to install and maintain a clustered filesystem for persistent volumes.

Even though we cut many of the difficult parts short in our setup, we expect a few days of maintenance, sometimes immediate action, in the coming year that will be releated to our self-managed Kubernetes.

But nodes die also on managed Kubernetes offerings and disks fill up. Maybe in our case rebuilding a node will be longer (starting a new node, running Ansible scripts, etc) than on hyperscalers, but the number of issues and the severity we don't expect to become unmanageable. Famous last words, right? A follow up blog post is due in 12 months.

No RDS, what now?

A managed database is really something that I happily pay a premium for. High-availability, point-in-time restores in a click of a button is not something that is easy to replicate.

Postgresql is also something that is critical for Gimlet's SaaS platform as we keep all state in Postgresql databases. Not just client data, but the Kubernetes control plane is also stored in an SQL database. So we needed to build a highly available, secure Postgresql cluster, and had to handle proper off-site backups.

What gave us confidence in the process is that we had experience in running replicated Postgres clusters back in the pre-Postgres-RDS days. Feels like a lifetime away, but Postgres on RDS is not yet ten years old today.

Networking and security, the big one

We spent most of the time on the networking setup and security considerations.

By default, Hetzner gives you root access over SSH, there are no virtual networks, or security groups. If you start a server process on your node, it will be instantly accessible on the internet. Not a friendly default, and the lack of VPCs and Security Groups have been the biggest pain we had to solve.

This was also the place where we involved external help. A review of our setup by two independent consultants icreased our confidence in the process.

Building infrastructure from the ground up on bare metal in 2023

In this section I list the tooling and steps it took to build our SaaS platform on Hetzner.

Infrastructure as code

While the year is 2023, we wrote our infrastructure as code setup in Ansible. Since most of the configuration happens inside a node, there is not much to Terraform. Our team had confidence in Ansible from the virtual machine days.

Node security

The first Ansible playbooks we wrote were the node hardening ones. Unattended upgrades, SSH fail2ban, and tens of best practices applied automatically on all our nodes.

Network security

By default Hetzner assigns a public IP to nodes, and hands out root SSH access for you to start using the nodes. Also, every port you open is open to the internet. No private networks, or security groups by default. So we started by building one.

How we built our network:

There is basic VLAN support in Hetzner, we could connect our nodes with an unmetered, internal network connection.
We also built a Wireguard based VPN mesh on this internal network. Essentially encrypting all traffic between our nodes.
We installed a firewall on all nodes, and bound only port 80 and 443 to the public network IP addresses.
We front all web traffic with a Cloudflare Loadbalancer and Web Application Firewall. Traffic is only accepted from Cloudflare's servers.

Kubernetes

We use k3s. Its single binary distribution and config experience made the job easier, furthermore we could back it with an SQL database, which removed etcd from the mix. We don't know etcd, so that was amazing.

Postgresql

We built a streaming replication based active - passive Postgresql cluster.

To keep things simple we built the cluster outside of Kubernetes and containers. Not that it would have been a big issue otherwise, but we would have pinned the Postgres pods onto specific nodes anyways. We didn't opt to use Kubernetes Postgres operators, like Patroni just yet.

There is one significant shortcut that we took here. Failover is designed to be manual at this point. This could be a considerable source of downtime, and we may improve this practice in further iterations of our platform.

It is important to note why we took this risk: to enable automatic failover we would have to write a bulletproof failover script that maximizes availability and minimizes data consistency risks. With a bug in the failover automation, we could risk data consistency issues that are potentially more difficult to handle than downtimes. Basically we optimized for operational simplicity, and a good enough uptime.

Now judging a good enough uptime comes down to the reader as Gimlet does not provide an SLA at this point, but let me leave you with two thoughts:

A 99,5% availability, an industry wide standard for SaaS platforms, means a yearly 1.83 days of downtime. A 99% availability means a 3.65 days downtime a year. This last one practically means that our whole team could be on vacation in the jungles of Brazil, travel back to Europe, open their laptops and do the database failover and we would still be faster than three days. It goes without saying that we are not planning to travel to the jungles of Brazil without anyone being on call.
It is good to look up at this point what SLA Amazon's RDS database provides. They are kind of mushy on the topic: "AWS will use commercially reasonable efforts" and if they fail, they give your money back in credits. Ten percent if they are between 99-99.95% availability, 25% percent if they are between 95 and 99 percent, and all the money otherwise. In practical terms, they can be down for 18 days a year and you only get back one fourth of your money.

Gitops: drinking our own champagne

Each user that signs up gets a new Gimlet instance that is identical to the latest release of the open-source version. Each user configuration is stored in gitops and we run Gimlet to manage those Gimlet instances.

Besides Gimlet instances, we also manage our cluster components with gitops.

Backups

We store state in a single location only, in our Postgresql cluster.

For everything else, the source of truth is our infrastructure as code repositories: Ansible and gitops.

We backup our Postgresql cluster, encrypt and ship our backups to an off-site location.

Disaster recovery

Our disaster recovery strategy builds on two pillars: our backups and infrastructure as code repositories.

During our platform building efforts we rebuilt the whole stack numerous times from code and before launching the early access program, we rebuilt everything from scratch. We also ran synthetic database restore tests.

Encryption

We encrypt data in numerous places.

In transit:

We encrypt all traffic on our internal Wireguard based VPN mesh.
We use SSL between our applications and Postgresql.
Between our services.
Between our applications and our ingress controller.
Between our ingress controller and Cloudflare.
Between Cloudflare and endusers.

At rest:

We encrypt our node disks.
We encrypt our backups.
Gimlet instances encrypt sensitive database fields in the database.
K3s encrypts its secrets that are stored in Postgresql.

Monitoring

We use a Prometheus / Grafana stack for monitoring, UptimeRobot for uptime monitoring, and PagerDuty for our on-call.

How has Hetzner been so far?

We used Hetzner a couple of years back. It has been stable enough back then, and also today.

We use stock node types and only expect linear scaling. We provision nodes manually which takes around fifteen minutes until we can log in. We heard from friends that this is not the case for custom machine types, but we will cross this bridge when we get there.

An improvement since our previous ride with Hetzner is the VLAN feature. It was straightforward to set up based on the documentation and it has been stable so far. One thing we could not achieve though: connecting our dedicated nodes with Hetzner Cloud VMs. We are using dedicated nodes, but we could spin up VMs for small tasks if the VLAN would work between those.

That's it for now. It is a snapshot of where we are currently, expect a follow up post in 12 months, or earlier if something changes significantly.

As always, if you want to deploy to Kubernetes using the best of open-source tools out of the box, our self-serve SaaS early access is open on https:/gimlet.io/signup.

Onwards!

Options for Kubernetes pod autoscaling

Laszlo Fogas — Thu, 26 Jan 2023 09:28:25 +0000

Kubernetes autoscaling was supposed to be easy. Even though one of the selling points of Kubernetes is scaling, the built-in autoscaling support is basic at best. You can only scale based on CPU or memory consumption, anything more advanced requires additional tooling that is often not trivial.

The Gimlet.io team put together this blog to show common usecases of autoscaling:

based on CPU
custom Prometheus metrics
and RabbitMQ queue length

Furthermore, we are aiming to clear up the differences between the Horizontal Pod Autoscaler (HPA), the Prometheus Adapter and KEDA.

Let's get into it shall we?

First, about the Horizontal Pod Autoscaler (HPA).

First, about the Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler, or HPA in short, is a Kubernetes resource that allows you to scale your application based on resource utilization such as CPU and memory.

To be more precise, HPA is a general purpose autoscaler, but by default only CPU and memory metrics are available for it to scale on.

Its data source is the Kubernetes Metrics API, which by the way also powers the kubectl top command, and backed by data provided by the metrics-server component. This component runs on your cluster and it is installed by default on GKE, AKS, CIVO and k3s clusters, but it needs to be manually installed on many others, like on Digital Ocean, EKS and Linode.

The HPA resource is moderately well documented in the Kubernetes documentation. Some confusion arises from the fact that there are blog posts out there showcasing different Kubernetes API versions: keep in mind that autoscaling/v2 is not backwards compatible with v1!

More headaches arise when you try to scale on resource metrics other than CPU and memory. In order to scale pods - let's say - based on number of HTTP requests or queue length, you need to make the Kubernetes API aware of these metrics first. Luckily there are open-source metrics backends implemented, and the best known is Prometheus Adapter.

Prometheus Adapter

Prometheus Adapter is a Kubernetes Custom Metrics API implementation which exposes selected Prometheus metrics through the Kubernetes API for the Horizontal Pod Autoscaler (HPA) to scale on.

Essentially you configure the Prometheus Adapter to read your desired metric from Prometheus, and it will serve it to HPA to scale on. This can be an HTTP request rate, or a RabbitMQ queue length or any metric from Prometheus.

Prometheus Adapter does the job, but in our experience its configuration is cryptic. While there are several blog posts out there explaining its configuration syntax, we could not make it work sufficiently reliably with our custom metrics scaling needs.

That is essentially why we have brought you here today, to share our experience with a Prometheus Adapter alternative, called KEDA.

So, what exactly is KEDA, and why we prefer it?

KEDA

KEDA is a Kubernetes operator that is handling a user friendly custom yaml resource where you can define your scaling needs.

In KEDA, you create a ScaledObjectcustom resource with the necessary information about the deployment you want to scale, then define the trigger event, which can be based on CPU and memory usage or on custom metrics. It has premade triggers for most anything that you may want to scale on, with a yaml structure that we think the Kubernetes API could have been made in the first place.

KEDA does two things:

it exposes the selected metrics to the Kubernetes Custom Metrics API - just like Prometheus Adapter
and it creates the Horizontal Pod Autoscaler resource. Ultimately this HPA does the scaling.

Now that you have an overview, let's take a step further and show how you can autoscale with KEDA!

Autoscaling example based on CPU usage

In order to autoscale your application with KEDA, you need to define a ScaledObject resource.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-based-scaledobject
  namespace: default
spec:
  minReplicaCount:  1                                 
  maxReplicaCount:  10
  scaleTargetRef:
    kind: Deployment
    name: test-app-deployment
  triggers:
  - type: cpu
    metricType: Utilization
    metadata:
      value: "50"

scaleTargetRef is where you refer to your deployment, and triggers is where you define the metrics and threshold that will trigger the scaling.

In this sample we trigger based on the CPU usage, the ScaledObject will manage the number of replicas automatically for you and maintain a maximum 50% CPU usage per pod.

As usual with Kubernetes custom resources, you can kubectl get and kubectl describe the resource once you deployed it on the cluster.

$ kubectl get scaledobject
NAME                    SCALETARGETKIND      SCALETARGETNAME      MIN   MAX   TRIGGERS  READY   ACTIVE
cpu-based-scaledobject  apps/v1.Deployment   test-app-deployment   2     10    cpu      True    True

To have an in-depth understanding of what is happening in the background, you can see the logs of the keda operator pod, and you can also kubectl describe the HPA resource that KEDA created.

Autoscaling example based on custom metrics

To use custom metrics, you need to make changes to the triggers section.

Scaling example based on custom Prometheus metrics:

triggers:
- type: prometheus
  metadata:
    serverAddress: http://<prometheus-host>:9090
    metricName: http_requests_total # Note: name to identify the metric, generated value would be `prometheus-http_requests_total`
    query: sum(rate(http_requests_total{deployment="my-deployment"}[2m])) # Note: query must return a vector/scalar single element response
    threshold: '100.50'
    activationThreshold: '5.5'

Scaling example based on RabbitMQ queue length:

triggers:
- type: rabbitmq
  metadata:
    host: amqp://localhost:5672/vhost
    mode: QueueLength # QueueLength or MessageRate
    value: "100" # message backlog or publish/sec. target per instance
    queueName: testqueue

Check the KEDA official website to see all the scalers.

Closing words

When we found KEDA, our pains with the Prometheus Adapter were solved instantly. KEDA's simple install experience and readymade scalers allowed us to cover our autoscaling needs, while its straightforward yaml syntax communicates well the scaling intent.

We not just use KEDA ourselves, but also recommend it to our clients and friends. So much so that we integrated KEDA into our preferred stack at Gimlet.

Onwards!

Originally published by Youcef Guichi and Laszlo Fogas at https://gimlet.io.