Forem: Gahl Saraf

Creating Kube or Fake with ChatGPT

Gahl Saraf — Tue, 01 Aug 2023 16:37:03 +0000

Guest post by Assaf Avital, the mastermind who developed Kube or Fake!

Hi all! In this blog post, I’ll guide you through creating your own Kube or Fake? mini-game using ChatGPT. For those of you who joined late, Kube or Fake? is a Kubernetes/ChatGPT mini-game created by Raftt, where the player must distinguish between real Kubernetes terms and fake ChatGPT-generated ones (and it all happens live 💪). If you haven’t already tried it, kick back and enjoy.
First, we will get familiar with the ChatGPT API and see how we can use it to generate text. We will then see how we can integrate it into a small funky app, wrap it nicely enough, and publish it for the whole world! (or a couple of friends).

Step 1 - Using ChatGPT to Generate K8s Terms

The most important part of our mini-game is having ChatGPT generate Kubernetes terms (either real or fake). Since we want to use ChatGPT to power our app, let’s have it return the result in a structured syntax:

{
  "term": <the generated term>,
  "isReal": <*True* if the term is real, else *False*>,
  "description": <description of the term>;
}

Attempt #1: Making ChatGPT flip a coin

Our initial approach was letting ChatGPT decide whether the generated Kubernetes term should be real or fake, using a prompt as follows:

import openai
‍
def generate():
  prompt = """You are a Kubernetes expert, and also a game master.
You are 50% likely to respond with a real Kubernetes term (either a resource kind or field name), and 50% likely to make up a fake term which resembles a real one, in order to confuse the player.
Your response syntax is:
{
  “term”: your generated term,
  “isReal”: true if the term is real, else false,
  “description”: a short description of the term. If it’s fake, don’t mention it
}"""
  messages = [
    {
      "role": "system",
      "content": prompt
    }
  ]

  return openai.ChatCompletion.create(
    model="gpt-3.5-turbo-16k-0613",
    messages=messages
  )

Notice the API call openai.ChatCompletion.create(...) which requires choosing a specific GPT model to use, and the structure of messages we pass to it.
When getting the response back, we retrieve its content as follows:

import json
‍
response = generate()
content = response['choices'][0]['message']['content']
term = json.loads(content) # Since we instructed ChatGPT to respond with a JSON string

Overall, this attempt worked pretty well, however the perceived probability of generating real terms vs. fake ones wasn’t 50/50. After running the same prompt for a few times, it became pretty obvious that ChatGPT was a bit biased towards real Kubernetes terms. We can overcome this by either:

Fine-tuning the probabilities in the prompt (e.g. 30% real / 70% term) such that the actual probability is closer to 50/50.
Extracting the coin flip to the code, and writing a different prompt for each side of the coin.

Attempt #2: Flip the coin before invoking ChatGPT API

We’ll need two different prompts for this approach, depending on the coin flip result.

Prompt #1

You are a Kubernetes expert.
Generate a real Kubernetes term, which is either a resource kind or a field name.
Response syntax:
{
  “term”: your generated term,
  “isReal”: true,
  “description”: a short description of the term
 }

Prompt #2

You are a Kubernetes expert.
Generate a random, fake Kubernetes term, which resembles a resource kind or a field name.
An average software developer should not be able to immediately realize this is a fake term.
Make sure that this is in fact, not a real Kubernetes term. 
Response syntax:
{
 “term”: your generated term,
 “isReal”: false,
 “description”: a short description of the term, don’t mention it is fake
}

We’ll make the necessary adjustments to our code, and add the coin flip logic:

import openai
import random
‍
def generate():
  prompts = [REAL_PROMPT, FAKE_PROMPT]
  random.shuffle(prompts)
  messages = [
    {
      "role": "system",
      "content": prompts[0]
    }
  ]

return openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k-0613",
  messages=messages
)

This approach worked much better, and the generated probability was lot closer to 50/50 😛

Attempt #3: Adding a 3rd side to the coin

The previous approach was pretty good output-wise, and met our requirements almost perfectly. Now, the only thing missing from this game is the “oh-my-god-chatgpt-is-so-random” funny part!
To spice it up just a bit, let’s add a third prompt and let ChatGPT run wild:

You are a Kubernetes expert.
Generate a random, obviously fake Kubernetes term, which resembles a resource kind or a field name.
An average software developer should be able to immediately realize this is a fake term,
Make sure that this is in fact, not a real Kubernetes term.
Extra points for coming up with a funny term.
Response syntax:
{
 “term”: your generated term,
 “isReal”: false,
 “description”: a short description of the term, don’t mention it is fake
}

We can either keep the prompts equally likely (~33% chance for each), or play with the probability values as we see fit. “Prompt engineering” often leads to add-on sentences like “Make sure that this is in fact, not a real Kubernetes term”, because the model makes many mistakes and this helps keep it on track.

Deploying the term generator as an AWS Lambda

We’ll soon get to the code of the game itself, but first let’s publish our term-generating code so it is publicly available. For this purpose, let’s use AWS Lambda.

We want the Lambda handler to do the following:

Flip a (3-sided 🙃) coin (using random)
Depending on the flip result, either:
- Generate a real Kubernetes term
- Generate a not-so-obviously-fake Kubernetes term, to make the game a bit more difficult
- Generate an obviously-fake Kubernetes term, to make this game a bit more funny
Respond with the JSON generated by ChatGPT

I’ll be using Python as my Lambda runtime, but this is easily achievable with other runtimes as well.

The general structure of our Lambda code is as follows:

import json
import openai
import os
import random

‍
openai.api_key = os.getenv("OPENAI_API_KEY")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")

‍
def lambda_handler(event, context):
    prompts = [REAL_PROMPT, FAKE_PROMPT, OBVIOUSLY_FAKE_PROMPT]
    random.shuffle(prompts)
    response = generate(prompts[0])
    body = json.loads(response['choices'][0]['message']['content'])
    return {
        'statusCode': 200,
        'headers': {
            "Content-Type": "application/json",
            "X-Requested-With": '*',
            "Access-Control-Allow-Headers": 'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Requested-With',
            "Access-Control-Allow-Origin": '*',
            "Access-Control-Allow-Methods": 'GET, OPTIONS',
            "Access-Control-Allow-Credentials": True # Required for cookies, authorization headers with HTTPS 
        },
        'body': body
    }

We’ll cover two easy ways to deploy the lambda code to AWS. Use whichever is more convenient for you.

Method 1: AWS Console

One approach to creating the Lambda function is via AWS Console. Using this approach we don’t bind the ChatGPT logic to our game repo, which allows us to quickly make changes to the Lambda without having to commit, push, and re-deploy our website.

Creating the OpenAI Lambda Layer

AWS Lambda Layers are a way to manage common code and libraries across multiple Lambda functions. A layer is made up of code and libraries that are organized into a package that can be reused across multiple functions.

But why do we need them? Well, unfortunately, AWS Lambda’s Python runtimes do not include the openai package by default. Hence, we must provide it as a layer:

Install the openai package locally, in a folder called ‘python’:

mkdir python
pip install openai -t python

After the installation is complete, zip the folder:

zip -r openai.zip python

Create the Lambda layer by going to the Lambda console → Layers → Create Layer. Provide a name for the layer, and select “Upload a .zip file”. Under “Compatible runtimes”, select Python 3.9. The final screen should look like this:

Creating the Lambda Function

Now we’re ready to create the actual Lambda! Go to Functions → Create Function, and choose Python 3.9 as your runtime. The code editor should come up shortly.
To use our newly created layer, click Layers → Add a layer. Choose “Custom layers” as the source and pick the layer from the dropdown. When finished, click “Add”.

Next, we’ll define our environment variables. Go to Configuration → Environment variables and set the following env vars:

You can use any model you’d like, I’ve chosen gpt-3.5-turbo-16k-0613

Now we’re ready to code! Go to “Code”, open up the web IDE and paste the code snippet above. Click “Deploy” to publish the function. You can set up a function URL by going to Configuration → Function URL.

Method 2: SAM

Another approach to AWS Lambdas is using SAM, which allows us to deploy Lambdas either locally or to our AWS account.

Writing the Lambda handler

We’ll create a file lambda/lambda.py in our repository, and have the handler code written there.

Creating a local Lambda layer

In order to create the openai layer locally, we should install it as follows:

pip install -r requirements.txt -t libs/python

Creating a CloudFormation template

To deploy the Lambda, we first must set up a CloudFormation template in our project root directory, named template.yml:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  OpenAILambdaLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: openai
      ContentUri: libs
  GenerateKubernetesTermFunction:
    Type: AWS::Serverless::Function
    Properties:
      Environment:
        Variables:
          OPENAI_API_KEY: &lt;&lt; replace &gt;&gt;
          OPENAI_MODEL_NAME: &lt;&lt; replace &gt;&gt;
      CodeUri: lambda/
      Handler: lambda.lambda_handler
      Runtime: python3.10
      Timeout: 10
      FunctionUrlConfig:
        AuthType: NONE
      Events:
        GenerateTerm:
          Type: Api
          Properties:
            Path: /generate
            Method: get
      Layers:
        - !Ref OpenAILambdaLayer

Outputs:
  GenerateKubernetesTermFunction:
    Value: !GetAtt GenerateKubernetesTermFunction.Arn
  GenerateKubernetesTermFunctionIAMRole:
    Value: !GetAtt GenerateKubernetesTermFunctionRole.Arn
  GenerateKubernetesTermFunctionURL:
    Value: !GetAtt GenerateKubernetesTermFunctionUrl.FunctionUrl

The template above defines the resources that will be deployed as part of the CloudFormation stack:

Lambda function + layer
IAM Role associated with the Lambda function
API Gateway

From here we can deploy the Lambda function either locally, or to AWS.

Local Lambda deployment

The Lambda can be run locally with sam:

sam local start-api

This command starts a server running in localhost:3000. The command output should look like this:

Mounting GenerateKubernetesTermFunction at &lt;http://127.0.0.1:3000/generate&gt; [GET]
You can now browse to the above endpoints to invoke your functions. You do not need to restart/reload SAM CLI while working on your functions, changes will be reflected instantly/automatically. If you used sam
build before running local commands, you will need to re-run sam build for the changes to be picked up. You only need to restart SAM CLI if you update your AWS SAM template
2023-07-20 11:58:51 WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on &lt;http://127.0.0.1:3000&gt;
2023-07-20 11:58:51 Press CTRL+C to quit

When the Lambda is invoked via localhost:3000/generate, some more logs are shown:

Invoking lambda.lambda_handler (python3.10)
OpenAILambdaLayer is a local Layer in the template
Local image is up-to-date
Building image.....................
Using local image: samcli/lambda-python:3.10-x86_64-b22538ac72603f4028703c3d1.
‍
Mounting kube-or-fake/lambda as /var/task:ro,delegated, inside runtime container
START RequestId: b1c733b3-8449-421b-ae6a-fe9ac2c86022 Version: $LATEST
END RequestId: b1c733b3-8449-421b-ae6a-fe9ac2c86022
REPORT RequestId: b1c733b3-8449-421b-ae6a-fe9ac2c86022

‍
Note: You may be requested to provide your local machine credentials to allow sam interacting with your local docker daemon.

Be aware that the Lambda docker image will be built upon each invocation, and that there is no need to re-run sam local start-api when making changes to the Lambda code (changes to template.yml do require a re-run though).

Deploying to AWS

We do this using sam as well:

sam deploy

Follow the command's output to see where your new Lambda is created (GenerateKubernetesTermFunctionURL):

Step 2 - Getting JSy With It

Our generator’s already up, all we have to do is create the game’s UX.
There are many ways to implement this, but for the purpose of this guide I’ll be focusing on the raw JS script we’ll be using.

Generate a Term by Invoking Lambda

We want the browser to make the call to our Lambda and generate a term. This can be done with fetch:

const GENERATOR_URL = "{your generated function url}";

async function generateWord() {
  const response = await fetch(GENERATOR_URL);
  return await response.json();
}

Implement “Guess” Logic

We want to show a result according to the user’s choice (”KUBE” or “FAKE”).

Here’s the relevant code, assuming we’ve created the buttons in the HTML already:

function checkGuess(guess) {
  const correctGuess = (guess === word.isReal);
  if (correctGuess) {
    // user is correct - show good message
  } else {
    // user is wrong - show bad message
  }
‍
  if (word.isReal) {
    // show description
    document.getElementById("description").innerHTML = word.description;
  } else {
    document.getElementById("description").innerHTML = `${word.term} is an AI hallucination.`;
  }
}
‍
document.getElementById("game-board").addEventListener("click", (e) =&gt; {
  const targetClass = e.target.classList;
  if (targetClass.contains("kube-button")) {
    checkGuess(true);
  } else {
    checkGuess(false);
  }
});

And… that’s basically it! Of course you can add a lot more (score-keeping, animations, share buttons, etc.) but that’s completely up to you.

This may be the place to note that there are all kinds of fancy UI libraries for web. React, Angular, Vue, … As you might have been able to tell from the website itself, we know none of these. 😄

Step 3 - Deploy to GitHub Pages

We want the game to be publicly accessible, right? Let’s use GitHub pages to make it work. In your repo, go to Settings → Pages. Select a branch to deploy from, and choose your root folder as the source → Click “Save”. A workflow called “pages build and deployment” should start instantly:
‍
This workflow will now be triggered by every push to your selected branch!

Wrapping up

This was a blast to create, and [I hope] to play with. Don’t forget to share your results - we’d love to see forks of the game with changed UX or logic 🙂
Check out all the code in the repo - https://github.com/rafttio/kube-or-fake.

Kube or Fake: A Kubernetes Minigame

Gahl Saraf — Wed, 26 Jul 2023 12:15:02 +0000

Raftt is excited to announce the launch of Kube or Fake?, our latest web mini-game. In this game, you will be presented with a series of Kubernetes terms generated in real-time by ChatGPT, and will need to determine whether they are real or AI hallucinations. Be warned, the descriptions accompanying each term may be tricky.

We built Kube or Fake? with ChatGPT API, AWS Lambda, and some JavaScript. Our goal was to create a fun and interactive way to test your knowledge of Kubernetes terminology. This game is perfect for Kubernetes enthusiasts and newcomers alike.

So, how do you play? Simply visit the game and start playing! Try to correctly identify each of the terms as either “Kube” or “Fake”. Challenge your friends and see who can get the highest score!

We hope you enjoy playing Kube or Fake? as much as we enjoyed building it. Let us know what you think, and stay tuned for the Making Of, where we dig into how we built this and why :).

Develop Right on Top of Your Kubernetes Cluster

Gahl Saraf — Mon, 24 Jul 2023 07:20:00 +0000

In recent years, Kubernetes has emerged as the de facto standard for container orchestration, providing an efficient and scalable platform for deploying and managing applications. As developers strive to streamline their workflows and optimize productivity, an emerging trend is developing applications directly on top of the Kubernetes cluster itself. In this blog post, we will explore the benefits of developing right on top of your Kubernetes cluster, and dive into the challenges that make this a difficult reality to achieve.

Starting with the benefits -

Reduced env variance

By developing directly on your Kubernetes cluster, you eliminate the need for a separate development environment or local setup. This streamlines the development process, reducing the overhead of replicating the cluster setup locally, and therefore the time spent on setting up and maintaining multiple environments.

In a complex project there may be multiple flavors of production environments along with a host of staging, test, and preview environments. Having to maintain yet another environment type (which is used by the entire dev team!) is a huge time and attention sink for both developers and devops.

Consistent development and production envs

Catch and resolve issues early on, and reduce the risk of surprises during deployment by using the same tools, libraries, and configurations as the production environment. Developing on top of your Kubernetes cluster ensures that your development environment is aligned with the production environment. This consistency eliminates the "works on my machine" problem, where code behaves differently in different environments.

Seamless integration with cluster services

Developing on top of your Kubernetes cluster gives you direct access to cluster services and features. You can easily integrate with monitoring tools, logging systems, service meshes, and other cluster-specific resources. This allows you to develop applications that are tightly coupled with the underlying infrastructure, leveraging the full capabilities of Kubernetes and maximizing the efficiency of your applications.

Improved collaboration and teamwork

Kubernetes provides powerful features for collaboration, allowing multiple developers to work simultaneously on the same cluster. With development happening directly on the cluster, team members can easily share links to environments (using Kubernetes Ingress objects). This enhances teamwork and promotes knowledge sharing, as developers can observe and learn from each other's work.

Simplified testing

Developing directly on your Kubernetes cluster simplifies the testing process. Since you are working on the same environment where your application will run, you can easily reproduce and analyze issues that might arise during deployment. This reduces the time spent on recreating issues locally and provides a more accurate representation of how the application behaves in the production environment.

So why isn’t everyone developing on top of their Kubernetes cluster?

Unfortunately, there are a couple of significant challenges that make developing on top of a cluster difficult -

Container image immutability

Containers are designed to be immutable, meaning they can not be modified once built. While this immutability promotes scalability and consistency, it poses a big challenge for developers - it is no longer possible to make code changes and see fast feedback! Instead, developers rely on their CI pipeline to deploy new images, which extends the feedback cycle from seconds to 30+ minutes.

Long cycles leads to bundling many changes together, which in turn causes each cycle’s success rate to drop, and inevitably dev speed is slowed significantly.

Inability to debug

To make matters worse, it is difficult to set up proper debugging for code running in a Kubernetes cluster.

The container image does not contain the required debugging toolchain (debugpy for Python, dlv for Go, etc.).
The container may not have the required security configuration (read only root FS / missing capabilities / …)
Ports need to be opened between the IDE and the container running in the cluster.
The IDE needs to be configured with the correct debugging configuration.

All of these mean that practically - developers cease debugging once their env is running on Kubernetes.

Complexity

Kubernetes is a highly sophisticated and complex system. While some developers have a strong grasp of the basic Kubernetes concepts, the learning curve can be steep and it can slow down developers significantly.

Lack of access

In most organizations, access to Kubernetes clusters (even those intended for development) is not available to all developers. For good reason too - it is easy to make mistakes and cause damage that takes time and resources to repair, and the teams maintaining the clusters are wise to take proactive measures to minimize the chance of that. The larger the organization, and the more complex the deployment, the higher the chance of limitations being in place for its clusters.

Cost and environment management

Managing the cost of a Kubernetes cluster is an art in itself. Significantly increasing the number of environments due to their use in development can increase costs significantly. Managing the large number of dev environments can add a lot of overhead to already overextended DevOps or platform teams. Existing tooling intended to provide visibility into a limited number of environments may not scale well to the higher usage. And differences in configuration between environment types can cause overhead and false-positives.

Wait - how is anyone developing on their Kubernetes cluster?

Luckily, there are all kinds of tools that can be used to manage and develop on dev environments on Kubernetes. Everything from Spinnaker to ArgoCD and FluxCD for orchestration, and various solutions for the developer experience layer.

We use our own product (Raftt), which handles all of this complexity [and more]. Raftt adapts the Kubernetes environment for development, enabling out of the box hot-reloading of code and debugging, connected directly to your local machine and IDE. Raftt’s optional cluster-level controller handles environment provisioning and lifecycle, minimizing maintenance and additional costs.

You are left with a great development experience (including code hot reloading, debugging, collaboration…) that is easy to manage and deploy. You can try it yourself - deploy Raftt for development on your Kubernetes cluster, with our demo project or with your own code.

Reducing Cloud Costs on Kubernetes Dev Envs

Gahl Saraf — Wed, 19 Jul 2023 06:57:09 +0000

At Raftt, we’ve gone to great lengths to reduce the cloud costs for environments running on our development cluster, both for our internal usage and for our customers. In this blog post I’ll cover all the adaptations we made, so you can apply them to your own infrastructure. For each, I’ll add the approximate savings, and in the end I’ve attached a summary.

We primarily use AWS and EKS for our managed cloud clusters, so there will be certain parts of the post that will be a bit more relevant if you are using those (though the concepts carry over to any other cloud provider). We’ve covered the most significant factors, and through these were able to save over 95%, but there is lots more you can do. For instance - reduce persistent volume sizes or remove some of CloudWatch logging. I’ll start with infrastructure-level modifications - sharing clusters, autoscaling, right-sizing nodes, and using spot instances:

Infrastructure Adaptations

Shared clusters

This one may be a bit obvious, but it has to be said - while our production runs on its own Kubernetes cluster, we do not want a cluster for each preview or dev environment. This is because:

It takes a long time to create Kubernetes clusters. In EKS, this is 10-15 minutes, including the time it takes for nodes to spin up. Other distributions are faster, but it still slows down development.
There are a lot of static costs associated with each cluster. For our EKS setup, this ends up being around $100 / month, including the EKS backend (~$70), an NLB load balancer (~$21), some CloudWatch logs (~$10).
Different clusters cannot share the same nodes, so resource sharing is impossible, and we end up spending much more on EC2 instances.

Instead, we will create a single long-lived cluster, and deploy our application in different namespaces. There are a bunch of ways to do that - see ArgoCD, Flux, custom internal tooling, or other solutions (we use our own product). That way, we:

Only setup the cluster and infra once, and only incur the costs once.
Are able to share the underlying resources (more on that below).

Autoscaling

The most significant single factor in the cost of most Kubernetes clusters is the compute powering the cluster’s nodes. Several factors affect their cost, and the first we will cover is autoscaling. In cloud-based clusters with different levels of utilization, autoscaling is a must. It can reduce costs by an order of magnitude. Specifically for a cluster used for development purposes, and assuming we have infrastructure that can bring up and scale down environments as needed, this means cloud instances can be taken down (automatically):

Over the weekends, saving 48 hours a week
Outside of working hours, saving 14 hours a day for working days (another 70 hours a week)
Holidays, and other off days - another 15 days a year
Since we are talking about environments used for developments, we can also scale down on days where people are on personal leave - another 20ish days a year.

All told, we get to (365-(52*2)-15-20)=226 working days per engineer, and with 10ish hours of work per day - around 2260 hours, or around 2260/(365*24)=1/4 of the total yearly time.

Autoscaling over EKS can be accomplished using either the cluster-autoscaler project or Karpenter. If you want to use Spot instances, consider using Karpenter, as it has better integrations with AWS for optimizing spot pricing and availability, minimizing interruptions, and falling back to on-demand nodes if no spot instances are available.

Node selection

Autoscaling applies on the node types we have chosen, and there are several important factors to consider when choosing them:

The most important consideration is the node size. Kubernetes works well with clusters that have many smaller nodes, rather than a few large ones. This provides finer-grained autoscaling and reduces the impact of single nodes becoming unavailable. However, there is a set overhead per node consisting of daemons running on the VM itself (Kubelet, containerd, …), and DaemonSet pods that run on each node. Make sure to take those into account, and choose a node such that those services won’t be a significant percentage of its compute. We recommend working with nodes in the [cm].large-2xlarge range.

It is possible to further optimize your node size by creating multiple node groups. if your workloads have diverse resource requirements (some need high memory while others need high CPU), this might be a worthwhile optimization to allow flexibility and maximum resource utilization. For example, if you have a workload that requests 8 GiB memory and 1 vCPU, it may make sense to utilize r-type (high memory) instances. That said, for smaller clusters (less than 10 nodes), this can be a hassle to maintain.

When using a rare node type (for example, X1 super high memory instances), you may occasionally run into problems when scaling up. This is especially true if you are limited in your availability zones, or trying to use Spot instances. Our solution was to specify a wide range of similar instances, and allow Karpenter to choose between them. For different clusters we use the c or m instance families, and provide Karpenter with a list such as: c5a, c5, c5ad, c5d, c5n, c6a, c6i, c6id, and c6in.

Finally, since dev environments are bursty and can tolerate cpu disruption, you can choose to use the burstable node types (in AWS - t3, t4), and save some more. At time of writing, in AWS in Frankfurt, a t3a.2xlarge costs $252 per month, while a m5a.2xlarge costs $303, so a 17% discount. This is not super significant, and has some actual downsides, so we will not assume you’ve made this jump.

Spot Instances

One of the best ways to save money on cloud instances is to use them through the Spot or interruptible program through your cloud provider. This can save anywhere from 45%-85% of the cost, without requiring any commitment.

Not all workloads perform well on spot instances, since the chance of a node needing to be replaced is significant higher. It is very worthwhile to check though, since the savings are so significant.

We use Karpenter with a Provisioner that looks like this:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
 &nbsp;name: default
spec:
 &nbsp;requirements:
 &nbsp; &nbsp;- key: node.kubernetes.io/instance-type
 &nbsp; &nbsp; &nbsp;operator: In
 &nbsp; &nbsp; &nbsp;values: # List of node types as described above
 &nbsp; &nbsp;- key: karpenter.sh/capacity-type
 &nbsp; &nbsp; &nbsp;operator: In
 &nbsp; &nbsp; &nbsp;values:
 &nbsp; &nbsp; &nbsp; &nbsp;- "spot"
 &nbsp; &nbsp; &nbsp; &nbsp;- "on-demand"

Which allows Karpenter to choose the most ideal instances for us, and if none are available as Spot instances, use on-demand. Since Karpenter itself does not like to run on the instances it manages, we spin up a small t3.medium instance for Karpenter and a select few other services that don’t tolerate interruptions well enough.

We have seen an average reduction in EC2 cost of about 60%.

Application Adaptations

When using the same Infrastructure as Code (IaC) definitions for production, staging, preview and dev (highly recommended!) it is important to remember to modify them for use for dev environments. There are two main differences between production and dev that are relevant for our purposes:

That dev environments are mostly idle
That we don’t necessarily need the entire environment
That we don’t need to be as resilient

In the following sections we’ll discuss applying these assumptions to the number of replicas, the resource requests and the environment

Workload Replicas

Since we don’t need to be resilient, and the environment is expected to be mostly idle, we can scale down deployments to a single replica. This is true both for regular deployments, but also for stateful sets, where the scale down translates to storage savings as well.

Note that for stateful sets in particular, there may be business logic that might need to change to handle the different replica count, or edge cases that won’t replicate with fewer replicas. Think things like consistent routing of requests to the same stateful services.

Workload Resource Requests

The main difference is that dev environments are idle 99% of the time. In our case, for one of our services, we are seeing the pods use 1-5 mCPU and 50MiB memory, down from abut 100 milli CPU and 160 MiB memory. So around 1/20th the CPU and 1/3 the memory. In this case, we will be able to pack 3 times as many dev environments if we modify the resource requests accordingly.

One of the difficulties with developer environments is that their usage can be “bursty” - idle 99% of the time, but suddenly busy for a short while while the developer is working with them. These spikes can cause an increase in CPU and in memory. Because CPU is a compressible resource, nothing significantly bad will happen if there is some contention, though things may run a bit more slowly. Memory, however, is a different matter. If your application takes a lot more memory when in use, you may need to increase the requests to match, or be susceptible to OOMs and evictions.

Kubernetes is reasonably good at dispersing pods across nodes, and one thing that could happen is that some nodes will end up with the “burstier” pods, while others are quiet. You could try to bind the entire environment to a single node so all nodes behave similarly, though that adds more complexity to the deployment.

Partial environments

A final possibility for resource reduction is to intelligently choose subsets of environments to bring up for dev purposes. For example, instead of bringing up all 20 microservices, you could choose a part of the environment to bring up. There are two main ways to accomplish that:

Automatically Identify which code was changed in the branch, to which services it belongs, and bring up those + their dependencies. Unfortunately, this is very hard to do well, and mistakes mean broken environments.
Pre-define several environment subsets, and bring those up depending on certain criteria or developer request. This is easier, but requires maintenance.

From our experience, investing in partial environments only makes sense if it is easy to do for your system, and you have a relatively large number of services - let’s say more than 20. If you do go this route, it should be possible to save an additional 30-60% of the resources in the environment.

Putting it all together

We’ve gone through 6 complimentary strategies for reducing the cloud resource cost of running developer environments. Together, they can drastically reduce your developer-related cloud bill. Let’s bring this down to actual money. Let’s say we have a project that has 15 microservices, with 3 replicas in production, and an overall resource utilization in production of 80GiB memory and 20 CPUs. This costs us (assuming AWS in Frankfurt):

	Count	Cost	Subtotal
EKS Cluster	1	$70	$70
Load Balancer	1	$21	$21
CloudWatch	1	$10	$10
Static management node (t3.medium)	1	$35	$35
Main EC2 Nodes (m5a.2xlarge)	3	$303	$909
Total			$1045

Our comparison state is one where we deploy a full copy of our infrastructure for each environment. If you already have some of this implemented, take a look at the difference. Lets take a look at the savings we get from applying all the above techniques.

Shared cluster - so we have 0 incremental costs for each environment. Assuming 10 environments on average using the cluster, let’s amortize the cost - (70+21+10+35)/10 = $13.6
Autoscaling - Instead of keeping our 3 main nodes up all the time, we enable autoscaling and reduce their cost to (3*303)/4 = $227.25
Node selection - as discussed in that section, while we could save another 17% and switch to t3 instances, we’ll skip this optimization for now.
Spot instances - implementing spot instances, and using the up to date pricing, we reduce our cost per node to $134.46, and taking into account the autoscaling and node count, a total of (3*134.46)/4 = $100.845
Workload replicas - reducing our replica count to 1 cuts our resource usage by 1/3, and leaves us with 1 instead of 3 nodes. At this point it may make sense to choose smaller nodes, but since pricing is generally linear with node resources, let’s ignore that. Our updated price for the EC2 instances is 134.46/4 = $33.6
Workload resource requests - cutting our resource requests to around one-third translates directly to fewer EC2 instances, so - 33.6/3 = $11.2
Partial environments - if we can easily cut our environment and bring up only what is necessary, we can save another ~50%, and get to 11.2/2 = $5.1

To sum up, we have $13.6 static costs and $5.1 usage costs per env. We’ve gone down more than 50x, from $1045 for our production to $18.7 for each dev env. And since our largest portion is the cluster static costs, we scale really well - increasing this to 20 environments means the price per env drops to $11.9.

This is, of course, a rough approximation, and doesn’t count things like EBS volumes, increased CloudWatch usage or less-than-optimal node utilization. But even if we are off by 20%, we still end up over 40x cheaper.

Sounds great, right? What do I need to do to make this happen?

Make sure you have some declarative way to define the infrastructure - Terraform, CloudFormation, CDK, Pulumi - whatever. This is crucial because making incremental improvements is significantly easier if you can see the exact effects of the changes you are making and roll back as needed.
Next, adopt some kind of environment orchestration solution - while we don’t use Argo or Flux internally, I’ve heard great things about either of them.
Once you have a working setup, start with the infrastructure changes, and adopt them one by one - autoscaling, right-sizing nodes, and switching to spot instances.
Next, modify your application deployment (through Helm / Kustomize / whatever you are using) to reduce replicas and resource requests, and create partial configurations if possible.
Finally, wait a day and go to your AWS Cost Explorer page and see how far you got 🙂.

To actually be able to use these environments for development, you will want a solution that gives developers access to these environments, and provides fast iterations, hot reloading and debugging. For that, (or if all of this seems like a lot of work), let’s get in touch 😉.