Forem: CAST AI

Solving the Reserved Instance Resale Ban With K8s Automation

CAST AI — Mon, 22 Jan 2024 16:58:22 +0000

AWS Reserved Instance (RI) Resale has been banned. How do you keep maximum savings while staying flexible and risk-free?

Quick summary of AWS’s ban on RI resale

AWS prohibited the marketplace resale of Reserved Instances (RIs) acquired at a discount as of January 15, 2024. This means that the “flexible RI reseller market” has effectively lost the ability to provide flexibility and risk-free RI coverage.

According to estimates, the RI “flexibility/trading” market results in annual savings of over $1 billion in cloud costs. These savings are no longer available. Any cloud user that fails to actively address this issue, runs a significant risk of overcommitment, waste, and vendor lock-in.

The CAST AI platform offers a solution that allows for higher savings with full flexibility, zero risk, and zero vendor lock-in.

CAST AI offers a solution: automation

CAST AI achieves significant savings without lock-in by applying AI to continuously ensure efficiency at all levels of the Kubernetes cloud environment.

It’s a continuous process that involves bin packing pods into nodes, selecting the most cost-efficient instances, and scaling them up and down in line with actual application demand. CAST AI runs these processes automatically, 24/7.

Instead of covering a non-optimized environment with groups of non-flexible and typically wasteful 1- or 3-year commitments, CAST AI makes sure every K8s environment is as efficient and flexible as possible, automatically and with practically zero set-up time.

In the case of existing commitments, CAST AI will help customers make the most of their existing RIs by prioritizing the usage of reserved capacity.

With CAST AI, you can help your customers drive more savings, maintain resource type flexibility, and avoid expensive long-term commitments. I’ll soon explain how.

CAST AI features

A well-optimized cloud environment keeps the following principles balanced:

Availability and compliance – the cornerstones of every environment.
Scalability – being able to grow and meet the workload’s needs, but also to scale down when not needed.
Cost efficiency – instance types, amount of infra provisioned, and the leveraging of different discounting mechanisms offered by the cloud providers.
Continuous compliance with all of the above in an always changing environment.

With CAST AI, your customers get all of the above for your K8s environment, and the best part: it’s fully automated.

Autoscaling

CAST AI automatically scales up and down (thousands of nodes in minutes if needed) while continuously making sure to provision only the resources needed.

Bin packing

The platform bin packs pods into nodes to help your customers achieve optimal resource utilization.

Automated instance selection

CAST AI selects the most cost-efficient and performance-optimized instance types that stay compliant and are always available.

Workload rightsizing

On top of that, CAST AI will identify pods that request more resources than the workload actually needs and, if allowed, will automatically manage a feedback cycle that will continuously rightsize the pods themselves.

Spot instance automation

Lastly, if applicable for a specific workload, CAST AI can run selected workloads on spot instances with an automated fallback to on-demand.

How to achieve optimal resource utilization, that includes RIs

To make the most of RIs, only commit to what you’re guaranteed to use.

CAST AI will continuously optimize your customers’ environments and keep them available, all the while improving performance and reducing costs dramatically. All of this is fully automated, without the need to commit to the cloud provider for long-term consumption.

It takes about 5 minutes to install the CAST AI agent and get an analysis of your potential savings. Your customers also get access to K8s cost monitoring with dashboards to report on costs by cluster, workload, namespace, node, and more.

Once your customers are ready to enable automation, CAST AI offers a swift and free POC that will generate actual savings and showcase the automation and performance of the platform.

Partner with the #1 Kubernetes automation platform

CAST AI automates Kubernetes cost, performance, and security management in one platform, achieving over 60% cost savings for its users. Become a partner!

Kubernetes DaemonSet: Practical Guide to Monitoring in Kubernetes

CAST AI — Fri, 01 Dec 2023 11:38:51 +0000

As teams moved their deployment infrastructure to containers, monitoring and logging methods changed a lot. Storing logs in containers or VMs just doesn’t make sense – they’re both way too ephemeral for that. This is where solutions like Kubernetes DaemonSet come in.

Since pods are ephemeral as well, managing Kubernetes logs is challenging. That’s why it makes sense to collect logs from every node and send them to some sort of central location outside the Kubernetes cluster for persistence and later analysis.

A DaemonSet pattern lets you implement node-level monitoring agents in Kubernetes easily. This approach doesn’t force you to apply any changes to your application and uses little resources.

Dive into the world of DaemonSets to see how they work on a practical example of network traffic monitoring.

What is Kubernetes DaemonSet? Intro to node-level monitoring in Kubernetes

A DaemonSet in Kubernetes is a specific kind of workload controller that ensures a copy of a pod runs on either all or some specified nodes within the cluster. It automatically adds pods to new nodes and removes pods from removed nodes.

This makes DaemonSet ideal for tasks like monitoring, logging, or running a network proxy on every node.

DaemonSet vs. Deployment

While a Deployment ensures that a specified number of pod replicas run and are available across the nodes, a DaemonSet makes sure that a copy of a pod runs on all (or some) nodes in the cluster. It’s a more targeted approach that guarantees that specific services run everywhere they’re needed.

DaemonSets provide a unique advantage in scenarios where consistent functionality across every node is crucial. This is particularly important for node-level monitoring within Kubernetes.

By deploying a monitoring agent via DaemonSet, you can guarantee that every node in your cluster is equipped with the tools necessary for monitoring its performance and health. This level of monitoring is vital for early detection of issues, load balancing, and maintaining overall cluster efficiency.

An alternative approach – which involves manually deploying these agents or using other types of workload controllers like Deployments – could lead to inconsistencies and gaps in monitoring.

For example, without a DaemonSet, a newly added node might remain unmonitored until it’s manually configured. This gap could pose a risk to both the performance and security of the entire cluster.

The benefits of DaemonSets

DaemonSets automate this process, ensuring that each node is brought under the monitoring umbrella without any manual intervention as soon as it joins the cluster.

Furthermore, DaemonSets aren’t just about deploying the monitoring tools. They also manage the lifecycle of these tools on each node. When a node is removed from the cluster, the DaemonSet ensures that the associated monitoring tools are also cleanly removed, keeping your cluster neat and efficient.

In essence, Kubernetes DaemonSets simplify the process of maintaining a high level of operational awareness across all nodes.

They provide a hands-off, automated solution that ensures no node goes unmonitored, enhancing the reliability and performance of Kubernetes clusters. This makes DaemonSets an indispensable tool in the arsenal of Kubernetes cluster administrators, particularly for tasks like node-level monitoring that require uniform deployment across all nodes.

Head over to K8s docs for details about the Kubernetes DaemonSet feature.

How do DaemonSets work?

A DaemonSet is a Kubernetes object that is actively controlled by a controller. You can define whatever state you wish for it – for example, declare that a specific pod should be present on all nodes.

The tuning control loop compares the intended state to what is currently happening. If a matching pod doesn’t exist on the monitored node, the DaemonSet controller will create one for you. This automated approach applies to both existing and newly created nodes.

By default, the DaemonSet creates pods on all nodes. You can use the node selector to limit the number of nodes it can accept. The DaemonSet controller will only create pods on nodes that match the YAML file’s preset nodeSelector field.

Here’s a DaemonSet example for creating nginx pods only on nodes that have disktype=ssd label:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: nginx
  name: nginx-daemonset
spec:
  selector:
    matchLabels:
      name: nginx-pod
  template:
    metadata:
      labels:
        name: nginx-pod
    spec:
      containers:
      - image: nginx:latest
        name: nginx-container
        ports:
        - containerPort: 80
      nodeSelector:
        disktype: ssd

When you add a new node to the cluster, that pod is also added to the new node. When a node is removed (or the cluster shrinks), Kubernetes automatically garbage-collects that pod.

Network traffic monitoring with DaemonSet

In the ever-evolving landscape of network management, understanding and overseeing network traffic is pivotal.

Network traffic essentially refers to the amount and type of data moving across your network – this could be anything from user requests to data transfers. It’s the lifeblood of any digital environment, influencing the performance, security, and overall health of your network.

The role of DaemonSets in traffic monitoring

How do you keep an eye on this in a Kubernetes environment? This is where DaemonSets come into play.

As you already know, DaemonSets are a Kubernetes feature that allows you to deploy a pod on every node in your cluster.

Why is that important for network traffic monitoring?

Well, each node in your Kubernetes cluster can be involved in different kinds of network activities. By deploying a monitoring agent on every node, you get a comprehensive view of what’s happening across your entire cluster.

You might be wondering now:

Why not just use a Deployment and adjust the number of replicas to run on one or maybe two nodes to monitor the traffic of all nodes?

It sounds simpler, but here’s the catch:

Security and isolation: In Kubernetes, each node operates in its own isolated environment. This means that a pod on one node can’t directly monitor or access the network traffic of another node due to the security policies and Linux namespaces. These security measures are crucial for maintaining the integrity of your cluster.
Accurate and localized data: By having a monitoring agent on each node, you get precise, localized data about the traffic. This level of granularity is essential for effective monitoring, as it helps in identifying specific issues and bottlenecks that might occur on individual nodes.
Scalability and reliability: Using DaemonSets ensures that your monitoring setup scales with your cluster. As nodes are added or removed, the DaemonSet automatically adjusts, deploying or removing pods as needed. This dynamic scalability is a core requirement for maintaining a robust monitoring system in a growing or changing environment.

As you can see, using DaemonSets for network traffic monitoring in a Kubernetes cluster isn’t just a matter of convenience; it’s a necessity for accurate, secure, and scalable network analysis.

Each node has its own unique traffic patterns and potential issues, and DaemonSets ensure you don’t miss out on these critical insights. They empower you to maintain a high-performing and secure Kubernetes environment by providing a bird’s-eye view of your network traffic, node by node.

Simplifying network traffic monitoring in Kubernetes

When it comes to keeping tabs on network traffic in your Kubernetes cluster, the road can be complex and challenging.

Those keen on DIY approaches might consider building a custom solution. This could involve leveraging tools like conntrack to monitor each pod’s traffic, crafting intricate logic to process and store data, and continuously tackling a variety of potential issues that might arise along the way.

While this approach offers flexibility, it’s often resource-intensive and riddled with complexities.

A streamlined alternative to network monitoring

Alternatively, what if you could bypass these hurdles and jump straight to an efficient, ready-to-use solution?

That’s exactly what our open-source egressd tool offers. It’s designed to simplify network traffic monitoring in Kubernetes, providing a comprehensive and hassle-free approach.

egressd consists of two main components:

Collector – a DaemonSet pod responsible for monitoring network traffic on nodes.
Exporter – a Deployment pod that fetches traffic data from each collector and export logs to HTTP or Prometheus. Here’s what our solution brings to the table:

1. Efficient conntrack monitoring

egressd retrieves conntrack entries for pods on each node at a configured interval, defaulting to every 5 seconds.

If you’re using Cilium, it fetches conntrack records directly from eBPF maps located in the host’s /sys/fs/bpf directory, which are created by Cilium.

For setups using the Linux Netfilter Conntrack module, it leverages Netlink to obtain these records.

2. Intelligent data reduction

The records are then streamlined, focusing on key parameters like source IP, destination IP, and protocol to provide a clear picture of network interactions.

3. Enhanced with Kubernetes context

We enrich the data by adding Kubernetes-specific context. This includes information about source and destination pods, nodes, node zones, and IP addresses, giving you a comprehensive view of your cluster’s network traffic.

4. Flexible export options

The exporter in our solution is designed to be versatile, offering the capability to send logs either to an HTTP endpoint or to Prometheus for detailed analysis and alerting.

Sidestep the complexity of building and maintaining a custom solution with egressd

You get a solid, ready-to-deploy system that seamlessly integrates into your Kubernetes environment, providing detailed, real-time insights into your network traffic. This means you can focus more on strategic tasks and less on the intricacies of monitoring infrastructure.

Additionally, egressd provides you with two options:

egressd can be installed as a standalone tool that will track your network traffic movements within the cluster, which you can then visualize in Grafana to get a better picture of your network:

Alternatively, if you’re a CAST AI user, you can connect egressd to your dashboard to get all the benefits of our fancy cost reports.

This way, you can see not only the amount of traffic within the cluster but also get more insights about workload-to-workload communication – and how much you pay for that traffic as it differentiates between different providers/regions/zones.

Check out how we used the network cost report to reduce egress costs by 70%.

Wrap up

Kubernetes DaemonSets come in handy for logging and monitoring purposes, but this is just the tip of the iceberg. You can also use them to tighten your security and achieve compliance by running CIS Benchmarks on each node and deploying security agents like intrusion detection systems or vulnerability scanners to run on nodes that handle PCI and PII-compliant data.

And if you’re looking for more cost optimization opportunities, get started with a free cost monitoring report that has been fine-tuned to match the needs of Kubernetes teams:

Breakdown of costs per cluster, workload, label, namespace, allocation group, and more.
Workload efficiency metrics, with CPU and memory hours wasted per workload.
Available savings report that shows how much you stand to save if you move your workloads to more cost-optimized nodes.

Kubernetes Lens: How To Enhance Your Kubernetes Cluster

CAST AI — Wed, 19 Jul 2023 06:42:16 +0000

You probably know this feeling really well: one day you’re managing clusters like a pro, and another day you face a tornado of errors and bugs attacking you everywhere. We all love Kubernetes, but saying that it makes things a bit complicated would be an understatement. There’s a reason why solutions that make DevOps lives easier – with Kubernetes Lens among them – are popping up all over the place.

Kubernetes gives us a lot of good stuff – portability, extensibility, openness to automation, and an easier time managing containerized applications. But it has a lot of moving parts and tricky areas around scaling clusters, orchestrating storage, batch processing, and many others. Using command-line CLIs

Another problem with Kubernetes is the use of command-line interfaces (CLIs), which may overwhelm anyone used to the clarity of modern GUIs.

No wonder the market is full of tools that solve such Kubernetes-specific issues. One of them is Kubernetes Lens, a solution that has gained a lot of traction recently. What problems does it help DevOps solve, and how do you make it work? Keep reading and learn more about Kubernetes Lens.

What is Kubernetes Lens?

Kubernetes Lens is an open-source integrated development environment (IDE) for Kubernetes. It simplifies K8s management by letting cloud-native developers manage and monitor clusters in real time.

In 2020, Mirantis purchased Lens from Kontena and later open-sourced it and made it freely available (here’s the repo). A number of Kubernetes and cloud-native ecosystem pioneers, including Apple, Rakuten, Zendesk, Adobe, and Google, support it.

A stand-alone tool that works on macOS, Windows, and various Linux distributions, Lens lets developers connect to and manage multiple Kubernetes clusters.

It comes with a clear, user-friendly graphical interface that helps you deploy and manage clusters directly from the console. Lens also provides dashboards that deliver helpful metrics and insights into everything that happens on a cluster, such as installations, configurations, networking, storage, and access control.

The latest and most significant release of Lens is Lens 6. It has the following new features:

You can launch a local Minikube development environment, complete with a single-node Kubernetes cluster operating on a local virtual machine (VM).
Container Security, which provides security reports on Common Vulnerabilities and Exposures (CVE) right from the Lens desktop.
Built-in technical help chat (available only with a premium membership).

The new version introduces a different subscription model with the following tiers:

Lens Personal subscription – this package is meant for personal use, education, and organizations with annual income or financing of less than $10 million.
Lens Pro subscription – for professional usage in large companies, $19.90 per month or $199 per year per user.
The licensing for the community version, OpenLens, remains unchanged, as do the upstream projects utilized by Lens Desktop.

What problems does Kubernetes Lens solve?

Cloud-native DevOps teams develop applications by iterating locally and using version control and CI/CD to push code to a sequence of Kubernetes clusters for testing, staging, and production. Clusters are built and scaled using the same tools by operators. The coordination of cluster management often becomes an issue.

Small, task-specific clusters are prioritized above big clusters in many companies. As a result, teams end up managing a large number of clusters. The issue here is that the CLIs that interface with clusters consume lots of files, making it difficult to handle the complex and diverse set of methods and contexts.

When scaling up apps and clusters, managing infrastructure via the command line is sluggish and error-prone. Configurations may also differ, making tracking more challenging. An IDE combines the tools and information required to deal with various settings and jobs.

Kubernetes Lens helps in dealing with these issues by:

Reducing the complexity of setting cluster access and enabling you to automatically add clusters.
Discovering local kubeconfig files automatically and allowing you to manage clusters across practically any infrastructure.
To cope with cluster sprawl, it is now feasible to arrange a large number of Kubernetes clusters.
Managing various kubectl versions – Lens installs the version necessary by each cluster.
Interactions with RBAC requirements are automatically restricted so that users only access resources that are authorized.
Installing Prometheus instances automatically in any namespace to deliver metrics.

Key features of Kubernetes Lens

Cluster management

Adding a Kubernetes cluster to Lens is really simple: direct the local/online kubeconfig file to Lens, and it will detect and connect to it. Lens allows you to view all of the resources operating within your cluster, from simple pods and deployments to bespoke types provided by your apps.

With Kubernetes Lens, you can work on many clusters while preserving context with each of them. It organizes and reveals the complete working system in the cluster while delivering analytics, allowing you to setup, change, and reroute clusters with a single click. With this knowledge, you can make changes fast and confidently.

Lens Connector is one cool feature of Lens. It’s a built-in terminal employs a kubectl version that is API-compatible with your cluster. The terminal can identify your cluster version automatically and then assign or download the appropriate version in the background. As you transition from one cluster to another, you keep the right kubectl version and context.

User-friendly GUI

Since managing many clusters across diverse platforms requires interpreting multiple access contexts, modes, and techniques for structuring the infrastructure, Lens provides a solution to administer Kubernetes via GUI.

Solving all of these via the command line would otherwise be complex, time-consuming, and prone to error. This is due, in part, to the ever-increasing number of clusters and applications, as well as their configurations and requirements.

Using Kubernetes Lens GUI, you can:

manually add clusters by browsing through their kubeconfigs,
quickly find kubeconfig files on your own system,
organize clusters into workgroups based on how you interact with them,
visualize the state of objects in your cluster, such as pods, deployments, namespaces, network, storage, and even custom resources – making it simple to detect and debug any cluster issues.

And if you still enjoy using the CLI, you can use Lens’s built-in terminal to run your preferred kubectl command line.

Metrics and visualization

Kubernetes Lens has a Prometheus configuration with multi-user functionality that provides role-based access control (RBAC) for each user. This implies that users may only view visualizations for which they have authorization.

When you configure a Prometheus instance in Lens, it offers cluster metrics and visualizations. Lens autodetects Prometheus for that cluster after installation and starts providing cluster metrics and visualizations. You may also use Prometheus to preview Kubernetes manifests before applying them.

Real-time graphs, resource utilization charts, and use data such as CPU, RAM, network, and requests are available with Prometheus and become part of the Lens dashboard. You’ll see these metrics displayed in the context of the specific cluster in real time.

Handy integrations

Lens smoothly integrates with a wide range of Kubernetes tools. One good example is Helm, which helps you make Helm charts and releases simple to deploy and manage in Kubernetes.

You can access available Helm repositories from the Artifact Hub, which, by default, adds a Bitnami repository if no other repositories are specified. Other repositories can be added manually via the command line if necessary.

Lens Extensions

Kubernetes Lens Extensions enable you to add new and custom features and visualizations to expedite the development processes for all Kubernetes-integrated technologies and services.

Kubernetes Lens also allows you to use the Lens APIs to script your own extensions. They let you add additional object information, create custom pages, add status bar items, and make other UI changes. The Kubernetes Lens install screen can use a tarball link that extensions have uploaded to npm to produce.

Collaboration and teamwork via Kubernetes Lens Spaces

Kubernetes Lens encourages teamwork and collaboration with its Spaces feature. It's basically a location for cloud-native development teams and projects to collaborate.

You can easily organize and access your team clusters from anywhere with a Lens space: EKS, GKE, AKS, on-premises, or a local dev cluster. Users will be able to quickly access and securely share all clusters in one space.

Kubernetes Lens alternatives to make your K8s ride even smoother

Kubernetes Lens is a powerful tool for everyone looking to get things done quickly and move on to more impactful activities. You can find even more solutions in the K8s ecosystem that streamline your work even more.

CAST AI is an autonomous Kubernetes management platform that automates a lot of the heavy lifting around cloud infra management to make engineers more efficient. Once you onboard CAST AI, you’ll end up using Lens even less – and put cloud automation in place to do a lot of things for you.

Check out the docs to learn more about CAST AI’s capabilities around autoscaling, instance rightsizing, automated instance provisioning and decommissioning, and more.

Node Affinity, Node Selector, and Other Ways to Better Control Kubernetes Scheduling

CAST AI — Fri, 26 May 2023 08:14:37 +0000

Assigning pods to nodes is one of the most critical tasks of Kubernetes cluster management. While the default process can prove too generic, you can adjust it with advanced features like node affinity.

The way the Kubernetes scheduler distributes pods across worker nodes impacts performance and resources and, therefore, your costs. It's then essential to understand how the process works and how to keep it in check.

This article outlines basic Kubernetes scheduling concepts, including node selector, node affinity and anti-affinity, and pod affinity and anti-affinity. It also includes an example of how combining node affinity and automation can improve your workload's availability and fault tolerance.

How Kubernetes scheduling works

Kubernetes scheduling is about selecting a suitable node to run pods. Kube-scheduler is part of the control plane and it selects nodes for new or not yet scheduled pods, by default trying to spread them evenly.

Containers in pods can have different requirements, so the Kubernetes scheduler filters out any nodes that don't match the pod's specific needs.

The Kubernetes scheduler identifies and scores all feasible nodes for your pod. It then picks the one with the highest score and notifies the API server about this decision.

Several factors impact the scheduler's decisions, such as resource requirements, hardware and software constraints, etc.

The Kubernetes scheduler is fast, thanks to automation. However, it can be expensive as you may have to pay for resources that are insufficient for your different environments.

And as there's no easy way to track your costs in Kubernetes, teams must find other ways to keep their expenses in check.

How to control the scheduler’s choices

In a nutshell, you can control where your pods go with Kubernetes labels.

Labels are key/value pairs you can manually attach to objects like pods and nodes. By using them, you can specify identifying attributes, organize, or select subsets of objects.

The simplest way to constrain the Kubernetes scheduler is to use a node selector.

How does a node selector work?

Adding the node selector field to your pod specification with a key-value pair lets you indicate the labels you wish the target node to have.

Kubernetes will only schedule pods onto the nodes matching the labels you specify.

The node selector is sufficient in small clusters but is usually unsuitable for complex cases.

For example, you may have an app that needs to run in separate availability zones. Or you may want to keep the API and database separate, e.g., when you don’t have many replicas.

That’s where the concept of affinity comes in handy.

Moving beyond the node selector with affinity

Affinity and anti-affinity expand the types of constraints you can add and give you more control over the selection logic.

Using them, you can create "preferred" and "soft" rules for different conditions for Kubernetes to schedule the pod even if there are no perfectly matching nodes. They also let you match the labels of pods running on the same nodes and specify the location of new pods more precisely.

It’s essential to keep in mind that there are two types of affinity:

Node affinity refers to impacting how pods get matched to nodes.
Pod affinity specifies how pods can be scheduled based on the labels of pods already running on that node.

Let’s now discuss both of them to highlight the difference.

Node affinity: what is it, and how does it work?

Similar to node selector, node affinity also lets you use labels to specify to which nodes Kube-scheduler should schedule your pods.

You can specify it by adding the .spec.affinity.nodeAffinity field in your pod.

Remember that if you specify nodeSelector and nodeAffinity, both must be met for the pod to be scheduled.

There are two types of node affinity:

requiredDuringSchedulingIgnoredDuringExecution – when using this one, the scheduler will only schedule the pod if the node meets the rule.
preferredDuringSchedulingIgnoredDuringExecution– in this scenario, the scheduler will try to find a node matching the rule, but it will still schedule the pod even if it doesn't find anything suitable.

The latter lets you specify each instance’s weight by using a value between 1 and 100.

When the scheduler finds nodes meeting all of your unscheduled pod’s requirements, Kube-scheduler iterates through every preferred rule that the node matches and adds the value of the weight to a sum.

The Kubernetes scheduler then adds this sum to the final score, impacting your pod's final node decision.

What is pod affinity?

Working along similar lines, this concept focuses on impacting the Kubernetes scheduler based on the labels on the pods already running on a given node.

You can also specify it within the affinity section using the podAffinity and podAntiAffinity fields in the pod spec.

Pod affinity assumes that a given pod can run in a specific location if there is already a pod meeting particular conditions.
Pod anti-affinity offers the opposite functionality, preventing pods from running on the same node as pods matching particular criteria.

There is a separate post diving into inter-pod affinity and anti-affinity.

That’s why, for now, let’s focus on one practical application of node affinity.

Node affinity in action: high availability and fault tolerance

Availability is the holy grail of migrating to the cloud, and you can also boost it with node affinity.

By spreading pods across several different nodes, you can ensure that your application remains available even if one or more of those nodes fail.

With node affinity, you can instruct the Kubernetes scheduler to choose nodes in different availability zones, data centers, or regions. By doing so, your app can continue running even if your AZ or data center experiences an outage.

If you then add Kubernetes automation, you can ensure that pods get scheduled in the preferred zones even if they’re not present in your cluster.

Here is an example deployment of an on-demand instance on AWS with affinity set for a single zone:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-cross-single-az
  labels:
    app: nginx-cross-single-az
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx-single-az
  template:
    metadata:
      labels:
        app: nginx-single-az
    spec:
      nodeSelector:
        topology.kubernetes.io/zone: "eu-central-1a"
      containers:
      - name: nginx
        image: nginx:1.24.0
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 2

In this case, the node selector will pick nodes with the label "topology.kubernetes.io/zone" set to "eu-central-1a".

For comparison, here’s an example of node affinity set for multi-zone pod scheduling:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-cross-az
  labels:
    app: nginx-cross-az
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx-cross-az
  template:
    metadata:
      labels:
        app: nginx-cross-az
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "topology.kubernetes.io/zone"
                operator: In
                values:
                - eu-central-1a
                - eu-central-1b
                - eu-central-1c
      containers:
      - name: nginx
        image: nginx:1.24.0
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 2

In this scenario, CAST AI will create nodes across multiple AWS zones that match your requirements. It will use on-demand instances, and all provisioning will happen automatically.

Summary

Kubernetes affinity is an important feature allowing you to control your pod scheduling better.

Pod and node affinity and anti-affinity let you have more say on where your pods get scheduled. By specifying these rules, you get more scheduling configurations.

Add automation to ensure that your pods get distributed across the most suitable nodes at all times and easily keep a tab on all related costs.

Docker Hub Alternatives: On the Lookout for a Container Image Repository for OSS

CAST AI — Fri, 24 Mar 2023 09:21:54 +0000

As a leading containerization service, Docker has been particularly popular across the open-source ecosystem. However, its recent moves have proved controversial enough for some teams to seek Docker alternatives.

Its Q1 2023 announcement of sunsetting the free version of Docker Hub has caused quite a stir in the OSS ecosystem. The change would affect numerous projects and companies relying on them, including CAST AI.

While Docker has clarified its plans, the dust has yet to settle completely. Read on to learn more about the recent change and potential alternatives to Docker Hub.

Why is Docker Hub significant?

Docker Hub¹ is the world's largest container image repository. It comes with various content sources, such as container community developers, open source projects, and software vendors.

Docker Hub's free version – Free Team – lets users store, share, and access container images across public repositories. Premium plans also enable creating private repos and restricting content to specific user groups.

The Free Team plan has been particularly popular for open source images because of its business model. OSS initiatives usually scrape for funds, so they liked the ability to upload public images for free while the downloader covered the costs.

As a result, Docker Hub was a popular image repository choice for open source initiatives. While large OSS can strike a deal with enterprise image registries or get support from the likes of RedHat or Oracle, Free Team also attracted smaller projects.

What happened to Docker Hub in 2023?

Suddenly in March 2023, Docker announced plans to discontinue and delete all organizations using its repository's free version. The way the company communicated this message has caused quite an outcry.

Docker sent emails warning users that their accounts would be deleted after 30 days unless they switched to one of the paid subscriptions. This could potentially affect most open-source projects using Docker to host their images – and the deadline was nigh (April 14, 2023).

While the company has a provision for such projects under its open-source program², it has reportedly not been of much help. The sign-up process isn’t fast and projects must satisfy a range of criteria. According to some accounts, processing applications sometimes takes even more than a year.

The announcement practically meant that OSS wishing to save their images would have to go from paying nothing to a few hundred dollars annually. With many OS initiatives having little to no funding and mainly depending on voluntary contributions, such short deadlines for switching simply didn't add up.

Moreover, as many OSS have published images to Docker Hub in this way for years, the danger of cybersquatting the image and publishing malicious content became real.

A few days later, Docker apologized³ for the lack of clarity in its message. It promised to only remove images if their maintainers decide to delete them and committed to assigning more staff to review requests for OSS support. The company also stated that users with a Free Team organization could migrate to a Personal user account.

While this message reassured many teams that they didn't need to take immediate action, the discussion about Docker alternatives is still up in the air.

Searching for Docker alternatives

The team behind CAST AI depends on many OSS images, so we didn't wish to go dark on April 14. Like many other teams, we started thinking about workarounds.

Luckily, in light of Docker's announcement and its subsequent 'mea culpa', we understood that we wouldn't need to take any action. Phew!

However, we found interesting Docker alternatives in this write-up from OpenFaaS' Alex Ellis⁴. One of the potential workarounds he outlines is to completely delete your organization on Docker Hub and recreate it as a free personal account. This step should suffice to prevent hostile takeovers of your name.

While large projects can't delete their organizations, smaller initiatives that can tolerate some downtime could try the following steps. First, create a new personal account, and use it to mirror all images from the organization. They could then delete the organization and rename the personal account accordingly.

Another actionable idea is to start publishing to GitHub's Container Registry⁵, which offers free storage for public images. GitHub is, of course, far from perfect, but with recent developments, including Actions and GHCR, it makes it easier to publish images.

There are also other registries that offer free hosting for open sources, such as GitLab and Quay. Not to mention that you could also host your own registry.

When migrating images, the crane tool by Google's open source office can mirror them much more efficiently than Docker pull, tag and push. Another helpful solution is CNCF's Harbor⁶, an open source registry that promises to have a mirroring capability.

Summary

Docker Hub is a useful repository for images that support open-source projects. However, the recent sunsetting announcement has unsettled the OSS community, and the discussion about Docker alternatives will likely continue.

Open source contributors already do their best to serve their communities, so they will surely go the extra mile to ensure their projects continue working as required. One way or another, they will find a way forward – and we hope the above list will also be helpful.

References

^{[1] - Docker Hub}
^{[2] - Docker-Sponsored Open Source Program}
^{[3] - Docker: We apologize.}
^{[4] - Alex Ellis}
^{[5] - GitHub Container Registry}
^{[6] - Harbor}

The Truth About CloudWatch Pricing

CAST AI — Fri, 10 Mar 2023 11:43:03 +0000

Amazon CloudWatch pricing can get tricky. CloudWatch's pricing is based on three parts, just like any other observability tool: a data ingest pipeline, a place to store data, and a management console.

When using it, you’ll face costs related to ingesting data into the data store, retaining data there, and using the visualization and management tools that help you derive insights from data.

But how exactly does CloudWatch pricing work? And how can you keep tabs on it so the charges don’t spiral out of control?

Keep reading to find out—or jump directly to CloudWatch pricing best practices.

What is Amazon CloudWatch?

Amazon CloudWatch is a built-in AWS tool that keeps an eye on all of your AWS resources and applications.

Head over to the CloudWatch home page, and you’ll see metrics for every single AWS service you’re using. You can also build custom dashboards to display stats about your custom apps and metric groups.

CloudWatch lets you set up alarms that notify you when a threshold is exceeded - or even adjust the monitored resources automatically (EC2 Auto Scaling and Amazon Simple Notification Service actions)

For example, you can use CloudWatch to monitor different aspects of your EC2 instances such as CPU utilization and disk reads and writes. Once you gather enough data, you can decide whether to deploy more instances to handle the increasing load or stop underutilized instances to save money.

How do you access Amazon CloudWatch?

Head over to your AWS account and use one of the following methods to access CloudWatch:

How does CloudWatch work?

Amazon CloudWatch works like a metrics repository where services like EC2 can add their metrics, allowing you to retrieve statistics based on those metrics at any time. This graph helps to understand this:

Source: AWS documentation

CloudWatch stores metrics separately per region, but thanks to the cross-region feature, it lets you aggregate statistics from different regions.

Now that we've gotten the fundamentals out of the way, let’s focus on pricing.

CloudWatch pricing tiers

In general, CloudWatch charges for its Metrics service based on the number of metrics submitted to it and the frequency with which the API is called to transmit or fetch a metric.

The higher your cost, the more metrics you provide to CloudWatch, and the more frequently you access the API.

This is vital to remember since the more metrics you track, the easier it is to analyze particular problems in your system. And the faster you submit data, the more detailed and exact you can be when troubleshooting service issues.

CloudWatch is mostly priced based on how accurate the data it collects and stores is. The more accurate the data, the more Cloudwatch costs.

Free

The CloudWatch free tier is applied to your service automatically before you receive any charges based on the tool’s paid tier. It comes with small allowances for every CloudWatch service, such as Metrics, Logs, and Dashboards.

AWS offers three types of free tier:

Always free - this option doesn’t have an expiry data and is available to all users.
12 months free - available for 12 months from the sign up.
Trial - trials are short-term offers counted from the moment of activating the service.

The Amazon CloudWatch free tier gives you access to:

Basic monitoring metrics
10 detailed monitor metrics
1 million API requests
10 alarm metrics
5 GB of log data ingestion and archive
3 dashboards with up to 50 metrics per month

The free plan gives you a great opportunity to try out CloudWatch and check if the paid plan is a worthwhile investment.

Still, don’t forget that basic monitoring metrics are, well, pretty basic. We’re talking about a few core metrics per service to make sure that you can monitor a service for availability and high-level performance. Most AWS services like EC2, EBS, RDS, or S3 offer basic monitoring none of the tracked metrics here will be billed to your account.

Paid

The CloudWatch paid tier charges differ by region and are subject to change. To check the pricing for your region, go to the CloudWatch pricing page or use the AWS pricing calculator to check the costs for your unique use case.

In general, CloudWatch pricing will be calculated based on the features you use, like:

Metrics
Alarms
Dashboards
Events
Logs
Contributor insights

Note that every feature is priced differently - and some are way pricier than others.

Consider this example:

Pricing for the US (East) Ohio region is as follows:

First 10,000 metrics – $0.30
Next 240,000 metrics – $0.10
Next 750,000 metrics – $0.05
Over 1,000,000 metrics – $0.02
API – $0.01 per 1000 metrics requested
Dashboards – $3.00 per dashboard
Alarm – $0.10 per alarm metric at a standard resolution of 60 seconds
Logs – $0.50 per GB of data collected; $0.03 per GB for data stored

4 best practices for CloudWatch costs

1. Monitor EC2 like a pro

Remember about memory - The basic monitoring metrics for EC2 instance are CPU load, disk I/O, and network I/O metrics. What about memory? Well, you need to set up a custom metric to track this!

Choose detailed monitoring when needed - By default, Amazon EC2 sends data to CloudWatch in 5-mi intervals. If this level of monitoring isn’t enough for you, go for detailed monitoring, which delivers metrics in 1-min intervals to help you act faster. Note: from a price standpoint, detailed monitoring charges all of the basic monitoring measures as specialized metrics.

For instance, let's say you run 12 EC2 instances and set them up for detailed monitoring. You’ll have to pay $22.2 for your CloudWatch Metric each month. The common EC2 instance types have 7 built-in metrics tracked for them by default. So:

7 metrics per instance * 12 instances = 84 metrics in total

Sure, you get 10 metrics for free as part of the CloudWatch free tier, so AWS will charge you for 74 metrics. And for the first 10k metrics, the charge per metric is $0.30/month: 74 metrics * $0.30 = $22.2 per month.

Don’t get fooled by custom metrics - they’re tracked differently from built-in metrics as you pay for the amount of custom metrics tracked as well as the number of API calls made.

So, if you want to monitor memory use at a resolution faster than 1 minute, your API calls will grow too. For example, if you want to track memory use at a 15-second resolution, you'll have to pay four times the amount in API expenses because the API request will be performed four times every minute.

2. Consider dimensions

Every metric has unique qualities that define it, and dimensions work like categories for those traits. A metric might be CPU usage, while a dimension could be CPU core.

On multi-core devices, this might result in CloudWatch tracking a large number of CPU metrics. Dimensions are part of a metric's unique identifier; adding a unique name/value combination to one of your metrics creates a new variant of that metric.

Even though the metrics have the same name, CloudWatch interprets each unique combination of attributes as a different measure.

When deciding whether to track certain dimensions of a metric, it’s critical to consider the dimension's cardinality. High-cardinality metrics, like an IP address or a unique identifier, can cause the number of CloudWatch metrics collected to grow, which can cause your AWS costs to go up by a large amount.

3. Remove custom metrics you don’t need

Metrics strike a balance between data precision and cost. The more data you gather, the more information you’ll have that will come in handy when you need to understand what went wrong when a service fails.

Internal services and less essential external services can withstand some amount of service disruption. Prioritizing the importance and impact of a service correctly will help you to choose which services need additional monitoring expenditures.

Mission-critical services should record the most metrics with the highest resolution, while less important services can record the fewest metrics and have the lowest resolution to keep costs down.

Don’t forget that Custom Metrics and EC2 Detailed Monitoring are premium monitoring options. Excessive or too comprehensive monitoring won’t improve your system's performance but drives costs.

Before monitoring a metric, consider whether it will help diagnose a specific issue. If a metric is already being recorded, check to see if it has been useful in the past and should be watched again. Can Basic Monitoring be sufficient if a service is not a high priority?

4. Lower the resolution of metrics when you don’t need high resolution

Address the metric resolution too! High-resolution metrics might be useful during an analysis to discover a needle in a haystack, but they come at a cost.

If a service is so critical that learning about a problem within seconds is critical, then use a high-resolution measure. If a 5-minute delay in sending metrics and not being able to make metrics more detailed are fine for the service, limiting the resolution of metrics will save you money on Cloudwatch costs.