A YAML to VM conversion app I wrote for myself.
Introduction
In my daily work, a recurring task involves estimating the hardware requirements for on-premise Kubernetes or, more frequently, OpenShift clusters tailored to specific application needs. These estimations typically target bare-metal server deployments, where direct access to physical resources is assumed. However, scenarios arise where the hosting platform is not bare-metal but a virtualized environment. In such cases, the estimation focus shifts from physical server capacity to determining the appropriate virtual machine capacity to adequately support the cluster and its applications.
To streamline this often repetitive process, I embarked on automating the initial steps through a simple scripting solution. The script aims to take Kubernetes configuration inputs, particularly resource requests defined for applications, and translate them into an estimated VM capacity. What follows is an oversimplified version of this automation effort, intended to illustrate the fundamental logic behind converting Kubernetes workload demands into virtual machine resource requirements.
The code
The core concept behind this endeavor is to translate the declared resource needs of target application deployments within a Kubernetes platform into corresponding virtual machine specifications. By analyzing the deployment specifications, typically captured in a YAML file, the goal is to transpose the application’s CPU, memory, and potentially other resource demands into metrics suitable for provisioning virtual machines. This allows for a more informed approach to sizing the underlying VM infrastructure that will host the Kubernetes cluster and its workloads.
To illustrate this translation process, consider the following YAML file. This file serves as a concrete example of how application deployment specifications are defined within a Kubernetes environment. By examining the resource requests outlined within this YAML, we can begin to understand the raw input that our automation script will process to generate estimations for the necessary virtual machine capacity.
workerNodes: 3
deployments:
- apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: web
image: nginx:latest
resources:
requests:
cpu: "0.5"
memory: "512Mi"
- name: app
image: my-app:v1
resources:
requests:
cpu: "1"
memory: "1Gi"
statefulSets:
- apiVersion: apps/v1
kind: StatefulSet
metadata:
name: db
spec:
replicas: 1
selector:
matchLabels:
app: db
template:
metadata:
labels:
app: db
spec:
containers:
- name: postgres
image: postgres:13
resources:
requests:
cpu: "2"
memory: "4Gi"
daemonSets:
- apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
containers:
- name: fluentd
image: fluent/fluentd:v1.18
resources:
requests:
cpu: "0.25"
memory: "256Mi"
And below, is the code written in Go to make the conversion.
package main
import (
"fmt"
"os"
"gopkg.in/yaml.v3"
)
// Simplified Kubernetes Configuration struct
type KubernetesConfig struct {
WorkerNodes int `yaml:"workerNodes"`
Deployments []Deployment `yaml:"deployments"`
StatefulSets []StatefulSet `yaml:"statefulSets"`
DaemonSets []DaemonSet `yaml:"daemonSets"`
}
type Deployment struct {
Spec struct {
Replicas int `yaml:"replicas"`
Template PodSpec `yaml:"template"`
} `yaml:"spec"`
}
type StatefulSet struct {
Spec struct {
Replicas int `yaml:"replicas"`
Template PodSpec `yaml:"template"`
} `yaml:"spec"`
}
type DaemonSet struct {
Spec struct {
Template PodSpec `yaml:"template"`
} `yaml:"spec"`
}
type PodSpec struct {
Spec struct {
Containers []Container `yaml:"containers"`
} `yaml:"spec"`
}
type Container struct {
Resources Resources `yaml:"resources"`
}
type Resources struct {
Requests ResourceList `yaml:"requests"`
}
type ResourceList struct {
CPU string `yaml:"cpu"`
Memory string `yaml:"memory"`
}
func main() {
if len(os.Args) != 2 {
fmt.Println("Usage: go run main.go <kubernetes-config.yaml>")
return
}
yamlFile, err := os.ReadFile(os.Args[1])
if err != nil {
fmt.Printf("Error reading YAML file: %v\n", err)
return
}
var kubeConfig KubernetesConfig
err = yaml.Unmarshal(yamlFile, &kubeConfig)
if err != nil {
fmt.Printf("Error unmarshaling YAML: %v\n", err)
return
}
totalRequestedCPU := float64(0)
totalRequestedMemory := int64(0)
// Helper function to process Pods and aggregate resources
processPods := func(pods []PodSpec, replicas int) {
for i := 0; i < replicas; i++ {
for _, pod := range pods {
for _, container := range pod.Spec.Containers {
cpuCores, err := parseCPU(container.Resources.Requests.CPU)
if err != nil {
fmt.Printf("Error parsing CPU request '%s': %v\n", container.Resources.Requests.CPU, err)
continue
}
totalRequestedCPU += cpuCores
memoryBytes, err := parseMemory(container.Resources.Requests.Memory)
if err != nil {
fmt.Printf("Error parsing memory request '%s': %v\n", container.Resources.Requests.Memory, err)
continue
}
totalRequestedMemory += memoryBytes
}
}
}
}
// Aggregate resources from Deployments and StatefulSets
for _, deployment := range kubeConfig.Deployments {
processPods([]PodSpec{deployment.Spec.Template}, deployment.Spec.Replicas)
}
for _, statefulSet := range kubeConfig.StatefulSets {
processPods([]PodSpec{statefulSet.Spec.Template}, statefulSet.Spec.Replicas)
}
// Aggregate resources from DaemonSets (running on each worker node)
for _, daemonSet := range kubeConfig.DaemonSets {
processPods([]PodSpec{daemonSet.Spec.Template}, kubeConfig.WorkerNodes)
}
fmt.Printf("Total requested CPU across all Pods: %.2f cores\n", totalRequestedCPU)
fmt.Printf("Total requested memory across all Pods: %d bytes (%.2f GB)\n", totalRequestedMemory, float64(totalRequestedMemory)/(1024*1024*1024))
fmt.Printf("Number of worker nodes specified: %d\n", kubeConfig.WorkerNodes)
// The number of required VMs for worker nodes is at least the number of worker nodes.
fmt.Printf("Minimum number of required VMs for worker nodes: %d\n", kubeConfig.WorkerNodes)
// Further considerations:
// - Capacity of each VM: How much CPU and RAM should each VM have?
// - Oversubscription: Will you allow scheduling more requests than available capacity on a node?
// - Control plane nodes: You'll also need VMs for the Kubernetes control plane.
}
// Helper function to parse Kubernetes CPU string
func parseCPU(cpu string) (float64, error) {
var cores float64
_, err := fmt.Sscan(cpu, &cores)
if err != nil {
return 0, fmt.Errorf("invalid CPU format: %w", err)
}
return cores, nil
}
// Helper function to parse Kubernetes memory string
func parseMemory(memory string) (int64, error) {
var amount float64
var unit string
_, err := fmt.Sscanf(memory, "%f%s", &amount, &unit)
if err != nil {
return 0, fmt.Errorf("invalid memory format: %w", err)
}
multiplier := int64(1)
switch unit {
case "Ki":
multiplier = 1024
case "Mi":
multiplier = 1024 * 1024
case "Gi":
multiplier = 1024 * 1024 * 1024
case "Ti":
multiplier = 1024 * 1024 * 1024 * 1024
}
return int64(amount) * multiplier, nil
}
Run the code;
go mod init my-kubernetes-translator
go mod tidy
go run main.go my-deployment.yaml
The output;
go run main.go my-deployment.yaml ✔ base at 16:58:54 ▓▒░
Total requested CPU across all Pods: 5.75 cores
Total requested memory across all Pods: 8321499136 bytes (7.75 GB)
Number of worker nodes specified: 3
Minimum number of required VMs for worker nodes: 3
If you’re not comfortable with Go, the same script in Python would look like the following.
import yaml
import os
class KubernetesConfig:
def __init__(self, workerNodes=0, deployments=None, statefulSets=None, daemonSets=None):
self.workerNodes = workerNodes
self.deployments = deployments if deployments is not None else []
self.statefulSets = statefulSets if statefulSets is not None else []
self.daemonSets = daemonSets if daemonSets is not None else []
class Deployment:
def __init__(self, spec=None):
self.spec = spec if spec is not None else {"replicas": 1, "template": {"spec": {"containers": []}}}
class StatefulSet:
def __init__(self, spec=None):
self.spec = spec if spec is not None else {"replicas": 1, "template": {"spec": {"containers": []}}}
class DaemonSet:
def __init__(self, spec=None):
self.spec = spec if spec is not None else {"template": {"spec": {"containers": []}}}
class PodSpec:
def __init__(self, spec=None):
self.spec = spec if spec is not None else {"containers": []}
class Container:
def __init__(self, resources=None):
self.resources = resources if resources is not None else {"requests": {"cpu": "0", "memory": "0"}}
class Resources:
def __init__(self, requests=None):
self.requests = requests if requests is not None else {"cpu": "0", "memory": "0"}
class ResourceList:
def __init__(self, cpu="0", memory="0"):
self.cpu = cpu
self.memory = memory
def parse_cpu(cpu_str):
try:
return float(cpu_str)
except ValueError:
return 0.0
def parse_memory(memory_str):
memory_str = memory_str.lower()
amount_str = ""
unit = ""
for char in memory_str:
if char.isdigit() or char == '.':
amount_str += char
else:
unit += char
try:
amount = float(amount_str)
except ValueError:
return 0
unit = unit.strip()
if unit == "ki":
return int(amount * 1024)
elif unit == "mi":
return int(amount * 1024 * 1024)
elif unit == "gi":
return int(amount * 1024 * 1024 * 1024)
elif unit == "ti":
return int(amount * 1024 * 1024 * 1024 * 1024)
else:
# Assume bytes
return int(amount)
def process_pods(pods_data, replicas=1, current_cpu=0.0, current_memory=0):
total_cpu = current_cpu
total_memory = current_memory
for _ in range(replicas):
for pod_data in pods_data:
if 'spec' in pod_data and 'containers' in pod_data['spec']:
for container_data in pod_data['spec']['containers']:
resources_data = container_data.get('resources', {}).get('requests', {})
cpu_str = resources_data.get('cpu', '0')
memory_str = resources_data.get('memory', '0')
total_cpu += parse_cpu(cpu_str)
total_memory += parse_memory(memory_str)
return total_cpu, total_memory
if __name__ == "__main__":
if len(os.sys.argv) != 2:
print("Usage: python main.py <kubernetes-config.yaml>")
os.sys.exit(1)
yaml_file_path = os.sys.argv[1]
try:
with open(yaml_file_path, 'r') as file:
kube_config_data = yaml.safe_load(file)
except FileNotFoundError:
print(f"Error: File not found at {yaml_file_path}")
os.sys.exit(1)
except yaml.YAMLError as e:
print(f"Error parsing YAML file: {e}")
os.sys.exit(1)
worker_nodes = kube_config_data.get('workerNodes', 0)
deployments_data = kube_config_data.get('deployments', [])
stateful_sets_data = kube_config_data.get('statefulSets', [])
daemon_sets_data = kube_config_data.get('daemonSets', [])
total_requested_cpu = 0.0
total_requested_memory = 0
for deployment_data in deployments_data:
replicas = deployment_data.get('spec', {}).get('replicas', 1)
template = deployment_data.get('spec', {}).get('template', {})
if 'spec' in template:
total_requested_cpu, total_requested_memory = process_pods([template], replicas, total_requested_cpu, total_requested_memory)
for stateful_set_data in stateful_sets_data:
replicas = stateful_set_data.get('spec', {}).get('replicas', 1)
template = stateful_set_data.get('spec', {}).get('template', {})
if 'spec' in template:
total_requested_cpu, total_requested_memory = process_pods([template], replicas, total_requested_cpu, total_requested_memory)
for daemon_set_data in daemon_sets_data:
template = daemon_set_data.get('spec', {}).get('template', {})
if 'spec' in template:
total_requested_cpu, total_requested_memory = process_pods([template], worker_nodes, total_requested_cpu, total_requested_memory)
print(f"Total requested CPU across all Pods: {total_requested_cpu:.2f} cores")
print(f"Total requested memory across all Pods: {total_requested_memory / (1024 * 1024 * 1024):.2f} GB")
print(f"Number of worker nodes specified: {worker_nodes}")
print(f"Minimum number of required VMs for worker nodes: {worker_nodes}")
print("Further considerations:")
print("- Capacity of each VM: How much CPU and RAM should each VM have?")
print("- Oversubscription: Will you allow scheduling more requests than available capacity on a node?")
print("- Control plane nodes: You'll also need VMs for the Kubernetes control plane.")
And the run result would be the same.
pip install pyyaml
####
python main.py my-deployment.yaml
1 ✘ base at 17:19:39 ▓▒░
Total requested CPU across all Pods: 5.75 cores
Total requested memory across all Pods: 7.75 GB
Number of worker nodes specified: 3
Minimum number of required VMs for worker nodes: 3
Further considerations:
- Capacity of each VM: How much CPU and RAM should each VM have?
- Oversubscription: Will you allow scheduling more requests than available capacity on a node?
- Control plane nodes: You'll also need VMs for the Kubernetes control plane.
The Outcome
In essence, these scripts offer a foundational simplification for transposing the mental exercise of bare-metal server estimation to the realm of virtual machines. Instead of directly envisioning physical CPU cores and RAM modules, the scripts automate the initial aggregation of application-level resource requests as defined within Kubernetes configurations. This provides a quantifiable starting point — a total demand in vCPUs and memory — which can then be mapped onto appropriate VM instance types and counts.
While the scripts presented are oversimplified and lack the nuances of real-world capacity planning (like considering storage, network, overhead, and oversubscription), they automate the crucial first step of summarizing application resource needs. This automation reduces the manual effort involved in parsing Kubernetes configurations and performing basic arithmetic, allowing for a more data-driven and less error-prone transition from understanding application requirements to estimating the necessary virtual machine capacity, effectively bridging the gap between bare-metal thinking and VM provisioning.
Top comments (0)