Forem: Prabhu Jayakumar

Firewall for Applications in Kubernetes

Prabhu Jayakumar — Fri, 19 Feb 2021 20:13:45 +0000

Originally published at https://www.prabhujayakumar.dev

In this blog, we will look at a kubernetes feature which is intended to improve security for applications running in a cluster

In a kubernetes cluster, we can run many applications with multiple replicas for each application. By default, any pods can talk to any other pods running in the same cluster.

But it is not recommended to allow such feature to be in place. In case an intruder get access to a pod, then he/she can access all pods from inside that compromised pod. So we need a firewall for applications using which it can decide if the traffic(both ingress and egress) should be allowed or denied inside the cluster.

There comes the saviour, Network Policy that helps to create a firewall for applications running in kubernetes cluster.

Let's understand the need for such firewall and how Network Policy helps for the same with some examples.

Guestbook

Consider an application Guestbook having 3 different components as follows:

guestbook-ui (frontend)
guestbook-api (backend)
guestbook-db (db)

Expected communication between these components are like ui communicates to api and api communicates to db.

But when all these components of guestbook application are running in a kubernetes cluster, technically ui component can communicate to db by default.

Use Network Policy to setup firewall for applications in cluster

As network policy is a feature that should be implemented by the network plugin, ensure that your network plugin supports NetworkPolicy resource. Creating NetworkPolicy resource without such plugin will have no effect. Calico, Cilium, Kube-router, Romana and Weave Net are some of the network plugins that support network policy

As a first step, disable the default behaviour of allowing all communications between all pods running in a kubernetes cluster by creating a network policy as follows in all namespaces.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  ingress: {}

Now, none of the pods can communicate to any other pods running in the kubernetes cluster. Now based on requirements, allow the ingress/egress traffic for pods in the cluster.

For the guestbook application, to allow ingress traffic to guestbook-api pods but only from guestbook-ui pods, create a network policy as follows.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ui-to-api
spec:
  podSelector: 
    matchLabels: 
      app: guestbook-api
      tier: backend   
  ingress:
  - from:
      podSelector:
        matchLabels:
          app: guestbook-ui
          tier: frontend

And to allow ingress traffic to guestbook-db pods but only from guestbook-api pods, create a network policy as follows.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
spec:
  podSelector: 
    matchLabels: 
      app: guestbook-db
      tier: db   
  ingress:
  - from:
      podSelector:
        matchLabels:
          app: guestbook-api
          tier: backend

Final setup with these network policies allows only the valid communications between the pods

More about network policy

NamespaceSelector

In some scenarios, we have to allow ingress from any pods in a namespace. For example, all applications pods should allow ingress from all pods in monitoring namespace. Namespace selector can be used to achieve this setup easily as follows.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-monitoring-namespace
spec:
  podSelector: {} ## applies to all pods in the namespace
  ingress:
  - from:
      namespaceSelector:
        matchLabels:
          team: monitoring ## labels of the monitoring namespace

IP Block

Not just labels, network policy also allows configuration using IP blocks. Following network policy allows ingress from 10.72.X.X but blocks traffic from 10.72.10.X

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-using-ip
spec:
  podSelector: {}
  ingress:
  - from:
      ipBlock:
        cidr: 10.72.0.0/16
        except: 10.72.10.0/8

Conclusion

It is good to have security at all levels in k8s cluster. Adding a firewall for the application pods in the kubernetes cluster increases level of security by reducing the attack surface and hence is highly recommended.

Save Your Kubernetes Cluster From DoS

Prabhu Jayakumar — Mon, 06 Jul 2020 17:43:06 +0000

Originally published at https://www.prabhujayakumar.dev

Kubernetes has become a market leader in container orchestration tools. According to the 2019 CNCF survey, "78% of respondents are using Kubernetes in production, a huge jump from 58% last year"

While containers have become the norm for deploying applications, kubernetes has become the norm for container management.

DoS in Kubernetes cluster

Consider a Kubernetes cluster with 3 worker nodes having a memory of 10GB each. Suppose the developers end up deploying some pods by mistake which consumes almost all the cpu and memory available in the node. Hence causing resource contention in the others application pods which serves traffic from consumers. This could also be caused when hackers inject high resource consuming pods intentionally into the system.

To understand this situation better, assume a node having 10GB of allocatable memory for containers and there are 5 application pods sharing the available memory. Each application pod on an average consumes 1GB of memory which makes the node to run at 50% to 60% memory utilization. Now deploying a pod that consumes 10GB of memory in the same node will cause noisy neighbor problem. This results in resource contention in other application pods which in turn either increase the latency of those services or in the worse case make the services go unavailable.

This might also happen when the developer commits a mistake in the code which leaks the memory and increase the memory usage steadily.

Effective Resource Management in Kubernetes

This is the reason why developers should be more conscious of the resources required for the services that they develop and configure the minimum required memory and cpu while deploying it in the Kubernetes cluster. Resource requests can be configured accordingly to schedule the pods in the node which guarantees the minimum resource.

For example, if a service requires 2 cpu and 3GB of memory at minimum, then in the pod specification configure the resource request as follows:

apiVersion: v1
kind: Pod
metadata:
  name: demo-service
  namespace: demo
spec:
  containers:
  - name: demo-container
    image: demo-service-image
    resources:
      requests:
        cpu: "2"
        memory: "3G"

It is also important that the developers should also configure the maximum resources that their service could consume. This helps in avoiding noisy neighbor problem in other service pods.

Let's say the same demo-service should not consume more than 4 cpu and 5 GB of memory. In that case add the resource limit to the above pod specification.

apiVersion: v1
kind: Pod
metadata:
  name: demo-service
  namespace: demo
spec:
  containers:
  - name: demo-container
    image: demo-service-image
    resources:
      requests:
        cpu: "2"
        memory: "3G"
      limits:
        cpu: "4"
        memory: "5G"

It is strongly recommended to configure the resource requests and limits in the pods deployed in the Kubernetes cluster as per the service needs.

Having said that, just to be on the safer side, we need some policy to be added in the Kubernetes cluster to ensure the pods do not over utilize the resource causing noisy neighbor problem and hence save the apps deployed in cluster from Denial of Service.

Save your Kubernetes cluster from DoS

One solution to the above problem is setting up some kind of a quota on the resources that will be configured as requests and limits in pods. This is to ensure that no one over configures the resource requests and limits and hence avoid the over utilization of resources in the node (beyond some threshold).

Kubernetes provides ResourceQuota to achieve this as it injects an admission controller in validating phase. Most of the kubernetes distribution supports ResourceQuota by default. Even otherwise, it can be enabled by adding ResourceQuota in --enable-admission-plugins flag to kube-apiserver.

kube-apiserver --enable-admission-plugins=ResourceQuota,..

After enabling the ResourceQuota admission plugin, add the quota for each namespace as per need. For example, to allocate only 10GB of memory that deploy their services in a namespace, create a resource quota as:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: demo-resourcequota
spec:
  hard:
    limits.memory: 10G

This resource quota ensures that the sum of memory in resource limits of all pods does not exceed 10GB of memory. Any create or modify request will be validated by the ResourceQuota admission controller in the validating phase. In case of non compliance of the existing ResourceQuota, the request will be rejected.

Now deploy the pod with memory limit of 2GB as follows:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
    resources:
      limits:
        memory: 2G

And by running kubectl describe resourcequota demo-resourcequota, see that 2GB memory limit is used out of configured quota.

When another pod having memory limit of 9GB is deployed, it will fail as it will exceed the quota of 10GB.

Cluster administrators should configure the values in ResourceQuota so that it does not overutilize the nodes.

In kubernetes cluster, mentioning resource requests and limits in the pod spec is not mandatory and if its not mentioned, it means no limit for resources to be consumed by pod. It can consume resources as much as available in the node. Hence it is not recommended to skip the resource requests and limits in pod spec.

Having ResourceQuota, any pod creation or updation will be rejected if the request or limit of resources that is required by resource quota is missing. For example, the above resource quota will not allow creation or updation of pods that does not have resource limit for memory configured.

Simply to ensure that all the pods have some default resource request or limit, we can create LimitRanger in the namespace.

LimitRanger is also an admission controller which will be involved in the mutating phase. If any of the pod spec does not have resource requests or limits, it will add the resource request and limit as mentioned in the LimitRanger of same namespace.

Create a LimitRanger in a namespace as follows:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-requests-limits
  namespace: demo
spec:
  limits:
  - default:
      memory: 1Gi
    defaultRequest:
      memory: 500Mi
    type: Container

And then apply the pod without resource requests and limits in the same namespace as follows:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-no-resources-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

Now, get the yaml of the above pod that is running in the K8s cluster. Resource request and limits that is mentioned in the LimitRanger default-requests-limits got added to .spec.resources.

...
spec:
  containers:
  - command:
    - sh
    - -c
    - echo The app is running! && sleep 3600
    image: busybox:1.28
    imagePullPolicy: IfNotPresent
    name: myapp-container
    resources:
      limits:
        memory: 1G
      requests:
        memory: 500Mi
...

Once again run kubectl describe resourcequota demo-resourcequota and see that 3GB memory limit is used out of configured quota.

Conclusion

Thus, configuring ResourceQuota and LimitRanger based on requirements, all the applications in the Kubernetes cluster can be saved from noisy neighbor problem and Denial of Service

Container From Scratch

Prabhu Jayakumar — Tue, 23 Jun 2020 19:47:20 +0000

Originally published at https://www.prabhujayakumar.dev

This is my first blog and I want to share my learnings about containers

Container adoption to run enterprise applications softwares in production has been increasing drastically nowadays. And most of the container deployments are using docker. Docker became the defacto technology for running containerised applications. But what is docker built on? How it is containerising the applications? I will try to answer these questions in this article.

Need for the container

Before getting into containers, let's clearly understand what a process is?

A process is created when a program is put into execution.

How can we execute a program? What are the requirements to execute a program?

A program needs libs, envs and resources to be able to put into execution and thereby create a process. For example, to execute a python script, we need python binary and some python modules, python environments and resources like cpu, memory, disk.

Let us consider a Web Application which consists of many microservices running on various languages and various versions. These microservices are nothing but a process at the backend.

Now that being said, imagine running these services in a physical machine… Thats not an easy task, ITS MERELY POSSIBLE

But Why? What is the challenge in running all these services of the same web application in a physical machine?

The problem is, suppose if the application has 2 Java services using different Java versions what will be the value for JAVA_HOME? There can't be 2 JAVA_HOME set in a single physical machine.

If we have N services to be run in a machine, we have to assign the port to each service such that no port collision occurs. Think about running 2 versions of postgres, pg10 on 5432 port and pg9.6 on 5433 port. So all the services should be aware of the port it is running on.

To sum up the above mentioned challenge in a single word, there is NO ISOLATION.

Since there are different versions of libs and hence different envs are required, each and every service needs to be isolated from one another.

In order to isolate the processes, people started using Virtual Machines. Let's see how Virtual Machines solve this problem.

Virtual machine is a separate guest OS created by an hypervisor running on a host machine. This separate guest OS, helps to achieve isolated libs, envs and resources.

But there are some challenges in using Virtual Machines in this case.

Think about running all the above services(processes) in a physical machine by creating required virtual machines(1 VM for 1 service).

You can evidently see the performance overhead in that physical machine which is running 10+ virtual machines. The reason being, the guest OS in each VM has its own memory management, network management and so on.

Not just that, proper resource utilization becomes a tough task while using Virtual Machines.

The main reason behind this overhead in Virtual machines approach is that the hypervisor virtualizes an hardware by creating guest OS for each VM.

All that we want is something which can isolate the libs, envs and resources without having to create separate OS. Why can't we use the resource management of the Host OS itself instead of virtualizing the hardwares which results in overhead?

Yes, we have something called “CONTAINER” which is capable of doing the same for us.

A Container is nothing but an isolated process implemented by using some linux technologies like namespace and cgroup.

Now let us dive deep into containers, how they provide isolation to a process, what are namespaces and cgroup and how are they used.

Container is a group of process running on a host machine isolated by namespaces.

It provides OS level virtualization. Hence we can call it as “Lightweight VM”

We now have a basic understanding of what containers are. Next step would be how do we create it? I know many of us use docker to create containers using docker run command. But, is that the only option? No, there are few other tools like lxc, podman, etc. How the containers are created using these tools? What is the backend process?

To understand that, let us see how to create a container from scratch using linux technologies like namespace and cgroup.

Simple Container in Golang

Let's create a simple go program which takes command as an argument and executes that command by creating a new process. Assume this go program as a docker. To execute a command in docker, we will use “docker run” command, similarly, here we use “go run container.go run”

package main

import (
    "fmt"
    "os"
    "os/exec"
)

// go run container.go run <cmd> <args>
// docker run <cmd> <args>
func main() {
    switch os.Args[1] {
    case "run":
        run()
    default:
        panic("invalid command!!")
    }
}

func run() {
    fmt.Printf("Running %v as PID %d \n", os.Args[2:], os.Getpid())

    cmd := exec.Command(os.Args[2], os.Args[3:]...)

    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.Run()
}

The above program executes the given arguments as a command. As you see below, “go run container run echo hello container” executes the command “echo hello container”. It executes the command by creating a new process which can be considered as a container.

Similarly let's create a process using /bin/bash and assign a dedicated hostname for that container. But changing the hostname inside the container, changed the hostname of the host machine as well.

This happens because there is no isolation of hostname for this container. So, to create isolation of hostname, we can assign new UTS namespace for the container. In golang, we can do this by using the syscall package.

func run() {
    fmt.Printf("Running %v as PID %d \n", os.Args[2:], os.Getpid())

    cmd := exec.Command(os.Args[2], os.Args[3:]...)

    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS,
    }

    cmd.Run()
}

Now, if you change the hostname of the container, it will not affect the hostname of the host machine because the container has its own UTS namespace.

But I want to assign the hostname automatically to the container from golang program using syscall syscall.Sethostname([]byte("container-demo")). But where can I place this line in the above program, the process is created on cmd.Run() and exited on the same line. Hence, let's fork a child process and set hostname inside that.

package main

import (
    "fmt"
    "os"
    "os/exec"
    "syscall"
)

// go run container.go run <cmd> <args>
// docker run <cmd> <args>
func main() {
    switch os.Args[1] {
    case "run":
        run()
    case "child":
        child()
    default:
        panic("invalid command!!")
    }
}

func run() {
    fmt.Printf("Running %v as PID %d \n", os.Args[2:], os.Getpid())

    args := append([]string{"child"}, os.Args[2:]...)
    cmd := exec.Command("/proc/self/exe", args...)

    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS,
    }

    cmd.Run()
}

func child() {
    fmt.Printf("Running %v as PID %d \n", os.Args[2:], os.Getpid())

    syscall.Sethostname([]byte("container-demo"))
    cmd := exec.Command(os.Args[2], os.Args[3:]...)

    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.Run()
}

Another catch here is, the container is able to see all the processes running in the host machine.

A container should be able to see only the processes running in that container, which can be achieved by using PID namespace.

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
    }

Even then, the container is able to see the processes of the host machine. The reason is /proc; the container is using the same root filesystem as that of the host machine. Hence a different root file system is to be used for container and mount /proc into it.
/containerfs directory contains files of an operating system which has few binaries like python and core linux utilities. So mounting this directory as a root file system for container makes it self sufficient for linux utilities and not depend on host machine for binaries. It also provides separate environment for this container.

func child() {
    fmt.Printf("Running %v as PID %d \n", os.Args[2:], os.Getpid())

    syscall.Sethostname([]byte("container-demo"))
    cmd := exec.Command(os.Args[2], os.Args[3:]...)

    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    syscall.Chroot("/containerfs")
    os.Chdir("/")
    syscall.Mount("proc", "proc", "proc", 0, "")

    cmd.Run()
}

Now we have achieved process id isolation using PID namespace. Similarly, we can provide isolation of network and users using network and user namespace.

Basically, the namespace is about what you can see in the container. It allows us to create restricted views of systems like the process tree, network interfaces, and mounts and users. These are the various namespaces that are available to provide isolation:

UTS(Unix Time Sharing) namespace: hostname and domain name
PID namespace: process number
Mounts namespace: mount points
IPC namespace: Inter Process Communication resources
Network namespace: network resources
User namespace: User and Group ID numbers

Now, let us see how resource management works in container. I have a python script hungry.py which consumes 10mb of memory for each 0.5 seconds. Running this python script using the container.go program, allows the container process to consume all of the memory available in the host machine.

To manage the resources like memory, cpu, disk blocks, we can use cgroups.
Every system has control groups in /sys/fs/cgroup/ and memory uses the default values from /sys/fs/cgroup/memory. You can see that the value in /sys/fs/cgroup/memory/memory.limit_in_bytes is very huge, allowing a process to consume memory as much as available in the host machine.

Here, in this go program, I am creating a control group prabhu and giving 100mb as maximum limit of memory and disabling the swap memory. And also assigning the process id of the container to the tasks of cgroup prabhu.

func child() {
    fmt.Printf("Running %v as PID %d \n", os.Args[2:], os.Getpid())

    syscall.Sethostname([]byte("container-demo"))
    controlgroup()

    cmd := exec.Command(os.Args[2], os.Args[3:]...)

    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    syscall.Chroot("/containerfs")
    os.Chdir("/")
    syscall.Mount("proc", "proc", "proc", 0, "")

    cmd.Run()
}

func controlgroup() {

    cgPath := filepath.Join("/sys/fs/cgroup/memory", "prabhu")
    os.Mkdir(cgPath, 0755)

    ioutil.WriteFile(filepath.Join(cgPath, "memory.limit_in_bytes"), []byte("100000000"), 0700)

    ioutil.WriteFile(filepath.Join(cgPath, "memory.swappiness"), []byte("0"), 0700)

    ioutil.WriteFile(filepath.Join(cgPath, "tasks"), []byte(strconv.Itoa(os.Getpid())), 0700)
}

Since, cgroup prabhu assigned to the container process allows only 100mb of memory, the python process gets killed once it tries to exceed that memory limit.

By using cgroups, system administrators gain fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources. The hardware resources can be appropriately divided up among tasks and users, increasing overall efficiency. And hence we can use the cgroups for resource management in container ecosystem.

Here, I have created a simple container with isolation of hostname, mount(/proc) and process tree using corresponding namespaces and also did memory management for the container using cgroups.

Conclusion

Containers are just isolated groups of processes running on a single host and that isolation leverages several underlying technologies built into the Linux kernel like namespaces, cgroups and chroots.

This is how Docker is containerising the applications with many other features like storing and transferring the files in terms of docker images.

And Docker is not the only technology that helps to run containers. There are other options like Podman from RedHat, LXC Linux Containers and rkt from CoreOS(project closed now).

Inspired from Building a container from scratch in Go - Liz Rice (Microscaling Systems)