Forem: Ham

Getting Cilium to work on Ubuntu Cloud Image

Ham — Fri, 03 Jan 2025 08:07:56 +0000

How to get Cilium working on Ubuntu Cloud Images Focal (20.04) or Jammy (22.04).

If you are running one of the Ubuntu Cloud Images and you are trying to install Cilium as your CNI network plugin on your Kubernetes cluster. You might have noticed that you get CrashLoopBackOff from your cilium pods when issuing a kubectl get pods -n kube-system

Upon further troubleshooting on the problem pod with a kubectl logs cilium-jgcdm -n kube-system you might see the below messages.

There is a very good chance you are missing some kernel configuration options. Have a look at https://docs.cilium.io/en/stable/operations/system_requirements/#linux-kernel for more information.

The base requirements are:

On a Ubuntu system running 20.04, you can check your kernel configurations with:
cat /lib/modules/$(uname -r)/build/.config - Note: if you are not running as root, prepend sudo before the command.

You can look for specific config options by prepending a pipe to grep:
cat /lib/modules/$(uname -r)/build/.config | grep -i config_bpf_jit.

The Fix

After checking, you will noticed that certain options are not enabled. To solve, let's replace our kernel variant with the 'generic' version. If not running as root, prepend sudo.

apt update && apt install linux-generic or sudo apt update && sudo apt install linux-generic -y if not running as root.

Now let's build initramfs for the kernel. You will want to use the generic kernel version that was installed, you should see the version from the install screen. You can also check under '/boot' directory with the command ls /boot:

Note: at the time of this writing, '5.4.0-204-generic' is the installed kernel version.
update-initramfs -u -k 5.4.0-204-generic or sudo update-initramfs -u -k 5.4.0-204-generic for non-root user.

Note: If you want to remove 'linux-kvm' variant, issue
sudo apt purge linux-kvm
If you plan to keep the 'linux-kvm' kernel around, don't issue the apt purge command and move any files ending in '-kvm' in the /boot directory to a different folder.
mkdir /boot/kvm-kernel
mv /boot/*-kvm /boot/kvm-kernel/

Now, let's update grub
update-grub or sudo update-grub

Reboot the system
reboot or sudo reboot

Repeat the above steps for each of the nodes you have in the cluster.

After installing 'generic' kernel and rebooting, check your cilium pods again and they all should be running.

kubectl get pods -n kube-system -owide - the '-owide' option will show which node the pods are running on.

Fixing Expired Certificates In Kubernetes

Ham — Fri, 03 Jan 2025 06:20:32 +0000

By default, when you setup your Kubernetes cluster, the certificates expires after one year.

If it's been a while since you started up your Kubernetes cluster and you try to issue kubectl commands and notice connection refused errors.

The commands used requires 'root' privileged, so if you are login as non-root, you will need to prepend sudo before the commands.

Troubleshooting further, you noticed that your kubelet service is failing to start (systemctl status kubelet).

Upon checking logs related with kubelet (journalctl | grep kubelet), you noticed the belong error messages.

To verify, issue kubeadm certs check-expiration.

Bingo!

Let's manually renew certificates to fix our issues.

If your cluster has more than one control-plane node, be sure to run the following commands on all control-plane nodes in the cluster.

First, let's backup our certificates just in case we need them.

cp -R /etc/kubernetes/pki /etc/kubernetes/pki.backup

Now let's renew our certificates with

kubeadm certs renew all

Verify new certificate installed correctly.

kubeadm certs check-expiration

Restart kubelet service

systemctl restart kubelet

Once the renew process is complete, we will need to restart all the control plane pods. One way to restart is moving it's manifest file out and wait for for about 20 seconds before moving the file back in to the 'pki' folder. This will will recreate the Pod to use the new certificates.

Copy the administrator certificates

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

This should get your cluster running again. You can verify with 'kubectl get pods' and not get an error.

Additional Things To Do If The Steps Above Doesn't Work

If for some reason the above doesn't work, you will need to do the following. Make a backup of /etc/kubernetes/pki/

cp -R /etc/kubernetes/pki /etc/kubernetes/pki-backup

Delete the following files from /etc/kubernetes/pki

rm apiserver.crt \ apiserver-etcd-client.key\ apiserver-kubelet-client.crt\ front-proxy-ca.crt\ front-proxy-client.crt\ front-proxy-client.key\ front-proxy-ca.key\ apiserver-kubelet-client.key\ apiserver.key\ apiserver-etcd-client.crt

Remove the following all .crt and .key files from /etc/kubernetes/pki/etcd

rm /etc/kubernetes/pki/etcd/*.crt
rm /etc/kubernetes/pki/etcd/*.key

Then create certs with

kubeadm init phase certs all --apiserver-advertise-address <IP> -- substitute your cluster IP.

Backup\Move conf files from /etc/kubernetes

mkdir conf-backup
mv admin.conf\ controller-manager.conf\ kubelet.conf\ scheduler.conf /etc/kubernetes/conf-backup/

Create new conf files

kubeadm init phase kubeconfig all

Finally, restart kubelet service or reboot system.

systemctl restart kubelet or reboot