<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Daniel Kim</title>
    <description>The latest articles on Forem by Daniel Kim (@lazyplatypus).</description>
    <link>https://forem.com/lazyplatypus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F371344%2F3fecc61d-140e-4cad-bfa3-9e27e11b3a62.jpeg</url>
      <title>Forem: Daniel Kim</title>
      <link>https://forem.com/lazyplatypus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lazyplatypus"/>
    <language>en</language>
    <item>
      <title>How to identify and troubleshoot common Kubernetes errors</title>
      <dc:creator>Daniel Kim</dc:creator>
      <pubDate>Tue, 25 Apr 2023 18:16:44 +0000</pubDate>
      <link>https://forem.com/newrelic/how-to-identify-and-troubleshoot-common-kubernetes-errors-27jk</link>
      <guid>https://forem.com/newrelic/how-to-identify-and-troubleshoot-common-kubernetes-errors-27jk</guid>
      <description>&lt;p&gt;&lt;em&gt;To read this full New Relic article, &lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;click here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Debugging Kubernetes can be stressful and time-consuming. This blog post walks through some common errors at the container, pod, and node level so you can apply these tips to help your clusters run smoothly. Whether you’re new to Kubernetes or have been using it for a while, you’ll learn a comprehensive set of tools and methods for debugging Kubernetes issues. &lt;/p&gt;

&lt;p&gt;This post is part three in a Monitoring Kubernetes series that explains everything you need to quickly set up your Kubernetes clusters and monitor them with New Relic.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://newrelic.com/blog/best-practices/monitoring-kubernetes-part-one?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;part one&lt;/a&gt;, you learned that Kubernetes automates the mundane operational tasks of managing containers.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-two?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;part two&lt;/a&gt;, you learned how to optimize Kubernetes for your application's needs&lt;/p&gt;

&lt;p&gt;We’ll build on the previous parts, diving into a wide range of topics, from understanding the basics of pods and containers to advanced troubleshooting techniques.&lt;/p&gt;

&lt;p&gt;Debugging Kubernetes can be stressful and time-consuming. This blog post walks through some common errors at the container, pod, and node level so you can apply these tips to help your clusters run smoothly. Whether you’re new to Kubernetes or have been using it for a while, you’ll learn a comprehensive set of tools and methods for debugging Kubernetes issues. &lt;/p&gt;

&lt;p&gt;This post is part three in a Monitoring Kubernetes series that explains everything you need to quickly set up your Kubernetes clusters and monitor them with New Relic.&lt;/p&gt;

&lt;p&gt;In part one, you learned that Kubernetes automates the mundane operational tasks of managing containers.&lt;br&gt;
In part two, you learned how to optimize Kubernetes for your application's needs&lt;br&gt;
We’ll build on the previous parts, diving into a wide range of topics, from understanding the basics of pods and containers to advanced troubleshooting techniques.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting every level of your Kubernetes deployment
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frl8l4r1vqnvao7jxc9n9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frl8l4r1vqnvao7jxc9n9.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting your nodes
&lt;/h2&gt;

&lt;p&gt;To begin this troubleshooting guide, let’s start at the top layer. If you want to know the health of your entire Kubernetes cluster, you’ll want to look at how the nodes in the cluster are working, at what capacity, the number of applications running on each node, and the resource utilization of the entire cluster. &lt;/p&gt;

&lt;p&gt;You can get all of these metrics and more by running this command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl top node


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The output of &lt;code&gt;kubectl top&lt;/code&gt; node includes metrics on the number of CPU cores and memory utilized, as well as overall CPU and memory utilization. This is a critical first step in troubleshooting as it allows you to see the current state of your infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting your pods
&lt;/h2&gt;

&lt;p&gt;In Kubernetes, a pod is the smallest and simplest unit in the object model. It represents a single process running in a cluster. While a pod is the smallest unit in the Kubernetes object model, it can hold one or more containers, and these containers share the same network namespace, meaning they can communicate with each other using &lt;code&gt;localhost&lt;/code&gt;. Pods also have shared storage volumes, so all containers in a pod can access the same data.&lt;/p&gt;

&lt;p&gt;Pods have various states, including running, pending, failed, succeeded, and unknown. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The running state means that the pod's containers are running and healthy. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The pending state means that the pod has been created but one or more of its containers are not yet running. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The failed state means that one or more of the pod's containers has terminated with an error. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The succeeded state means that all containers in the pod have terminated successfully. And the Unknown state means that the pod's state cannot be determined.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To get the state of your pods, you can use the kubectl get pods command in your terminal. The output displays the current state of all pods in the current namespace. By default, it shows the pod name, the current state (for example, running, pending, and so on), the number of containers, and the age of the pod.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl get pods


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You can use the &lt;code&gt;-o wide&lt;/code&gt; option to get even more information on each pod, such as the IP address and hostname.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl get pods -o wide


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Pods also generate events that can provide valuable information about their status. You can view events by using the kubectl describe command to get more detailed information about the pod, including its current state, its IP address, and the status of its containers.&lt;/p&gt;

&lt;p&gt;This is useful for understanding why a pod is in a particular state. For example, an event might indicate that a pod was evicted due to a lack of resources, or that a container failed to start due to an error in the application code.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl describe pod &amp;lt;pod-name&amp;gt;



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You can also filter the pods based on their state, for example, to check all the running pods, you can use this command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl get pods --field-selector=status.phase=Running



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Additionally, you can use &lt;code&gt;kubectl top pod&lt;/code&gt; command to get the resource usage statistics for the pods.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl top pod



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Are you currently running a Kubernetes cluster? If so, try using some of these commands to see what output you get. &lt;/p&gt;

&lt;h2&gt;
  
  
  Common pod errors
&lt;/h2&gt;

&lt;p&gt;Now that you know the the current state of your pods, this section introduces common issues with Kubernetes resources (Pods, Services, or StatefulSets). We’ll cover how to make sense of, troubleshoot, and resolve each of these common issues.&lt;/p&gt;

&lt;p&gt;Although there are more issues you can encounter when working with Kubernetes, the list in this section includes the most common instances you’ll encounter.&lt;/p&gt;

&lt;p&gt;Use this list to jump directly to the section about these common issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three#toc-crashloopbackoff-error?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;CrashLoopBackOff error&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three#toc-imagepullbackoff-errimagepull-error?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;ImagePullBackOff/ErrImagePull error&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three#toc-oomkilled-error?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;OOMKilled error&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three#toc-createcontainerconfigerror-and-createcontainererror?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;CreateContainerConfigError and CreateContainerError&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three#toc-pods-are-stuck-in-pending-or-waiting?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;Pods are stuck in pending or waiting&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CrashLoopBackOff error
&lt;/h2&gt;

&lt;p&gt;One of the most common errors you’ll encounter while working with Kubernetes is the &lt;code&gt;CrashLoopBackOff&lt;/code&gt; error. This error occurs in Kubernetes environments typically when a container in a pod crashes and the pod's restart policy is set to &lt;code&gt;Always&lt;/code&gt;. In this scenario, Kubernetes will keep trying to restart the container, but if it continues to crash, the pod will enter a &lt;code&gt;CrashLoopBackOff&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquhumkkd97uu7hfj01dj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquhumkkd97uu7hfj01dj.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to identify a CrashLoopBackOff error
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Run kubectl get pods.&lt;/li&gt;
&lt;li&gt;Check the output to see if… 
The pod’s status has a &lt;code&gt;CrashLoopBackOff&lt;/code&gt; error.
There is more than one restart.
Pods aren't identified as ready.
This example shows 0/1 as ready, 2 restarts, and has a status of &lt;code&gt;CrashLoopBackOff&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ kubectl get pods 
NAME                 READY   STATUS               RESTARTS   AGE
example-pod           0/1    CrashLoopBackOff     2          4m26s


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  What causes a CrashLoopBackOff error?
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;CrashLoopBackOff&lt;/code&gt; status in the &lt;code&gt;STATUS&lt;/code&gt; column isn't the root cause of the problem—it simply indicates that the pod is experiencing a crash loop. To effectively troubleshoot and fix the issue, you'll need to identify and address the underlying error that’s causing the containers to crash.&lt;/p&gt;

&lt;p&gt;There are several possible causes for the &lt;code&gt;CrashLoopBackOff&lt;/code&gt; error:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The container might be running out of memory or CPU resources. You can verify this by checking the resource usage of the container and pod using &lt;code&gt;kubectl&lt;/code&gt; commands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The container might be unable to start due to an issue with the image or configuration. For example, the image might be missing a required dependency, or the container might not have the necessary permissions to access certain resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The container might be crashing due to a bug in the application code. In this case, the logs of the container might provide more information about the cause of the crash.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The container might be crashing due to a network issue, for example, the container might be unable to connect to a required service.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Troubleshooting a CrashLoopBackOff error
&lt;/h2&gt;

&lt;p&gt;Once you identify the particular pod that is showing the &lt;code&gt;CrashLoopBackOff&lt;/code&gt;error, follow these steps to identify the root cause.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run this command:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl describe pod [name]



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;If the pod is failing due to a liveness probe failure or a back-off restarting failed container error, the command will provide valuable insights.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

From       Message
-----      -----
kubelet    Liveness probe failed: cat: can’t open ‘/tmp/healthy’: No such file or directory
kubelet    Back-off restarting failed container


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;If you get &lt;code&gt;back-off restarting failed container&lt;/code&gt;, and &lt;code&gt;Liveness probe failed&lt;/code&gt; error messages, it’s likely that the pod is experiencing a temporary resource overload caused by a spike in activity.
To resolve this issue, you can adjust the &lt;code&gt;periodSeconds&lt;/code&gt; or &lt;code&gt;timeoutSeconds&lt;/code&gt; parameters to give the application more time to respond. This allows the pod to recover.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  ImagePullBackOff/ErrImagePull error
&lt;/h2&gt;

&lt;p&gt;In a Kubernetes cluster, there’s an agent on each node called the kubelet that’s responsible for running containers on that node. If a container image doesn’t already exist on a node, the kubelet will instruct the container runtime to pull it. &lt;/p&gt;

&lt;p&gt;When a Kubernetes pod encounters an issue with pulling an image, it will initially generate an &lt;code&gt;ErrImagePull error&lt;/code&gt;. The system will then retry a few times to download the image before ultimately "pulling back" and scheduling another attempt. With each unsuccessful attempt, the delay between retries increases exponentially, with a maximum delay of five minutes.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ImagePullBackOff&lt;/code&gt; and &lt;code&gt;ErrImagePull&lt;/code&gt; errors in Kubernetes environments typically occur when the Kubernetes node is unable to pull the specified image from the container registry. This can happen for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The image might not exist in the specified container registry, or the image name might be misspelled in the pod definition.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The image might be private, and the pod doesn’t have the necessary credentials to pull it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The pod's network might not have access to the container registry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The pod might not have enough permissions to pull the image from the container registry.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Identify an ImagePullBackOff error
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Run &lt;code&gt;kubectl get pods&lt;/code&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check the output to see if the pod’s status has an &lt;code&gt;ImagePullBackOff&lt;/code&gt; error:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ kubectl get pods 
NAME                 READY   STATUS             RESTARTS   AGE
example-pod           0/1    ImagePullBackOff   0          4m26s


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  How to Troubleshoot an ImagePullBackOff error
&lt;/h2&gt;

&lt;p&gt;To troubleshoot the &lt;code&gt;ImagePullBackOff&lt;/code&gt; error, first run &lt;code&gt;kubectl describe&lt;/code&gt;. Review the specific error under events. Take these recommended actions for each of the errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Repository does not exist or no pull access&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This means that the repository specified in the pod doesn’t exist in the Docker registry the cluster is using.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By default, images are pulled from Docker Hub, but your cluster might be using one or more private registries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The error might occur because the pod does not specify the correct repository name, or doesn’t specify the correct fully qualified image name (for example username/imagename).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Another reason for this error might be DockerHub or another container registry’s rate limits prevent the kubelet from fetching the image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Manifest not found&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This means that the specific version of the requested image was not found. If you specified a tag, this means the tag was incorrect.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To resolve it, double-check that the tag in the pod specification is correct and that it exists in the repository. Keep in mind that tags in the repo might have changed. If you didn’t specify a tag, check if the image has a &lt;strong&gt;latest&lt;/strong&gt; tag. If you don’t specify a valid tag, the images that don't have a latest tag won’t be returned.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Authorization failed&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In this case, the issue is that the credentials you provided can’t access the container registry or the specific image you requested.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To resolve this, create a Kubernetes Secret with the appropriate credentials, and reference it in the pod specification. If you already have a Secret with credentials, ensure those credentials have permission to access the required image or grant access in the container repository.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  OOMKilled error
&lt;/h2&gt;

&lt;p&gt;In Kubernetes, kubelets running on your virtual machines (VMs) have something called a Memory Manager that tracks memory usage for various processes, including out-of-memory (OOM) issues. When the VM comes close to running out of memory, the Memory Manager kills the least amount of pods to free up enough memory to prevent the entire system from crashing. &lt;/p&gt;

&lt;p&gt;There are two different scenarios that cause a &lt;code&gt;OOMKilled Error&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. The pod was terminated because a container limit was reached.
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;Container Limit&lt;/code&gt; Reached error is specific to a single pod. When Kubernetes detects that a pod is using more memory than its set limit, it will terminate the pod with the error message &lt;code&gt;OOMKilled - Container Limit Reached&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv32h27nx1efpa4rpqds8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv32h27nx1efpa4rpqds8.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To troubleshoot this error, it's important to check the application logs to understand why the pod was using more memory than its set limit. This could be due to a spike in traffic, a long-running Kubernetes job, or a memory leak in the application.&lt;/p&gt;

&lt;p&gt;Investigate! If you find that the application is running as expected and simply requires more memory to operate, consider increasing the values for the request and limit for that pod. Monitoring the resource usage and performance of the pod, and of the cluster, can also help you identify the problem, and find a way to prevent it in the future.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. The pod was terminated because the node was “overcommitted”
&lt;/h2&gt;

&lt;p&gt;The pods were scheduled to the node that, put together, request more memory than is available on the node. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;OOMKilled: Limit Overcommit&lt;/code&gt; error can occur when the aggregate memory requirements of all pods on a node exceeds the available memory on that node. You might recall seeing this issue in &lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-two?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;Part 2 of this series&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbjpu1lxo6g94f7pgxy8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbjpu1lxo6g94f7pgxy8.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, let’s imagine you have a node with 5 GB of memory, and you have five pods running on that node, each with a memory limit of 1 GB. The total memory usage would be 5 GB, which is within the limit. However, if one of those pods is configured with a higher limit of, say 1.5 GB, it can cause the total memory usage to exceed the available memory, leading to OOMKilled error. This can happen when the pod experiences a spike in traffic or an unexpected memory leak, causing Kubernetes to terminate pods to reclaim memory.&lt;/p&gt;

&lt;p&gt;It's important to check the host itself and ensure that there are no other processes running outside of Kubernetes that could be consuming memory resources, leaving less for the pods. It's also important to monitor memory usage and adjust the limits of pods accordingly.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to identify an OOMKilled error
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Run kubectl get pods&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check the output to see if the pod’s status has an &lt;code&gt;OOMKilled&lt;/code&gt; error.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ kubectl get pods 
NAME                 READY   STATUS           RESTARTS   AGE
example-pod           0/1    OOMKilled        0          4m26s



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  How to troubleshoot an OOMKilled error
&lt;/h2&gt;

&lt;p&gt;How you respond to an &lt;code&gt;OOMKilled&lt;/code&gt; error depends on why the pod was terminated. It might have been terminated because of a container limit or an overcommitted node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the pod was terminated because of a container limit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To resolve the &lt;code&gt;OOMKilled: Container Limit Reached&lt;/code&gt; error, it's important to first determine if the application truly requires more memory. If the application is facing increased load or use, it might require more memory than was originally allocated. In this scenario, you can increase the memory limit for the container in the pod specification to address the error. You can check if this is the case by getting the logs for the particular pod: run kubectl logs pod-name and determine if there is a noticeable spike in requests.&lt;/p&gt;

&lt;p&gt;But if the memory usage unexpectedly increases and doesn’t appear to be related to application demand, it could indicate that the application is experiencing a memory leak. In this case, you need to debug the application and identify the source of the memory leak. Simply, increasing the memory limit in the application without addressing the underlying issue might result in a situation that just consumes more resources without solving the problem. It’s important to address the root cause of the leak to prevent it from happening again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the pod was terminated because of an overcommitted node&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pods are scheduled on a node based on their memory request value compared to the available memory on the node. But this can result in overcommitment of memory. To troubleshoot and resolve &lt;code&gt;OOMKilled&lt;/code&gt; errors caused by overcommitment, it's important to understand why Kubernetes terminated the pod. Then you can adjust the memory limits and requests to ensure that the node is not overcommitted.&lt;/p&gt;

&lt;p&gt;To prevent these issues from happening, it is important to monitor your environment constantly, understand the memory behavior of pods and containers, and regularly check your settings. This approach can help you identify potential issues early on and take appropriate action to prevent them from escalating. Having a good understanding of the memory behavior of your pods and containers—and knowing the settings you have configured—allows you to easily diagnose and resolve Kubernetes memory issues.&lt;/p&gt;
&lt;h2&gt;
  
  
  CreateContainerConfigError and CreateContainerError
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;CreateContainerConfigError&lt;/code&gt; and &lt;code&gt;CreateContainerError&lt;/code&gt; in Kubernetes typically occur when there’s a problem creating the container configuration for a pod. Some common causes  include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An invalid image name or tag: Make sure that the image name and tag specified in the pod definition are valid and can be pulled from the specified container registry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Missing image pull secrets: If the image is in a private registry, make sure that the necessary image pull secrets are defined in the pod definition.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Insufficient permissions: Ensure that the service account used by the pod has the necessary permissions to pull the specified image from the registry.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How to identify a CreateContainerConfigError or CreateContainerError
&lt;/h2&gt;

&lt;p&gt;Run &lt;code&gt;kubectl get pods&lt;/code&gt; &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check the output to see if the pod’s status is CreateContainerConfigError:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ kubectl get pods 
NAME                 READY   STATUS                       RESTARTS   AGE
example-pod           0/1    CreateContainerConfigError   0          4m26s


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  How to Troubleshoot a CreateContainerConfigError or CreateContainerError
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Check the pod definition for any errors or typos in the image name or tag. If the image does not exist in the specified container registry, it’s going to throw this error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make sure that the specified image pull secrets are valid and exist in the namespace. Also make sure that the service has the necessary permissions to pull the specified image from the registry. You can run the &lt;code&gt;kubectl auth can-i&lt;/code&gt; command to check if a service account has the necessary permissions to perform a specific action. For example, use the this command to check if a service account named &lt;code&gt;my-service-account&lt;/code&gt; can pull the NGINX image from the NGINX repository:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl auth can-i pull nginx --as=system:serviceaccount:&amp;lt;namespace&amp;gt;:my-service-account


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Replace &lt;code&gt;&amp;lt;namespace&amp;gt;&lt;/code&gt; with the namespace where the service account is located. If the command returns &lt;code&gt;yes&lt;/code&gt;, the service account has the necessary permissions to pull the NGINX image. If it returns &lt;code&gt;no&lt;/code&gt;, the service account doesn't have the necessary permissions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Check the Kubernetes logs for more information about the error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the command &lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt;&lt;/code&gt; to get more details about the pod and check for any error messages.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Pods are stuck in pending or waiting
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-two?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;part 2&lt;/a&gt;, we discussed how to “rightsize” your workloads with requests and limits. But what happens if you don’t rightsize correctly? Your pods’ status might be stuck in pending or waiting—because they aren’t able to be scheduled onto a node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pods are stuck in pending
&lt;/h2&gt;

&lt;p&gt;Look at the &lt;code&gt;describe pod output&lt;/code&gt;, in the Events section. Look for messages that indicate why the pod couldn’t be scheduled.&lt;/p&gt;

&lt;p&gt;Examples include: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The cluster might have insufficient CPU or memory resources. This means you’ll need to delete some pods, add resources on your nodes, or add more nodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The pod might be difficult to schedule due to specific resources requirements. See if you can release some of the requirements to make the pod eligible for scheduling on additional nodes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pods are stuck in waiting
&lt;/h2&gt;

&lt;p&gt;If a pod’s status is waiting, this means it is scheduled on a node, but unable to run. Run &lt;code&gt;kubectl describe &amp;lt;podname&amp;gt;&lt;/code&gt; and in the Events section look for reasons the pod can’t run.&lt;/p&gt;

&lt;p&gt;Most often, pods are stuck in waiting status because of an error when fetching the image. Check for these issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Ensure the image name in the pod manifest is correct.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure the image is really available in the repository.&lt;br&gt;
Test manually to see if you can retrieve the image. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run a docker pull command on the local machine to ensure that you have the appropriate permissions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Kubernetes troubleshooting with New Relic
&lt;/h2&gt;

&lt;p&gt;The troubleshooting process in Kubernetes is complex. Without the right tools, debugging can be stressful, ineffective, and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong—simply because it can. &lt;/p&gt;

&lt;p&gt;You can use New Relic as a single source of truth for all of your observability data. It collects all of your metrics, logs, and traces from every part of your Kubernetes stack, from the applications themselves, to the Kubernetes components, all the way down to the infrastructure metrics of your VMs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;To read this full New Relic article, &lt;a href="https://newrelic.com/blog/how-to-relic/monitoring-kubernetes-part-three?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;click here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Not an existing New Relic user? &lt;a href="https://newrelic.com/signup?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=amer-fy-24-q1-devto-post" rel="noopener noreferrer"&gt;Sign up for a free account&lt;/a&gt; to get started! 👨‍💻&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to optimize Kubernetes resource configurations for cost and performance</title>
      <dc:creator>Daniel Kim</dc:creator>
      <pubDate>Tue, 17 Jan 2023 19:47:44 +0000</pubDate>
      <link>https://forem.com/newrelic/how-to-optimize-kubernetes-resource-configurations-for-cost-and-performance-ea4</link>
      <guid>https://forem.com/newrelic/how-to-optimize-kubernetes-resource-configurations-for-cost-and-performance-ea4</guid>
      <description>&lt;p&gt;Kubernetes, often abbreviated as K8s, automates the mundane operational tasks of managing the containers that make up the necessary software to run an application. With built-in commands for deploying applications, Kubernetes rolls out changes to your applications, scales your applications up and down to fit changing needs, monitors your applications, and more. Kubernetes orchestrates your containers wherever they run, which makes it easier to deploy across multiple cloud environments and migrate between infrastructure platforms. In short, &lt;strong&gt;Kubernetes makes it easier to manage applications.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A properly configured Kubernetes system saves time and money. But configuring your Kubernetes clusters can be difficult. Improper configuration can lead to problems with application availability, performance, resilience, or overspending. Here in part two of this Kubernetes guide, you'll get help balancing appropriate parameter configuration for any cluster you are working with now or in the future. You'll learn about requests and limits, measuring CPU utilization, and how to optimize Kubernetes resource allocation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7o18pmgrts5eu1zie0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7o18pmgrts5eu1zie0z.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Rightsizing your workloads with requests and limits
&lt;/h2&gt;

&lt;p&gt;In an ideal world, your Kubernetes pods would use exactly the amount of resources you requested. But, in the real world, resource usage isn’t predictable. If you have a large application on a node with limited resources, the node might run out of CPU or memory and things can break. And if you’ve been working as an engineer long enough, you know that things breaking in your architecture means frantic messages in the middle of the night and lost revenue for your organization. &lt;/p&gt;

&lt;p&gt;On the flip side, If you allocate too many resources for CPU and memory, then there is waste since those resources remain reserved for that node. When utilization is lower than the requested value, it creates slack cost. When you design and configure a tech stack, the goal is to use the lowest cost resources that still meet the technical specifications of a specific workload.&lt;/p&gt;

&lt;p&gt;To &lt;em&gt;rightsize&lt;/em&gt; workloads by optimizing the use of resources, it is important to know the historical usage and workload patterns of your system. With this knowledge, you can make informed cost savings decisions. For instance, let’s say your average CPU utilization is only 40% and on your highest traffic day in the last two years the CPU utilization spiked up to only 60%. Your initially provisioned level of compute is too high! A simple change in configuration can result in large savings in cost by reducing underutilized compute resources.&lt;/p&gt;

&lt;p&gt;Applying accurate resource requests and limits to deployments can help prevent overprovisioning of extra resources which leads to underutilization and higher cluster costs, or underprovisioning of fewer resources than required, which may lead to various errors such as out of memory (OOM) events.&lt;/p&gt;

&lt;p&gt;Kubernetes uses requests and limits to control resources like CPU and memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foddjlkimctnk6azyrrcx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foddjlkimctnk6azyrrcx.png" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Requests&lt;/em&gt; are resources a container is guaranteed to get. If a container requests a resource, the Kubernetes scheduler (kube-scheduler) will ensure the container is placed on a node that can accommodate it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Limits&lt;/em&gt; make sure a container never uses a value that is higher than its quota.&lt;/p&gt;

&lt;p&gt;You can set requests and limits per container. Each container in the pod can have its own limit and request, but you can also set the values for limits and requests at the pod or namespace level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory allocation and utilization
&lt;/h2&gt;

&lt;p&gt;Memory resources are defined in bytes. You can express memory as a plain integer or a fixed-point integer with one of these suffixes: &lt;code&gt;E, P, T, G, M, K, Ei, Pi, Ti, Gi, Mi, Ki&lt;/code&gt;. For example, the following represent approximately the same value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;128974848, 129e6, 129M, 123Mi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Memory is not a compressible resource and there is no way to throttle memory. If a container goes past its memory limit, it will be killed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory limits and memory utilization per pod
&lt;/h2&gt;

&lt;p&gt;When specified, a memory limit represents the maximum amount of memory a node will allocate to a container. Here are &lt;a href="https://docs.newrelic.com/docs/query-your-data/nrql-new-relic-query-language/get-started/introduction-nrql-new-relics-query-language/" rel="noopener noreferrer"&gt;NRQL&lt;/a&gt; examples of querying memory limits.&lt;/p&gt;

&lt;p&gt;NRQL that targets a New Relic metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT latest(cpuUsedCores/cpuLimitCores) FROM K8sContainerSample FACET podName TIMESERIES SINCE 1 day ago 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;NRQL that targets a Prometheus metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT rate(sum(container_cpu_usage_seconds_total), 1 SECONDS) FROM Metric SINCE 1 MINUTES AGO UNTIL NOW FACET pod TIMESERIES LIMIT 20

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a limit is not provided in the manifest and there is not an overall configured default, a pod could use the entirety of a node’s available memory. A node might be oversubscribed—the sum of the limits for all pods running on a node might be greater than that node’s total allocatable memory. This requires that the pods’ specific requests are below the limit. The node’s kubelet will reduce resource allocation to individual pods if they use more than they request so long as that allocation at least meets their requests.&lt;/p&gt;

&lt;p&gt;Tracking pods’ actual memory usage in relation to their specified limits is particularly important because memory is a non-compressible resource. In other words, if a pod uses more memory than its defined limit, the kubelet can’t throttle its memory allocation, so it terminates the processes running on that pod instead. If this happens, the pod will show a status of OOMKilled.&lt;/p&gt;

&lt;p&gt;Comparing your pods’ memory usage to their configured limits will alert you to whether they are at risk of being killed because they are out of memory (OOM), as well as whether their limits make sense. If a pod’s limit is too close to its standard memory usage, the pod may get terminated due to an unexpected spike. On the other hand, you may not want to set a pod’s limit significantly higher than its typical usage because that can lead to poor scheduling decisions. &lt;/p&gt;

&lt;p&gt;For example, a pod with a memory request of 1gibibyte (GiB) and a limit of 4GiB can be scheduled on a node with 2GiB of allocatable memory (more than sufficient to meet its request). But if the pod suddenly needs 3GiB of memory, it will be killed even though it’s well below its memory limit. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc283u226ms6tudmx1u9k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc283u226ms6tudmx1u9k.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory requests and allocatable memory per node
&lt;/h2&gt;

&lt;p&gt;Memory requests are the minimum amounts of memory a node’s kubelet will assign to a container. &lt;/p&gt;

&lt;p&gt;If a request is not provided, it will default to whatever the value is for the container’s limit (which, if also not set, could be all memory on the node). Allocatable memory reflects the amount of memory on a node that is available for pods. Specifically, it takes the overall capacity and subtracts memory requirements for OS and Kubernetes system processes to ensure they won’t compete with user pods for resources.&lt;/p&gt;

&lt;p&gt;Although node memory capacity is a static value, its allocatable memory (the amount of compute resources that are available for pods) is not. Maintaining an awareness of the sum of pod memory requests on each node, versus each node’s allocatable memory, is important for capacity planning. These metrics will inform you if your nodes have enough capacity to meet the memory requirements of all current pods and if the kube-scheduler is able to assign new pods to nodes. To learn more about the difference between node allocatable memory and node capacity, see &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/" rel="noopener noreferrer"&gt;Reserve Compute Resources for System Daemons&lt;/a&gt; in the Kubernetes documentation. &lt;/p&gt;

&lt;p&gt;The kube-scheduler uses several levels of criteria to determine if it can place a pod on a specific node. One of the initial tests is whether a node has enough allocatable memory to satisfy the sum of the requests of all the pods running on that node, plus the new pod. To learn more about the scheduling process criteria, see the &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/#kube-scheduler-implementation" rel="noopener noreferrer"&gt;node selection section&lt;/a&gt; of the Kubernetes scheduler documentation.&lt;/p&gt;

&lt;p&gt;Comparing memory requests to capacity metrics can also help you troubleshoot problems when launching and running the number of pods that you want to run across your cluster. If you notice that your cluster’s count of current pods is significantly less than the number of pods you want, these metrics might show you that your nodes don’t have the resource capacity to host new pods. One straightforward remedy for this issue is to provision more nodes for your cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring CPU utilization
&lt;/h2&gt;

&lt;p&gt;One CPU core is equivalent to 1000m (one thousand millicpu or one thousand millicores). If your container needs one full core to run, specify a value of 1000m or just 1. If your container needs 1⁄4 of a core, specify a value of 250m.&lt;/p&gt;

&lt;p&gt;CPU is a compressible resource. If your container starts hitting your CPU limits, it will be throttled. CPU will be restricted and performance will degrade. But it won’t be killed.&lt;/p&gt;

&lt;p&gt;To get important insight into cluster performance, you’ll need to track two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track the amount of CPU your pods are using compared to their configured requests and limits.&lt;/li&gt;
&lt;li&gt;Track the CPU utilization at the node level. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Much like a pod exceeding its CPU limits, a lack of available CPU at the node level can lead to the node throttling the amount of CPU allocated to each pod.&lt;/p&gt;

&lt;p&gt;Measuring actual utilization compared to requests and limits per pod will help determine if these are configured appropriately and your pods are requesting enough CPU to run properly. Alternatively, consistently higher than expected CPU usage might point to problems with the pod that need to be identified and addressed.&lt;/p&gt;

&lt;p&gt;Here's a NRQL query that shows the CPU requests and allocatable CPU per node. Try it on your cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT
filter(sum(`node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate`), where true) /
filter(sum(kube_pod_container_resource_requests), WHERE (resource = 'cpu') and job = 'kube-state-metrics') * 100 as 'CPU Request Commitment'
FROM Metric FACET node since 1 minute ago
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s a NRQL query that shows the CPU requests and allocatable CPU per pod. Try it on your cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT sum(`node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate`) / filter(sum(kube_pod_container_resource_limits), WHERE (resource = 'cpu') and job = 'kube-state-metrics') * 100 as 'CPU Limit Commitment' FROM Metric FACET pod since 1 minute ago
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to optimize Kubernetes resource allocation
&lt;/h2&gt;

&lt;p&gt;To optimize your resource allocation, you’ll need to define pod specs, resource quotas, and limit range.&lt;/p&gt;

&lt;h3&gt;
  
  
  Define pod specs
&lt;/h3&gt;

&lt;p&gt;Here is a typical pod spec for resources: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxc7z03rjxjnh1r1sl04y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxc7z03rjxjnh1r1sl04y.png" alt="Image description" width="559" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each container in the pod can set its own requests and limits which are all additive. So in this example, the pod has a total request of 64 mebibyte (MiB) of memory, and a total limit of 128 MiB. Keep in mind that if you put a request for CPU above the core count of your biggest node, your pod will never be scheduled. Unless your application is specifically architected to take advantage of multiple cores, it is generally good to keep your CPU request below 1 and leverage replicas to scale horizontally. &lt;/p&gt;

&lt;h2&gt;
  
  
  Define resource quotas
&lt;/h2&gt;

&lt;p&gt;Without guardrails, developers can allocate any amount of resources to their applications running on Kubernetes. When several teams share a cluster with a fixed number of nodes, this becomes a problem. Kubernetes allows administrators to set hard limits for resource usage in namespaces with &lt;code&gt;ResourceQuotas&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnw8tpyxl7up23pnd1h0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnw8tpyxl7up23pnd1h0.png" alt="Image description" width="292" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you apply this file to a namespace, you’ll set the following requirements for all the containers of the namespace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The sum of all the CPU requests can’t be higher than 0.5 cores.&lt;/li&gt;
&lt;li&gt;The sum of all the CPU limits can’t be higher than 0.8 cores.&lt;/li&gt;
&lt;li&gt;The sum of all the memory requests can’t be higher than 200 MiB.&lt;/li&gt;
&lt;li&gt;The sum of all the memory limits can’t be higher than 500 MiB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means you could have 50 containers with 4 MiB requests, five containers with 40 MiB requests, or even one container with 200 MiB requests. &lt;/p&gt;

&lt;h3&gt;
  
  
  Define limit range
&lt;/h3&gt;

&lt;p&gt;You can also create a LimitRange for a namespace. Instead of looking at the namespace as a whole, a LimitRange applies to individual containers. &lt;/p&gt;

&lt;p&gt;Here’s an example of what a LimitRange might look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpomdqgl6xx0tgeqx462p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpomdqgl6xx0tgeqx462p.png" alt="Image description" width="474" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The default section sets the default limits for a container in a pod. If you use the values in the &lt;code&gt;LimitRange&lt;/code&gt;, any containers that do set ranges themselves will get assigned the default values.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;defaultRequest&lt;/code&gt; section sets the default requests for a container in a pod. If you use the values in the LimitRange, any containers that do set ranges themselves will get assigned the default values.&lt;/p&gt;

&lt;p&gt;The max section will set up the maximum limits that a container in a pod can set. The default section and limits set on a container cannot be higher than this value. One thing to note, if the max value is set and the default is not, any containers that do not set these values themselves will get assigned the max value as the limit. &lt;/p&gt;

&lt;p&gt;The min section will set up the minimum requests that a container in a pod can set. The &lt;code&gt;defaultRequest&lt;/code&gt; section and requests set on a container cannot be lower than this value. One thing to note: if this value is set and the &lt;code&gt;defaultRequest&lt;/code&gt; is not, the min value becomes the &lt;code&gt;defaultRequest&lt;/code&gt; value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Now that you’ve learned the basics of Kubernetes and why it needs monitoring in part one and had a deep dive into Kubernetes architecture here in part two, you might want to try out a few things on your own.&lt;/p&gt;

&lt;p&gt;A growing number of tools and frameworks are dedicated to helping visualize Kubernetes infrastructure efficiency. Here are two examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubecost provides real-time cost visibility and insights for teams using Kubernetes, helping you continuously reduce your cloud costs&lt;/li&gt;
&lt;li&gt;Open Cost is a vendor-neutral open source project for measuring and allocating infrastructure and container costs in real time. (New Relic is a founding contributor.)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>beginners</category>
      <category>devops</category>
    </item>
    <item>
      <title>What is Kubernetes and how should you monitor it?</title>
      <dc:creator>Daniel Kim</dc:creator>
      <pubDate>Tue, 20 Dec 2022 19:13:30 +0000</pubDate>
      <link>https://forem.com/newrelic/what-is-kubernetes-and-how-should-you-monitor-it-bld</link>
      <guid>https://forem.com/newrelic/what-is-kubernetes-and-how-should-you-monitor-it-bld</guid>
      <description>&lt;p&gt;In this blog post, you'll learn what Kubernetes is and what components you’ll need for complete observability. It's the first part in a Monitoring Kubernetes series.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Kubernetes?
&lt;/h2&gt;

&lt;p&gt;Kubernetes, often abbreviated as “K8s,” is an open source platform that has established itself as the de facto standard for container orchestration. Usage of Kubernetes has risen globally, particularly in large organizations, with the CNCF in 2021 reporting that there are 5.6 million developers using Kubernetes worldwide, representing 31% of all backend developers.&lt;/p&gt;

&lt;p&gt;As a container orchestration system, it automatically schedules, scales, and maintains the containers that make up the infrastructure of any modern application. The project is the flagship project of the Cloud Native Computing Foundation (CNCF). It’s backed by key players like Google, AWS, Microsoft, IBM, Intel, Cisco, and Red Hat.&lt;/p&gt;

&lt;h2&gt;
  
  
  What can Kubernetes do?
&lt;/h2&gt;

&lt;p&gt;Kubernetes automates the mundane operational tasks of managing the containers that make up the necessary software to run an application. With built-in commands for deploying applications, Kubernetes rolls out changes to your applications, scales your applications up, and down to fit changing needs, monitors your applications, and more. Kubernetes orchestrates your containers wherever they run, which facilitates multi-cloud deployments and migrations between infrastructure platforms. In short, Kubernetes makes it easier to manage applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated health checks
&lt;/h3&gt;

&lt;p&gt;Kubernetes continuously runs health checks against your services. For cloud-native apps, this means consistent container management. Using automated health checks, Kubernetes restarts containers that fail or have stalled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated operations
&lt;/h3&gt;

&lt;p&gt;You can automate mundane sysadmin tasks using Kubernetes since it comes with built-in commands that take care of a lot of the labor-intensive aspects of application management. Kubernetes can ensure that your applications are always running as specified in your configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure abstraction
&lt;/h3&gt;

&lt;p&gt;Kubernetes handles the compute, networking, and storage on behalf of your workloads. This allows developers to focus on applications and not worry about the underlying environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Kubernetes changes your monitoring strategy
&lt;/h2&gt;

&lt;p&gt;If you ever meet someone who tells you, "Kubernetes is easy to understand," most would agree they are lying to you!&lt;/p&gt;

&lt;p&gt;Kubernetes requires a new approach to monitoring, especially when you are migrating away from traditional hosts like VMs or on-prem servers. &lt;/p&gt;

&lt;p&gt;Containers can live for only a few minutes at a time since they get deployed and re-deployed adjusting to usage demand. How can you troubleshoot if they don't exist anymore?&lt;/p&gt;

&lt;p&gt;These containers are also spread out across several hosts on physical servers worldwide. It can be hard to connect a failing process to the affected application without the proper context for the metrics you are collecting.&lt;/p&gt;

&lt;p&gt;To monitor a large number of short-lived containers, Kubernetes has built-in tools and APIs that help you understand the performance of your applications. A monitoring strategy that takes advantage of Kubernetes will give you a bird's eye view of your entire application’s performance, even if containers running your applications are continuously moving between hosts or being scaled up and down. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdki4yg25wi4cytmvt6nu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdki4yg25wi4cytmvt6nu.png" alt="Image description" width="501" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Increased monitoring responsibilities
&lt;/h2&gt;

&lt;p&gt;To get full visibility into your stack, you need to monitor your infrastructure. Modern tech stacks have made the relationship between applications and their infrastructure a more complicated than in the past. &lt;/p&gt;

&lt;h3&gt;
  
  
  Traditional infrastructure
&lt;/h3&gt;

&lt;p&gt;In a traditional infrastructure environment, you only have two things to monitor–your applications and the hosts (servers or VMs) running them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8xesranmo0jtypyeznr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8xesranmo0jtypyeznr.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The introduction of containers
&lt;/h3&gt;

&lt;p&gt;In 2013, Docker introduced containerization to the world. Containers are used to package and run an application, along with its dependencies, in an isolated, predictable, and repeatable way. This adds a layer of abstraction between your infrastructure and your applications. Containers are similar to traditional hosts, in that they run workloads on behalf of the application. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gnnyoxlqmwiqgzf66wd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gnnyoxlqmwiqgzf66wd.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubernetes
&lt;/h3&gt;

&lt;p&gt;With Kubernetes, full visibility into your stack means collecting telemetry data on the containers that are constantly being automatically spun up and dying while also collecting telemetry data on Kubernetes itself. Gone are the days of checking a few lights on the server sitting in your garage!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gnszu8q4a025zkja2nj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gnszu8q4a025zkja2nj.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are four distinct components that need to be monitored in a Kubernetes environment each with their specificities and challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure (*worker nodes)&lt;/li&gt;
&lt;li&gt;Containers&lt;/li&gt;
&lt;li&gt;Applications &lt;/li&gt;
&lt;li&gt;Kubernetes clusters (control plane)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Correlating application metrics with infrastructure metrics with metadata
&lt;/h2&gt;

&lt;p&gt;While making it easier to build scalable applications, Kubernetes has blurred the lines between application and infrastructure. If you are a developer, your primary focus is on the application and not the cluster's performance, but the cluster's underlying components can have a direct effect on how well your application performs. For example, a bug in a Kubernetes application might be caused by an issue with the physical infrastructure, but it could also result from a configuration mistake or coding problem. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqgt7mvv4c4tgqs6ktsd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqgt7mvv4c4tgqs6ktsd.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While making it easier to build scalable applications, Kubernetes has blurred the lines between application and infrastructure. If you are a developer, your primary focus is on the application and not the cluster's performance, but the cluster's underlying components can have a direct effect on how well your application performs. For example, a bug in a Kubernetes application might be caused by an issue with the physical infrastructure, but it could also result from a configuration mistake or coding problem. &lt;/p&gt;

&lt;p&gt;When using Kubernetes, monitoring your application isn’t optional, it’s a necessity! &lt;/p&gt;

&lt;p&gt;Most Application Performance Monitoring (APM) language agents don’t care where an application is running. It could be running on an ancient Linux server in a forgotten rack or on the latest Amazon Elastic Compute Cloud (Amazon EC2) instance. However, when monitoring applications managed by an orchestration layer, having context into infrastructure can be very useful for debugging or troubleshooting to be able to relate an application error trace, for example, to the container, pod, or host that it’s running on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring labels in Kubernetes
&lt;/h2&gt;

&lt;p&gt;Kubernetes automates the creation and deletion of containers with varying lifespans. This entire process needs to be monitored. With so many moving pieces, a clear organization-wide labeling policy needs to be in place in order to match metrics to a corresponding application, pod, namespace, node, etc.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlgnvk1ie7jwb3uvxs83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlgnvk1ie7jwb3uvxs83.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Consistent labelling of objects in your K8s cluster&lt;br&gt;
By attaching consistent labels across different objects, you can easily query your Kubernetes cluster for these objects. For example, suppose you get a call from your developers asking if the production environment is down. If the production pods have a “prod” label, you can run the following kubectl command to get all their logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -l name=prod
NAME  READY STATUS  RESTARTS  AGE
router-worker-6db6999875-b8t8m   0/1   ErrImagePull   0   1d4h
router-worker-6db6999875-7fn7z   1/1   Running        0   47s router-worker-6db6999875-8rl9b   1/1   Running        3   10h router-worker-6db6999875-b8t8m   1/1   Running        2   11h 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, you might spot that one of the prod pods has an issue with pulling an image and providing that information to your developers who use the prod pod. If you didn’t have labels, you would have to manually grep the output of kubectl get pods. &lt;/p&gt;

&lt;h2&gt;
  
  
  Common labeling conventions
&lt;/h2&gt;

&lt;p&gt;In the example above, you saw an instance in which pods are labeled “prod” to identify their use by environment. Every team operated differently but the following naming conventions can commonly be found regardless of the team you work on:&lt;/p&gt;

&lt;h3&gt;
  
  
  Labels by environment
&lt;/h3&gt;

&lt;p&gt;You can create entities for the environment they belong to. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;env: production
env: qa
env: development
env: staging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Labels by team
&lt;/h3&gt;

&lt;p&gt;Creating tags for team names can be helpful to understand which team, group, department, or region was responsible for a change that led to a performance issue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;### Team tags

team: backend
team: frontend0
team: db

### Role tags

roles: architecture
roles: devops
roles: pm

### Region tags

region: emea
region: america
region: asia
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Labels by Kubernetes recommended labels
&lt;/h3&gt;

&lt;p&gt;Kubernetes provides a list of recommended labels that allow a baseline grouping of resource objects. The app.kubernetes.io prefix distinguishes between the labels recommended by Kubernetes and the custom labels that you may separately add using a company.com prefix. Some of the most popular recommended Kubernetes labels are listed below.&lt;/p&gt;

&lt;p&gt;Labels&lt;br&gt;
Key Description&lt;br&gt;
app.kubernetes.io/name  Name of application (such as redis)&lt;br&gt;
app.kubernetes.io/instance  Unique name for this specific instance of the application (such as redis-department-a)&lt;br&gt;
app.kubernetes.io/component A descriptive identifier of what the component is for (such as login-cache)&lt;br&gt;
app.kubernetes.io/part-of   The higher-level application using this resource (such as company-auth)&lt;/p&gt;

&lt;p&gt;With all of your Kubernetes objects labeled, you can query your observability data to get a bird’s eye view of your infrastructure and applications. You can examine every layer in your stack by filtering your metrics. And, you can drill into more granular details to find the root cause of an issue.&lt;/p&gt;

&lt;p&gt;Therefore, having a clear, standardized strategy for creating easy-to-understand labels and selectors should be an important part of your monitoring and alerting strategy for Kubernetes. Ultimately, health and performance metrics can only be aggregated by labels that you set. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So far, we’ve covered what Kubernetes is, what it can do, why it requires monitoring, and best practices on how to set up proper Kubernetes monitoring. &lt;/p&gt;

&lt;p&gt;In part two of this multi-part series, we’ll go through a deep dive into Kubernetes architecture.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;To read this full blog post from New Relic, click &lt;a href="https://newrelic.com/blog/best-practices/monitoring-kubernetes-part-one?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=global-fy23-q3-devto_k8" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/em&gt; 📚&lt;/p&gt;

</description>
      <category>ai</category>
      <category>watercooler</category>
    </item>
    <item>
      <title>Instrumenting Your Node.js Apps with OpenTelemetry</title>
      <dc:creator>Daniel Kim</dc:creator>
      <pubDate>Thu, 24 Jun 2021 16:21:26 +0000</pubDate>
      <link>https://forem.com/newrelic/instrumenting-your-node-js-apps-with-opentelemetry-5flb</link>
      <guid>https://forem.com/newrelic/instrumenting-your-node-js-apps-with-opentelemetry-5flb</guid>
      <description>&lt;p&gt;When I first joined New Relic, I really didn't understand the importance of observability, because I came from a frontend background. As I began learning about why observability was valuable for developers, I started digging deeper into the open source ecosystem and learning what makes it possible for modern apps to maintain uptime. I learned more about OpenTelemetry, a popular open source tool for monitoring your apps and websites, but it was intimidating because I couldn't find any introductory tutorials online guiding me through the process of instrumentation.&lt;/p&gt;

&lt;p&gt;It wasn't until I began instrumenting my own apps using the &lt;a href="https://opentelemetry.io/docs/" rel="noopener noreferrer"&gt;OpenTelemetry documentation&lt;/a&gt; that I realized how easy it was to get started. I collaborated with &lt;a href="https://freecodecamp.org" rel="noopener noreferrer"&gt;freeCodeCamp.org&lt;/a&gt; to create a beginner-friendly resource for anyone to begin instrumenting apps with OpenTelemetry. I worked with an amazing technical content creator named Ania Kubów to bring this one-hour video course to life. This course teaches how to use OpenTelemetry, including microservices, observability, tracing, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instrumenting your Node.js apps with OpenTelemetry
&lt;/h2&gt;

&lt;p&gt;As systems become increasingly complex, it’s increasingly important to get visibility into the inner workings of systems to increase performance and reliability. Distributed tracing shows how each request passes through the application, giving developers context to resolve incidents, showing what parts of their system are slow or broken. &lt;/p&gt;

&lt;p&gt;A single trace shows the path a request makes, from the browser or mobile device down to the database. By looking at traces as a whole, developers can quickly discover which parts of their application is having the biggest impact on performance as it affects your users’ experiences.&lt;/p&gt;

&lt;p&gt;That’s pretty abstract, right? So let’s zero in on a specific example to help clarify things. We’ll use OpenTelemetry to generate and view traces from a small sample application.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/r8UvWSX3KA8"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Spinning up our Movies App
&lt;/h2&gt;

&lt;p&gt;We have written a simple application consisting of two microservices, movies and dashboard. The &lt;code&gt;movies&lt;/code&gt; service provides the name of movies and their genre in JSON format, while the &lt;code&gt;dashboard&lt;/code&gt; service returns the results from the movies service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/lazyplatypus/Open-Telemetry-Demo" rel="noopener noreferrer"&gt;👉 Clone the repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To spin up the app, run&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ npm i
$ node dashboard.js
$ node movies.js


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Notice the variable &lt;code&gt;delay&lt;/code&gt;, built into the movies microservice that causes random delays returning the JSON.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="p"&gt;const express = require('express')
const app = express()
const port = 3000
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="p"&gt;app.get('/movies', async function (req, res) {
&lt;/span&gt;   res.type('json')
&lt;span class="gi"&gt;+  var delay = Math.floor( ( Math.random() * 2000 ) + 100);
+  setTimeout((() =&amp;gt; {
&lt;/span&gt;      res.send(({movies: [
         { name: 'Jaws', genre: 'Thriller'},
         { name: 'Annie', genre: 'Family'},
         { name: 'Jurassic Park', genre: 'Action'},
      ]}))
&lt;span class="gi"&gt;+  }), delay)
&lt;/span&gt;})
&lt;span class="err"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Tracing HTTP Requests with Open Telemetry
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry traces incoming and outgoing HTTP requests by attaching IDs. To do this, we need to &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instantiate a trace provider&lt;/strong&gt; to get data flowing. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure that trace provider with an exporter&lt;/strong&gt; to send telemetry data to another system where you can view, store, and analyze it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install OpenTelemetry plugins&lt;/strong&gt; to instrument specific node module(s) to automatically instrument various frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need to have Docker on your machine to run a Zipkin instance. If you don't have Docker yet, it's easy to install. As for Zipkin, it's an open-source distributed tracing system created by Twitter that helps gather timing data needed to troubleshoot latency problems in service architectures. The OpenZipkin volunteer organization currently runs it. Finally, if you want to export your OpenTelemetry data to New Relic in Step 4,  sign up to analyze, store, and use your telemetry data for free, forever.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Create our trace provider and configuring it with an exporter
&lt;/h3&gt;

&lt;p&gt;To create a trace provider, you need to install the following tool:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ npm install @opentelemetry/node


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  OpenTelemetry auto instrumentation package for NodeJS
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;@opentelemetry/node&lt;/code&gt; module provides auto-instrumentation for Node.js applications, which automatically identifies frameworks (Express), common protocols (HTTP), databases, and other libraries within your application. This module uses other community-contributed plugins to automatically instrument your application to automatically produce spans and provide end-to-end tracing with just a few lines of code.&lt;/p&gt;
&lt;h4&gt;
  
  
  OpenTelemetry Plugins
&lt;/h4&gt;

&lt;p&gt;Install the plugins:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ npm install @opentelemetry/plugin-http
$ npm install @opentelemetry/plugin-express


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When NodeJS’s HTTP module handles any API requests, the &lt;code&gt;@opentelemetry/plugin-http&lt;/code&gt; plugin generates trace data. The &lt;code&gt;@opentelemetry/plugin-express&lt;/code&gt; plugin generates trace data from requests sent through the Express framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Adding the Trace Provider and the Span Processor
&lt;/h3&gt;

&lt;p&gt;After tracers are implemented into applications, they record timing and metadata about operations that take place (for example, when a web server records exactly when it receives a request, and when it sends a response).&lt;/p&gt;

&lt;p&gt;Add this code snippet to &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;create a trace provider&lt;/li&gt;
&lt;li&gt;adds a span processor to the trace provider&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This code gets data from your local application and prints it to the terminal:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NodeTracerProvider&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ConsoleSpanExporter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;SimpleSpanProcessor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/tracing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NodeTracerProvider&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;consoleExporter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConsoleSpanExporter&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;spanProcessor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SimpleSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;consoleExporter&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;spanProcessor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you want to learn more about this code, check out the &lt;a href="https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#tracer" rel="noopener noreferrer"&gt;OpenTelemetry docs on tracers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once we add this code snippet, whenever we reload &lt;code&gt;http://localhost:3001/dashboard&lt;/code&gt;, we should get something like this - beautiful things on the terminal. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffklulsur2tviyd32wlnf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffklulsur2tviyd32wlnf.gif" alt="giphy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Use Docker to install Zipkin and start tracing your application
&lt;/h3&gt;

&lt;p&gt;You instrumented OpenTelemetry in the previous step. Now you move the data that you collected to a running Zipkin instance.&lt;/p&gt;

&lt;p&gt;Let's spin up a Zipkin instance with the &lt;a href="https://hub.docker.com/r/openzipkin/zipkin/" rel="noopener noreferrer"&gt;Docker Hub Image&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ docker run -d -p 9411:9411 openzipkin/zipkin


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;and you’ll have a Zipkin instance up and running. You’ll be able to load it by pointing your web browser to &lt;a href="http://localhost:9411" rel="noopener noreferrer"&gt;http://localhost:9411&lt;/a&gt;. You’ll see something like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs7db7lo5gck3i2ty12e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs7db7lo5gck3i2ty12e.png" alt="Screenshot of Zipkin"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Exporting to Zipkin
&lt;/h4&gt;

&lt;p&gt;Although neat, spans in a terminal window are a poor way to gain visibility into a service. You’re not going to want to scroll through JSON data in your terminal. Instead, it’s a lot easier to see a visualization in a dashboard. Let's work on that now. In the previous step, we added a console exporter to the system. Now you ship this data to Zipkin.&lt;/p&gt;

&lt;p&gt;In this code snippet, we are instantiating a Zipkin exporter, and then adding it to the trace provider. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="p"&gt;const { NodeTracerProvider } = require('@opentelemetry/node')
const { ConsoleSpanExporter, SimpleSpanProcessor } = require('@opentelemetry/tracing')
&lt;/span&gt;&lt;span class="gi"&gt;+ const { ZipkinExporter } = require('@opentelemetry/exporter-zipkin')
&lt;/span&gt;&lt;span class="p"&gt;const provider = new NodeTracerProvider()
const consoleExporter = new ConsoleSpanExporter()
const spanProcessor = new SimpleSpanProcessor(consoleExporter)
provider.addSpanProcessor(spanProcessor)
provider.register()
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+ const zipkinExporter = new ZipkinExporter({
+  url: 'http://localhost:9411/api/v2/spans',
+  serviceName: 'movies-service'
&lt;/span&gt;})
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+ const zipkinProcessor = new SimpleSpanProcessor(zipkinExporter)
+ provider.addSpanProcessor(zipkinProcessor)
&lt;/span&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;After you make these changes, let's visit our Zipkin instance at localhost:9411, start our application back up and request some URLs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscuaqg2k6y3zwwzospt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscuaqg2k6y3zwwzospt4.png" alt="Screen Shot 2021-06-23 at 3.54.32 PM"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Using the OpenTelemetry Collector to export the data into New Relic
&lt;/h3&gt;

&lt;p&gt;What happens if we want to send the OpenTelemetry data to another backend where you didn't have to manage all of your own telemetry data? &lt;/p&gt;

&lt;p&gt;Well, the amazing contributors to OpenTelemetry have come up with a solution to fix this!  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2o98se1kb01bsjni2j6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2o98se1kb01bsjni2j6.png" alt="Group 1792"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The OpenTelemetry Collector is a way for developers to receive, process, and export telemetry data to multiple backends. This collector acts as the intermediary, getting data from the instrumentation and sending it to multiple backends to store, process, and analyze the data.&lt;/p&gt;

&lt;p&gt;It supports multiple open source observability data formats like Zipkin, Jaeger, Prometheus, and Fluent Bit, sending it to one or more open source or commercial backends.&lt;/p&gt;

&lt;h4&gt;
  
  
  New Relic
&lt;/h4&gt;

&lt;p&gt;New Relic is a platform for you to analyze, store, and use your telemetry data for Free, forever. &lt;a href="https://newrelic.com/signup?utm_campaign=fy20-q1-amer-obsv-video-free_code_camp-video-&amp;amp;utm_medium=video&amp;amp;utm_source=free_code_camp&amp;amp;utm_content=video&amp;amp;fiscal_year=fy20&amp;amp;quarter=q1&amp;amp;program=obsv&amp;amp;audience=none&amp;amp;creative=none&amp;amp;placement=none&amp;amp;targeting=none&amp;amp;ad_type=none&amp;amp;geo=amer" rel="noopener noreferrer"&gt;Sign up now!&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F692fiew3ync926gcfuw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F692fiew3ync926gcfuw1.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuring the OpenTelemetry Collector with New Relic
&lt;/h4&gt;

&lt;p&gt;Clone the &lt;a href="https://github.com/lazyplatypus/OpenTelemetry-NR-Exporter" rel="noopener noreferrer"&gt;OpenTelemetry Collector with New Relic Exporter&lt;/a&gt; and spin up the docker container, making sure to export the New Relic API key. &lt;/p&gt;

&lt;p&gt;To get a key, go to the New Relic one dashboard and choose API keys from the dropdown menu in the upper right.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmo09xbtn40wowmno4rp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmo09xbtn40wowmno4rp.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, from the API keys window, click the Create a key button.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5h6l8j2bvt0tfmzemw2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5h6l8j2bvt0tfmzemw2a.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When creating the key, make sure you choose the &lt;strong&gt;Ingest - License&lt;/strong&gt; key type. Then click Create a key to generate the key.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthkhp0cqaour3deyab71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthkhp0cqaour3deyab71.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After you have an API key, you need to replace &lt;code&gt;&amp;lt;INSERT-API-KEY-HERE&amp;gt;&lt;/code&gt; in the code snippet below with your API key.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

export NEW_RELIC_API_KEY=&amp;lt;INSERT-API-KEY-HERE&amp;gt;
docker-compose -f docker-compose.yaml up


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Make sure to change the reporting URL from &lt;code&gt;http://localhost:9411/api/v2/spans&lt;/code&gt; to &lt;code&gt;http://localhost:9411/&lt;/code&gt; in both &lt;code&gt;dashboard.js&lt;/code&gt; and &lt;code&gt;movies.js&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="p"&gt;const zipkinExporter = new ZipkinExporter({&lt;br&gt;
&lt;/span&gt;&lt;span class="gd"&gt;- url: '&lt;a href="http://localhost:9411/api/v2/spans" rel="noopener noreferrer"&gt;http://localhost:9411/api/v2/spans&lt;/a&gt;',&lt;br&gt;
&lt;/span&gt;&lt;span class="gi"&gt;+ url: '&lt;a href="http://localhost:9411" rel="noopener noreferrer"&gt;http://localhost:9411&lt;/a&gt;',&lt;br&gt;
&lt;/span&gt;  serviceName: 'movies-service'&lt;br&gt;
})&lt;br&gt;
&lt;span class="err"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Step 5: Look at your ✨ beautiful data ✨&lt;br&gt;
&lt;/h1&gt;

&lt;p&gt;Navigate to the "Explorer" tab on &lt;a href="https://one.newrelic.com" rel="noopener noreferrer"&gt;New Relic One&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim53j8pdag4sl034us77.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim53j8pdag4sl034us77.png" alt="New Relic One Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you click on the service, you should be able to see some ✨beautiful✨ traces!&lt;/p&gt;

&lt;p&gt;The trace in the dashboard is transmitting data about the random delay that was added to the API calls:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffati4q1acf3pqn675yhz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffati4q1acf3pqn675yhz.png" alt="OTel Traces"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Instrumenting your app with Open Telemetry makes it easy to figure out what is going wrong when parts of your application is slow, broken, or both. With the collector, you can forward your data anywhere, so you are never locked into a vendor. You can choose to spin up an open source backend, use a proprietary backend like New Relic, or just roll your own backend! Whatever you choose, I wish you well you in journey to instrument EVERYTHING! &lt;/p&gt;

&lt;blockquote&gt;
&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;You can try out New Relic One with OpenTelemetry by signing up for our always free tier today.&lt;/p&gt;

&lt;p&gt;To learn more about OpenTelemetry, look for our upcoming Understand OpenTelemetry blog series.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>opentelemetry</category>
      <category>observability</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
