<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Netdata</title>
    <description>The latest articles on Forem by Netdata (@netdata).</description>
    <link>https://forem.com/netdata</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F3293%2F0fc83944-0e3d-438e-bc6f-7d187e7562d7.png</url>
      <title>Forem: Netdata</title>
      <link>https://forem.com/netdata</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/netdata"/>
    <language>en</language>
    <item>
      <title>How to extend the Geth-Netdata integration</title>
      <dc:creator>Odysseas Lamtzidis</dc:creator>
      <pubDate>Mon, 02 Aug 2021 15:28:49 +0000</pubDate>
      <link>https://forem.com/netdata/how-to-extend-the-geth-netdata-integration-4o68</link>
      <guid>https://forem.com/netdata/how-to-extend-the-geth-netdata-integration-4o68</guid>
      <description>&lt;h1&gt;
  
  
  How to extend the Geth collector
&lt;/h1&gt;

&lt;p&gt;This is the the last of a 2-part blog post series regarding Netdata and Geth. If you missed the first, be sure to check it out &lt;a href=""&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Geth is short for Go-Ethereum and is the official implementation of the Ethereum Client in Go. Currently it's one of the most widely used implementations and a core piece of infrastructure for the Ethereum ecosystem. &lt;/p&gt;

&lt;p&gt;With this proof of concept I wanted to showcase how easy it really is to gather data from any Prometheus endpoint and visualize them in Netdata. This has the added benefit of leveraging all the other features of Netdata, namely it's per-second data collection, automatic deployment and configuration and superb system monitoring. &lt;/p&gt;

&lt;p&gt;The most challenging aspect is to make sense of the metrics and organize them into meaningful charts. In other words, the expertise that is required to understand what each metric means and if it makes sense to surface it for the user. &lt;/p&gt;

&lt;p&gt;Note that some metrics would make sense for some users, and other metrics for others. We want to surface &lt;strong&gt;all metrics that make sense&lt;/strong&gt;. When developping an application, you need much lower level metrics (e.g &lt;a href="https://containerjournal.com/topics/container-management/using-ebpf-monitoring-to-know-what-to-measure-and-why/"&gt;eBPF&lt;/a&gt;), than when operating the application.&lt;/p&gt;

&lt;p&gt;Let's get down to it. &lt;/p&gt;

&lt;h3&gt;
  
  
  A note on collectors
&lt;/h3&gt;

&lt;p&gt;First, let's do a very brief intro to what a collector is. &lt;/p&gt;

&lt;p&gt;In Netdata, every collector is composed of a plugin and a module. The plugin is an orchestrator process that is responsible for running jobs, each job is an instance of a module. &lt;/p&gt;

&lt;p&gt;When we are "creating" a collector, in essence we select a plugin and we develop a module for that plugin. &lt;/p&gt;

&lt;p&gt;For Geth, since we are using the Prometheus Endpoint, it's easier to use our Golang Plugin, as it has internal libraries to gather data from Prometheus endpoints. &lt;/p&gt;

&lt;p&gt;The following image is useful:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PuFSqLHQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://aws1.discourse-cdn.com/business5/uploads/netdata2/original/1X/3cc1ef3cb489e7d3146d73bedefb812e49631cc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PuFSqLHQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://aws1.discourse-cdn.com/business5/uploads/netdata2/original/1X/3cc1ef3cb489e7d3146d73bedefb812e49631cc3.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to dive into the Netdata Collector framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://community.netdata.cloud/docs?topic=1189"&gt;FAQ: What are collectors and how do they work?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.netdata.cloud/docs/agent/collectors/plugins.d"&gt;External plugins overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Geth collector structure
&lt;/h3&gt;

&lt;p&gt;So, in essence, the Geth collector is the Geth module of the Go.d.plugin.&lt;/p&gt;

&lt;p&gt;As you can see on &lt;a href="https://github.com/netdata/go.d.plugin/tree/master/modules/geth"&gt;GitHub&lt;/a&gt;, the module is composed of four files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;charts.go&lt;/code&gt;: Chart definitions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;collect.go&lt;/code&gt;: Actual data collection, using the metric variables defined in &lt;code&gt;metrics.go&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;geth.go&lt;/code&gt;: Main structure, mostly boilerplate. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;metrics.go&lt;/code&gt;: Define metric variables to the corresponding Prometheus values&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  How to extend the Geth collector with a new metric
&lt;/h3&gt;

&lt;p&gt;It's very simply, really. &lt;/p&gt;

&lt;p&gt;Open your Prometheus endpoint and find the metrics that you want to visualize with Netdata. &lt;/p&gt;

&lt;p&gt;e.g &lt;code&gt;p2p_ingress_eth_65_0x08&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;metrics.go&lt;/code&gt; and define a new variable&lt;/p&gt;

&lt;p&gt;e.g &lt;code&gt;const p2pIngressEth650x08 = "p2p_ingress_eth_65_0x08"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;collect.go&lt;/code&gt; and create a new function, identical to the one that already exist. Although it doesn't really makes a difference in our case, we strive to organize the metrics into sensible functions (e.g gather all &lt;code&gt;p2pEth65&lt;/code&gt; metrics in one function). This is the function that we will do any computation on the raw value that we gather. &lt;/p&gt;

&lt;p&gt;Note that Netdata will automatically take care of units such as &lt;code&gt;bytes&lt;/code&gt; and will show the most human readable unit in the dashboard (e.g MB, GB, etc.)&lt;/p&gt;

&lt;p&gt;e.g&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func (v *Geth) collectP2pEth65(mx map[string]float64, pms prometheus.Metrics) {
    pms = pms.FindByNames(
        p2pIngressEth650x08
    )
    v.collectEth(mx, pms)
    mx[p2pIngressEth650x08] = mx[p2pIngressEth650x08] + 1234

}

func (v *Geth) collectEth(mx map[string]float64, pms prometheus.Metrics) {
    for _, pm := range pms {
        mx[pm.Name()] += pm.Value
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also need to add the function in the central function that is called by the module at the defined interval.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Geth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;collectGeth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pms&lt;/span&gt; &lt;span class="n"&gt;prometheus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collectChainData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collectP2P&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collectTxPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collectRpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collectP2pEth65&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mx&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, now that we have the value inside the module, we need to create the chart for that value. We do that in &lt;code&gt;charts.go&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;chartReorgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chart&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"reorgs_executed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Executed Reorgs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Units&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"reorgs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Fam&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;"reorgs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Ctx&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;"geth.reorgs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Dims&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dims&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reorgsExecuted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"executed"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;chartReorgsBlocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chart&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"reorgs_blocks"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Blocks Added/Removed from Reorg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Units&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"blocks"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Fam&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;"reorgs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Ctx&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;"geth.reorgs_blocks"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;Line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;Dims&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dims&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reorgsAdd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"added"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Algorithm&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"absolute"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reorgsDropped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"dropped"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's explain the fields of the structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ID&lt;/code&gt;: The unique identification for the chart.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Title&lt;/code&gt;: A human readable title for the front-end.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Units&lt;/code&gt;: The units for the dimension. Notice that Netdata can automatically scale certain units, so that the raw collector value stays in &lt;code&gt;bytes&lt;/code&gt; but the user sees &lt;code&gt;Megabytes&lt;/code&gt; on the dashboard. You can find a list of supported "automatically scaled" units on this &lt;a href="https://github.com/netdata/dashboard/blob/068bbbb975db7871920406be56af5a641c79a08e/src/utils/units-conversion.ts"&gt;file&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Fam&lt;/code&gt;: The submenu title, used to group multiple charts together.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Ctx&lt;/code&gt;: The identifier for the particular chart, kinda like id. Use the convention &lt;code&gt;&amp;lt;collector_name&amp;gt;.&amp;lt;chart_id&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Type&lt;/code&gt;: &lt;code&gt;Line&lt;/code&gt; (Default) or &lt;code&gt;Area&lt;/code&gt; or &lt;code&gt;Stacked&lt;/code&gt;. &lt;code&gt;Area&lt;/code&gt; is best used with dimensions that signify "bandwidth". &lt;code&gt;Stacked&lt;/code&gt; when it make sense to visually observe the &lt;code&gt;sum&lt;/code&gt; of dimensions. (e.g the&lt;code&gt;system.ram&lt;/code&gt; chart is stacked).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Dims&lt;/code&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ID&lt;/code&gt;: The variable name for that dimension.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Name&lt;/code&gt;: human readable name for the dimension.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Algorithm&lt;/code&gt;: 

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;absolute&lt;/code&gt;: Default (if omitted) is &lt;code&gt;absolute&lt;/code&gt;. Netdata will show the value that it gets from the collector. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;incremental&lt;/code&gt;: Netdata will show the per-second rate of the value. It will automatically take the delta between two data collections, find the per-second value and show it. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;percentage&lt;/code&gt;: Netdata will show the percentage of the dimension in relation to the &lt;code&gt;sum&lt;/code&gt; of all the dimensions of the chart. If four dimensions have value = &lt;code&gt;1&lt;/code&gt;, it will show &lt;code&gt;25%&lt;/code&gt;. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Mul&lt;/code&gt;: Multiply value by some integer.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Div&lt;/code&gt;: Divide value by some integer.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A final note on extending Geth
&lt;/h3&gt;

&lt;p&gt;The prometheus endpoint is not the only way to monitor Geth, but it's the simplest. &lt;/p&gt;

&lt;p&gt;If you feel adventurous, you can try to implement a collector that also uses Geth's RPC endpoint to pull data (e.g show charts about specific contracts in real time) or even Geth's logs. &lt;/p&gt;

&lt;p&gt;To use Geth's RPC endpoint with Golang, take a look at &lt;a href="https://geth.ethereum.org/docs/dapp/native"&gt;Geth's documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To monitor Geth's logs, you can use our &lt;a href="https://github.com/netdata/go.d.plugin/tree/ec9980149c3d32e4a90912826edd344dfb0413ac/modules/weblog"&gt;weblog collector&lt;/a&gt; as a template. It monitors Apache and NGINX servers by parsing their logs. &lt;/p&gt;

&lt;h3&gt;
  
  
  Add alerts to Geth charts
&lt;/h3&gt;

&lt;p&gt;Now that we have defined the new charts, we may want to define alerts for them. The full alert syntax is out-of-scope for this tutorial, but it shouldn't be difficult once you get the hang of it. &lt;/p&gt;

&lt;p&gt;For example, here is a simple alarm that tells me if Geth is synced or not, based on whether &lt;code&gt;header&lt;/code&gt; and &lt;code&gt;block&lt;/code&gt; values are the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  1 #chainhead_header is expected momenterarily to be ahead. If its considerably ahead (e.g more than 5 blocks), then the node is definetely out of sync.
  2  template: geth_chainhead_diff_between_header_block
  3        on: geth.chainhead
  4     class: Workload
  5      type: ethereum_node
  6 component: geth
  7     every: 10s
  8      calc: $chain_head_block -  $chain_head_header
  9     units: blocks
 10      warn: $this != 0
 11      crit: $this &amp;gt; 5
 12     delay: up 5s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;You can read the above example as follows:&lt;/strong&gt;&lt;br&gt;
On the charts that have the context &lt;code&gt;geth.chainhead&lt;/code&gt; (thus all the Geth nodes that we may monitor with a single Netdata Agent), every 10s, caluclate the difference between the dimensions &lt;code&gt;chain_head_block&lt;/code&gt; and &lt;code&gt;chain_head_header&lt;/code&gt;. If it's not 0, then raise alert to &lt;code&gt;warn&lt;/code&gt;. If it's more than 5, then raise to &lt;code&gt;critical&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;Some useful resources to get you up to speed quickly with creating alerts for our Geth node:&lt;/p&gt;

&lt;p&gt;Note that if you create an alert and it works for you, a great idea is to make a PR into the main &lt;code&gt;netdata/netdata&lt;/code&gt; &lt;a href="https://github.com/netdata/netdata"&gt;repository&lt;/a&gt;. That way, the alert definition will exist in every netdata installation, and you will help countless other Geth users. &lt;/p&gt;

&lt;p&gt;Here are some useful resources to create new alerts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=aWYj9VT8I5A"&gt;Youtube - Creating your first health alarm in Netdata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.netdata.cloud/docs/monitor/configure-alarms"&gt;Docs - Configure health alert
&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.netdata.cloud/docs/agent/health/reference"&gt;Docs - alert configuration reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.netdata.cloud/docs/monitor/enable-notifications"&gt;Docs - Enable alert notifications&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Extend Geth collector for other clients
&lt;/h2&gt;

&lt;p&gt;The beauty of this solution is that it's &lt;strong&gt;trivial&lt;/strong&gt; to duplicate the collector and gather metrics from all Ethereum clients that support the Prometheus endpoint:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.nethermind.io/nethermind/ethereum-client/metrics/setting-up-local-metrics-infrastracture"&gt;Nethermind&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://besu.hyperledger.org/en/stable/HowTo/Monitor/Metrics/"&gt;Besu&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ledgerwatch/erigon"&gt;Erigon&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only difference between a Geth collector and a &lt;a href="https://nethermind.io/client"&gt;Nethermind&lt;/a&gt; collector is that they might expose different metrics or the same metrics with different "Prometheus metrics names". So, we just need to change the Prometheus metrics names in the &lt;code&gt;metrics.go&lt;/code&gt; source file and propagate any change to the other source files as well. &lt;/p&gt;

&lt;p&gt;The logic that I described above stays exactly the same. &lt;/p&gt;

&lt;h2&gt;
  
  
  In conclusion
&lt;/h2&gt;

&lt;p&gt;Extending Geth for more metrics is trivial. &lt;/p&gt;

&lt;p&gt;As you may suspect, this guide is applicable for any data source that is exposing it's metrics using the Prometheus format. &lt;/p&gt;

</description>
      <category>ethereum</category>
      <category>go</category>
      <category>devops</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Introduction to StatsD</title>
      <dc:creator>Odysseas Lamtzidis</dc:creator>
      <pubDate>Mon, 15 Feb 2021 14:00:14 +0000</pubDate>
      <link>https://forem.com/netdata/introduction-to-statsd-1ci9</link>
      <guid>https://forem.com/netdata/introduction-to-statsd-1ci9</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9Qgh66Uk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/5160ztstkjwl8ng46eoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9Qgh66Uk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/5160ztstkjwl8ng46eoj.png" alt="Intro image"&gt;&lt;/a&gt;StatsD is an industry-standard technology stack for monitoring applications and instrumenting any piece of software to deliver custom metrics. The StatsD architecture is based on delivering the metrics via UDP packets from any application to a central statsD server. Although the original StatsD server was written in Node.js, there are many implementations today, with Netdata being one of them.&lt;/p&gt;

&lt;p&gt;StatsD makes it easier for you to instrument your applications, delivering value around three main pillars: open-source, control, and modularity. That’s a real windfall for full-stack developers who need to code quickly, troubleshoot application issues on the fly, and often don’t have the necessary background knowledge to use complex monitoring platforms.&lt;/p&gt;

&lt;p&gt;First and foremost, StatsD is an open-source standard, meaning that vendor lock-in is simply not possible. With most of the monitoring solutions offering a StatsD server, you know that your instrumentation will play nicely with any solution you might want to use in the future.&lt;/p&gt;

&lt;p&gt;The second is that you have absolute control over the data you send, since the StatsD server just listens for metrics. You can choose how, when, or why to send data from any application you build, whether it’s in aggregate or as highly cardinal data points. You also don’t need to spend any time configuring the StatsD server, since it will accept any metrics in any form you choose via your instrumentation.&lt;/p&gt;

&lt;p&gt;Finally, there is a complete decoupling of each component of the stack. The client doesn’t care about the implementation of the server, and the server is agnostic about the backend. You can mix and match any combination of client, server, and backend that works best for you, or migrate between them as your needs change.&lt;/p&gt;

&lt;p&gt;Historically, it has always been easier to measure and collect metrics about systems and networks than applications. In 2011, Erik Kasten developed StatsD while working at Etsy, to collect metrics from instrumented code. The original implementation, in Node.JS, listened on a UDP port for incoming metrics data, extracted it, and periodically sent batches of metrics to Graphite. Since then, countless applications have StatsD already implemented and can be configured to send their metrics to any StatsD server, while the number of available libraries makes it trivial to use the protocol in any language.&lt;/p&gt;

&lt;h1&gt;
  
  
  How does StatsD work?
&lt;/h1&gt;

&lt;p&gt;The architecture of StatsD is divided into 3 main pieces: client, server, and backend. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;client&lt;/strong&gt; is what creates and delivers metrics. In most cases, this is a StatsD library, added to your application, that pushes metrics at specific points where you add the relevant code.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;server&lt;/strong&gt; is a daemon process responsible for listening for metric data as it’s pushed from the client, batching them, and sending them to the backend.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;backend&lt;/strong&gt;, which is where metrics data is stored for analysis and visualization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;StatsD uses UDP packets because the client/server both reside on the same host, where packet loss is minimal and you can get the maximum throughput with the least amount of overhead. TCP is also an option, in case the client/server implementations reside on different hosts and the deliverability of metrics is a primary concern; in that case, the metrics collection speed will be lower due to the overhead of TCP.&lt;/p&gt;

&lt;p&gt;In case you are wondering about the difference between TCP and UDP, this image is most illustrative:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AkgNvpTd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://ydevern.files.wordpress.com/2018/09/tcp-vs-udp.png%3Fw%3D809" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AkgNvpTd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://ydevern.files.wordpress.com/2018/09/tcp-vs-udp.png%3Fw%3D809" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ydevern.wordpress.com/2018/09/26/ccna-udp-vs-tcp/"&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More often than not, an HTTP-based connection is used to send the metrics from the server to the backend, and because the backend is stored for long-term analysis and storage, it often resides in a different host than the server/clients.&lt;/p&gt;

&lt;h1&gt;
  
  
  StatsD in &lt;a href="https://netdata.cloud"&gt;Netdata&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Netdata is a fully featured StatsD server, meaning it collects formatted metrics from any application that you instrumented with your library of choice. Netdata is also its own backend implementation, as it offers instant visualization and long-term storage using the embedded time-series database (TSDB). When you install Netdata, you immediately get a fully functional StatsD implementation running on port 8125.&lt;/p&gt;

&lt;p&gt;Since StatsD uses UDP or TCP to send instrumented metrics, either across localhost or between separate nodes, you’re free to deploy your application in whatever way works best for you, and it can still connect to Netdata’s server implementation. As soon as your application exposes metrics and starts sending packets on port 8125, Netdata turns the incoming metrics into charts and visualizes them in a meaningful fashion. &lt;/p&gt;

&lt;p&gt;Your applications can be deployed in a variety of ways and still be able to easily surface monitoring data to Netdata. Moreover, Netdata accepts StatsD packets by default, meaning that as soon as your application starts sending data to Netdata, Netdata will create charts and visualize them as accurately as it can. Since there are a myriad of different setups, Netdata offers a robust server implementation that can be configured to organize the metrics in charts that make sense, so you can easily improve the visualization by making some simple modifications. &lt;/p&gt;

&lt;p&gt;Because StatsD is a robust, mature technology, developers have built libraries to easily instrument applications in most popular languages.&lt;/p&gt;

&lt;p&gt;Python:  &lt;a href="https://github.com/jsocol/pystatsd"&gt;https://github.com/jsocol/pystatsd&lt;/a&gt;&lt;br&gt;
Python Django: &lt;a href="https://github.com/WoLpH/django-statsd"&gt;https://github.com/WoLpH/django-statsd&lt;/a&gt;&lt;br&gt;
Java: &lt;a href="https://github.com/tim-group/java-statsd-client"&gt;https://github.com/tim-group/java-statsd-client&lt;/a&gt;&lt;br&gt;
Clojure: &lt;a href="https://github.com/pyr/clj-statsd"&gt;https://github.com/pyr/clj-statsd&lt;/a&gt;&lt;br&gt;
Nodes/Javascript: &lt;a href="https://github.com/sivy/node-statsd"&gt;https://github.com/sivy/node-statsd&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Taking the example from python-statsd, you only need a reachable Netdata Agent (locally or over the internet) and a couple of lines of code. This hello_world example illustrates just how simple it is to send any metric you care about to Netdata and instantly visualize it. &lt;/p&gt;

&lt;p&gt;Even with no configuration at all, Netdata automatically creates charts for you. Netdata, being a robust monitoring agent, is also capable of organizing incoming metrics in any way you find most meaningful.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;statsd&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statsd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatsClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'localhost'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8125&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'foo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Increment the 'foo' counter.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100000000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'bar'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'foo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'bar'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'stats.timed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Record a 320ms 'stats.timed'.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Netdata’s StatsD server is also quite performant, which means you can monitor applications where they run without concerns over bottlenecks or restricting resources:&lt;/p&gt;

&lt;p&gt;Netdata StatsD is fast. It can collect more than 1.200.000 metrics per second on modern hardware, more than 200Mbps of sustained statsd traffic, using 1 CPU core&lt;/p&gt;

&lt;p&gt;Netdata does this on top of gathering metrics from other data sources. Netdata monitors an application’s full stack, from hardware to operating system to underlying services, organized automatically into meaningful categories. Every available metric is nicely organized automatically into a single dashboard.&lt;/p&gt;

&lt;p&gt;Ready to get started?&lt;br&gt;
In the next part of the StatsD series, we are going to illustrate how to configure Netdata to organize the metrics of any application, using K6 as our use-case. &lt;/p&gt;

&lt;p&gt;If you can’t wait until then, join our Community Forums where we have kickstarted a discussion around StatsD.&lt;/p&gt;

&lt;p&gt;Here are a couple of interesting resources to get you started with StatsD:&lt;/p&gt;

&lt;p&gt;StatsD GitHub &lt;a href="https://github.com/statsd/statsd"&gt;repository&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/@DoorDash/scaling-statsd-84d456a7cc2a"&gt;Scaling StatsD in DoorDash&lt;/a&gt;&lt;br&gt;
Netdata StatsD reference &lt;a href="https://learn.netdata.cloud/docs/agent/collectors/statsd.plugin"&gt;documentation&lt;/a&gt;&lt;/p&gt;

</description>
      <category>netdata</category>
      <category>statsd</category>
      <category>monitoring</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Deploy real-time monitoring with Netdata and Ansible</title>
      <dc:creator>Joel Hans</dc:creator>
      <pubDate>Tue, 17 Nov 2020 14:40:25 +0000</pubDate>
      <link>https://forem.com/netdata/deploy-real-time-monitoring-with-netdata-and-ansible-3d49</link>
      <guid>https://forem.com/netdata/deploy-real-time-monitoring-with-netdata-and-ansible-3d49</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ncoe0ykf5n2fj2r51na.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ncoe0ykf5n2fj2r51na.png" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hello, Joel here! I'm working with &lt;a href="https://dev.to/netdata"&gt;Netdata&lt;/a&gt; to help more people deploy real-time system and application monitoring. I hope this Ansible guide helps a few of you build some extraordinary infrastructure.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Netdata's &lt;a href="https://learn.netdata.cloud/docs/get" rel="noopener noreferrer"&gt;one-line kickstart&lt;/a&gt; is zero-configuration, highly adaptable, and compatible with tons of different operating systems and Linux distributions. You can use it on bare metal, VMs, containers, and everything in-between.&lt;/p&gt;

&lt;p&gt;But what if you're trying to bootstrap an infrastructure monitoring solution as quickly as possible. What if you need to deploy Netdata across an entire infrastructure with many nodes? What if you want to make this deployment reliable, repeatable, and idempotent? What if you want to write and deploy your infrastructure or cloud monitoring system like code?&lt;/p&gt;

&lt;p&gt;Enter &lt;a href="https://ansible.com" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt;, a popular system provisioning, configuration management, and infrastructure as code (IaC) tool. Ansible uses &lt;strong&gt;playbooks&lt;/strong&gt; to glue many standardized operations together with a simple syntax, then run those operations over standard and secure SSH connections. There's no agent to install on the remote system, so all you have to worry about is your application and your monitoring software. &lt;/p&gt;

&lt;p&gt;Ansible has some competition from the likes of &lt;a href="https://puppet.com/" rel="noopener noreferrer"&gt;Puppet&lt;/a&gt; or &lt;a href="https://www.chef.io/" rel="noopener noreferrer"&gt;Chef&lt;/a&gt;, but the most valuable feature about Ansible is that every is &lt;strong&gt;idempotent&lt;/strong&gt;. From the &lt;a href="https://docs.ansible.com/ansible/latest/reference_appendices/glossary.html" rel="noopener noreferrer"&gt;Ansible glossary&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An operation is idempotent if the result of performing it once is exactly the same as the result of performing it repeatedly without any intervening actions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Idempotency means you can run an Ansible playbook against your nodes any number of times without affecting how they operate. When you deploy Netdata with Ansible, you're also deploying &lt;em&gt;monitoring as code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In this guide, we'll walk through the process of using an &lt;a href="https://github.com/netdata/community/tree/main/netdata-agent-deployment/ansible-quickstart" rel="noopener noreferrer"&gt;Ansible playbook&lt;/a&gt; to automatically deploy the Netdata Agent to any number of distributed nodes, manage the configuration of each node, and claim them to your Netdata Cloud account. You'll go from some unmonitored nodes to a infrastructure monitoring solution in a matter of minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  A Netdata Cloud account. &lt;a href="https://app.netdata.cloud" rel="noopener noreferrer"&gt;Sign in and create one&lt;/a&gt; if you don't have one already.&lt;/li&gt;
&lt;li&gt;  An administration system with &lt;a href="https://www.ansible.com/" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt; installed.&lt;/li&gt;
&lt;li&gt;  One or more nodes that your administration system can access via &lt;a href="https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key" rel="noopener noreferrer"&gt;SSH public keys&lt;/a&gt; (preferably password-less).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Download and configure the playbook
&lt;/h2&gt;

&lt;p&gt;First, download the &lt;a href="https://github.com/netdata/community/tree/main/netdata-agent-deployment/ansible-quickstart" rel="noopener noreferrer"&gt;playbook&lt;/a&gt;, move it to the current directory, and remove the rest of the cloned repository, as it's not required for using the Ansible playbook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/netdata/community.git
&lt;span class="nb"&gt;mv &lt;/span&gt;community/netdata-agent-deployment/ansible-quickstart &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; community
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, &lt;code&gt;cd&lt;/code&gt; into the Ansible directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;ansible-quickstart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Edit the &lt;code&gt;hosts&lt;/code&gt; file
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;hosts&lt;/code&gt; file contains a list of IP addresses or hostnames that Ansible will try to run the playbook against. The &lt;code&gt;hosts&lt;/code&gt; file that comes with the repository contains two example IP addresses, which you should replace according to the IP address/hostname of your nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="m"&gt;203&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;113&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;  &lt;span class="n"&gt;hostname&lt;/span&gt;=&lt;span class="n"&gt;node&lt;/span&gt;-&lt;span class="m"&gt;01&lt;/span&gt;
&lt;span class="m"&gt;203&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;113&lt;/span&gt;.&lt;span class="m"&gt;1&lt;/span&gt;  &lt;span class="n"&gt;hostname&lt;/span&gt;=&lt;span class="n"&gt;node&lt;/span&gt;-&lt;span class="m"&gt;02&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also set the &lt;code&gt;hostname&lt;/code&gt; variable, which appears both on the local Agent dashboard and Netdata Cloud, or you can omit the &lt;code&gt;hostname=&lt;/code&gt; string entirely to use the system's default hostname.&lt;/p&gt;

&lt;h4&gt;
  
  
  Set the login user (optional)
&lt;/h4&gt;

&lt;p&gt;If you SSH into your nodes as a user other than &lt;code&gt;root&lt;/code&gt;, you need to configure &lt;code&gt;hosts&lt;/code&gt; according to those user names. Use the &lt;code&gt;ansible_user&lt;/code&gt; variable to set the login user. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="m"&gt;203&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;113&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;  &lt;span class="n"&gt;hostname&lt;/span&gt;=&lt;span class="n"&gt;ansible&lt;/span&gt;-&lt;span class="m"&gt;01&lt;/span&gt;  &lt;span class="n"&gt;ansible_user&lt;/span&gt;=&lt;span class="n"&gt;example&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Set your SSH key (optional)
&lt;/h4&gt;

&lt;p&gt;If you use an SSH key other than &lt;code&gt;~/.ssh/id_rsa&lt;/code&gt; for logging into your nodes, you can set that on a per-node basis in the &lt;code&gt;hosts&lt;/code&gt; file with the &lt;code&gt;ansible_ssh_private_key_file&lt;/code&gt; variable. For example, to log into a Lightsail instance using two different SSH keys supplied by AWS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="m"&gt;203&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;113&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;  &lt;span class="n"&gt;hostname&lt;/span&gt;=&lt;span class="n"&gt;ansible&lt;/span&gt;-&lt;span class="m"&gt;01&lt;/span&gt;  &lt;span class="n"&gt;ansible_ssh_private_key_file&lt;/span&gt;=~/.&lt;span class="n"&gt;ssh&lt;/span&gt;/&lt;span class="n"&gt;LightsailDefaultKey&lt;/span&gt;-&lt;span class="n"&gt;us&lt;/span&gt;-&lt;span class="n"&gt;west&lt;/span&gt;-&lt;span class="m"&gt;2&lt;/span&gt;.&lt;span class="n"&gt;pem&lt;/span&gt;
&lt;span class="m"&gt;203&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;113&lt;/span&gt;.&lt;span class="m"&gt;1&lt;/span&gt;  &lt;span class="n"&gt;hostname&lt;/span&gt;=&lt;span class="n"&gt;ansible&lt;/span&gt;-&lt;span class="m"&gt;02&lt;/span&gt;  &lt;span class="n"&gt;ansible_ssh_private_key_file&lt;/span&gt;=~/.&lt;span class="n"&gt;ssh&lt;/span&gt;/&lt;span class="n"&gt;LightsailDefaultKey&lt;/span&gt;-&lt;span class="n"&gt;us&lt;/span&gt;-&lt;span class="n"&gt;east&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;.&lt;span class="n"&gt;pem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Edit the &lt;code&gt;vars/main.yml&lt;/code&gt; file
&lt;/h3&gt;

&lt;p&gt;In order to claim your node(s) to your Space in Netdata Cloud, and see all their metrics in real-time in &lt;a href="///docs/visualize/overview-infrastructure.md"&gt;composite charts&lt;/a&gt; or perform &lt;a href="https://learn.netdata.cloud/docs/cloud/insights/metric-correlations" rel="noopener noreferrer"&gt;Metric Correlations&lt;/a&gt;, you need to set the &lt;code&gt;claim_token&lt;/code&gt; and &lt;code&gt;claim_room&lt;/code&gt; variables.&lt;/p&gt;

&lt;p&gt;To find your &lt;code&gt;claim_token&lt;/code&gt; and &lt;code&gt;claim_room&lt;/code&gt;, go to Netdata Cloud, then click on your Space's name in the top navigation, then click on &lt;strong&gt;Manage your Space&lt;/strong&gt;. Click on the &lt;strong&gt;Nodes&lt;/strong&gt; tab in the panel that appears, which displays a script with &lt;code&gt;token&lt;/code&gt; and &lt;code&gt;room&lt;/code&gt; strings. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27q5537s5twu0l6pq7jf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27q5537s5twu0l6pq7jf.gif" alt="Animated GIF of finding the claiming script and the token and room strings" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy those strings into the &lt;code&gt;claim_token&lt;/code&gt; and &lt;code&gt;claim_rooms&lt;/code&gt; variables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;claim_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;XXXXX&lt;/span&gt;
&lt;span class="na"&gt;claim_rooms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;XXXXX&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change the &lt;code&gt;dbengine_multihost_disk_space&lt;/code&gt; if you want to change the metrics retention policy by allocating more or less disk space for storing metrics. The default is 2048 Mib, or 2 GiB. &lt;/p&gt;

&lt;p&gt;Because we're claiming this node to Netdata Cloud, and will view its dashboards there instead of via the IP address or hostname of the node, the playbook disables that local dashboard by setting &lt;code&gt;web_mode&lt;/code&gt; to &lt;code&gt;none&lt;/code&gt;. This gives a small security boost by not allowing any unwanted access to the local dashboard.&lt;/p&gt;

&lt;p&gt;You can read more about this decision, or other ways you might lock down the local dashboard, in our &lt;a href="https://learn.netdata.cloud/docs/configure/secure-nodes" rel="noopener noreferrer"&gt;node security doc&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Curious about why Netdata's dashboard is open by default? Read our &lt;a href="https://www.netdata.cloud/blog/netdata-agent-dashboard/" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; on that zero-configuration design decision.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Run the playbook
&lt;/h2&gt;

&lt;p&gt;Time to run the playbook from your administration system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; hosts tasks/main.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ansible first connects to your node(s) via SSH, then &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_vars_facts.html#ansible-facts" rel="noopener noreferrer"&gt;collects facts&lt;/a&gt; about the system. This playbook doesn't use these facts, but you could expand it to provision specific types of systems based on the makeup of your infrastructure.&lt;/p&gt;

&lt;p&gt;Next, Ansible makes changes to each node according to the &lt;code&gt;tasks&lt;/code&gt; defined in the playbook, and &lt;a href="https://docs.ansible.com/ansible/latest/reference_appendices/common_return_values.html#changed" rel="noopener noreferrer"&gt;returns&lt;/a&gt; whether each task results in a changed, failure, or was skipped entirely.&lt;/p&gt;

&lt;p&gt;The task to install Netdata will take a few minutes per node, so be patient! Once the playbook reaches the claiming task, your nodes start populating your Space in Netdata Cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;Go use Netdata!&lt;/p&gt;

&lt;p&gt;If you need a bit more guidance for how you can use Netdata for health monitoring and performance troubleshooting, see our &lt;a href="https://learn.netdata.cloud/docs" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;. It's designed like a comprehensive guide, based on what you might want to do with Netdata, so use those categories to dive in.&lt;/p&gt;

&lt;p&gt;Some of the best places to start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://learn.netdata.cloud/docs/collect/enable-configure" rel="noopener noreferrer"&gt;Enable or configure a collector&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://learn.netdata.cloud/docs/agent/collectors/collectors" rel="noopener noreferrer"&gt;Supported collectors list&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://learn.netdata.cloud/docs/visualize/overview-infrastructure" rel="noopener noreferrer"&gt;See an overview of your infrastructure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  [Interact with dashboards and charts](&lt;a href="https://learn.netdata.cloud/docs/visualize/interact-dashboards-charts" rel="noopener noreferrer"&gt;https://learn.netdata.cloud/docs/visualize/interact-dashboards-charts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://learn.netdata.cloud/docs/store/change-metrics-storage" rel="noopener noreferrer"&gt;Change how long Netdata stores metrics&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're looking for more deployment and configuration management strategies, whether via Ansible or other provisioning/infrastructure as code software, such as Chef or Puppet, in Netdata's &lt;a href="https://github.com/netdata/community" rel="noopener noreferrer"&gt;community repo&lt;/a&gt;. Anyone is able to fork the repo and submit a PR, either to improve this playbook, extend it, or create an entirely new experience for deploying Netdata across entire infrastructure.&lt;/p&gt;

</description>
      <category>ansible</category>
      <category>monitoring</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Introduction to community repository: Consul, Ansible, ML</title>
      <dc:creator>Odysseas Lamtzidis</dc:creator>
      <pubDate>Mon, 16 Nov 2020 16:46:02 +0000</pubDate>
      <link>https://forem.com/netdata/introduction-to-community-repository-consul-ansible-ml-4hma</link>
      <guid>https://forem.com/netdata/introduction-to-community-repository-consul-ansible-ml-4hma</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The post was originally posted on the &lt;a href="https://www.netdata.cloud/blog/welcome-to-netdatas-community-repository-consul-ansible-ml/"&gt;Netdata blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QYd9Z7L8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/csg9ef98iqi6g78y785g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QYd9Z7L8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/csg9ef98iqi6g78y785g.png" alt="Cover Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On our journey to democratize monitoring, we are proud to have open source at the core of both our products and our company values. What started as a project out of frustration for lack of existing alternatives (see &lt;a href="https://www.rexfeng.com/blog/2016/01/anger-driven-development/"&gt;anger-driven development&lt;/a&gt;), quickly became one of the most starred open-source projects on all of GitHub. &lt;/p&gt;

&lt;p&gt;Fast-forward a couple of years later, and the Netdata Agent, our open-source monitoring agent, is maturing as the best single-node monitoring experience, offering unparalleled efficiency and thousands of metrics, per-second. At the same time, we have gathered a considerable community on our &lt;a href="https://github.com/netdata/netdata"&gt;GitHub repository&lt;/a&gt; and new forums.&lt;/p&gt;

&lt;p&gt;As the community grows, and considering our belief that extensibility is key to adoption, it was only natural to start brainstorming a way to share code and sample applications that supercharge the user experience and the Netdata Agent’s capabilities. &lt;/p&gt;

&lt;p&gt;Thus, without further ado, please say hello to our &lt;a href="https://github.com/netdata/community"&gt;Community Repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z1WkbeYr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://www.netdata.cloud/wp-content/uploads/2020/11/netdata-community.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z1WkbeYr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://www.netdata.cloud/wp-content/uploads/2020/11/netdata-community.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although still in its infancy, we expect this repository to be filled by community members who want to share their experience of running Netdata in a production environment or integrated into a technological stack. At the moment, the repository will be used to house all sample applications, which are divided into categories, depending on the use case.&lt;/p&gt;

&lt;p&gt;Currently, there are three example applications, all contributed by the Netdata team, which were originally developed for internal use. Let’s take a look at them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration Management
&lt;/h2&gt;

&lt;p&gt;The first sample application is one I built that focuses on the issue of configuration management of an arbitrary number of Netdata Agents. More specifically, I opted to use &lt;a href="https://www.consul.io/"&gt;Consul&lt;/a&gt;, an amazing open-source project by HashiCorp, to dynamically manage the configuration of a Netdata Agent. The keyword is “dynamically”: Whenever I choose to change a configuration variable, the Netdata Agent restarts automatically so that it can pick up the change from the configuration files.&lt;/p&gt;

&lt;p&gt;Consul, per their documentation, is a “service mesh solution providing a full-featured control plane with service discovery, configuration, and segmentation functionality”. As such, Consul is routinely used already in cloud-native applications, and it’s ideal for a simple key/value store that we can use to house the configuration variables that we wish to dynamically change. Since Netdata can’t pick up configuration from a RESTful interface, we use consul-template, again an open-source tool by HashiCorp, which watches a Consul node for a specific number of keys, picks up the changes to their values and places them into the templates, generating the changed configuration files in the process.&lt;/p&gt;

&lt;p&gt;The code and documentation for this sample application can be found in the specific &lt;a href="https://github.com/netdata/community/tree/main/configuration-management/consul-quickstart"&gt;consul-quickstart directory&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Machine Learning and Netdata Agent’s API
&lt;/h2&gt;

&lt;p&gt;The second contribution came from &lt;a href="https://www.netdata.cloud/author/amaguire/"&gt;Andrew Maguire&lt;/a&gt;, who contributed a few examples built on the &lt;a href="https://registry.my-netdata.io/swagger/#/default/get_data"&gt;Netdata Agent’s API&lt;/a&gt;. The API offers anyone the ability to extract data from the Netdata Agent in an extremely efficient way and build real-time applications on top of it. He leveraged his in-house &lt;a href="https://github.com/netdata/netdata-pandas/tree/master/"&gt;python library&lt;/a&gt; to automatically extract data, add them to panda arrays, and enable live ML, capabilities such as the detection of anomalies.&lt;/p&gt;

&lt;p&gt;You can find the examples in the &lt;a href="https://github.com/netdata/community/tree/main/netdata-agent-api/netdata-pandas"&gt;appropriate directory&lt;/a&gt; of the community repository and open them in Google Colab. We suggest Google Colab not only because it’s free, but also because they spin up a VM and install all the required dependencies, making it the fastest way to try out the examples and play with the API. To open them on Google Colab, simply open a notebook on GitHub, and click on the Open in Colab button.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automatic provisioning of Netdata Agents
&lt;/h2&gt;

&lt;p&gt;Last but not least, &lt;a href="https://www.netdata.cloud/author/joel/"&gt;Joel Hans&lt;/a&gt; pulled together the scripts that he had created for him to be able to automatically provision and claim any number of Netdata Agents on remote servers. The sample application is enabled by Ansible, a popular system provisioning, configuration management, and infrastructure-as-code tool. The user defines a set of steps in a &lt;code&gt;.yaml&lt;/code&gt; file, called a playbook, and then Ansible is responsible to run this playbook against a number of hosts using SSH as the only requirement. &lt;/p&gt;

&lt;p&gt;With &lt;a href="https://www.ansible.com/"&gt;Ansible&lt;/a&gt;, Joel can install and claim any number of Netdata Agents automatically, so that he can access and monitor his nodes in a matter of minutes, through Netdata Cloud. It’s that easy. You can learn more in the &lt;a href="https://learn.netdata.cloud/guides/deploy/ansible"&gt;guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now, it’s your turn
&lt;/h2&gt;

&lt;p&gt;The repository is up and running, but we need you to participate. If you are using any of the aforementioned tools and platforms and feel that we could have done something in a better way, please do let us know and make a pull request with your suggestions. &lt;/p&gt;

&lt;p&gt;If, on other hand, you are using Netdata with another application that greatly improves the experience, please do create a README about the project and PR it to the appropriate category. The value of this repository is of a compounding nature. The more examples we can get, the more value our users (like you) will be able to receive, and thus the popularity of the repository will invite even more sample applications.&lt;/p&gt;

&lt;p&gt;See you all on our repo!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>monitoring</category>
      <category>machinelearning</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
