<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ken Tune</title>
    <description>The latest articles on Forem by Ken Tune (@kentune).</description>
    <link>https://forem.com/kentune</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F421310%2Fd824a4f1-bfeb-4fae-aefa-30888156ffc1.png</url>
      <title>Forem: Ken Tune</title>
      <link>https://forem.com/kentune</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kentune"/>
    <language>en</language>
    <item>
      <title>Aerospike &amp; IoT using MQTT</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Fri, 11 Nov 2022 16:40:31 +0000</pubDate>
      <link>https://forem.com/aerospike/aerospike-iot-using-mqtt-29on</link>
      <guid>https://forem.com/aerospike/aerospike-iot-using-mqtt-29on</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://mqtt.org/"&gt;MQTT&lt;/a&gt; (Message Queuing Telemetry Transport) is a widely used messaging protocol for the Internet of Things (IoT). It is ideal for communicating with small remote devices with limited power and network bandwidth. MQTT is used in a wide variety of industries, such as automotive, manufacturing, telecommunications, oil and gas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aerospike.com/"&gt;Aerospike&lt;/a&gt; is a high performance distributed database, particularly well suited for real time transactional processing. It is aimed at institutions and use-cases that need high throughput (100k tps+), with low latency (95% completion in &amp;lt;1ms), while managing large amounts of data (Tb+) with 100% uptime, scalability and low cost.&lt;/p&gt;

&lt;p&gt;This article, based on example code in the &lt;a href="https://github.com/aerospike-examples/mqtt-aerospike-example"&gt;aerospike-examples/mqtt-aerospike-example&lt;/a&gt; GitHub repository, describes how to achieve end-to-end data flow between a small device and Aerospike, with the data being stored in Aerospike as queryable time series. Although the example is small in scope, the decoupled MQTT architecture and high performance Aerospike database allows the approach to be scaled to accommodate thousands of devices, storing data over a period of years if necessary.&lt;/p&gt;

&lt;p&gt;More specifically, the example simulates the generation of data from an IoT sensor and tracks how that can be sent to a specific topic on an MQTT Broker. The data simulator could &lt;a href="http://www.steves-internet-guide.com/using-arduino-pubsub-mqtt-client"&gt;quite easily be replaced with an actual sensor&lt;/a&gt;, communicating with an MQTT Broker.&lt;/p&gt;

&lt;p&gt;On the receiving side we describe how to subscribe to the above topic and how the data can be serialized to the Aerospike database using our Community &lt;a href="https://github.com/aerospike-community/aerospike-time-series-client"&gt;Time Series Client&lt;/a&gt;, which can also be used to query the data.&lt;/p&gt;

&lt;p&gt;The net result of this is the ability to source data in a scalable fashion from IoT devices and store it as queryable time series data within Aerospike.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating the data
&lt;/h2&gt;

&lt;p&gt;The data simulation in the example works as follows. Successive calls to the simulator result in data points, which are &lt;em&gt;(timestamp,value)&lt;/em&gt; pairs. The average time between &lt;em&gt;timestamps&lt;/em&gt; is specified at the outset as is a &lt;em&gt;percentage variability&lt;/em&gt; in the timestamps, to make the simulation realistic. The &lt;em&gt;ratio&lt;/em&gt; between successive &lt;em&gt;values&lt;/em&gt; is normally distributed - the mean and variance of this distribution is also specified before the simulation is started. So, we have four parameters governing our simulation. In addition, an initial timestamp and value must be specified and the simulation must be given a name. The simulator constructor reflects this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;TimeSeriesSimulator&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
  &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;simulatorName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;startTime&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;initialValue&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;observationIntervalMilliSeconds&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
  &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;observationIntervalVariabilityPct&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
  &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;dailyDriftPct&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;dailyVolatilityPct&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We obtain successive data points by calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="nf"&gt;getNextDataPoint&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following output shows the kind of content we expect to see, if simulating a sensor polling approximately hourly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sampling Engine-001-RPM-Sensor at time 2022-10-14 01:00:00.000. Found value 10000.000000. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 02:00:25.920. Found value 10470.777590. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 02:57:30.240. Found value 11123.240496. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 03:57:35.280. Found value 11066.086321. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 04:55:18.840. Found value 10599.837433. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 05:57:19.800. Found value 10268.800822. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 06:56:12.120. Found value 10256.870171. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 07:55:04.800. Found value 10329.697112. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 08:57:12.600. Found value 10307.305881. 
Sampling Engine-001-RPM-Sensor at time 2022-10-14 09:57:15.840. Found value 10436.093769. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sending the data to an MQTT Broker
&lt;/h2&gt;

&lt;p&gt;The MQTT paradigm assumes we have many disparate small devices. In order to collect information these devices will publish to a &lt;em&gt;topic&lt;/em&gt; on an MQTT Broker. You can think of a broker as a centralized depot for the receipt and distribution of messages, which provides for scalability. &lt;em&gt;Topics&lt;/em&gt; allow the messages to be separated into distinct collections. &lt;em&gt;Subscribers&lt;/em&gt; can independently subscribe to a topic and receive updates to the topic via push notifications.&lt;/p&gt;

&lt;p&gt;The following code shows the signature of a &lt;em&gt;Sensor Observer&lt;/em&gt; object. We provide a simulator to watch, a topic to publish to, and integers governing the frequency and number of observations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;RunnableMQTTSensorObserver&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
 &lt;span class="nc"&gt;ITimeSeriesSimulator&lt;/span&gt; &lt;span class="n"&gt;timeSeriesSimulator&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
 &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;millisecondsBetweenObservations&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;observationCount&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
 &lt;span class="nc"&gt;MqttTopic&lt;/span&gt; &lt;span class="n"&gt;publicationTopic&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MQTT publication topic is obtained by connecting to a networked resource, &lt;code&gt;MQTT_BROKER_URL&lt;/code&gt; using a publisher id &lt;code&gt;MQTT_PUBLISHER_ID&lt;/code&gt;.  To keep things simple, in this example we use the &lt;em&gt;public&lt;/em&gt; MQTT server &lt;code&gt;tcp://test.mosquitto.org:1883&lt;/code&gt;. This is an open resource and can be used by anybody. No special setup is required, but your data is potentially public. For this example this is not an issue, but you will ultimately need your own broker to take things further.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;IMqttClient&lt;/span&gt; &lt;span class="n"&gt;mqttPublisher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;MqttClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MQTT_BROKER_URL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;MQTT_PUBLISHER_ID&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;mqttPublisher&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;connect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;standardMqttConnectOptions&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="nc"&gt;MqttTopic&lt;/span&gt; &lt;span class="n"&gt;mqttTopic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mqttPublisher&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTopic&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MQTT_TOPIC_NAME&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we are using the &lt;a href="https://www.eclipse.org/paho/"&gt;Eclipse Paho&lt;/a&gt; implementation of the MQTT API.&lt;/p&gt;

&lt;p&gt;When the observer is run the following code is executed &lt;code&gt;observationCount&lt;/code&gt; times, each time resulting in the data point being sent to the publication topic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timeSeriesSimulator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getCurrentDataPoint&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MQTTUtilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;encodeForMQTT&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;timeSeriesSimulator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSimulatorName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;&lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getBytes&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="nc"&gt;MqttMessage&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MqttMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;publicationTopic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;publish&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the first line, we obtain a data point from the simulator.&lt;/p&gt;

&lt;p&gt;In the second line, we encode the data point so it can be sent as a message. The encoding function has the following signature:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;encodeForMQTT&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It makes use of a very simple serialization -  &lt;code&gt;timeSeriesName:dataPoint.getTimestamp():dataPoint.getValue()&lt;/code&gt; - colon separated values. See the function &lt;code&gt;MQTTUtilities.encodeForMQTT&lt;/code&gt; in the &lt;a href="https://github.com/aerospike-examples/mqtt-aerospike-example"&gt;aerospike-examples/mqtt-aerospike-example&lt;/a&gt; repository for full details.&lt;/p&gt;

&lt;p&gt;In the third line we construct an &lt;code&gt;MQTTMessage&lt;/code&gt; and finally, in the fourth line, publish it to the publication topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscribing to an MQTT Broker
&lt;/h2&gt;

&lt;p&gt;Similar to the above section, we connect to the MQTT Broker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;IMqttClient&lt;/span&gt; &lt;span class="n"&gt;mqttSubscriber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MqttClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MQTT_BROKER_URL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;MQTT_SUBSCRIBER_ID&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;mqttSubscriber&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;connect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;standardMqttConnectOptions&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also create a listener object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;IMqttMessageListener&lt;/span&gt; &lt;span class="n"&gt;mqttDataListener&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MQTTAerospikeDataPersister&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asTimeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next line implements the &lt;code&gt;IMqttMessageListener&lt;/code&gt; interface consisting of a single call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;messageArrived&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;MqttMessage&lt;/span&gt; &lt;span class="n"&gt;mqttMessage&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will see that our implementation of &lt;code&gt;IMqttMessageListener&lt;/code&gt;,  &lt;code&gt;MQTTAerospikeDataPersister&lt;/code&gt; requires an Aerospike Time Series Client when constructed. Now, we subscribe to the topic using the listener object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;mqttSubscriber&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribe&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MQTT_TOPIC_NAME&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mqttDataListener&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Inside the messageArrived function
&lt;/h2&gt;

&lt;p&gt;Whenever a message is received the &lt;code&gt;messageArrived&lt;/code&gt; function of the listener is invoked. Following is our implementation code for that function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;mqttMessageAsString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mqttMessage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPayload&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;Constants&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MQTT_DEFAULT_CHARSET&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MQTTUtilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeSeriesNameFromMQTTMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mqttMessageAsString&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MQTTUtilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;dataPointFromMQTTMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mqttMessageAsString&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First we obtain the message as a string. In lines 2 and 3 we extract the time series name and data point (i.e. the timestamp and value). Finally we add the value to the Aerospike database using the &lt;code&gt;put&lt;/code&gt; call of the &lt;code&gt;timeSeriesClient&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running the demonstration
&lt;/h2&gt;

&lt;p&gt;Get the source code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/aerospike-examples/mqtt-aerospike-example.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example requires an Aerospike database accessible via the &lt;em&gt;localhost&lt;/em&gt; address, listening on port 3000. These values can be altered in the code using the &lt;code&gt;MQTTPersistenceDemo.AEROSPIKE_SEED_HOST&lt;/code&gt; and &lt;code&gt;MQTTPersistenceDemo.AEROSPIKE_SERVICE_PORT&lt;/code&gt; parameters. The easiest way to obtain Aerospike is to install Docker Desktop and run an Aerospike Community container e.g.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; aerospike aerospike/aerospike-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can run &lt;code&gt;MQTTPersistenceDemo.main()&lt;/code&gt; in your favorite IDE or build at the command line from the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mvn clean compile assembly:single
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the demonstration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;java &lt;span class="nt"&gt;-jar&lt;/span&gt; target/aerospike-mqtt-example-1.0-SNAPSHOT-jar-with-dependencies.jar 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your output should be similar to &lt;a href="https://github.com/aerospike-examples/mqtt-aerospike-example/tree/main/resources/sample-output.txt"&gt;this sample output&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying the data from Aerospike using the Community Time Series Client
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;MQTTPersistenceDemo.main&lt;/code&gt; validates the end-to-end pipeline by requesting the data for our time series - &lt;em&gt;Engine-001-RPM-Sensor&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;MQTTUtilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printTimeSeries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asTimeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="no"&gt;SENSOR_NAME&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The body of the above function is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Get the basic time series details&lt;/span&gt;
&lt;span class="nc"&gt;TimeSeriesInfo&lt;/span&gt; &lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TimeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTimeSeriesDetails&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// and output them&lt;/span&gt;
&lt;span class="n"&gt;outputMessageWithPara&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="c1"&gt;// use the time series client to get all the available points for our series with name timeSeriesName&lt;/span&gt;
&lt;span class="c1"&gt;// We use the timeSeriesInfo object to get the start and end date times for the series &lt;/span&gt;
&lt;span class="c1"&gt;// so we can request all the points available&lt;/span&gt;
&lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;dataPoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPoints&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Date&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStartDateTimestamp&lt;/span&gt;&lt;span class="o"&gt;()),&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEndDateTimestamp&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
&lt;span class="c1"&gt;// Header for the output&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Timestamp,Value"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// For each point print out t formatted version of the point&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dataPoints&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;outputMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s,%.6f"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataPointDateToString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typical output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieving data for time series Engine-001-RPM-Sensor from Aerospike database:

Name : Engine-001-RPM-Sensor Start Date : 2022-10-15 01:00:00 End Date 2022-10-15 09:52:44 Data point count : 10

Timestamp,Value
2022-10-15 01:00:00.000,10000.000000
2022-10-15 01:58:54.480,10197.212074
2022-10-15 02:57:50.040,10579.313417
2022-10-15 03:59:18.240,10025.330483
2022-10-15 04:56:36.600,10013.730374
2022-10-15 05:56:40.920,10188.447442
2022-10-15 06:58:32.880,10145.885126
2022-10-15 07:55:53.400,10350.374583
2022-10-15 08:54:05.400,10533.135383
2022-10-15 09:52:44.040,10326.813161
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you scroll back to the beginning of the article, you will see this is exactly the data initally emitted by our mock sensor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This example shows how the &lt;a href="https://aerospike.com/"&gt;Aerospike&lt;/a&gt; database can be easily and scalably used to store industrial time series data made available by the &lt;a href="https://mqtt.org/"&gt;MQTT&lt;/a&gt; ecosystem. Aerospike plus its Community &lt;a href="https://github.com/aerospike-community/aerospike-time-series-client"&gt;Time Series Client&lt;/a&gt; streamlines the storage and retrieval of the data, supporting the ability to both write and read millions of data points per second if required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Directions
&lt;/h2&gt;

&lt;p&gt;This demonstration could easily be scaled to show data being harvested from multiple sensors in parallel and saved to Aerospike. It would also be interesting to replace the simulation with an actual device - something &lt;a href="http://www.steves-internet-guide.com/using-arduino-pubsub-mqtt-client"&gt;Arduino based&lt;/a&gt; for example.&lt;/p&gt;

</description>
      <category>iot</category>
      <category>mqtt</category>
    </item>
    <item>
      <title>Aerospike Time Series API</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Wed, 20 Apr 2022 09:22:06 +0000</pubDate>
      <link>https://forem.com/aerospike/aerospike-time-series-api-2ii2</link>
      <guid>https://forem.com/aerospike/aerospike-time-series-api-2ii2</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pam8nla4sa8cdf9kxej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pam8nla4sa8cdf9kxej.png" alt="Image description" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Aerospike is a high performance distributed database, particularly well suited for real time transactional processing. It is aimed at institutions and use-cases that need high throughput ( 100k tps+), with low latency (95% completion in &amp;lt;1ms), while managing large amounts of data (Tb+) with 100% uptime, scalability and low cost.&lt;/p&gt;

&lt;p&gt;Conceptually, Aerospike is most readily categorised as a key value database. In reality however it has a number of bespoke features that make it capable of supporting a much wider set of use cases. A good example is our &lt;a href="https://aerospike.com/blog/aerospike-document-api/"&gt;document API&lt;/a&gt; which builds on our &lt;a href="https://docs.aerospike.com/guide/data-types/cdt"&gt;collection data types&lt;/a&gt; in order to provide &lt;a href="https://goessner.net/articles/JsonPath/"&gt;JsonPath&lt;/a&gt; support for documents.&lt;/p&gt;

&lt;p&gt;Another general use case we can consider is support for time series. The combination of &lt;a href="https://docs.aerospike.com/architecture/storage#ssdflash"&gt;buffered writes&lt;/a&gt; and efficient &lt;a href="https://docs.aerospike.com/guide/data-types/cdt-map"&gt;map operations&lt;/a&gt; allows us to optimise for both read and write of time series data. The &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client"&gt;Aerospike Time Series API&lt;/a&gt; leverages these features to provide a general purpose interface for efficient reading and writing of time series data at scale. Also included is a &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client#benchmarking"&gt;benchmarking&lt;/a&gt; tool allowing performance to be measured.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time Series Data
&lt;/h2&gt;

&lt;p&gt;Time series data can be thought of as a sequence of observations associated with a given property of a single subject. An observation is a quantity comprising two elements - a timestamp and a value. A property is a measurable attribute such as speed, temperature, pressure or price. We can see then that examples of time series might be the speed of a given vehicle; temperature readings at a fixed location; pressures recorded by an industrial sensor or the price of a stock on a given exchange. In each case the series consists of the evolution of these properties over time.&lt;/p&gt;

&lt;p&gt;A time series API in its most basic form needs to consist of&lt;/p&gt;

&lt;p&gt;1) A function allowing the writing of time series observations&lt;br&gt;
2) A function allowing the retrieval of time series observations&lt;/p&gt;

&lt;p&gt;Additional conveniences might include&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The ability to write data in bulk (batch writes)&lt;/li&gt;
&lt;li&gt;The ability to query the data e.g. calculate the average, maximum or minimum.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Aerospike Time Series API
&lt;/h2&gt;

&lt;p&gt;The Aerospike Time Series API provides the above via the TimeSeriesClient object. The API is as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Store a single data point for a named time series&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Store a batch of data points for a named time series&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;dataPoints&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Retrieve all data points observed between startDateTime and endDateTime for a named time series&lt;/span&gt;
&lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="nf"&gt;getPoints&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;startDateTime&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;endDateTime&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Retrieve the observation made at time dateTime for a named time series&lt;/span&gt;
&lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="nf"&gt;getPoint&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;dateTime&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Execute TimeSeriesClient.QueryOperation versus the observations recorded for a named time series&lt;/span&gt;
&lt;span class="c1"&gt;// recorded between startDateTime and endDateTime&lt;/span&gt;
&lt;span class="c1"&gt;// The operations may be any of COUNT, AVG, MAX, MIN or VOL (volatility)&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="nf"&gt;runQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TimeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;QueryOperation&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;fromDateTime&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A DataPoint is a simple object representing an observation and the time at which it was made, constructed as follows. The Java Date timestamp allows times to be specified to millisecond accuracy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;dateTime&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Simple Example
&lt;/h2&gt;

&lt;p&gt;The code example below shows us inserting a series of 24 temperature readings, taken in Trafalgar Square, London, on the 14th February 2022. We give the time series a meaningful and precise name by concatenating subject, property and units.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Let's store some temperature readings taken in Trafalgar Square, London. Readings are Centigrade.&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;timeSeriesName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"TrafalgarSquare-Temperature-Centigrade"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// The readings were taken on the 14th Feb, 2022&lt;/span&gt;
&lt;span class="nc"&gt;Date&lt;/span&gt; &lt;span class="n"&gt;observationDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SimpleDateFormat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"yyyy-MM-dd"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;parse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"2022-02-14"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ... and here they are&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;hourlyTemperatureObservations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="o"&gt;[]{&lt;/span&gt;&lt;span class="mf"&gt;2.7&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;2.3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.9&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.8&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.8&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.7&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.7&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.4&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.7&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.9&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;9.9&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;9.3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
               &lt;span class="mf"&gt;9.6&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;9.7&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.4&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.4&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.8&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.5&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.4&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.2&lt;/span&gt;&lt;span class="o"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// To store, create a time series client object. Requires AerospikeClient object and Aerospike namespace name&lt;/span&gt;
&lt;span class="c1"&gt;// new TimeSeriesClient(AerospikeClient asClient, String asNamespaceName)&lt;/span&gt;
&lt;span class="nc"&gt;TimeSeriesClient&lt;/span&gt; &lt;span class="n"&gt;timeSeriesClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TimeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asClient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;asNamespaceName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Insert our hourly temperature readings&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;hourlyTemperatureObservations&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++){&lt;/span&gt;
  &lt;span class="c1"&gt;// The datapoint consists of the base date + the required number of hours&lt;/span&gt;
  &lt;span class="nc"&gt;DataPoint&lt;/span&gt; &lt;span class="n"&gt;dataPoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Utilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementDateUsingSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observationDate&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;hourlyTemperatureObservations&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]);&lt;/span&gt;
  &lt;span class="c1"&gt;// Which we then 'put'&lt;/span&gt;
  &lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dataPoint&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a diagnostic, we can get some basic information about the time series&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;TimeSeriesInfo&lt;/span&gt; &lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TimeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTimeSeriesDetails&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which will give&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Name : TrafalgarSquare-Temperature-Centigrade Start Date : 2022-02-14 00:00:00.000 End Date 2022-02-14 23:00:00.000 Data point count : 24

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another diagnostic allows the time series to be printed to the command line&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printTimeSeries&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;gives&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Timestamp,Value
2022-02-14 00:00:00.000,2.70000
2022-02-14 01:00:00.000,2.30000
2022-02-14 02:00:00.000,1.90000
...
2022-02-14 22:00:00.000,4.30000
2022-02-14 23:00:00.000,4.20000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally we can run a basic query&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
  &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Maximum temperature is %.3f"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;runQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;TimeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;QueryOperation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MAX&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStartDateTime&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;&lt;span class="n"&gt;timeSeriesInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEndDateTime&lt;/span&gt;&lt;span class="o"&gt;())));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Maximum temperature is 9.900
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note we could alternatively have used the batch put operation, which 'puts' all the points in a single operation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Create an array of DataPoints&lt;/span&gt;
&lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;dataPoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;hourlyTemperatureObservations&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;];&lt;/span&gt;
&lt;span class="c1"&gt;// Add our observations to the array&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;hourlyTemperatureObservations&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// The datapoint consists of the base date + the required number of hours&lt;/span&gt;
  &lt;span class="n"&gt;dataPoints&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DataPoint&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Utilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementDateUsingSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observationDate&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;hourlyTemperatureObservations&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// Put the points in a single call&lt;/span&gt;
&lt;span class="n"&gt;timeSeriesClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeSeriesName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dataPoints&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;There are two key implementation concepts to grasp. Firstly, rather than store each data point as a separate object, they are inserted into Aerospike maps. This minimises network traffic at write time (we only 'send' the new point) and allows large numbers of points to be potentially read at read time as they are encapsulated in a single object. It also helps minimise memory usage as Aerospike has a fixed (64 byte) cost for each object. Schematically, each time series object looks something like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    timestamp001 : value001,
    timestamp002 : value002,
    ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The maps must not grow to an indefinite extent, so the API ensures that each map will not grow beyond a specified maximum size. By default this limit is 1000 points, although this can be altered (see &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client#additional-control"&gt;additional control&lt;/a&gt;). There is also a discussion in the README of the &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client#sizing"&gt;sizing&lt;/a&gt; and &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client#performance"&gt;performance&lt;/a&gt; considerations associated with this setting.&lt;/p&gt;

&lt;p&gt;The second implementation point follows on from the first. As there is a limit to the number of points that can be stored in a block, we need to have some mechanism for creating new blocks and keeping track of existing blocks for each time series. This is done, on a per time series basis, by maintaining an index of all blocks created. Conceptually this looks something like the following&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    TimeSeriesName : "MyTimeSeries",
  ListOfDataBlocks : {
        StartTimeForBlock1 : {EndTime: &amp;lt;lastTimeStampForBlock1&amp;gt;, EntryCount: &amp;lt;entriesInBlock1&amp;gt;},
        StartTimeForBlock1 : {EndTime: &amp;lt;lastTimeStampForBlock1&amp;gt;, EntryCount: &amp;lt;entriesInBlock1&amp;gt;},
    ...
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Benchmarking
&lt;/h2&gt;

&lt;p&gt;The Time Series API ships with a benchmarking tool. Three modes of operation are provided - real time insert, batch insert and query. For details of how to download and run see the &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client#benchmarking"&gt;benchmarking&lt;/a&gt; section of the README.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Time Benchmarking
&lt;/h3&gt;

&lt;p&gt;As a simple example, let's insert 10 seconds of data for a single time series, with observations being made once per second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./timeSeriesBenchmarker.sh &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;AEROSPIKE_HOST_IP&amp;gt;  &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;AEROSPIKE_NAMESPACE&amp;gt; &lt;span class="nt"&gt;-m&lt;/span&gt; realTimeWrite &lt;span class="nt"&gt;-p&lt;/span&gt; 1 &lt;span class="nt"&gt;-c&lt;/span&gt; 1 &lt;span class="nt"&gt;-d&lt;/span&gt; 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Aerospike Time Series Benchmarker running in real time insert mode

Updates per second : 1.000
Updates per second per time series : 1.000

Run time : 0 sec, Update count : 1, Current updates/sec : 1.029, Cumulative updates/sec : 1.027
Run time : 1 sec, Update count : 2, Current updates/sec : 1.000, Cumulative updates/sec : 1.013
Run time : 2 sec, Update count : 2, Current updates/sec : 0.000, Cumulative updates/sec : 0.672
...
Run time : 8 sec, Update count : 9, Current updates/sec : 1.000, Cumulative updates/sec : 1.003
Run time : 9 sec, Update count : 10, Current updates/sec : 1.000, Cumulative updates/sec : 1.003

Run Summary

Run time : 10 sec, Update count : 10, Cumulative updates/sec : 0.997

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can make use of another utility to see the output - ./timeSeriesReader.sh. This can be run for a named time series, or alternatively, will select a time series at random.&lt;/p&gt;

&lt;p&gt;Here is sample output for our simple example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./timeSeriesReader.sh -h &amp;lt;AEROSPIKE_HOST_IP&amp;gt;  -n &amp;lt;AEROSPIKE_NAMESPACE&amp;gt;

Running TimeSeriesReader

No time series specified - selecting series AFNJFKSKDV

Name : AFNJFKSKDV Start Date : 2022-02-22 12:17:13.294 End Date 2022-02-22 12:17:23.185 Data point count : 11

Timestamp,Value
2022-02-22 12:17:13.294,97.37854
2022-02-22 12:17:14.247,97.34929
2022-02-22 12:17:15.263,97.33103
...
2022-02-22 12:17:22.212,97.31197
2022-02-22 12:17:23.185,97.29315

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see that we have had sample points generated over a ten second period, with the series given a random name.&lt;/p&gt;

&lt;p&gt;The benchmarker can be run at greater scale using the -c (time series count) flag. You may also wish to make use of -z (multi-thread) flag in order to achieve required throughput. The benchmarker will warn you if required throughput is not being achieved.&lt;/p&gt;

&lt;p&gt;Another real time option is acceleration via the -a flag. This runs the simulation at an accelerated rate. So for instance if you wished to insert points every 30 seconds over a 1 hour period (120 points), you could shorten the time of the run by running using '-a 30'. This will 'speed up' the simulation by a factor of 30, so it will only take 120s. A higher number would also be possible. The benchmarker will indicate the actual update rates. For example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./timeSeriesBenchmarker.sh -h &amp;lt;AEROSPIKE_HOST&amp;gt;  -n &amp;lt;AEROSPIKE_NAMESPACE&amp;gt; -m realTimeWrite -c 5 -p 10 -a 10 -d 10
Aerospike Time Series Benchmarker running in real time insert mode

Updates per second : 5.000
Updates per second per time series : 1.000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Batch Insertion
&lt;/h3&gt;

&lt;p&gt;A disadvantage of the 'real time' benchmarker is precisely that - the loading occurs in real time. You may wish to build your sample time series as quickly as possible. The batch insert mode is provided for this purpose.&lt;/p&gt;

&lt;p&gt;In this mode, data points are loaded a block at a time -  effectively as fast as the benchmarker will run. The invocation below, for example, will create 1000 sample series (-c flag), over a period of 1 year (-r flag), with 30 seconds between each observation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./timeSeriesBenchmarker.sh -h &amp;lt;AEROSPIKE_HOST_IP&amp;gt;  -n &amp;lt;AEROSPIKE_NAMESPACE&amp;gt;  -m batchInsert -c 10 -p 30 -r 1Y 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./timeSeriesBenchmarker.sh -h $HOST  -n test  -m batchInsert -c 1000 -p 30 -r 1Y -z 100 

Aerospike Time Series Benchmarker running in batch insert mode

Inserting 1051200 records per series for 1000 series, over a period of 31536000 seconds

Run time : 0 sec, Data point insert count : 0, Effective updates/sec : 0.000. Pct complete 0.000%
Run time : 1 sec, Data point insert count : 1046000, Effective updates/sec : 870216.306. Pct complete 0.100%
Run time : 2 sec, Data point insert count : 2568000, Effective updates/sec : 1146363.231. Pct complete 0.244%
Run time : 3 sec, Data point insert count : 4196000, Effective updates/sec : 1308796.007. Pct complete 0.399%
Run time : 4 sec, Data point insert count : 5806000, Effective updates/sec : 1372576.832. Pct complete 0.552%
...
Run time : 577 sec, Data point insert count : 1051077000, Effective updates/sec : 1820986.414. Pct complete 99.988%
Run time : 578 sec, Data point insert count : 1051158000, Effective updates/sec : 1817977.108. Pct complete 99.996%

Run Summary

Run time : 578 sec, Data point insert count : 1051200000, Effective updates/sec : 1816538.588. Pct complete 100.000%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Query Benchmarking
&lt;/h3&gt;

&lt;p&gt;Having two different methods for generating data now puts us in the position where we can consider query benchmarking. This is the third and final aspect of the benchmarking toolkit.&lt;/p&gt;

&lt;p&gt;Query benchmarking can be invoked via the 'query' mode. We choose how long to run the benchmarker for (-d flag) and the number of threads to use (-z flag).&lt;/p&gt;

&lt;p&gt;At runtime, the benchmarker scans the database to determine all time series available. Each iteration of the benchmarker selects a series at random and calculates the average value of the series. The necessitates pulling all data points for the series to the client side and doing the necessary calculation so it is a good test of the query capability. We can ensure the queries are consistent in terms of data point value by using the batch insert aspect of the benchmarker which ensures all series have the same number of data points.&lt;/p&gt;

&lt;p&gt;Sample invocation and output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./timeSeriesBenchmarker.sh -h $HOST -n test -m query -z 1 -d 120 

Aerospike Time Series Benchmarker running in query mode

Time series count : 1000, Average data point count per query 1051200

Run time : 0 sec, Query count : 0, Current queries/sec 0.000, Current latency 0.000s, Avg latency 0.000s, Cumulative queries/sec 0.000
Run time : 1 sec, Query count : 1, Current queries/sec 1.003, Current latency 0.604s, Avg latency 0.604s, Cumulative queries/sec 0.999
Run time : 2 sec, Query count : 3, Current queries/sec 2.002, Current latency 0.585s, Avg latency 0.591s, Cumulative queries/sec 1.499
Run time : 3 sec, Query count : 5, Current queries/sec 2.000, Current latency 0.515s, Avg latency 0.561s, Cumulative queries/sec 1.666
Run time : 4 sec, Query count : 7, Current queries/sec 2.000, Current latency 0.583s, Avg latency 0.567s, Cumulative queries/sec 1.750
...
Run time : 120 sec, Query count : 241, Current queries/sec 2.000, Current latency 0.471s, Avg latency 0.496s, Cumulative queries/sec 2.008

Run Summary

Run time : 120 sec, Query count : 242, Cumulative queries/sec 2.016, Avg latency 0.496s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Simulation
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client"&gt;Aerospike Time Series API&lt;/a&gt; contains a realistic simulator, which is made use of by the Benchmarker.&lt;/p&gt;

&lt;p&gt;Many time series over a short period at least, follow a &lt;a href="https://en.wikipedia.org/wiki/Brownian_motion"&gt;Brownian Motion&lt;/a&gt;. The &lt;em&gt;TimeSeriesSimulator&lt;/em&gt; allows this to be simulated. The idea is that if we look at the &lt;em&gt;relative change&lt;/em&gt; in our observed value, then the &lt;em&gt;expected&lt;/em&gt; mean change should be proportional to the time between observations and the &lt;em&gt;expected variance&lt;/em&gt; should similarly be proportional to the period in question. Formally, let X(τ) be the observation of the subject property X at time τ. After a time t let the value of X be X(τ+t). The simulation distributes the value of (X(τ +t) - X(τ)) / X(τ) i.e. the relative change in X like a normal distribution with mean μt and variance σ&lt;sup&gt;2&lt;/sup&gt;t.&lt;/p&gt;

&lt;center&gt;(X(t + τ) - X(t)) / X(t) ~ N(μt,σ&lt;sup&gt;2&lt;/sup&gt;t.)&lt;/center&gt;  
  

&lt;p&gt;More detail is available at &lt;a href="https://github.com/aerospike-examples/aerospike-time-series-client#simulation"&gt;simulation&lt;/a&gt; but it is useful to see that the net effect of the above is to produce sample series such as the one shown below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmsr5agwbnlv3a7qmd5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmsr5agwbnlv3a7qmd5r.png" alt="Image description" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see it looks very much like the sort of graph we might see for a stock price.&lt;/p&gt;

&lt;p&gt;More complex time series e.g. those seen for temperatures might be simulated by concatenating several series together, with different drifts and volatilities, allowing values to trend both up and down. Mean reverting series can be simulated by setting the drift to zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Life Performance
&lt;/h2&gt;

&lt;p&gt;As a test, performance was examined on an Aerospike cluster deployed on 3  i3en.2xlarge AWS instances. This instance type was selected as the  &lt;a href="https://docs.aerospike.com/operations/plan/ssd/ssd_certification"&gt;ACT&lt;/a&gt; rating of the drives is 300k, making the arithmetic simple.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writes
&lt;/h3&gt;

&lt;p&gt;In simple terms, this cluster can then support 100k (see Performance Considerations) * 1.5kbyte * 3 (number of instances) = 450mb of throughput.&lt;/p&gt;

&lt;p&gt;We know our average write is ~8kb. We assume replication factor two for resilience purposes. Sustainable updates per second is then 450mb / 2 (replication factor) / 8kb  = 28,000.&lt;/p&gt;

&lt;p&gt;In practice a &lt;em&gt;50k&lt;/em&gt; update rate was easily sustained using the real time benchmarker. The reason the value is higher is that larger writes do not necessarily have a larger penalty than small writes. Also, the ACT rating guarantees operations are sub 1ms in latency 95% of the time, a guarantee not necessarily needed for time series inserts. &lt;/p&gt;

&lt;p&gt;The cost of such a cluster would be $23k per year using on-demand pricing ($0.90 / hour / instance) or $16k per year ($0.61 / hour/ instance) if using a reserved pricing plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reads
&lt;/h3&gt;

&lt;p&gt;Queries retrieving &lt;strong&gt;1 million points per query&lt;/strong&gt; (1 year of observations every 30 seconds) were able to run at the rate of two per second, with &lt;strong&gt;end to end latency of ~0.5 seconds&lt;/strong&gt; for a sustained period using the benchmarking tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Directions
&lt;/h2&gt;

&lt;p&gt;At the time of writing, this is an initial release of this API. Further developments should be expected. Possible further iterations may include&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data compression following the &lt;a href="https://www.vldb.org/pvldb/vol8/p1816-teller.pdf"&gt;Gorilla&lt;/a&gt; approach which potentially allows data footprint to be reduced by 90%&lt;/li&gt;
&lt;li&gt;Labelling of data to support the easy retrieval of multiple properties for subjects. For example, several sensors may be attached to an industrial machine - it may be convenient to retrieve all this series simultaneously for analysis purposes.&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop"&gt;REPL&lt;/a&gt; (read/eval/print/loop) capability to support interrogative analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Download
&lt;/h2&gt;

&lt;p&gt;The Time Series Client is available at Maven Central - &lt;a href="https://search.maven.org/artifact/io.github.aerospike-examples/aero-time-series-client"&gt;aero-time-series-client&lt;/a&gt;. You can download directly or by adding the below to your pom.xml file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.github.aerospike-examples&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;aero-time-series-client&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;LATEST&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;Images courtesy of Unsplash - left to right&lt;br&gt;
&lt;a href="https://unsplash.com/@joshmillerdp"&gt;https://unsplash.com/@joshmillerdp&lt;/a&gt;&lt;br&gt;
&lt;a href="https://unsplash.com/@markusspiske"&gt;https://unsplash.com/@markusspiske&lt;/a&gt;&lt;br&gt;
&lt;a href="https://unsplash.com/@publicpowerorg"&gt;https://unsplash.com/@publicpowerorg&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Aerospike Document API</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Thu, 17 Jun 2021 09:45:01 +0000</pubDate>
      <link>https://forem.com/aerospike/aerospike-document-api-3e5</link>
      <guid>https://forem.com/aerospike/aerospike-document-api-3e5</guid>
      <description>&lt;p&gt;&lt;a href="https://www.aerospike.com"&gt;Aerospike&lt;/a&gt; is a high performance, distributed, scalable, key value database. Aerospike leverages SSD technology to achieve levels of throughput and low latency exceeding even those obtained with in memory products. This allows hardware costs to be reduced 10x or more and data density to be increased 10x or more versus any other high performance solution.&lt;/p&gt;

&lt;p&gt;Aerospike has a significant number of distinguishing characteristics versus competitor products. Here we focus on the &lt;a href="https://www.aerospike.com/docs/guide/cdt.html"&gt;Collection Data Type (CDT) API&lt;/a&gt;. The CDT API facilitates list and map oriented operations within objects thereby reducing network overhead and client side computation. It is worth noting that it is highly efficient, adding little overhead to read or write calls and composable, allowing construction of complex overall operations.&lt;/p&gt;

&lt;p&gt;The CDT API contains within it all the primitives required to build a sophisticated document API along the lines suggested by Stefan Groessner in his well known &lt;a href="https://goessner.net/articles/JsonPath/"&gt;JsonPath&lt;/a&gt; proposal which is in turn modelled on the &lt;a href="https://en.wikipedia.org/wiki/XPath"&gt;XPath&lt;/a&gt; standard for XML. For those not familiar with it, XPath supports CRUD operations via filesystem navigation like syntax. JsonPath is essentially a port of this idea to JSON.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/aerospike/aerospike-document-lib"&gt;Aerospike Document API&lt;/a&gt; provides such an API, using the CDT API. Below we demonstrate the way in which CRUD operations can be executed at arbitrary points within a JSON document using the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Document Creation
&lt;/h2&gt;

&lt;p&gt;First we need a JSON document to work with. Consider the below, expressing some subjective, for space reasons, highlights of &lt;a href="https://en.wikipedia.org/wiki/Tommy_Lee_Jones"&gt;Tommy Lee Jones&lt;/a&gt; career.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"forenames"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Tommy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Lee"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"surname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jones"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"date_of_birth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"month"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1946&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected_filmography"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2012"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"Lincoln"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Men In Black 3"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2007"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"No Country For Old Men"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2002"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"Men in Black 2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"1997"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"Men in Black"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Volcano"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"1994"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"Natural Born Killers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Cobb"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"1991"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"JFK"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"1980"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"Coal Miner's Daughter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Barn Burning"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"imdb_rank"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"https://www.imdb.com/list/ls050274118/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rank"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;51&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"best_films_ranked"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://www.rottentomatoes.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"films"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"The Fugitive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"No Country For Old Men"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Men In Black"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Coal Miner's Daughter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Lincoln"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"https://medium.com/the-greatest-films-according-to-me/10-greatest-films-of-tommy-lee-jones-97426103e3d6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"films"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"The Three Burials of Melquiades Estrada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"The Homesman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"No Country for Old Men"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"In the Valley of Elah"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Coal Miner's Daughter"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the document API we add to our database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Create a document client via an existing aerospikeClient&lt;/span&gt;
&lt;span class="nc"&gt;AerospikeDocumentClient&lt;/span&gt; &lt;span class="n"&gt;documentClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AerospikeDocumentClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Convert JSON string to a JsonNode&lt;/span&gt;
&lt;span class="nc"&gt;JsonNode&lt;/span&gt; &lt;span class="n"&gt;jsonNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JsonConverters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;convertStringToJsonNode&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesJson&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Construct an appropriate key&lt;/span&gt;
&lt;span class="nc"&gt;Key&lt;/span&gt; &lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;AEROSPIKE_NAMESPACE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;AEROSPIKE_SET&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"src/test/resources/tommy-lee-jones.json"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Add to database using a named bin&lt;/span&gt;
&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonNode&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reads
&lt;/h2&gt;

&lt;p&gt;We can find out the name of Jones' best film according to 'Rotten Tomatoes' using the path &lt;code&gt;$.best_films_ranked[0].films[0]&lt;/code&gt;. Hopefully the use of keys and list indexes is immediately intuitive when considered in the context of the above document.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"$.best_films_ranked[0].films[0]"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will return&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"The Fugitive"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are not limited to retrieving primitives. An expression such as&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"$.selected_filmography.1980"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will retrieve a list&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Coal Miner's Daughter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"Barn Burning"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Updates
&lt;/h2&gt;

&lt;p&gt;We can add to the document. The snippet below will add a 2019 element to the filmography.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_2019Films&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;();&lt;/span&gt;
&lt;span class="n"&gt;_2019Films&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Ad Astra"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"$.selected_filmography.2019"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="n"&gt;_2019Films&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can update document nodes directly - this syntax updates Jones' &lt;a href="https://www.imdb.com/"&gt;IMDB&lt;/a&gt; ranking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"$.imdb_rank.rank"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Append operations may also be used. For example, we can append to 'Rotten Tomatoes' list of best films using the reference &lt;code&gt;$.best_films_ranked[0].films&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"$.best_films_ranked[0].films"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"Rolling Thunder"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"$.best_films_ranked[0].films"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"The Three Burials"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deletion
&lt;/h2&gt;

&lt;p&gt;We can similarly use JsonPath like expressions to support node deletion. The line below will remove the Medium rankings from the document&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;documentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tommyLeeJonesDBKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documentBinName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"$.best_films_ranked[1]"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Availability
&lt;/h2&gt;

&lt;p&gt;The library has been published to Maven Central as &lt;em&gt;aerospike-document-api&lt;/em&gt;. Add&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
   &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.aerospike&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
   &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;aerospike-document-api&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
   &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to your &lt;em&gt;pom.xml&lt;/em&gt; to use.&lt;/p&gt;

&lt;p&gt;For full details of the API see &lt;a href="https://github.com/aerospike/aerospike-document-lib"&gt;aerospike-document-lib&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Tie Breaker Functionality for Aerospike Multi-Site Clustering</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Fri, 06 Nov 2020 09:58:19 +0000</pubDate>
      <link>https://forem.com/aerospike/tie-breaker-functionality-for-aerospike-multi-site-clustering-36kd</link>
      <guid>https://forem.com/aerospike/tie-breaker-functionality-for-aerospike-multi-site-clustering-36kd</guid>
      <description>&lt;p&gt;Aerospike's &lt;a href="https://www.aerospike.com/lp/exploring-data-consistency-aerospike-enterprise-edition/" rel="noopener noreferrer"&gt;Strong Consistency (SC)&lt;/a&gt; and &lt;a href="https://www.aerospike.com/docs/architecture/rack-aware.html" rel="noopener noreferrer"&gt;rack awareness&lt;/a&gt; features allow construction of &lt;a href="http://pages.aerospike.com/rs/229-XUE-318/images/Aerospike_Solution_Brief_Multi-site_Clustering.pdf" rel="noopener noreferrer"&gt;multi-site database clusters&lt;/a&gt; in which each site has a full copy of the data and all sites are perfectly synchronized.&lt;/p&gt;

&lt;p&gt;This allows database operation to continue in the event of network partition between sites or site loss without loss of data.&lt;/p&gt;

&lt;p&gt;The gold standard configuration is to use three data centres, as represented in the diagram below. In this diagram, each data centre corresponds to a physical rack and a replication factor of three has been chosen. As we have three racks, rack awareness will distribute the records so that there is exactly one copy of each record on each rack. Clients can be configured to read either from the master records, which are distributed uniformly across the three racks or from the local version, be it master or replica, using &lt;a href="https://www.aerospike.com/blog/aerospike-4-5-2-relaxed-consistency-for-increased-availability/" rel="noopener noreferrer"&gt;relaxed reads&lt;/a&gt;. In the diagram the relaxed read approach is shown, which optimises read latency. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgdvqdryqayj4t7g8zcs7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgdvqdryqayj4t7g8zcs7.png" alt="3-dc-sc"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the event of one of the three data centres becoming unavailable, there will be no data loss as each of the three data centres contains a full copy of the data. Aerospike's resilience features will ensure that, in the event of failure, client applications automatically move to reading and writing from available data centres. &lt;/p&gt;

&lt;p&gt;For those interested in the performance of such a configuration, my colleague &lt;a href="https://medium.com/@jagjeet.singh" rel="noopener noreferrer"&gt;Jagjeet Singh&lt;/a&gt; has written an excellent &lt;a href="https://medium.com/aerospike-developer-blog/multi-site-clustering-and-rack-awareness-c987e3abe0a6?source=friends_link&amp;amp;sk=9fb729a6a1e21c5a596eeb4efde9b233" rel="noopener noreferrer"&gt;blog&lt;/a&gt; detailing results for a cluster that spans the east and west coasts of the USA.&lt;/p&gt;

&lt;p&gt;An alternate arrangement is to run across two data centres as shown below. In this diagram, each data centre again corresponds to a physical rack and a replication factor of two has been chosen. As we have two racks, rack awareness will distribute the records so that there is exactly one copy of each record on each rack. For reasons explained below, we have an odd/even split across the two DCS. Only the red nodes are initially active. The green node is on standby and is not part of the cluster. Again, the clients are shown operating in 'relaxed read' mode where they have a 'preferred' rack.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkv305ck5zc1kvocg5p2e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkv305ck5zc1kvocg5p2e.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once more, in the event of one of the two data centres becoming unavailable, there will be no data loss as each of the two data centres contains a full copy of the data. &lt;/p&gt;

&lt;p&gt;Should the minority data centre (DC2) fail, client applications will automatically move to reading and writing from the available data centre (DC1).&lt;/p&gt;

&lt;p&gt;Should the majority data centre (DC1) fail, the minority cluster (DC2) will block writes until a &lt;a href="https://www.aerospike.com/docs/operations/configure/consistency/index.html#create-the-roster" rel="noopener noreferrer"&gt;'roster'&lt;/a&gt; command is issued, indicating that the minority cluster (DC2) should take over as the definitive master. The standby (green) node is also added to the cluster at this point for capacity purposes.&lt;/p&gt;

&lt;p&gt;The odd/even arrangement is necessary as were the two data centres to contain equal number of nodes, our &lt;a href="https://www.aerospike.com/docs/architecture/consistency.html#roster-of-nodes" rel="noopener noreferrer"&gt;strong consistency rules&lt;/a&gt; would have the effect of 50% of partitions being active in each data centre which is unlikely to be the desired outcome.&lt;/p&gt;

&lt;p&gt;Two trade-offs are being made here in order to guarantee consistency. The first is the need for potential operator intervention, and the second is the uneven balance of the two sides. Automation can be used to deal with the first, and a 'spare' node might well be considered a reasonable price to pay for consistency. &lt;/p&gt;

&lt;p&gt;An alternative is available however, via a small change made in a recent server release - &lt;a href="https://www.aerospike.com/blog/aerospike5-2/" rel="noopener noreferrer"&gt;5.2&lt;/a&gt;. It allows us to add a &lt;a href="https://www.aerospike.com/docs/reference/configuration/#stay-quiesced" rel="noopener noreferrer"&gt;permanently quiesced&lt;/a&gt; node to the cluster. A quiesced node is one that does not master data, but may still participate in partition availability decisions. We can use such a node as a 'tie-break node', as shown in the arrangement below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fl97jsdj6ouv85q9ftaco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fl97jsdj6ouv85q9ftaco.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the event of one of the DC1 or DC2 becoming unavailable, there will be no data loss as each of the two data centres contains a full copy of the data. If this event were a network partition, making DC1 unreachable from DC2 and DC3, the cluster will automatically reconfigure to serve all writes and reads from DC2 thanks to the extra vote from the tie break node, serving to make DC2+DC3 a majority cluster. Similar behaviour occurs if DC2 is unreachable from DC1 and DC3. Finally the unavailability of DC3 would have no consequence as DC1+DC2 forms a majority.&lt;/p&gt;

&lt;p&gt;We have eliminated the need for operator intervention in the majority failure case in the above scenario as well as avoiding the need for a fully specified 'spare' node (needed previously to accommodate necessary migrations to ensure full replication of data). This is because our 'tie break node' has no capacity requirements associated with it - it is there solely for decision making purposes. &lt;/p&gt;

&lt;p&gt;The trade-off is the need for a third data centre. It can be seen however that this still offers an advantage over the 'gold standard' in that we reduce our inventory by 1/3. An additional iteration might be to double the number of tie-breaker nodes in DC3. Although not strictly necessary this might assuage any concerns around single point of failure.&lt;/p&gt;

&lt;p&gt;Let's see how this works in practice. In the diagram below, I distribute my cluster across 3 AWS availability zones. The data nodes are in us-east-1a/b, with the tie breaker in us-east-1c.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpntolw7bq93brliv3jba.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpntolw7bq93brliv3jba.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Aerospike admin tool can be used to show cluster state. The IP addesses of each node are highlighted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Frjevmtbgb2hj263clzxa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Frjevmtbgb2hj263clzxa.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next we add some data - 1m objects in this case and again, inspect state. We can see below (red highlight) that we have 1m master records, 1m replica records, and the green highlight shows us how our servers have been separated into three racks - corresponding to the availability zones. The tie-break node, 10.0.0.68, in us-east-1c, is in a rack of its own and is not managing any data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fir9m9c01sv7l2ozkhnot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fir9m9c01sv7l2ozkhnot.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can simulate a network partition by completely firewalling off us-east-1a. Let's say we do this while we have active reads and writes taking place.The screenshot below shows this happening at approx 13:51:25.  We can see that we get no errors on the read side (because replicas will be tried in the event of timeouts/fails) and some write timeouts (these are the in-flight writes at the time of the network partition). We also see (last 5 lines) the client removing the unavailable nodes from its internal node list and normal operation being resumed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmvj3agi2p59kqwcabn4g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmvj3agi2p59kqwcabn4g.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the server side, we can see below that we lose all the nodes in rack 100001, corresponding to those with the 10.0.1.* addresses. The number of master records stays as expected (green highlight), while we need to create prole or replica records (blue highlight) to allow for the fact that immediately after the network partition we only have one copy of each record. This is seen in the migration statistics (purple highlight).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fx1oolhr69a69fxz6bdiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fx1oolhr69a69fxz6bdiu.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the migrations are complete (purple highlight), we have a full complement of master and prole (replica) objects (green highlight).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fybul9lbpncjthyb0qrxi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fybul9lbpncjthyb0qrxi.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We can use the new tie-breaker capability to build fully resilient distributed Aerospike clusters, while minimising hardware usage.&lt;/p&gt;




&lt;p&gt;Title image : &lt;a href="https://unsplash.com/@arnok" rel="noopener noreferrer"&gt;https://unsplash.com/@arnok&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Automated Aerospike All Flash Setup</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Wed, 28 Oct 2020 14:15:42 +0000</pubDate>
      <link>https://forem.com/aerospike/automated-aerospike-all-flash-setup-3ho6</link>
      <guid>https://forem.com/aerospike/automated-aerospike-all-flash-setup-3ho6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Aerospike is a key value database maximising SSD/Flash technology in order to offer best in class throughput and latency at petabyte scale.&lt;/p&gt;

&lt;p&gt;Standard Aerospike usage will have the primary key index in DRAM and the data on SSD. Although Aerospike's usage of DRAM is very low at 64 bytes per object, for very large numbers of objects (100bn+) users might wish to consider the all-flash mode in which the primary key index is also placed on disk. More detail at &lt;a href="https://www.aerospike.com/docs/operations/configure/namespace/index/index.html#flash-index"&gt;all flash usage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are a number of non-trivial steps to go through to set up all flash. For that reason I've extended &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; to allow automation of this process. This article walks through the automated process. It's envisaged that this will be useful for those evaluating the feature, or looking to get up and running with it quickly.&lt;/p&gt;

&lt;p&gt;A working knowledge of &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; is assumed. This &lt;a href="https://dev.to/aerospike/ansible-for-aerospike-43ln"&gt;introductory article&lt;/a&gt; may also be useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  All Flash Calculations
&lt;/h2&gt;

&lt;p&gt;In order to correctly configure a system for all flash, you need to know the number of &lt;a href="https://www.aerospike.com/docs/reference/configuration/#partition-tree-sprigs"&gt;partition-tree-sprigs&lt;/a&gt; that are appropriate for the object count you will have in your database. You can think of a partition tree sprig as a mini primary key index - we use these in order to have a lower depth primary key tree, allowing us to lookup record location more rapidly. More detail at &lt;a href="https://discuss.aerospike.com/t/faq-what-are-sprigs/4936"&gt;sprigs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's important for all-flash because we size the system so the sprigs fit inside single disk blocks, minimising read and write overhead.&lt;/p&gt;

&lt;p&gt;You can find details of the calculation &lt;a href="https://www.aerospike.com/docs/operations/plan/capacity/index.html#all-flash-index-device-space"&gt;here&lt;/a&gt;, but to make life easier a spreadsheet can be found in &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; at  &lt;code&gt;assets/all-flash-calculator.xlsx&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Frnec3phngppeza78e67p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Frnec3phngppeza78e67p.png" alt="all-flash-calculator.xlsx" width="782" height="950"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Populate the yellow cells - # of objects, replication factor and object size.&lt;/p&gt;

&lt;p&gt;The spreadsheet will calculate required partition-tree-sprigs.&lt;/p&gt;

&lt;p&gt;It will also determine the fraction of available disk space that should be given over to the primary key index, based on the object size. In the screenshot, we can see that for 100m records, replication factor 2, average record size 1024 bytes, the overhead per record is 172 bytes and the overall record footprint is 2220 bytes, so approx 1/13 of the disk space should be allocated to the index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Aerospike-Ansible
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;vars/cluster-config.yml&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set &lt;code&gt;partitions_per_device&lt;/code&gt; to the value given in the spreadsheet - 13 in the example. The first partition on each device is used for the all flash index to ensure the correct index:data disk space ratio.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;partition_free_sprigs: YOUR_VALUE&lt;/code&gt; - YOUR_VALUE would be 1024 for this example&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will also need to &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set &lt;code&gt;all_flash: true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;enterprise: true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Provide a path to a valid Aerospike feature key using &lt;code&gt;feature_key: /your/path/to/key&lt;/code&gt;. You must therefore be either a licensed Aerospike customer, or running an Aerospike trial.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having done that&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ansible-playbook aws-setup-plus-aerospike-install.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;You should check that the aggregate disk space across your cluster exceeds the amount recommended in the spreadsheet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification
&lt;/h2&gt;

&lt;p&gt;Once the setup process is complete, log into one of your cluster nodes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/cluster-quick-ssh.sh 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then access &lt;code&gt;asadm&lt;/code&gt; (admin tool) followed by &lt;code&gt;info&lt;/code&gt; command&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgj56tha4n6sh9dnksl7x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgj56tha4n6sh9dnksl7x.png" alt="asadm" width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The index type comes up as 'flash' as per the highlight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Load
&lt;/h2&gt;

&lt;p&gt;You can follow the instructions in &lt;a href="https://github.com/aerospike-examples/aerospike-ansible#using-the-benchmarking-client"&gt;benchmarking&lt;/a&gt; to quickly load some data into the new configuration.&lt;/p&gt;

&lt;p&gt;As before, we can use asadm to examine the (highlighted) disk footprint of the primary key index for (in this case) 10m records (20m includes replicas). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fe885gviz51aq1goug5ov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fe885gviz51aq1goug5ov.png" alt="asadm-2" width="800" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; tooling makes it easy to set up all flash for Aerospike and benefit from the DRAM saving it offers.&lt;/p&gt;




&lt;p&gt;Cover image &lt;a href="https://unsplash.com/@kreyatif"&gt;Michał Mancewicz&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aerospike</category>
      <category>nosql</category>
    </item>
    <item>
      <title>Using Aerospike Connect For Spark</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Wed, 21 Oct 2020 15:08:21 +0000</pubDate>
      <link>https://forem.com/aerospike/using-aerospike-connect-for-spark-3poi</link>
      <guid>https://forem.com/aerospike/using-aerospike-connect-for-spark-3poi</guid>
      <description>&lt;p&gt;&lt;a href="https://www.aerospike.com"&gt;Aerospike&lt;/a&gt; is a highly scalable key value database offering best in class performance. It is typically deployed into real-time environments managing terabyte to petabyte data volumes. &lt;/p&gt;

&lt;p&gt;Aerospike will typically be run alongside other scalable distributed software such as Kafka, for system coupling or Spark for analytics. The &lt;a href="https://www.aerospike.com/docs/connect/"&gt;Aerospike Connect&lt;/a&gt; product line makes integration as easy as possible.&lt;/p&gt;

&lt;p&gt;This article looks at how Aerospike Spark Connect works in practice by offering a comprehensive and easily reproduced end to end example using &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Database Cluster Setup
&lt;/h2&gt;

&lt;p&gt;First take a look at &lt;a href="https://dev.to/aerospike/ansible-for-aerospike-43ln"&gt;Ansible for Aerospike&lt;/a&gt; which explains how to use &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this example I set &lt;code&gt;cluster_instance_type&lt;/code&gt; to c5d.18xlarge in &lt;code&gt;vars/cluster-config.yml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Follow the instructions up to and including &lt;a href="https://github.com/aerospike-examples/aerospike-ansible#one-touch-setup"&gt;one touch setup&lt;/a&gt;. You'll get as far as&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-playbook aws-setup-plus-aerospike-install.yml
ansible-playbook aerospike-java-client-setup.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which will give you a 3 node cluster by default, plus a client instance with relevant software installed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spark Cluster Setup
&lt;/h2&gt;

&lt;p&gt;This is done via&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ansible-playbook spark-cluster-setup.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For this example, prior to running, I set &lt;code&gt;spark_instance_type&lt;/code&gt; to c5d.4xlarge in &lt;code&gt;vars/cluster-config.yml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This playbook creates a 3 node Spark cluster, of the given instance type, with Spark installed and running. It also installs Aerospike Spark Connect.&lt;/p&gt;

&lt;p&gt;Note you will need to set &lt;code&gt;enterprise: true&lt;/code&gt; and provide a path to a valid Aerospike feature key using &lt;code&gt;feature_key: /your/path/to/key&lt;/code&gt; in &lt;code&gt;vars/cluster-config.yml&lt;/code&gt;. You must therefore be either a licensed Aerospike customer, or running an Aerospike trial.&lt;/p&gt;

&lt;p&gt;Near the end of the process you will see&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TASK [Spark master IP &amp;amp; master internal url] ************************************************************************************************************************************************************************
ok: [localhost] =&amp;gt; {
    "msg": "Spark master is 3.88.237.103. Spark master internal url is spark://10.0.2.122:7077."
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make a note of the Spark master internal url - it is needed later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Load Data
&lt;/h2&gt;

&lt;p&gt;Our example makes use of 20m records from the &lt;a href="https://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/"&gt;1bn NYC Taxi ride&lt;/a&gt; corpus, available in compressed form at &lt;a href="https://aerospike-ken-tune.s3.amazonaws.com/nyc-taxi-data/trips_xaa.csv.gz"&gt;https://aerospike-ken-tune.s3.amazonaws.com/nyc-taxi-data/trips_xaa.csv.gz&lt;/a&gt;. We load to Aerospike using &lt;a href="https://github.com/aerospike/aerospike-loader"&gt;aerospike loader&lt;/a&gt;, which is installed on the client machine set up above. First of all we get the addresses of the hosts in the Aerospike cluster - these are needed later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; ./scripts/cluster-ip-address-list.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Adds cluster ips to this array- AERO_CLUSTER_IPS
Use as ${ AERO_CLUSTER_IPS[index]}
There are 3 entries

##########################################################

cluster IP addresses : Public : 3.87.14.39, Private : 10.0.2.58
cluster IP addresses : Public : 3.89.113.231, Private : 10.0.0.234
cluster IP addresses : Public : 23.20.193.64, Private : 10.0.1.95
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/aerospike/aerospike-loader"&gt;aerospike loader&lt;/a&gt; requires a config file to load the data into Aerospike. This maps csv column postions to named and typed bins. A sample entry looks like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pkup_datetime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"column_position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"encoding"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"yyyy-MM-dd hh:mm:ss"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"dst_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is provided in the repo at &lt;code&gt;recipes/aerospike-spark-demo/nyc-taxi-data-aero-loader-config.json&lt;/code&gt;. We upload this to the client instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;source ./scripts/client-ip-address-list.sh 
scp -i .aws.pem ./recipes/aerospike-spark-demo/nyc-taxi-data-aero-loader-config.json ec2-user@${AERO_CLIENT_IPS[0]}:~
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next get the data onto the client machine. There's more than one way to do this, but you need to plan as the dataset is 7.6Gb when uncompressed. I used the below, but specifics will depend on the specifics of your drives and filesystem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/client-quick-ssh.sh # to log in, followed by

sudo mkfs.ext4 /dev/nvme1n1
sudo mkdir /data
sudo mount -t ext4 /dev/nvme1n1 /data
sudo chmod 777 /data

wget -P /data https://aerospike-ken-tune.s3.amazonaws.com/nyc-taxi-data/trips_xaa.csv.gz
gunzip /data/trips_xaa.csv.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally we load our data in, using the config file we uploaded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd ~/aerospike-loader
./run_loader -h 10.0.0.234 -p 3000 -n test -c ~/nyc-taxi-data-aero-loader-config.json /data/trips_xaa.csv 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note we're using one of the cluster ip addresses we recorded earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Spark
&lt;/h2&gt;

&lt;p&gt;Log into one of the Spark nodes. Via &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; there is a utility script for this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/spark-quick-ssh.sh 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start up a Spark shell, using the Spark master URL we saw when running the Spark cluster setup playbook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/spark/bin/spark-shell --master spark://10.0.2.122:7077
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Import relevant libraries&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.apache.spark.sql.&lt;/span&gt;&lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="nc"&gt;SQLContext&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SparkSession&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SaveMode&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;org.apache.spark.SparkConf&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.util.Date&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.text.SimpleDateFormat&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supply Aerospike configuration - note we supply the cluster ip used previously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="nv"&gt;spark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;conf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aerospike.seedhost"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"10.0.0.234"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;spark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;conf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aerospike.namespace"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"test"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Define a view, and a function we will be using&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;sqlContext&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;spark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;sqlContext&lt;/span&gt;
&lt;span class="nv"&gt;sqlContext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;udf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;register&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"getYearFromSeconds"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SimpleDateFormat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"yyyy"&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="py"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;taxi&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;sqlContext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;read&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.aerospike.spark.sql"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;option&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aerospike.set"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"nyc-taxi-data"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;load&lt;/span&gt;
&lt;span class="nv"&gt;taxi&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;createOrReplaceTempView&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"taxi"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally we run our queries&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Journeys grouped by cab type&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;sqlContext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT cab_type,count(*) count FROM taxi GROUP BY cab_type"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;show&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;

&lt;span class="o"&gt;+--------+--------+&lt;/span&gt;                                                             
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;cab_type&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+--------+--------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="n"&gt;green&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;20000000&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+--------+--------+&lt;/span&gt;

&lt;span class="c1"&gt;// Average fare based on different passenger count&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;sqlContext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT passenger_cnt, round(avg(total_amount),2) avg_amount FROM taxi GROUP BY passenger_cnt ORDER BY passenger_cnt"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;show&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;

&lt;span class="o"&gt;+-------------+----------+&lt;/span&gt;                                                      
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;passenger_cnt&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;avg_amount&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+-------------+----------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;10.86&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;14.63&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;15.75&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;15.87&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;15.85&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;14.76&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;15.42&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;23.74&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;19.52&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;      &lt;span class="mf"&gt;34.9&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+-------------+----------+&lt;/span&gt;

&lt;span class="c1"&gt;// No of journeys for different numbers of passengers&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;sqlContext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT passenger_cnt,getYearFromSeconds(pkup_datetime) trip_year, count(*) count FROM taxi GROUP BY passenger_cnt, getYearFromSeconds(pkup_datetime) order by passenger_cnt"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nv"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;show&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;

&lt;span class="o"&gt;+-------------+---------+--------+&lt;/span&gt;                                              
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;passenger_cnt&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;trip_year&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;   &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+-------------+---------+--------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;4106&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;16557518&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1473578&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;507862&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;160714&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;939276&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;355846&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;492&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;494&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;114&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+-------------+---------+--------+&lt;/span&gt;

&lt;span class="c1"&gt;// Number of trips for each passenger count/distance combination&lt;/span&gt;
&lt;span class="c1"&gt;// Ordered by trip count, descending&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;sqlContext&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT passenger_cnt,getYearFromSeconds(pkup_datetime) trip_year,round(trip_distance) distance,count(*) trips FROM taxi GROUP BY passenger_cnt,getYearFromSeconds(pkup_datetime),round(trip_distance) ORDER BY trip_year,trips desc"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;show&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;

&lt;span class="o"&gt;+-------------+---------+--------+-------+&lt;/span&gt;                                      
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;passenger_cnt&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;trip_year&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="n"&gt;trips&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+-------------+---------+--------+-------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;5321230&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;3500458&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;2166462&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="mi"&gt;1418494&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;918460&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;868210&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;6.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;653646&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;7.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;488416&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;433746&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;8.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;345728&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;305578&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;302120&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;9.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;226278&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;199968&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;199522&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;163928&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;145580&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;137152&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;122714&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;            &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;2014&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mf"&gt;11.0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;117570&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+-------------+---------+--------+-------+&lt;/span&gt;
&lt;span class="n"&gt;only&lt;/span&gt; &lt;span class="n"&gt;showing&lt;/span&gt; &lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This shows you how quickly you can get up and running with a large data corpus. The example was done with 20m rows but this is easily extended to the full corpus. We can also see just how quickly you can get up and running with the &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; tooling.&lt;/p&gt;

</description>
      <category>aerospike</category>
      <category>spark</category>
    </item>
    <item>
      <title>Aerospike on EKS (AWS K8s)</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Mon, 27 Jul 2020 10:12:18 +0000</pubDate>
      <link>https://forem.com/aerospike/aerospike-on-eks-aws-k8s-m5b</link>
      <guid>https://forem.com/aerospike/aerospike-on-eks-aws-k8s-m5b</guid>
      <description>&lt;p&gt;Much like Java, Kubernetes offers the promise of 'write once, run anywhere'. The wry riposte (a little unfairly) in the early days of Java, was 'write once, debug everywhere'. To a certain extent this is the position we are in today with the various flavours of Kubernetes out there  -  getting up and running is different on Google Cloud Platform vs AWS vs Azure vs local (e.g. Minikube), and a handful of well placed pointers can be a great time saver.&lt;/p&gt;

&lt;p&gt;This article is about getting Aerospike up and running on Amazon's Kubernetes Service  -  &lt;a href="https://aws.amazon.com/eks/"&gt;EKS&lt;/a&gt;. As it happens, it has little EKS specific to say about Aerospike, it is instead about the things you need to do to get EKS up and running so you can start to run Aerospike on top of it. For users of Aerospike, we just want to make it as easy as possible, wherever it is you'd like to run it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-requisites
&lt;/h2&gt;

&lt;p&gt;There are four things you need&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://eksctl.io/"&gt;eksctl&lt;/a&gt; - the AWS command line utility allowing you to administer (e.g. setup/teardown) your AWS Kubernetes cluster. We use it here to set up the EKS cluster itself. Details of how to install can be found at &lt;a href="https://github.com/weaveworks/eksctl#installation"&gt;https://github.com/weaveworks/eksctl#installation&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://helm.sh/"&gt;helm&lt;/a&gt; - the package manager for Kubernetes applications. We use it here to deploy our Aerospike cluster. Details of how to install at &lt;a href="https://helm.sh/docs/intro/install/"&gt;https://helm.sh/docs/intro/install/&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;kubectl - the Kubernetes client allowing you to deploy and administer Kubernetes applications. We use it here to perform ad-hoc Kubernetes operations. Installation details at &lt;a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/"&gt;https://kubernetes.io/docs/tasks/tools/install-kubectl/&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href="https://aws.amazon.com/cli/"&gt;AWS CLI&lt;/a&gt;, allowing command line management of AWS services. We use it here for credential management. Installation details at &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html"&gt;https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below this is scripted in full for Centos - this script can be readily leveraged for use in other environments. You need to run using sudo or root.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h2&gt;
  
  
  Credentials
&lt;/h2&gt;

&lt;p&gt;In order to use &lt;em&gt;eksctl&lt;/em&gt; you will need to set up an AWS user whose credentials &lt;em&gt;eksctl&lt;/em&gt; will run under. This is a slightly involved topic, and I have therefore split it out into a separate article&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/aerospike/aws-credentials-for-eks-3a2i"&gt;https://dev.to/aerospike/aws-credentials-for-eks-3a2i&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article shows you how to set up the &lt;a href="https://aws.amazon.com/iam/"&gt;IAM&lt;/a&gt; policy you need as well as user and group configuration. It concludes with use of &lt;em&gt;aws configure&lt;/em&gt; to store credentials so &lt;em&gt;eksctl&lt;/em&gt; can make use of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aerospike Up and Running
&lt;/h2&gt;

&lt;p&gt;Now we're in a position to launch our Aerospike cluster. First we need to create our Kubernetes cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;eksctl create cluster &lt;span class="nt"&gt;--name&lt;/span&gt; aero-k8s-demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be warned, this takes something of the order of 30 minutes to complete. Once complete, type &lt;em&gt;kubectl config get-contexts&lt;/em&gt; to verify correct setup. A &lt;em&gt;context&lt;/em&gt; contains Kubernetes access parameters - the user, cluster and default namespace if set. You output should be similar to&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo9dmdwo6eynzwzf8v32d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo9dmdwo6eynzwzf8v32d.png" alt="Alt Text" width="800" height="48"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we retrieve the helm chart that supports cluster setup, and add it to our local repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add aerospike https://aerospike.github.io/aerospike-kubernetes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are now able to install our Aerospike cluster. We will also enable the monitoring capability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm &lt;span class="nb"&gt;install &lt;/span&gt;demo aerospike/aerospike &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;enableAerospikeMonitoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--set&lt;/span&gt; rbac.create&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see something like&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyvsb39yxu99tlq03l36m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyvsb39yxu99tlq03l36m.png" alt="Alt Text" width="800" height="614"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the last line in the screenshot suggests, we can take a look at our Kubernetes objects via&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get all &lt;span class="nt"&gt;--namespace&lt;/span&gt; default &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="s2"&gt;"release=demo, chart=aerospike-5.0.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1kz7d8ta6gvtcq18c0b4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1kz7d8ta6gvtcq18c0b4.png" alt="Alt Text" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All your pods should be in the running state before proceeding  -  if not you can use the command below to wait until they are.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Access Grafana Monitoring 
&lt;/h2&gt;

&lt;p&gt;In the screenshot above, you can see that the Grafana monitoring service is running on port 80. We will make that available locally on port 8080 via port forwarding.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl port-forward service/demo-grafana 8080:80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can connect to this using a browser - &lt;em&gt;&lt;a href="http://yourhostname:8080"&gt;http://yourhostname:8080&lt;/a&gt;&lt;/em&gt;. Credentials admin/admin. Note Grafana will make you change your password on entry. Do Home -&amp;gt; Cluster Overview at top left and you should see something like&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnjdsak8fugb2a25p09p1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnjdsak8fugb2a25p09p1.png" alt="Alt Text" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that you can do the above port forward in any environment, not just where you did the creation. You will need to make sure you have the same credentials in &lt;em&gt;~/.aws/credentials&lt;/em&gt; and the required kubectl context. Below I have added my eks credentials to my aws credentials file under the &lt;em&gt;eks&lt;/em&gt; heading.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbabylk66g3vzdd9k716b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbabylk66g3vzdd9k716b.png" alt="Alt Text" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;eksctl&lt;/em&gt; then has a neat utility allowing you to get your K8s context into your alternative environment. I'm taking care to run the command under the eks account (p flag) and in the region where the K8s cluster was created (r flag - get this from the cluster name if you are not sure).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;eksctl utils write-kubeconfig &lt;span class="nt"&gt;--cluster&lt;/span&gt; aero-k8s-demo &lt;span class="nt"&gt;-p&lt;/span&gt; eks &lt;span class="nt"&gt;-r&lt;/span&gt; &amp;lt;REGION&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using Aerospike
&lt;/h2&gt;

&lt;p&gt;We'll demonstrate use of Aerospike via our benchmarking software. From the &lt;em&gt;kubernetes-aws&lt;/em&gt; project&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; aero-client-deployment.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will create a single pod with a container with our benchmarking software installed. You can see the Dockerfile for the images used in the aerospike-java-client-build directory.&lt;/p&gt;

&lt;p&gt;The commands below respectively&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve the name of the container running the java client&lt;/li&gt;
&lt;li&gt;Run the run_benchmarks command against that container.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CONTAINER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get pod &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="s2"&gt;"app=aerospike-java-client"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"{.items[0].metadata.name}"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nv"&gt;$CONTAINER&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
/aerospike-client-java/benchmarks/run_benchmarks &lt;span class="nt"&gt;-h&lt;/span&gt; demo-aerospike
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output will be similar to&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj7ggowebuuv6p05n5m1z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj7ggowebuuv6p05n5m1z.png" alt="Alt Text" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you go back to Grafana, change 'Cluster Overview' to 'Namespace view' at top left and change the time period (top right) to 'Last 5 minutes' and wait a couple of minutes your output will be similar to&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdmnq76r6jjpq4fbc5rta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdmnq76r6jjpq4fbc5rta.png" alt="Alt Text" width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tidying Up
&lt;/h2&gt;

&lt;p&gt;To remove the java client&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete &lt;span class="nt"&gt;-f&lt;/span&gt; aero-client-deployment.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To uninstall the Aerospike stack&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm uninstall demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To delete your EKS cluster&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl delete cluster aero-k8s-demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The above gets you up and running with Aerospike on EKS, the AWS Kubernetes service. Fundamentally it is about the steps you need to take to get your EKS cluster running. From that point on (&lt;em&gt;helm repo add&lt;/em&gt; onwards) you can use the same steps versus any Kubernetes cluster, be it EKS, GCP or minikube based.&lt;/p&gt;

</description>
      <category>aerospike</category>
      <category>kubernetes</category>
      <category>eks</category>
    </item>
    <item>
      <title>AWS Credentials for EKS</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Mon, 27 Jul 2020 10:06:27 +0000</pubDate>
      <link>https://forem.com/aerospike/aws-credentials-for-eks-3a2i</link>
      <guid>https://forem.com/aerospike/aws-credentials-for-eks-3a2i</guid>
      <description>&lt;p&gt;&lt;em&gt;What you need to know before you can create AWS Kubernetes clusters using the command line&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://eksctl.io/" rel="noopener noreferrer"&gt;eksctl&lt;/a&gt; is the AWS command line utility allowing you to administer (e.g. setup/teardown) your AWS Kubernetes cluster. This article details how you configure the credentials you need to use the service. This article is useful as this is not detailed on the eksctl &lt;a href="https://eksctl.io/" rel="noopener noreferrer"&gt;website&lt;/a&gt; and is non-trivial.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM Overview
&lt;/h2&gt;

&lt;p&gt;Credentials in AWS are managed using &lt;a href="https://aws.amazon.com/iam/" rel="noopener noreferrer"&gt;IAM&lt;/a&gt;  -  AWS Identity and Access Management. Broadly speaking, you create &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_access-management.html" rel="noopener noreferrer"&gt;policies&lt;/a&gt; which are granular aggregations of permissions on AWS objects. You associate these with &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_groups.html" rel="noopener noreferrer"&gt;groups&lt;/a&gt; to which you add &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html" rel="noopener noreferrer"&gt;users&lt;/a&gt;. If a user has been created for programmatic access use, the user will have an &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html" rel="noopener noreferrer"&gt;access key id&lt;/a&gt; and a secret access key which can be stored on disk for use in conjunction with the &lt;a href="https://aws.amazon.com/cli/" rel="noopener noreferrer"&gt;AWS command line&lt;/a&gt; interface. The same mechanism is used by eksctl.&lt;/p&gt;

&lt;p&gt;In this article we set up the eksctl account in accordance with the principle of 'least privilege' - the account should have sufficient privileges to execute actions as needed, but no more.&lt;/p&gt;

&lt;p&gt;Below we go through the steps in the above process in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM Policy Setup
&lt;/h2&gt;

&lt;p&gt;The eksctl &lt;a href="https://eksctl.io/" rel="noopener noreferrer"&gt;website&lt;/a&gt; does not detail the set of IAM privileges needed to run eksctl, and trial and error is not recommended. Guidance can be found in issue 204 below however.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fryzow0wq12z6h8ax6v01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fryzow0wq12z6h8ax6v01.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As this is still somewhat complicated (and incomplete) I'm going to make use of this, but simplify the process for you.&lt;/p&gt;

&lt;p&gt;First of all pull down &lt;a href="https://github.com/aerospike-examples/kubernetes-aws" rel="noopener noreferrer"&gt;https://github.com/aerospike-examples/kubernetes-aws&lt;/a&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
 bash
git clone https://github.com/aerospike-examples/kubernetes-aws
cd kubernetes-aws


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The policy you need is in &lt;em&gt;eks.iam.policy.template&lt;/em&gt;. Some permissions however are account specific - you will see this if you look for the text &lt;em&gt;account-id&lt;/em&gt; in &lt;em&gt;eks.iam.policy.template&lt;/em&gt; - this needs replacing with your own account id.&lt;/p&gt;

&lt;p&gt;Find your account id by logging into the AWS console. Select 'My Account'&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9vzvjpa5h4lexbhztrey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9vzvjpa5h4lexbhztrey.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see your account id in the next screen. Copy this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8nvrwt4us6o9x6mrkwkx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8nvrwt4us6o9x6mrkwkx.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the &lt;em&gt;kubernetes-aws&lt;/em&gt; project you just cloned, run&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
 bash
./make-policy.sh YOUR_ACCOUNT_ID


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The result will be saved as &lt;em&gt;eks.iam.policy&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Copy the contents of &lt;em&gt;eks.iam.policy&lt;/em&gt; to the clipboard.&lt;/p&gt;

&lt;p&gt;Select the IAM Service in the AWS console (Services-&amp;gt;IAM) and click 'Policies'&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fktnpjnbpr1f7ijxlwvtk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fktnpjnbpr1f7ijxlwvtk.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next 'Create Policy'. Select 'JSON' rather than 'Visual Editor', remove the JSON you see and replace with the contents of &lt;em&gt;eks.iam.policy&lt;/em&gt;. Your screen should look like&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjvr2gayz0sg38rwb9jy4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjvr2gayz0sg38rwb9jy4.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now click 'Review Policy'. Give your policy a name e.g. &lt;em&gt;EKS&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp04drubj149ma7x9gqc0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp04drubj149ma7x9gqc0.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally click 'Create Policy', bottom right of the above screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM Group Setup
&lt;/h2&gt;

&lt;p&gt;In this section we create an IAM group and add the EKS policy to it. &lt;/p&gt;

&lt;p&gt;Select 'Groups', from the left hand IAM menu.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2armwrcdn6ykawvgqtra.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2armwrcdn6ykawvgqtra.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click 'Create New Group'. Give your group a name e.g. EKS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5044o6udwkmfa0ysuf2j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5044o6udwkmfa0ysuf2j.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click 'Next Step'. Search for the policy you created and select.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1egnlk56iu864fe5g6r6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1egnlk56iu864fe5g6r6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click 'Next Step', followed by 'Create Group'. You should see your new group, EKS, appear in the group listing screen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8gjker6mus3vxr2cn7i0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8gjker6mus3vxr2cn7i0.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create IAM User
&lt;/h2&gt;

&lt;p&gt;Now we create a user and associate with the EKS group. Select 'Users' from the left hand side menu above.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9hxa6xu39kxdruric8cq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9hxa6xu39kxdruric8cq.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click 'Add User'. Give your user a name e.g. EKS and check the 'programmatic access' access type.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqt9df1rhlmavjqppyqk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqt9df1rhlmavjqppyqk9.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click 'Next: Permissions'. 'Add User To Group' will be selected by default. Check the 'EKS' group.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw94qhn3upq4lu56119uv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw94qhn3upq4lu56119uv.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click 'Next:Tags' followed by 'Next:Review' and finally 'Create User'. You will see the screen below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdh2uiguztwfnq439g1r6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdh2uiguztwfnq439g1r6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Keep this screen in your browser - you will need it for the steps below.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS CLI Credential Setup
&lt;/h2&gt;

&lt;p&gt;We are now in a position to cache our credentials on disk so they can be used by the AWS CLI or eksctl.&lt;/p&gt;

&lt;p&gt;You will need the AWS CLI. Installation details may be found at &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the environment in which you will be using the AWS CLI / eksctl type &lt;em&gt;aws configure&lt;/em&gt; and fill in the access key and secret access key which you can obtain from the screen above. You are also required to add in the default AWS region you wish to use. If you are curious, your credentials are stored in &lt;em&gt;~/.aws/credentials&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fotjejo7tfahoy4fa5jjn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fotjejo7tfahoy4fa5jjn.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have pixelated my keys as a matter of good practice, but I could also have made them visible and deleted the account immediately after taking the screenshot, then recreating the user. The secret key would have been completely different.&lt;/p&gt;

&lt;p&gt;Note that you will need to click 'show' to see the secret access key in the screen above. &lt;em&gt;You are only able to do this once&lt;/em&gt;. You will need to request another key if you do not record what you see for use in the &lt;em&gt;aws configure&lt;/em&gt; step. Not a big problem, see below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Access Key / Secret Key access
&lt;/h2&gt;

&lt;p&gt;IAM makes it easy to rotate keys and manage accounts. Having created your user above you can access via 'Users' in the IAM menu.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxw6dpy2bfyz4du5m955e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxw6dpy2bfyz4du5m955e.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we select 'EKS' we see&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8mwjrx24k4zltvediy3l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8mwjrx24k4zltvediy3l.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have tabbed to 'Security Credentials' above.&lt;/p&gt;

&lt;p&gt;Note you can make a set of credentials inactive via 'Make Inactive'. You can request a new set via 'Create Access Key'. This will again give you one time access to your secret key. It also supports key rotation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fccu7wgpj77k7j6eqdgo6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fccu7wgpj77k7j6eqdgo6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article we showed you how to set up credentials for eksctl in accordance with the best practice of least privilege. In &lt;a href="https://dev.to/aerospike/aerospike-on-eks-aws-k8s-m5b"&gt;https://dev.to/aerospike/aerospike-on-eks-aws-k8s-m5b&lt;/a&gt; we make use of this when detailing how to set up an Aerospike cluster on EKS.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Multi Record Transactions for Aerospike</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Thu, 16 Jul 2020 11:55:35 +0000</pubDate>
      <link>https://forem.com/aerospike/multi-record-transactions-for-aerospike-49ab</link>
      <guid>https://forem.com/aerospike/multi-record-transactions-for-aerospike-49ab</guid>
      <description>&lt;p&gt;Aerospike is a high performance key-value database. It’s aimed at institutions and use-cases that need high throughput ( 100k tps+), with low latency (95% completion in &amp;lt;1ms), while managing large amounts of data (Tb+) with 100% uptime, scalability and low cost.&lt;/p&gt;

&lt;p&gt;Because that’s what we offer, we don’t, at the server level, support multi-record transactions (MRT). If that’s what you need, then you will need two phase locking and two phase commit (assuming a distributed system). This will slow your system down, and will scale non-linearly, in that if you double your transaction volume, your time spent waiting for locks will more than double. If Aerospike did that it wouldn’t be a high performance database. In fact it would be just like any number of other databases and what would be the point in that?&lt;/p&gt;

&lt;p&gt;Furthermore, believe it or not, despite the fact that our customers ask for many things, multi-record transactions are not high on the list. This is because in practice they’re less necessary than people think outside of a relational database. In an RDBMS you do need MRT because you shard your insert/update across tables in a non-natural way. In a key value or key object database, an insert/update that might span multiple tables is actually a single record change.&lt;/p&gt;

&lt;p&gt;Even the textbook example of a change that supposedly has to be atomic, the transfer of money between two bank accounts, is not actually atomic in the real world. You can see this for yourself if you contemplate the fact that bank transfers are not instantaneous — far from it. That is because bank transfers cross system boundaries — and the credits and debits are not co-ordinated as a distributed transaction.&lt;/p&gt;

&lt;p&gt;This article is however concerned with multi-record transactions and specifically executing them using Aerospike. Although the server will not natively support them, they can be achieved in software via use of the capabilities Aerospike offers.&lt;/p&gt;

&lt;p&gt;Although, as discussed above, the need for MRT in key value databases is more limited than you might expect, there may well still be use cases that demand it. An example might be a workflow system - taking items off a queue and dispatching to other queues. The transfer of work item needs to be atomic — you don’t want a work item to potentially be processed twice or not at all due to a transactional failure.&lt;/p&gt;

&lt;p&gt;To that end, I’ve put together &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn"&gt;multi-record-txn&lt;/a&gt;, a package that supports atomic multi record updates in Aerospike. At the heart of this is our ability to create locks using the primitives we offer. A record can be created with a CREATE_ONLY flag and this is used for locking purposes — if a lock record for an object already exists, CREATE_ONLY will fail.&lt;/p&gt;

&lt;p&gt;We build on this by storing the state of our records prior to update in a transaction record, which is deleted once the updates are complete. Following that, we release our locks.&lt;/p&gt;

&lt;p&gt;Rollback, if needed, is accomplished by restoring the stored values. We supply calls for rolling back records of more than a certain age, and for releasing locks, similarly of a supplied longevity. Full details can be found in the project &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn"&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If it’s that easy, why don’t you put it into the product? Good question. The API comes with &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn#caveats"&gt;caveats&lt;/a&gt;, the principle one being that dirty reads are possible i.e. reads of values that may be rolled back. Specifically we are not offering isolation. The API does however offer the ability to lock records prior to update and to check locks, so in principle, if you were to insist on all gets being preceded by locks full isolation would be achieved. That said you wouldn’t have a high performance database anymore. The correct thing to do then is to use judiciously as you would with any tool. Also offered is the ability to do optimistic locking by supplying the generation (see &lt;a href="https://www.aerospike.com/docs/guide/FAQ.html"&gt;Aerospike FAQ&lt;/a&gt; ) of the record, with transaction failure occurring if generation count does not match what is expected. This keeps you secure when other database users may non-transactionally update records, without incurring the overhead of locking.&lt;/p&gt;

&lt;p&gt;It is worth saying at this point, that the value of this API is much greater if using Aerospike Enterprise ( rather than Community Edition ) and in particular making use of &lt;a href="https://www.aerospike.com/docs/architecture/consistency.html"&gt;Strong Consistency&lt;/a&gt;. Strong consistency gives you the guarantee that duplicate records in your database ( necessary for resilience purposes ) will not ever experience divergence. If you do not have this guarantee ( which very few databases in our performance range offer ) then there is potential for this to occur in the event of network partitions and process crashes. Divergence of records here would mean locks or transaction records being lost in a sub-cluster experiencing a partition event ( or process crash ). Strong Consistency gives you a guarantee this will not happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Multi record put that will succeed or fail atomically&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6xjx66lxmeqc2gqo72m6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6xjx66lxmeqc2gqo72m6.png" alt="Alt Text" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Atomic multi-record incorporating generation check&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fd99squw9ory892yxpyk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fd99squw9ory892yxpyk6.png" alt="Alt Text" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So what happens if my transaction fails part way through?&lt;/p&gt;

&lt;h2&gt;
  
  
  Rollback
&lt;/h2&gt;

&lt;p&gt;If transactions can they will self unwind. This will be done on a lock acquire or generation check exception.&lt;/p&gt;

&lt;p&gt;This may of course not be possible in the event of a network failure for example. For that we have the rollback call. This will (see below) allow rollback of all transactions of more than a (specified by the user) certain age, together with any orphan locks i.e. locks not associated with transactions ( the absence of a transaction record means the transaction completed, but the transaction process failed while unwinding the locks ).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Rollback of expired transactions / locks&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpnuitcyvq0i30gb7zvq0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpnuitcyvq0i30gb7zvq0.png" alt="Alt Text" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;But how do I know it’s sound? Try testing it of course. The &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn"&gt;README&lt;/a&gt; goes into some detail on this point and the test classes even more so. If you think something has been missed let me know.&lt;/p&gt;

&lt;h2&gt;
  
  
  More Detail
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn/blob/master/FAQ.md"&gt;FAQ&lt;/a&gt; covers a number of questions asked to date. Please do read this section and also the &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn#caveats"&gt;caveats&lt;/a&gt; section of the &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn"&gt;README&lt;/a&gt; ( in fact all of the README ) if you are considering using this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further direction
&lt;/h2&gt;

&lt;p&gt;An obvious next step to take would be to incorporate two-phase locking i.e. supporting shared read locks as well as exclusive write locks in order to reduce contention.&lt;/p&gt;

&lt;p&gt;Another possibility is that the locks are on the records themselves, rather than being separate records — this might optimize single record use.&lt;/p&gt;

&lt;p&gt;Finally, in this &lt;a href="http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html"&gt;post&lt;/a&gt;, Martin Kleppmann notes that this method may be problematic if there is the possibility that transactions can pause for unexpectedly long periods (exceeding the period you would reasonably expect the transaction to complete in). If you are considering using this API, you should consider the points he raises. His suggestion of ‘fencing tokens’ is an option for incorporating into this API if there is further interest.&lt;/p&gt;

&lt;p&gt;To that end, please let me know if you use this API, and then I’ll know if there’s appetite for more.&lt;/p&gt;

&lt;p&gt;Any questions/comments — please feed back through the GitHub &lt;a href="https://github.com/aerospike-examples/atomicMultiRecordTxn/issues"&gt;issues&lt;/a&gt; facility.&lt;/p&gt;

</description>
      <category>aerospike</category>
    </item>
    <item>
      <title>Record Aggregation in Aerospike For Performance and Economy</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Fri, 10 Jul 2020 07:37:29 +0000</pubDate>
      <link>https://forem.com/aerospike/record-aggregation-in-aerospike-for-performance-and-economy-4lk6</link>
      <guid>https://forem.com/aerospike/record-aggregation-in-aerospike-for-performance-and-economy-4lk6</guid>
      <description>&lt;p&gt;A strong differentiator for Aerospike vs other key value databases is its DRAM economy. Every object has a 64 byte DRAM footprint no matter what the size of the actual object is. You can manage a billion objects using only 128Gb of DRAM, allowing for 2x replication.&lt;/p&gt;

&lt;p&gt;Great news! A billion is a pretty big number and 3 * 512GB nodes gets me to 12bn. Within reason I can have as many objects as I like. I should start making them right away with no further thought required.&lt;/p&gt;

&lt;p&gt;Hold your horses, cowboy. It might not be as simple as that.&lt;/p&gt;

&lt;p&gt;For instance, what if your objects are very small? Worst case, they’re so small that they’re of the order of 64 bytes, so now your memory footprint is similar to your disk footprint. It might even be the case that your memory to disk ratio is such that your DRAM is full when your disk is half empty. In a bare metal situation, buy less disk / more DRAM for sure, but you might be in the cloud where you’re stuck with certain DRAM / storage ratios. Or maybe these machines got brought to you for re-purposing.&lt;/p&gt;

&lt;p&gt;A technique informally known as blocking can help you. You store your objects within larger objects. Your API turns this into an implementation detail. Blocking can reduce your memory footprint, helping with the small object use case.&lt;/p&gt;

&lt;p&gt;Aerospike lets you do this by offering a comprehensive &lt;a href="https://www.aerospike.com/docs/guide/cdt-list-ops.html"&gt;List&lt;/a&gt; and &lt;a href="https://www.aerospike.com/docs/guide/cdt-map-ops.html"&gt;Map&lt;/a&gt; API which allows you to place objects within objects as well as retrieving them in an efficient manner. List structures can be used for storing structured data such as time series of a regular frequency while maps can be used to reduce your key space. The API offered is a distinguishing feature of Aerospike when contrasted with other key value databases.&lt;/p&gt;

&lt;p&gt;Let’s look again at our example. Suppose your key space is composed of device ids, and these are fundamentally &lt;a href="https://en.wikipedia.org/wiki/Universally_unique_identifier"&gt;UUIDs&lt;/a&gt; — 128 bit numbers or 32 digit hexadecimal numbers. Let’s say you anticipate you may need to store as many as 15bn of these, but each record is only around 200 bytes. Your DRAM requirement with Aerospike would be of the order of&lt;/p&gt;

&lt;p&gt;&lt;em&gt;64(bytes) * 15bn * 2(replication) = ~2Tb&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Not the end of the world, but you could do better by working smarter.&lt;/p&gt;

&lt;p&gt;Assume also that we want to keep our physical object size below 128kb — a good &lt;a href="https://discuss.aerospike.com/t/faq-write-block-size/681"&gt;starting point&lt;/a&gt; for optimal block size on a flash device, which is recommended for Aerospike. We can put 655 200 byte objects into 128kb.&lt;br&gt;
If each physical object contains 655 actual objects then we require 15bn / 655 = 22.9m container object keys. The question is, how then, given do we map from a device id to the container(physical) object key, and how do we reliably look up a logical object inside thecontainer object. The answer is that we do this using &lt;a href="https://en.wikipedia.org/wiki/Mask_(computing)"&gt;bit-masking&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A 128 bit key space can be converted into a key space of size 2,4 …. 65536… 2²⁰ keys by AND-ing the key with a binary number composed of 1,2 … 16 … 20 etc leading ones followed by trailing zeros. For our example we need a bit mask of size equivalent to the first power of two above our key space size which can be calculated as&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ceiling(log(22.9 *10⁶) / log(2)) = 25 bits&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This gives us a key space of size 2²⁵ = ~33.5m so we’ve got our maths correct.&lt;/p&gt;

&lt;p&gt;Let’s look at how we make use of this in Aerospike&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;This function stores our device metadata inside a physical object. As described, the physical object key is derived using bit masking. Note this is efficient from a network capacity point of view — only the metadata gets sent across the network, not the full physical object.&lt;/p&gt;

&lt;p&gt;We also need to see how to retrieve our object&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The construction of the physical key is as before. This time we use the &lt;em&gt;getByKey&lt;/em&gt; operation to retrieve the device metadata.&lt;/p&gt;

&lt;p&gt;An important point to note is that only the metadata requested is transmitted across the network not the entire physical object. This consideration applies in general to the calls offered by the List/Map API. This is what we mean by ‘economy’ in the article title.&lt;/p&gt;

&lt;p&gt;Finally, a code snippet showing how to calculate the bit mask using the ‘natural’ inputs.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The net benefit of all the above is that the memory footprint will be reduced in this case to &lt;/p&gt;

&lt;p&gt;&lt;em&gt;2²⁵ (keys) * 64 (DRAM cost per record) * 2 (rep factor) = 4Gb&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;from &lt;/p&gt;

&lt;p&gt;&lt;em&gt;15bn * 64 (DRAM cost per record) * 2 (rep factor) = ~1.8TB&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The reduction factor is 447, slightly less than the '655' quoted above as on average, our 128kb blocks will not be completely filled.&lt;/p&gt;

&lt;p&gt;Before we close, worth noting our Enterprise only &lt;a href="https://www.aerospike.com/docs/operations/configure/namespace/index/index.html#flash-index"&gt;‘all flash’&lt;/a&gt; capability which allows both index and data to be placed on disk thus reducing DRAM usage to very low levels. This was developed specifically with the use cases of small objects and/or very large numbers of objects (~10¹² = 1 trillion ) in mind. It will engender higher levels of latency (~5ms vs ~1ms at the 95th percentile ) but it’s still competitive vs any other database out there.&lt;/p&gt;

&lt;p&gt;The above solution is a good example of a differentiating feature, our &lt;a href="https://www.aerospike.com/docs/guide/cdt-list-ops.html"&gt;List&lt;/a&gt; and &lt;a href="https://www.aerospike.com/docs/guide/cdt-map-ops.html"&gt;Map&lt;/a&gt; API , providing a distinguishing optimisation under constraints. The technique of ‘blocking’ can also be made use of for time series data which I hope to explore in a future article.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cover image with thanks to &lt;a href="https://unsplash.com/@nananadolgo"&gt;Nana Smirnova&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aerospike</category>
    </item>
    <item>
      <title>Ansible For Aerospike</title>
      <dc:creator>Ken Tune</dc:creator>
      <pubDate>Wed, 01 Jul 2020 07:59:31 +0000</pubDate>
      <link>https://forem.com/aerospike/ansible-for-aerospike-43ln</link>
      <guid>https://forem.com/aerospike/ansible-for-aerospike-43ln</guid>
      <description>&lt;p&gt;Working as a Solution Architect for Aerospike I often have occasion to create Aerospike clusters on the fly. These might be for benchmarking, demonstration purposes or investigation of a query from a prospect or customer.&lt;/p&gt;

&lt;p&gt;Although Aerospike is straightforward to &lt;a href="https://www.aerospike.com/docs/operations/index.html"&gt;configure and install&lt;/a&gt;, the relatively small number of steps you have to go through does begin to add up in aggregate. This becomes more significant when you might add in the need to configure TLS, encryption on disk, strong consistency, java benchmarking client or rack awareness to name a few of the options available.&lt;/p&gt;

&lt;p&gt;One possibility is to make use of &lt;a href="https://github.com/aerospike/aerospike-kubernetes/tree/master/helm"&gt;Aerospike Kubernetes&lt;/a&gt; but if what you’re doing demands high levels of performance, casual use will steer you towards using VMs.&lt;/p&gt;

&lt;p&gt;To that end I’ve put together &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;aerospike-ansible&lt;/a&gt; a collection of Ansible scripts allowing configurable automation of the build of Aerospike clusters. The focus is on doing this on AWS as that’s the platform that Ansible best supports, but with a little &lt;a href="https://docs.ansible.com/ansible/2.3/intro_inventory.html"&gt;inventory&lt;/a&gt; nous you can leverage these scripts on bare metal or alternate cloud provider environments*.&lt;/p&gt;

&lt;p&gt;The scripts go beyond simply building clusters. The configurable deployment of Aerospike &lt;a href="https://github.com/aerospike/aerospike-client-java/tree/master/benchmarks"&gt;Java benchmarking&lt;/a&gt; clients and the Prometheus/Grafana based &lt;a href="https://github.com/aerospike/aerospike-monitoring"&gt;Aerospike Monitoring&lt;/a&gt; is also handled by the repository.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/aerospike-examples/aerospike-ansible"&gt;README&lt;/a&gt; goes into full detail, but key supported options are&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instance type&lt;/li&gt;
&lt;li&gt;Hosts per AZ&lt;/li&gt;
&lt;li&gt;Community/Enterprise&lt;/li&gt;
&lt;li&gt;Encryption at rest (*)&lt;/li&gt;
&lt;li&gt;TLS&lt;/li&gt;
&lt;li&gt;Strong Consistency — including roster setup with rack awareness (*)&lt;/li&gt;
&lt;li&gt;Prometheus/Grafana monitoring stack&lt;/li&gt;
&lt;li&gt;Aerospike Version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(*) Aerospike Enterprise Only&lt;/p&gt;

&lt;p&gt;There’s also a &lt;a href="https://youtu.be/fWKACehyJHc"&gt;video&lt;/a&gt; showing end to end setup of the full Aerospike cluster/client/monitoring stack on a fresh Vagrant instance. It’s around 25 minutes long but it also goes through possible wrinkles relating to installation of Ansible plus &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html"&gt;IAM&lt;/a&gt; setup. Colleagues tell me these scripts take as little as 5 minutes to use for the first time, even with zero knowledge of Ansible.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/fWKACehyJHc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;I’m envisaging this being helpful to those looking to get Aerospike up and running for the first time. It allows easy setup for proof of concept and development work, while providing a production grade facility. You can tear it down just as easily as you stood it up, keeping costs low.&lt;/p&gt;

&lt;p&gt;Providing additional recipes for operational procedures such as rolling upgrades or cluster migration is very much on the cards, so I hope to get further assets and blog posts out in this area.&lt;/p&gt;

&lt;p&gt;Any questions/comments — please feed back through the GitHub &lt;a href="https://github.com/aerospike-examples/aerospike-ansible/issues"&gt;issues&lt;/a&gt; facility.&lt;/p&gt;

&lt;p&gt;[*] See the README for Google Cloud Platform details&lt;/p&gt;

</description>
      <category>aerospike</category>
      <category>ansible</category>
      <category>devops</category>
      <category>nosql</category>
    </item>
  </channel>
</rss>
