Forem: Tim F

Using Aerospike policies (correctly!)

Tim F — Thu, 29 Feb 2024 23:28:25 +0000

Developers familiar with Aerospike will know that the vast majority of the API calls take a “policy” as a parameter. This policy is optional but can have a significant impact on the application. There are a number of parameters in policies that can be confusing to developers, and even creating the policies is frequently done incorrectly. This blog will attempt to clear up some of this confusion.

What are policies?

Each API call in Aerospike typically takes a policy as a parameter. For example, consider a simple get in Java:

client.put(null, new Key("test", "testSet", 1), 
           new Bin("name", "Tim"), new Bin("age", 312));

The first parameter here is the policy; in this case, it is an instance of the WritePolicy class. If null is specified, then the default policy is used. Policies typically contain fields that fall into one of three categories:

Networking control: How should the API call behave if something goes wrong? For example, if the server cannot be reached, should the call be retried on a different server?
Application behavior: Parameters that control how the API call should behave in various application-driven scenarios. For example, Aerospike does an “upsert” operation by default on a put, selecting to either update or insert the record depending on whether it already exists or not. However, the application might require the record to be inserted only, failing if the record already exists in the database.
Filtering: Aerospike supports filters on operations, allowing almost all operations to become conditional. For example, an operation can be specified to retrieve a record if the amount bin in that record is greater than a thousand.

These three categories are all in the same policy object, with the networking control defined on the Policy class, which is a superclass of other policies, such as WritePolicy. Note that the Policy class is used for read operations without a subclass; there is no ReadPolicy class.

Networking control policies

The policy superclass defines a number of parameters related to networking. These are critical to optimizing application behavior but are often poorly understood. Let’s take a look at these and the interaction between them.

These networking fields are:

connectTimeout
maxRetries
replica
sleepBetweenRetries
socketTimeout
timeoutDelay
totalTimeout

A diagram is helpful in understanding the interaction between these fields:

Figure 1: API call sequence

The diagram shows a timeline of what happens when an API call is submitted to the Aerospike Client Library (ACL) and can be read from left to right.

The first thing that happens is the ACL needs a connection to the server which holds the data. If this is a simple get or put, this will be a connection to a single server, but if it’s a complex operation such as a batch or query, multiple servers might be involved. The ACL has connection pools to each server node, so acquiring this connection is typically a matter of borrowing one from the pool, which is almost instantaneous. However, if the pool has no available connections, a new connection must be established, which can take some time. This is especially true if the connection is over TLS as the handshake between the client and server on a TLS connection can be slow.

This connection establishment process has its own timeout field, connectTimeout. This field is in milliseconds and includes both the connection establishment as well as user authentication (if any). By default, this is set to zero, which is usually a reasonable default – this uses the lesser of the socketTimeout and totalTimeout, or 2,000 milliseconds if both are zero. If the application can tolerate extra time when a connection is established, especially when using TLS, then this can be set to a non-zero value. For example, if you want a large connectTimeout and a small socketTimeout to allow for protracted SSL handshakes but fast application responses, setting this value to something like 5,000 milliseconds works well.

Once a connection has been retrieved, the API call is sent to the server. There are two timeouts compared at this point, the socketTimeout and the totalTimeout. Both are in milliseconds, and the socketTimeout reflects how long it takes to wait for data from this API call before retrying or failing. The totalTimeout specifies the maximum time the call can take, including retries. If the time remaining in the totalTimeout is less than the socketTimeout, then the totalTimeout is used as the socketTimeout for the call.

In Figure 1, the socketTimeout is less than the totalTimeout, so the ACL waits until one of the following occurs: socketTimeout milliseconds have elapsed, the connection returns a timeout, or the call succeeds. If the call was unsuccessful in this period, the ACL checks the maxRetries setting to see if any retries remain. For reads, this value defaults to two, giving it a total of three tries – the initial attempt plus the two retries. For writes, this value defaults to zero, so no retries are attempted.

Figure 1 depicts a read with the default two retries, so once the original socketTimeout expires, the API call does not return to the application. Instead, the ACL waits for sleepBetweenRetries milliseconds and then submits the call again.

This second call may or may not go to the same server. This is controlled by the replica policy setting, which defaults to SEQUENCE. Sequence means that the first attempt is made to the node containing the master record, but on failure, the retry is made to a replica. This can be very useful in a read-sensitive use case, for example. If the socketTimeout is set to a short interval (say 20 milliseconds) and there are retries with a replica policy of SEQUENCE if the master node is busy and cannot respond in time or has fallen over. Still, if the cluster has not detected this yet, the retry will go to the replica and may return data within the application’s SLA.

If this second call again times out after socketTimeout milliseconds, a second delay of sleepBetweenRetry milliseconds is performed then the second retry (3rd total attempt) is performed.

Let’s take a look at where this leaves us in Figure 1:

The ACL is about to start the second retry, but in this case, the remaining time in the totalTimeout is less than the socketTimeout. Hence the API call will timeout before the socketTimeout elapses when the remaining time in the totalTimeout has elapsed.

Note that once the totalTimeout has elapsed, the client API call fails, even if further retries are available.

Timeout delay

The only timeout setting not covered in the above is the timeoutDelay setting. This is not strictly related to the API behavior but rather the system's efficiency. As the above shows, when the ACL submits the call to the server node, and no response is received within socketTimeout milliseconds, either the call is abandoned, or a retry is performed. But what happens to the connection on which that original request was placed?

By default, that connection is closed, destroying the connection and not placing it back in the connection pool. This is because it is possible that the server simply did not respond in time, and it might respond some time later. If the connection was reused instead of being closed, this server response might confuse the communication of the reusing call.

However, as discussed above, connection establishment can be expensive. Closing connections typically means that this connection must be re-established later, which can impact the application.

The timeoutDelay parameter specifies a time interval in milliseconds for those connections to be kept alive. If the server responds within that timeout, the socket is drained and then returned to the connection pool. If the timeout is exceeded, then the connection is closed.

Note that there is a cost of using the timeoutDelay as the ACL must keep track of which connections have timed out. It is useful in situations where timeouts may occur frequently (for example, aggressive read settings) and connection establishment is expensive.

Default policies

Now that the timeout settings are understood, we can discuss default policies. The ACL defines default values of the various policy attributes, which users can customize to suit their application instead of repeatedly modifying the defaults in the application code. Default policies are set up when the ACL is created and are used when null is passed to the policy parameter in the API call. Most often, the default policies are customized with the network settings discussed above.

Here is an example which creates a default writePolicy with a socketTimeout of 2,000 milliseconds:

WritePolicy writePolicyDefault = new WritePolicy();
writePolicyDefault.socketTimeout = 2000;

ClientPolicy clientPolicy = new ClientPolicy();
clientPolicy.writePolicyDefault = writePolicyDefault;
IAerospikeClient client = new AerospikeClient(clientPolicy, "127.0.0.1", 3000);

Note that the writePolicyDefault is set on the ClientPolicy instance before the ACL is created. When passed to the ACL constructor, a copy of the default policies is made so no changes are possible to the default policies.

After executing this code, passing null to a write API call will result in the ACL retrieving and using the defaultWritePolicy. Hence, the following two calls are identical:

client.put(null, key, new Bin("name", "joe"));
client.put(client.getWritePolicyDefault(), key, new Bin("name", "joe"));

Creating policies

Timeout settings are typically set on default policies and reused multiple times across the application. However, application-level settings are done as needed by the application on a per-call basis. Consider a check-and-set scenario where a record has been read from the database, changes made, and then those changes pushed back to the database, but only if the record has not been changed since it was read:

Record record = client.get(null, key);
...
WritePolicy writePolicy = client.getWritePolicyDefault();
writePolicy.generation = record.generation;
writePolicy.generationPolicy = GenerationPolicy.EXPECT_GEN_EQUAL;
client.put(writePolicy, key, new Bin("name", "joe"));

This code attempts to solve the requirements, getting the defaultWritePolicy from the ACL and then setting the generation and generationPolicy correctly. However, this code is very dangerous and will create issues in the application. There is just one defaultWritePolicy on the ACL, and the getWritePolicyDefault()call will return a reference to that object, not a copy of that object. Hence, setting the generation values on that policy will affect all calls that use that default policy on all threads, almost certainly causing many calls to fail with an exception.

Another common pattern to solve this problem is:

WritePolicy writePolicy = new WritePolicy();
writePolicy.generation = record.generation;
writePolicy.generationPolicy = GenerationPolicy.EXPECT_GEN_EQUAL;
client.put(writePolicy, key, new Bin("name", "joe"));

While this code is an improvement on the previous snippet, it too, has a problem. In this case, a unique WritePolicy instance has been created, preventing issues with this call affecting other calls. However, using new WritePolicy() does not pick up the settings of the default write policy, so settings changed on this default policy, such as the socketTimeout will not apply to this new policy.

The correct way to create a new policy to override fields is as follows:

WritePolicy writePolicy = new WritePolicy(client.getWritePolicyDefault());
writePolicy.generation = record.generation;
writePolicy.generationPolicy = GenerationPolicy.EXPECT_GEN_EQUAL;
client.put(writePolicy, key, new Bin("name", "joe"));

In this case, a new WritePolicy is created, but it gets a copy of all the settings off the default write policy. This object is local to this API call, so there is no contention with other calls.

Summary

Aerospike’s Policy classes are both powerful and flexible, allowing fine-grained, per-call control over both network and application-level settings. However, it is important to understand what the settings do to correctly control the API call. Network settings are often overlooked during the application development phase and are frequently poorly understood. A thorough understanding of them is critical to developing applications that perform well not just in the happy-day scenario but also in the face of network issues or server failures.

Developers who are new to Aerospike should understand this simple pattern to create new policies. It will ensure the correct execution of the API without affecting other concurrent calls and unintended surprises.

Comparing Aerospike clusters using "queryPartitions"

Tim F — Thu, 29 Jun 2023 20:04:09 +0000

Source: Photo by Jason Dent on Unsplash

Aerospike is renowned as a very fast, very scalable database capable of storing billions or trillions of records, as well as being able to replicate the data to multipe remote database clusters. Hence, a common question which arises is: "How can I validate that two clusters are in sync?". This used to be a difficult problem, but new API calls in Aerospike v5.6 make this task substantially easier. In this blog we will look at one of these new API calls and use it to develop some code to show how a cluster comparator could be written.

Premise

Let's assume we have 2 clusters called "A" and "B" with 10 billion records in them and the records are replicated bi-directionally. We want to efficiently see if the clusters have exactly the same set of records in them. There are a couple of obvious ways to do this, by comparing record counts and by using a primary index query with batch gets. Both of these approaches have flaws however. Let's take a look at these.

Comparing record counts

The first way we could check if the clusters have exactly the same set of records in them is by comparing record counts. Aerospike provides easy ways to get the total number of records in either a namespace or a set, so we could get a count of all the records in both clusters "A" and "B" per set and compare the 2 values. This relies only on metadata which the cluster keeps updated and so is blazingly fast. However, this can only determine that the clusters are different when the counts don't match.

What this approach lacks is:

The ability to determine which records are missing. Let's say cluster "A" has 10,000,000,000 records and cluster "B" has 10,000,000,005. Which 5 records are missing from cluster "A"? This cannot be determined by this method, although the set(s) with the missing records might be able to be determined.
Even if the record counts match, there is no guarantee that they both contain the same set of records. Consider a set with records 1,2,3,4,5 with the corresponding set on the other cluster having records 4,5,6,7,8. Both clusters would report 5 records, but the sets clearly do not contain the same records.

Using a Primary Index Query with Batch Gets

A different approach without the above limitations would be to do a primary index (PI) query (previously called a scan) on all the records in cluster "A". Then, for each record returned by the PI query, do a read of the record from cluster "B". If the record is not found, this is a difference between the clusters. We could optimize this by doing a batch read on cluster "B" with a reasonable number of keys (for example 1000) to minimize calls from the client to the server. This would complicate the logic a little but it's still not bad.

However, this approach suffers from the drawback that it will only detect missing records in cluster "B". If a record exists in cluster "B" and not in cluster "A", this approach will not detect it. The process would need to be run again, with "B" as the source cluster and then batch reading the records from cluster "A".

Using QueryParitions

Since Aerospike came out with PI Queries which could guarantee correctness even in the face of node or rack failures in version 5.6, another way has existed, using queryPartitions. (Note: in versions prior to 6.0 the API call for this is scanPartitions. This still exists today but is deprecated). Before we can understand what this call does, we need to understand what a partition in Aerospike is.

Partitions

Each Aerospike namespace is divided into 4096 logical partitions, which are evenly distributed between the cluster nodes. Each partition contains approximately 1/4096^th of the data in the cluster, and an entire partition is always on a single node. When a node is added to, or removed from the cluster, the ownership of some partitions is changed so the cluster will rebalance by moving the data associated with those partition between cluster nodes. During this migration of a partition the node receiving the partition cannot serve that data as it does not have a full copy of the partition.

Data within a partition is typically stored on Flash (SSD) storage and each partition stores the primary indexes for that partition in a map of balanced red-black trees:

Primary indexes and storage

Each tree is referred to as a "sprig" and the map holds pointers to the sprigs for that partition. There are 256 sprigs per partition by default, but this is changeable via the partition-tree-sprigs config parameter.

QueryPartitions

The queryPartitions method allows one or more of these partitions to be traversed in digest order. A digest is a unique object identifier created by hashing the record's key and the red-black trees depicted above use the digest to identify the location of a node in the tree.

When a partition is scanned, a single thread on the server which owns that partition will traverse each sprig in order and within that sprig will traverse all the nodes in the tree in order. This means that not only will we always get the returned records in the same order, but also that 2 different namespaces - even on different clusters - which contain the records with the same primary key and the same set name will always be traversed in exactly the same order.

Implementing a Comparator

Let's look at how we can use this to implement a comparator between 2 clusters. Let's assume there is only one partition for now. We know that the records in this partition will be returned in digest order, so we can effectively treat it as a very large sorted list. (If there are ~10 billion records in the cluster, there are 4096 partitions so each partition should have about 2.4 million records).

To illustrate how we will do this, let's consider 2 simple clusters each with 12 records in partition 1:

2 simple partitions and their data

(Note that the numbers represent the digest of the records).

If we were to query this partition on cluster 1 and cluster 2 at the same time, each time we retrieved a record from the queries we would have one of three situations:

The record would exist in only cluster 1
The record would exist in only cluster 2
The record would exist in both clusters.

The first time we retrieve a record from each cluster we will get record 1 on each:

After retrieving the first record on each cluster

The current record on cluster 1 (Current1) and the current record on cluster 2 (Current2) both have a digest of 1, so there is nothing to do. We read the next record on both clusters to advance to digest 2, which is again the same. We read digest 3 (same on both) so we read the next record on both which gives:

Situation after reading next record after digest 3

When we compare digests here Current1 has digest 5 but Current2 has digest 4. We need to flag the record which Current2 refers to (4) as missing from Cluster 1 and then advance Current2, but not Current1:

After flagging record 4 as missing from Cluster 1 and advancing Current2

We're now back in the situation where the 2 values are equal (5) so they progress again as before.

Situation after reading next record after digest 5

Here Current1 has 6 but Current2 has 7. We need to flag the record which Current1 refers to (6) as missing from Cluster 2 then advance Current1.

This process is repeated over and over until the entire partition has been scanned and all the missing records identified.

The code for this would look something like:

RecordSet recordSet1 = client1.queryPartitions(queryPolicy, statement, filter1);
RecordSet recordSet2 = client2.queryPartitions(queryPolicy, statement, filter2);
boolean side1Valid = getNextRecord(recordSet1);
boolean side2Valid = getNextRecord(recordSet2);

try {
    while (side1Valid || side2Valid) {

        Key key1 = side1Valid ? recordSet1.getKey() : null;
        Key key2 = side2Valid ? recordSet2.getKey() : null;
        int result = compare(key1, key2);
        if (result < 0) {
             // The digests go down as we go through the partition, so if side 2 is > side 1
            // it means that side 1 has missed this one and we need to advance side2
             missingRecord(client2, partitionId, key2, Side.SIDE_1);
             side2Valid = getNextRecord(recordSet2);
        }
        else if (result > 0) {
             missingRecord(client1, partitionId, key1, Side.SIDE_2);
             side1Valid = getNextRecord(recordSet1);
        }
        else {
            // The keys are equal, move on.
            side1Valid = getNextRecord(recordSet1);
            side2Valid = getNextRecord(recordSet2);
        }
    }
}
finally {
    recordSet1.close();
    recordSet2.close();
}

Enhancements

This process will identify missing records between the 2 clusters but not validate the contents of the records are the same between both clusters. If all we need is to identify missing records, the contents of the records are irrelevant - we just need the digest which is contained in the record metadata. Hence, when we set up the scans we can set the query policy to not include the bin data by using:

queryPolicy.includeBinData = false;

This will avoid reading any records off storage and send a much smaller amount of data per record to the client.

Should we wish to compare record contents, includeBinData must be set to true (the default). Then when the digests are the same we can iterate through all the bins and compare contents. This is obviously slower - Aerospike must now read the records off the drive, transmit them to the client, which will then iterate over them to discover differences. However, this is a much more thorough comparison.

Handling multiple partitions

This process works great for one partition, but how do we cover all partitions in a namespace? We could just iterate through all 4096 partitions applying the same logic. To get concurrency, we could have a pool of threads, each working on one partition at a time and when that partition is complete they move onto the next one which has not yet been processed. This is simple, efficient and allows concurrency to be controlled by the client. Each client-side thread would use just one server-side thread per cluster so the effect on the clusters would be well known.

It would be possible to have the server scan multiple partitions at once with a single call from the client. However, this is more complex as the results would contain records over multiple partitions, so the client would need to keep state per-partition and identify the partition for each record.

Implementation

A full open source implementation of this algorithm can be found here. This includes many features such as:

Optional comparison of record contents
Logging the output to a CSV file
The ability to "touch" records which are missing or different to allow XDR to re-transmit those records
Selection of start and end partitions
Selection of start and end last-update-times
Metadata comparison

Whilst there are a lot more features than are presented in this blog, fundamentally it is based on the simple algorithm presented here, all possible due to the nature of queryPartitions.

Optimizing Server Resources using Uniform Balance

Tim F — Wed, 19 Apr 2023 22:21:01 +0000

Source: Photo by Patrick Fore on Unsplash

Aerospike is known for incredible speed and scalability. As a bonus, people using Aerospike often recognize a far lower Total Cost of Ownership (TCO) compared with other technologies. Optimizing the distribution of data between servers contributes to this low TCO and Aerospike's uniform balance feature allows for almost-perfect even distribution of data across the servers, resulting in better resource utilization and easier capacity planning. This blog post examines how this feature works.

Clustering

To fully understand the benefits of how uniform balance works, let's start by reviewing the normal Aerospike partitioning scheme.

In Aerospike, each node in a cluster must have an ID. This can be allocated through the node-id configuration parameter, or the system will allocate one, but every node must have a distinct ID. For this explanation, assume there is a 4 node cluster with node-ids of A, B, C, D.

When the cluster forms, once all members agree that they can see all other nodes, a deterministic algorithm runs to form a partition map. Aerospike has 4096 partitions, and the partition map dictates how these partitions are assigned to nodes. For example, a normal partition table might look something like this:

Figure 1: Partition Table with RF = 2 and 4 nodes

Forming the partition table

The algorithm used to form this table is fairly simple. The partition ID is appended to the node ID, then hashed and the results sorted. For example, for partition 1 in this table we might have:

hash("A:1") => 12
hash("B:1") => 96
hash("C:1") => 43
hash("D:1") => 120
sort({12, "A:1"}, {96, "B:1"}, {43, "C:1"}, {120, "D:1"}) 
  => {12, "A:1"}, {43, "C:1"}, {96, "B:1"}, {120, "D:1}

Hence node A would be the leader for partition 1, followed by C then B then D. With 2 copies of the data (replication factor = 2), A is the leader and C is the replica.

This algorithm is applied across all 4096 partitions to form the partition table. This is a deterministic algorithm - the same 4 node-ids will always form the same partition table.

Partitioning design goal

When this partitioning algorithm was created, the goal was to minimize data movement in the event of a node loss or addition. For example, consider what happens if node C fails. All the nodes in the cluster are sending heartbeats to one another and after a number of missed heartbeats (controlled by the timeout configuration parameter in the heartbeat section of the configuration file), the node is considered "dead" and ejected from the cluster.

Immediately after this, the cluster creates a new partition map using the above algorithm, but without C in it. Given the sorting, the effect this has in practice is to "close" the holes left by the removal of C by shifting everything which was to the right of C in the table, one place to the left.

Figure 2: Partition Table after Node C has been removed

As Figure 2 shows, the only affected partitions to cause data movement are where C was either the leader or the replica, with these results:

On partition 1, C was the replica and B is promoted to replica with A migrating a copy of the data to B.
On partition 2, C was the leader, so B is promoted to leader (no data movement necessary as it has a full copy of the data) and D is promoted to replica. B migrates a copy of the data onto D.
On partitions 0 and 4095, no data movement is necessary as C was neither a leader or replica.

Should C be added back into the cluster, it returns to its previous location in the partition map, again minimizing the amount of data required to be migrated. Additionally, if a new node E is introduced to the cluster, only partitions where E is either the leader or the replica will require data migrations.

Issue with Design

This algorithm clearly minimizes the amount of data which is migrated between nodes in the case of node removal or addition. However, this is not always optimal from a data volume perspective. Consider a simplified partition map with just 4 partitions in it:

Figure 3: Unbalanced Partition Table

The hashing algorithm described in Figure 3 could easily produce a table like this. This is unbalanced:

Node	Leader Partitions	Follower Partitions	Total Partitions
A	1	0	1
B	2	1	3
C	1	1	2
D	0	2	2

This particular distribution highlights 2 problems:

Data volume inconsistency : Node A contains 1 partition, node B contains 3 partitions, so node B will have about 3x the data volume of node A. This can make capacity planning very difficult.
Processing load inconsistency : Aerospike reads and writes from the leader node by default. Writes are replicated by the leader to the follower, but in most cases all reads are served by the leader. So for writes in this example, node B will be involved as a leader or replica on 3 partitions (0, 2, 3) but node A will only be involved on 1 partition (1). The result is that node B will be processing about 3 times as much write traffic as node A. For read traffic, node B will serve traffic from 2 partitions (0, 3) but node D is the leader of no partitions and hence will serve no read traffic.

This is a contrived example based on only 4 partitions. In practice, the larger number of partitions Aerospike uses (4096) will smooth this out to some degree, but production customers still saw variances in the nodes in their cluster of up to 50%.

Uniform balance

This variance between nodes, both in data volume and processing requests, caused difficulty in capacity planning and node optimization. Aerospike clusters typically have homogeneous nodes with regard to processors, network, memory and storage, so having a heterogeneous distribution of partitions to nodes can result in sub-optimal configurations.

The solution, introduced in Aerospike Enterprise version 4.3.0.2 was to tweak the algorithm. The majority of the partition table still follows the algorithm, but the last 128 partitions are tweaked to even out the number of leader partitions each node has. Consider the contrived table again, with the last row tweaked:

Figure 4: Tweaking the Algorithm

If this table were tweaked in this fashion, every node would end up being the leader of 1 partition and the replica of 1 partition - an optimal distribution. Note that the algorithm doesn't guarantee that the replicas will be perfectly balanced, however they typically are.

The tweaking of the last 128 rows in real Aerospike clustering is again done in a deterministic manner, meaning that each node can compute the partition table independently. The same invariant is true: when a node leaves and re-enters the cluster, it will appear in the same location in the partition map for each partition. However, due to this tweaking it is possible that more data might have to be migrated than before on these last 128 partitions. In practice, this extra data migration is small and the benefits of even partition distribution typically far outweighs this extra network cost.

Note that clusters with rack awareness enabled might experience higher migrations under prefer-uniform-balance. For more information, see this support article about expected migrations after node removal and How does Rack Aware and Prefer Uniform Balance interact.

The prefer-uniform-balance configuration flag dictates whether this tweak to the partitioning algorithm is enabled or disabled. This is an Enterprise Edition feature only, the Community Edition behaves as described above without tweaking the last 128 rows.

Actual Results

Testing a complex algorithm like this is good in controlled conditions, but the real proof comes when implemented by customers. One customer with a large, multi-node cluster was kind enough to send us a screenshot of the transactions per second (TPS) per node of a production cluster showing the period when prefer-uniform-balance was set to true:

Multi-node cluster before and after turning on prefer-uniform-balance

As you can see, the initial state had a wide variance of TPS between the nodes, ranging from ~15k TPS to ~30k TPS. Once prefer-uniform-balance was enabled, this quiesced to a steady state with all nodes around the same level of ~22k TPS, and less than a 5% variance per node. (Note that node maintenance was going on when this graph was captured, resulting in some nodes having no TPS for short periods of time).

The results from customer deployments of the uniform balance algorithm are so effective, and the down-sides so minimal that prefer-uniform-balance was defaulted to true in Aerospike Server version 4.7.