Forem: Ronen Botzer

In-memory database improvements with Database 7

Ronen Botzer — Wed, 15 Nov 2023 07:50:18 +0000

Welcome to Aerospike Database 7, and the initial server 7.0 release that sets the foundation for big developer API additions in server 7.1 and beyond. Server 7.0 also overhauls in-memory namespace in significant ways.

Aerospike releases have always been a continuum, with preceding minor releases providing prerequisite work upon which the next major release stands. For example, server 5.6 added a data structure for set indexes later used for secondary indexes (SI). Server 5.7 overhauled secondary index garbage collection and cut down SI memory consumption by 60%. Aerospike Database 6 was built on these essential improvements. Similarly, server 6.4 removed support for single-bin and data-in-index namespaces, freeing up primary index (PI) space needed for upcoming server 7.1 features.

Though major server releases have distinct themes, work on specific subsystems doesn’t end when our main focus shifts. Just as server versions 6.1 and 6.4 delivered significant throughput improvements to cross-datacenter replication (XDR) following the XDR rewrite theme of Aerospike Database 5, secondary index improvements will continue in Aerospike Database 7 releases.

Unified storage format revolutionizes in-memory namespaces

After previously using the same storage format for namespaces that persist data on SSD or Intel Optane™ Persistent Memory (PMem), an overhaul of in-memory namespaces in server 7.0 consolidates all three storage engines to the same efficient flat format. This results in multiple operational benefits.

Faster restarts for in-memory namespaces

In Aerospike Enterprise Edition (EE) and Standard Edition (SE), the new storage-engine memory places its data in shared memory (shmem) rather than volatile process memory. This means that in-memory namespaces can now fast restart (AKA warmstart) after clean shutdowns.

An in-memory namespace without persistence shares the fast restart capability mentioned above. More interestingly, cold restarts, during which Aerospike daemon (asd) rebuilds its indexes, run much faster, as record data is read from shared-memory, rather than a storage device.

An in-memory namespace with storage-backed persistence benefits from faster cold restarts when certain configuration parameters are adjusted (for example, partition-tree-sprigs). Only after a crash will Aerospike read from storage-backed persistence to repopulate record data into memory, along with rebuilding the indexes.

Adding to its existing capability of backing up index shmem segments to disk, the Aerospike Shared Memory Tool (ASMT) can now be used after Aerospike shuts down to back up in-memory namespace data ahead of restarting the host machine.

To summarize, in-memory namespaces without storage-backed persistence don’t lose their data on restarts of asd, and all in-memory namespaces benefit from faster restarts (both warmstart and faster coldstart).

Compression for in-memory namespaces

Now you can configure an in-memory namespace to use storage compression, such as ZStandard (zstd), LZ4, or Snappy. Customers already using storage compression in the persistence layer of an in-memory namespace (or data on SSD or PMem) will achieve the same compression ratio for their data, regardless of the storage engine, at the same CPU cost.

Stability and performance improvements for in-memory namespaces

As in-memory and on-device storage use the exact same storage format in server 7.0, an in-memory namespace is mirrored to its persistence layer. This means that its write-block defragmentation happens much faster in memory and no longer requires device reads. The same continuous write-block defrag mechanism eliminates heap fragmentation encountered in the former JEMalloc-based in-memory storage. Similarly, tomb raiding of durable-delete tombstones happens fully in memory and requires no device reads. This removes back-pressure generated by the persistence layer’s devices on in-memory namespace operations.

The unified format's single-pointer and contiguous record storage improve performance in two other ways. First, reading the entire record costs a single memory access, as opposed to the older scattered bins approach requiring multiple independent reads from RAM. Second, since write-blocks are mirrored to the persistence layer, write operations save the CPU previously consumed by separate serialization to a second storage format.

Capacity planning for in-memory namespaces

The first thing to note about in-memory namespaces with storage-backed persistence in server 7.0 is that the persistence layer is exactly the same as the in-memory storage; the two are mirrors of each other, which has a capacity planning implication. For previous versions, we had recommended that persistent storage be a multiple of memory-size (the max memory allocation for the namespace), typically 4x. Starting with server 7.0, persistent storage needs to be at a 1:1 ratio to the memory you wish to dedicate to your in-memory namespace.

The second thing to be aware of is that in-memory data storage is static, and gets pre-allocated in server 7.0 rather than progressively growing and bound by the (now obsolete) memory-size configuration parameter.

Capacity planning for indexes has not changed. In server 7.0, indexes continue to start small and grow in increments defined by the configuration parameters index-stage-size, sindex-stage-size. Set indexes grow in 4KiB increments after an initial pre-allocation. After upgrading a cluster node, the namespace indexes consume the same amount of memory as before, out of the system memory not pre-allocated for namespace data storage.

We no longer have diverging capacity planning formulas for data in-memory versus data on persistent storage. All storage-engine configurations use the same storage format and a single formula in the capacity planning guide. Treating capacity planning of in-memory data storage the same as you would data storage on an SSD is helpful. In both cases, you are aiming to size pre-allocated storage.

The memory consumed by your indexes (and room for them to grow), plus the pre-allocated namespace data storage, should fit within your host machine’s RAM and within the previously declared memory-size. Read the special upgrade instructions for Server 7.0 for more details.

Caveats

Support for single-bin namespaces was removed in server 6.4. Aerospike users without single-bin namespaces in their cluster may upgrade to server 7.0 through a regular rolling upgrade. Otherwise, please consult the special upgrade instructions for Server 6.4.

Due to the unified storage format, a write-block-size limit of 8MiB applies to in-memory namespaces (with or without persistence). Aerospike users who depend on the former 128MiB record size limit of in-memory without persistence will need to break up their records. Customers may choose to delay upgrading till server 7.1, which will enable an easier transition.

Configuration and monitoring

The newly released Aerospike Observability Stack 3.0 and Tools 10.0 support metrics and configuration for both Aerospike 6 and Aerospike 7 and are designed to ease your transition to server 7.0. If you are not familiar with Aerospike Observability and Management (O&M), we have a short video, blog, and webinar available on our site.

Configuration parameter and metric changes in detail

Namespace configuration, regardless of storage engine choice, is simpler in server 7.0.

Stop-writes and eviction thresholds are controlled by the storage-engine configuration parameters stop-writes-used-pct, stop-writes-avail-pct, and evict-used-pct, which are relative to the namespace data storage size. There are also a pair of thresholds relative to the system memory - stop-writes-sys-memory-pct and evict-sys-memory-pct.

Data storage metrics have been simplified to data_used_bytes, data_total_bytes, data_used_pct, and data_avail_pct for all storage engine types.

Index metrics are also simpler. Set indexes have set_index_used_bytes. Primary and secondary indexes have index_used_bytes and sindex_used_bytes, whether they’re stored in shared memory, PMem, or an SSD. If they’re in persistent storage, they also have index_mounts_used_pct and sindex_mounts_used_pct relative to the mounts_budget.

Multi-tenancy

Many Aerospike customers deploy their database clusters as a multi-tenant service, with distinct users separated by sets within a namespace. Multi-tenancy leans on Aerospike enterprise features, such as scoped role-based access control (RBAC), rate quotas, and set quotas.

Server 7.0 makes multi-tenant deployment easier with several new features.The limit of 64K unique bin names per-namespace was removed. Operators no longer need to advise developers to restrict how many bin names their applications write into the namespace. As a result, the bins info command and the available_bin_names namespace statistic, were removed.

The limit on unique set names per namespaces was raised from 1023 to 4095, allowing for set-level segregation of more tenants on the same Aerospike cluster.

Finally, an operator can now assign a unique set-level default-ttl as an override of the namespace default-ttl.

New developer API features

Server 7.0 adds the capability to index and query bytes data (BLOBs).

Application developers may now choose to persist key-ordered Map indexes, trading off extra storage for improved performance. The new MapPolicy option can be applied when creating a new Map bin or with the Map set_type operation.

MapPolicy(MapOrder order, int flags, boolean persistIndex)

Persisting Map indexes should only be used by an application once all the nodes in the Aerospike cluster have been upgraded to version >= 7.0.

Dropping support for old OS versions

Server 6.4 added support for Amazon Linux 2023 and Debian 12. As I previously warned, server 7.0 removes support and el7 builds for Red Hat Enterprise Linux 7 and its variants, including CentOS 7, Amazon Linux 2, and Oracle Linux 7. Similarly, server 7.0 will not be available on Debian 10.

Aerospike engineering takes performance seriously, and Aerospike users choose to build mission-critical applications on our database because it has the best cost-performance in its field. Performance is hindered by running new Aerospike server versions on ancient lower-performing OS kernels, such as the 3.10 Linux kernel packaged with RHEL 7. In a recent announcement, the Linux kernel team declared they will no longer offer six years of LTS support. This announcement validated our perspective on the stability and performance cost of running Aerospike on old kernels.

Both Debian 10 and RHEL 7 reach their end of life (EOL) in June 2024 ( 5 and 10 years from their initial release, respectively ). Each new major and minor Aerospike server version gets two years of bug fixes and security vulnerability support. Going forward, new server versions will not be offered on OS distro versions scheduled to expire during this support period. Subsequent patch releases (hotfixes) will continue to be built and tested on the same OS distro versions as when they were first released.

Try Aerospike 7.0

Persistence, fast restart capability, and built-in compression all make the new Aerospike in-memory namespaces appealing for in-memory database use cases.

Consult the platform compatibility page and the minimum usable client versions table. For more details, read the Server 7.0 release notes. You can download Aerospike EE and run in a single-node evaluation; you can get started with a 60-day multi-node trial at our Try Now page.

Aerospike Database 6.4: Improved query and data distribution

Ronen Botzer — Fri, 04 Aug 2023 15:51:24 +0000

In this blog post, I’ll cover new features and changes in the generally available (GA) release of Aerospike 6.4.This release concludes our secondary index storage offerings with the introduction of secondary index on flash.

Secondary index on Flash

Before Aerospike Database 6 , secondary indexes could only live in the Aerospike daemon (asd) process memory (RAM). This prevented Aerospike Database Enterprise Edition (EE) clusters from being able to fast restart (AKA warmstart) when secondary indexes were used on a namespace.
One focus of our secondary index work has been to add storage types that can warm-start.

The shared memory (shmem) sindex-type was added in server 6.1.
The Intel Optane™ Persistent Memory (pmem) storage type was added in server 6.3.

Server 6.4 adds the ability to have all secondary indexes in a specified namespace stored on an NVMe flash device. Not only do secondary indexes on flash persist and enable their namespace to warmstart, they also consume no RAM. Secondary index capacity planning is simply the calculated memory space cast to disk storage.

Though secondary indexes on flash only affect writes and queries, they carry a significant extra device IO cost for heavy write workloads, which should be taken into account for capacity planning. That being said, Aerospike should continue to perform well, compared to other databases, in mixed workloads where many reads and queries happen along with writes that modify the secondary index. You should choose the appropriate secondary index type for your use case.

Write operations are impacted on latency and throughput when they must adjust secondary indexes. For each secondary index on a record bin where a value changed, the server crawls down the B-tree, at worst one device IO per layer of B-tree. As the B-tree layers fan out widely, the roots should get cached by mmap if the secondary index is used often, and we expect 1-3 device IOs per-write related adjustment. When the secondary index is very big, the cost per-adjustment might be 2-6 device IOs. Having enough memory for mmap to cache the first couple of layers of the B-tree will avoid paging, and drive down the number of extra IOPS.

The impact on query performance is use case dependent. In general, the B-tree structure is very efficient. For example, a secondary index with 8 million entries has a B-tree three layers deep. The top two layers only cost a few MiBs of mmap cache, so traversing to the lowest layer may only cost a single device IO.

Adjacent values are packed in the same node, so range queries (using BETWEEN) will get multiple values per device IO. Secondary index on flash will also be efficient for long queries which return many records.

Short queries – queries that consistently return a small number of records – will perform noticeably worse on secondary indexes on flash, relative to secondary indexes in memory (shared memory or PMem). In situations where your short queries return no records or a single record, you should model your data differently. Using a secondary index as a UNIQUE INDEX, regardless of whether the index is in memory or on flash, is an anti-pattern in a NoSQL database where single-key lookups have very low latency.

The cause for the performance difference between a secondary index on flash and a secondary index in memory is directly tied to the latency difference between memory access and device IO, as well as drives having a finite number of IOPS. Tuning how much memory is given to mmap has a bigger positive impact with secondary indexes on flash than with the primary index on flash. This is because the access patterns aren’t random - the B-tree roots are effectively cached, and with more RAM available to the mmap cache, the extra device IO cost decreases. When choosing to store the namespace secondary indexes on flash, you need to have enough aggregate device IO capacity to cover the increased IOPS cost needed to sustain your desired latency and throughput.

Secret Agent: Integrating with secrets management services

Starting with server 6.4, Aerospike EE can be configured to tell the newly released Aerospike Secret Agent to fetch secrets from an external secrets management service.

Secret Agent runs as an independent process, either on the same host as the Aerospike cluster node, or on a dedicated instance serving multiple nodes. The agent wraps around the native library of the service provider, and handles authentication against the service. Aerospike server is agnostic of the destination, and with a simple common configuration can be connected to a variety of services.

This initial release of Secret Agent fetches secrets from AWS Secrets Manager. The independent design makes it easy to rapidly add new integrations on a separate release cycle from the server, with Google Cloud Secret Manager, Azure Key Vault, and Hashicorp Vault planned next.

Improved Cross-Datacenter Replication (XDR) throughput

In server 6.4 an optimization previously used in server 6.1 to enhance XDR throughput in recovery mode or when rewinding a namespace was applied to stable-state XDR shipping. Ship requests are distributed across service threads using partition affinity. This highly efficient approach uses fewer service threads to ship more data, while creating significantly fewer socket connections.

As a result, the max-used-service-threads configuration parameter was removed.

Newly supported operating system distributions

Server 6.4 adds support for Amazon Linux 2023. While Amazon Linux 2 is based on, and mostly compatible with, CentOS 7, AL2023 is a distinct flavor of Linux. As such, the Aerospike Database 6.4 server and tools packages have an amzn2023 RPM distinct from the el7 one used for Amazon Linux 2.

We recommend that users of Amazon Linux 2 upgrade their OS to AL2023 because CentOS 7 and Amazon Linux 2 will lose support in Aerospike Database 7.

Server 6.4 also adds support for Debian 12. Users of Debian 10 are encouraged to upgrade in order to enjoy the benefits of the newer OS kernel.

End of the line for single-bin namespaces

After many years as an optional storage configuration, support for single-bin namespaces was removed in server 6.4. Single-bin namespaces provided savings of several bytes per record, an appealing feature when Aerospike Database was a new product, flash storage was much slower, and SSD and NVMe drives were much smaller. Today, common modern flash drives are significantly faster and larger, Aerospike’s storage formats are far more efficient than they were when single-bin was introduced, and Aerospike EE has a compression feature that helps reduce disk storage consumption.

The removal of the single-bin and data-in-index configuration parameters paves the way for many exciting new features in the upcoming Aerospike Database 7, including a major overhaul of the in-memory storage engine. Users of single-bin namespaces may choose to delay their upgrade until server 7.0 in order to gain the benefit of these new features while avoiding the pain of upgrading to a server version which no longer allows for these storage options.

Aerospike users without single-bin namespaces in their cluster may upgrade to server 6.4 through a regular rolling upgrade. Otherwise, please consult the special upgrade instructions for Server 6.4.

Miscellaneous

Query performance - sharing partition reservations provides enhanced query performance for equality (point) and BETWEEN (range) short queries, scans, and set index queries.
Improved defragmentation - Server 6.4 avoids superfluous defragmentation of blocks with indexed records.
Batch operations - The configuration parameter batch-max-requests was removed in server 6.4. Also, the benchmarks {ns}-batch-sub-read, {ns}-batch-sub-write, {ns}-batch-sub-udf are now auto-enabled and exposed by the latencies info command.
Hot-key logging - the new key-busy logging context provides the digest, user IP, and type of operation when an error code 14 (KEY_BUSY) is returned to the client.

Breaking Changes

The scheduler-mode configuration parameter was removed.
The deprecated scan info commands (scan-show, scan-abort, scan-abort-all) have been removed. Use the equivalent query commands (query-show, query-abort, query-abort-all).

For more details, read the Server 6.4 release notes. You can download Aerospike EE and run in a single-node evaluation; you can get started with a 60 day multi-node trial at our Try Now page.

Aerospike Database 6: Partitioned Secondary Index Queries, Batch Anything, and JSON Document Models

Ronen Botzer — Thu, 28 Apr 2022 21:46:12 +0000

Aerospike Database 6: Partitioned Secondary Index Queries, Batch Anything, and JSON Document Models

Aerospike is proud to announce Aerospike Database 6, our newest database server release, now generally available (GA). This version of Aerospike is full of exciting developer features, which open up new capabilities for application developers to build on.

Launched two years ago, Aerospike Database 5 delivered a vastly improved Cross-Datacenter Replication (XDR) subsystem. It enabled our customers to create high-performance, geo-distributed applications with fine-grain control over the distribution of their data. Our newest database product release builds on these features and reflects our increased focus on queries. Combined with Aerospike Connect for Spark and Aerospike Connect for Presto (Trino), the Aerospike Data Platform enables our customers to serve both low latency transactional and analytical workloads against their large data sets.

We will cover three primary capabilities in this blog post. First, a discussion of the new query capabilities in our 6.0 release, and second, support for document data models through the Document API and our secondary index work, and finally our completion of batch operations, including batch writes, for greater efficiency through pipelined operations.

Server version 6.0. comes after seven release candidates, and is the culmination of 14 months of engineering effort. This is a major release, and includes breaking changes. Please review the release notes and the special upgrade instructions related to the new storage format and new secondary index query capability.

Partitioned Secondary Index Queries

The path to the new query subsystem started in the last two releases of Aerospike Database 5.

Server version 5.6 added set indexes, an optional index type that improves performance for a special kind of query. Using set indexes enables low latency access to all the records of a small set that lives inside a large namespace. Like the primary index, set indexes support fast restart.

Server version 5.7 delivered a 60% reduction in the memory consumed by secondary indexes, and a brand new, highly efficient garbage collection system. Query performance and throughput were also improved.

Aerospike Database 6 builds on these changes with a new architectural approach aligned with the design of the primary index. The data in each Aerospike namespace is evenly distributed across 4096 logical partitions, which in-turn are evenly distributed across the cluster nodes. The data for each partition is stored and indexed locally in multiple primary index sub-trees called sprigs. This enables primary index (PI) queries (formerly known as ‘scans’) to be massively parallelized.

A PI query can target all the partitions, a set of partitions, or a single data partition. Leveraging this capability, the Spark and Presto (Trino) connectors can split a PI query into hundreds and thousands of partitioned queries, feeding data to many thousands of workers in parallel, and attacking the job of rapidly processing terabytes of data through horizontal scaling. This approach fits well with the architecture of these analytics systems. The combination creates a next-level distributed computing data platform.

Before version 6.0, secondary index queries could only be parallelized at the node level. This meant that if a cluster had 40 nodes, the best parallelization the Spark and Presto (Trino) connectors could possibly use was 40 workers. As our customers make production use of Spark clusters with thousands of cores, having the huge majority of them sit idle was unacceptable, so these connectors did not implement support for secondary index queries.

In version 6.0, secondary indexes have been re-architected to separately index each partition. This enables massively parallelizing secondary index (SI) queries, as well as supporting pagination, similar to PI queries. Furthermore, SI queries in version 6.0 are tolerant of rebalancing, unaffected by the automatic data migration that occurs when the cluster size changes. As a result, the Spark and Presto (Trino) connectors will implement SI query support in the same way that they currently do PI queries. This opens the door for operators of Aerospike to optionally trade memory for performance improvements. By adding secondary indexes to sets that have the right cardinality, SI queries can run at orders of magnitude better speeds than equivalent PI queries.

The change in the secondary index architecture is reflected in the server’s query subsystem, which now unifies both types of queries - primary index and secondary index ones. This change goes deep into a common execution layer; into metrics, which have been merged and renamed; into the client API, which deprecates the Scan class, and provides the same rich functionality to both PI and SI queries from a single Query class.

Partitioned queries are achieved through client-server coordination, and require a new client version, such as Java client 6.0.0, C client 6.0.0, Go client 6.0.0, C# client 5.0.0 or Python client 7.0.0. Applications using the previous release of these clients may run against server 6.0, but will not benefit from the rebalance tolerance. Similarly, the new clients can talk to both server 5.x and server 6.0 nodes, but will need the cluster upgrade to be completed to unlock the new features.

Upcoming Query Features in Aerospike Database 6

Aerospike has delivered better query performance, a lower memory footprint for indexes, query stability, and higher query throughput. Subsequent releases of Aerospike Database 6 will add more functionality and operational improvements to queries.

In Aerospike Database Community Edition (CE), the primary index and secondary indexes are stored in process memory, which means that they must be rebuilt upon restart in a relatively lengthy cold restart. In Aerospike Database Enterprise Edition (EE), the primary index is kept in shared memory by default, or optionally in persistent memory or on a flash device. This enables an Aerospike EE server to go through a warm (fast) restart, which is significantly faster. Server version 6.1 will add the ability to store secondary indexes in shared memory, allowing warm restarts of the Aerospike daemon (asd) when they are present. Later versions will allow secondary indexes to be stored in persistent memory and even on flash devices.

Currently secondary indexes can be built over the top level keys of a Map data structure. This is typically employed to index the top-level fields of JSON documents, which are stored in Aerospike as Maps. Server version 6.1 will add the ability to index elements nested at any depth.

Storing, Indexing and Querying JSON Documents

Since the introduction of Map and List Collection Data Types (CDTs), developers have been storing JSON documents in key-ordered Maps, and using Aerospike as a document database. Developers use the rich Map and List APIs in multi-operation transactions to query and manipulate document data atomically on the server-side. Documents (Maps) are stored in a space-efficient MessagePack binary serialization, facilitating fast access.

The Aerospike Document API library (introduced mid 2021) added the ability to store, modify and query documents using the popular JSONPath query language. The Document API splits these queries into server-side execution based on the native Map API, and augmented by a JSONPath library.

The Document API is currently available as a wrapper to the Java client, and as an interface in the Aerospike gateway (also known as the REST client). The Document API library will be ported to other programming languages that have an Aerospike client.

Together with the upcoming capability to index deeply nested elements, Aerospike Database 6 enhances the development of applications that use a document model approach. Combined with features such as Strong Consistency and Aerospike’s ability to scale up to petabytes of data and hundreds of billions of objects, while maintaining sub-millisecond transaction latencies, results in a document database at scale.

Batch Anything

Since the beginning, the client has had support for a simple batch get command, to allow multiple records (or bins within them) to be retrieved together based on a list of keys. Similarly, a batch exists command checks on the existence of multiple keys all at once, from a specified list of keys.

Later the client added the ability to execute the same multi-operation transaction against a list of keys in parallel, using the batch operate command, but limited the type of operations in the transaction to read-only ones.

With server 6.0, the addition of batch write commands (put, delete, operate transactions without restrictions on write operations) completes the ability of a developer to batch anything in their application - reads, writes, updates, deletes or UDFs. Logically related operations can be sent all at once to the database cluster.

Batch writes are more efficient than asynchronously launching a series of commands at the server. Using batch:

Reduces the round-trip time (RTT) needed to complete all the operations, lowering the overall latency
Reduces network traffic, using less connections, and combining operations into fewer IP packets
Improves parallelization, supporting faster data ingest

Developers of applications used in heavy writes or mixed workloads should consider converting from async writes to batch, for better performance and a more stable cluster.

Security Enhancements

Server 6.0 adds three new granular privileges for role-based access control.

The sindex-admin privilege grants a user the ability to add and drop secondary indexes.
The udf-admin privilege grants a user the ability to add and remove UDF modules.

Previously these privileges were only available through the data-admin privilege, which some users were reluctant to grant widely.

The truncate privilege is now a standalone privilege, and no longer a part of the write privilege. Users representing applications that perform truncates should be granted the truncate privilege to one of their roles.

Breaking Changes

As mentioned earlier, make sure to read the release notes and the upgrade instructions. The breaking changes in server version 6.0 include

A storage format change (the addition of a 4 byte end marker to each record) requires that persistent storage devices (with the exception of PMEM) be erased as part of the upgrade. The header (first 8MiB) of raw SSD devices must be zeroized. See SSD Initialization.
Several configuration parameters have been renamed or removed:
- A small number of configuration parameters have been renamed.
  - scan-max-done to query-max-done
  - scan-threads-limit to query-threads-limit
  - background-scan-max-rps to background-query-max-rps
  - single-scan-threads to single-query-threads
- The following query configuration parameters were removed.
  - query-threads
  - query-worker-threads
  - query-microbenchmark
  - query-batch-size
  - query-in-transaction-thread
  - query-long-q-max-size
  - query-priority
  - query-priority-sleep-us
  - query-rec-count-bound
  - query-req-in-query-thread
  - query-short-q-max-size
  - query-threshold
  - query-untracked-time-ms
- The batch-without-digests configuration parameter was removed.
The truncate privilege needs to be granted to applications using truncates. It is no longer part of the write privilege.
The long deprecated Predicate Filtering (PredExp) was removed. Use Filter Expressions instead.
The ‘scan’ module of the jobs: info command has been removed. Use the ‘query’ module instead.
Be aware that scan and query related metrics have changed. We will publish a separate blog to detail these changes.

Deprecation Notice

The jobs: info command, initially deprecated in server 5.7 is scheduled to be removed after 6 more months. Use query-show instead.
The scan-show info command is now deprecated. Use query-show instead.
The scan-abort info command is now deprecated. Use query-abort instead.
The scan-abort-all command is now deprecated. Use query-abort-all instead.

Learn more

Get details about Config, Metrics, and Info Changes in Aerospike Database 6.0

Aerospike Modeling: User Profile Store

Ronen Botzer — Sun, 26 Apr 2020 04:19:55 +0000

Audience Segmentation for Personalization

I recently published the article “Aerospike Modeling: IoT Sensors” to highlight a different modeling approach when using Aerospike versus Cassandra in IoT use cases. There is a similar juxtaposition of Cassandra’s column-oriented ‘many tiny records’ to Aerospike’s row-oriented ‘fewer, larger, records’ when modeling user profiles.

tl;dr

Cassandra databases, including derivatives such as ScyllaDB, have a needle in a haystack problem that affects performance when you need low latency key-value operations. Aerospike is uniquely capable to deliver speed at scale.

Photo by Taras Shypka on Unsplash

Overview

If you haven’t heard the terms user profile store or audience segmentation, I recommend Google Cloud’s article on digital advertising, and adpushup’s What is Audience Segmentation? Aerospike was first used heavily in the Ad Tech ecosystem, so it is not surprising that it’s an effective solution for storing user profiles.

Audience segmentation for real time bidding (RTB) is a special case of user profile stores. It’s a form of personalization that happens tens of millions of times per-second, without stop, as ads are served in real time around the world to people using apps, visiting web pages, and watching streaming content. It’s simple to describe, and generally applicable to other forms of online personalization.

A user is typically identified by a cookie containing a unique ID. Associated with that unique ID is a set of audience segment IDs that a Digital Management Platform (DMP) deduced the user to be in. This includes demographic, psychographic, behavioral, and geographic segments.

The user is supplied to an RTB ad exchange by a Supply Side Platform (SSP). The exchange matches eyeballs to advertisers by providing the user ID to Demand Side Platforms (DSP). These DSPs store DMP provided profiles, and pull up the segments known to match this user ID from their user profile store. With this information, the DSP determines if they have an ad programmed to target the user’s audience segments. The DSP can then choose to bid in the ad exchange on the right to serve an ad to the user. If it wins the bid, it will serve the ad to the user, and the entire process from end-to-end lasts around 150ms. Of that, just 20ms is used to decide on whether to try and serve an ad to the user. Aerospike evolved to deliver speed at very large scales, as the database used for many successful companies in this ecosystem.

Modeling

There may be millions of distinct segments, each segment ID an integer value. An average user may have a thousand audience segments they occupy at each point in time. When a user is assigned a segment, that association is given a specific time-to-live (TTL) value. As the user profile store is continuously refreshed from DMP data, temporary associations (such as a location change due to travel or a short term interest in something) will expire, while strong associations will have their segment’s TTL extended.

Modeling in Cassandra

Let’s consider how you would potentially model this kind of user profile store in Cassandra.

CREATE TABLE userspace.user_segments (
    user_id uuid,
    segment_id int,
    attr1 smallint,
    attr2 smallint,
    PRIMARY KEY ((user_id, segment_id), user_id)
)

You would now upsert user_id, segment_id data pairs with a TTL, each pair a distinct row.

Modeling in Aerospike

In Aerospike we would keep all of a user’s audience segmentation data in a single record, whose key is the user ID. Just a reminder, a record in an Aerospike namespace is uniquely identified by the tuple (namespace, set, user-key). The Aerospike client hashes the pair set, user-key through RIPEMD-160 into a 20 byte digest, which is the actual primary index identifier of the record in the namespace. This means that if you keep to the default key policy of KEY_DIGEST, storage is saved as the set (table) name and the 36 character UUID are hashed into 20 bytes of digest.

We will store the user’s segments in a map with a segment ID acting as map key, and the tuple [segment-TTL, {attr1, attr2}] as the map value.

Depending on the precision we desire for the segment TTL, we can use a smaller numeric value than the 8 bytes needed to hold a Unix epoch timestamp. Let’s assume the precision of segment TTLs is hourly, and our application has a local epoch of January 1st 2019, dating to when it was first deployed. The value of the segment-TTL would be an enumeration of the hours since that epoch. For example, December 20th, 2019 at 10am is 8530 hours since the epoch.

So, each user has as a map of {segmentID: [segment-TTL, {attr1, attr2}]}, for example:

{ 8457:  [8889, {}],
  12845: [8889, {}],
  42199: [8889, {}],
  43696: [8889, {}],
}

The map ordering options are UNORDERED, K-ORDERED and KV-ORDERED. All map operations can be applied to a map, regardless of its ordering, but it does affect the performance of map operations. I’ll follow the tip that in general, when a namespace is stored on SSD, choosing K-ORDERED gives the best performance.

Storage Space

If you read my article on modeling IoT uses cases, you already know that Aerospike’s MessagePack serialization will reduce the storage space needed to store these maps.

CE-4.8.0, 500K users, each with 1000 segments

Using the Aerospike Enterprise Edition Compression feature further compacts the segmentation data. When I compared running my sample code on CE and EE (using Zstandard level 1 compression), I saw a 0.69 compression ratio, saving about 30% on storage.

EE-4.8.0, 500K users, each with 1000 segments

Aerospike customers modeling similar data saw significantly better compression ratios of 0.25–0.20. As usual, the compression ratio depends on the codec, as well as the data being compressed.

Aerospike Enterprise Edition has namespace level statistics for storage compression

Advantages

This map structure has several advantages. We can use the remove_by_value_interval map operation to trim expired segments. We can use get_by_value_interval to filter segments that have a specific ‘freshness’. We can easily upsert new user segments into the map as they are processed.

Mainly, this allows for orders of magnitude faster retrieval of a user’s segments from the user profile store. In Cassandra finding a single user requires a query pulling a small number of records from a much larger partition. In Aerospike this is a single record read, which is always low latency, regardless of the number of records in the cluster.

An ad tech user profile store may have tens of billions of profiles, each with a large number of segments. This is due to the fact that there are hundreds of millions (billions) of distinct devices, users utilize incognito modes to browse, and browsers and operating systems anonymize through further means. There are many more ad tech cookies out there than there are humans.

Let’s consider a real use case where Aerospike was chosen over Scylla to replace a petabyte scale Apache Cassandra user profile store. With 50 billion users, and an average of 1000 segments per-user, the C* store had 50 trillion rows. It took many seconds to retrieve a given user from this profile store, leading to an approach of ‘pre warming’ whole segments of users in advance into a smaller front-end Aerospike cluster. This was very costly in terms of cluster resources. For Aerospike, there would only be 50 billion records, one per-user, and fetching any one of those is near 1ms latency.

Summit Talk

Matt Cochran, Director of Data Engineering at The Trade Desk, gave a talk about migrating petabytes of data from Cassandra to Aerospike using this modeling approach:

Code Sample

The code sample I wrote is located in aerospike-examples/modeling-user-segmentation.

Loading Data

I started by running run_workers.sh, which launched ten Python populate_user_profile.py workers at a time, until a few minutes later I had a profile store containing 500K users, each with 1000 random segment IDs. The value of a segment ID is an integer in the range between 0 and 81999.

$ ./run_workers.sh 
Generating users 1 to 5001
Generating users 5001 to 10001
Generating users 10001 to 15001
:

Next I ran update_query_user_profiles.py, which has an --interactive mode to make it easier to see the operations and their results.

Inserting and Updating Segments

I upserted a single segment into a user’s segment map, within a transaction that shows the state of that segment ID before and after.

The original 1000 segments for this user are random, so there’s a chance that this code segment produces an update rather than an insert.

$ python update_query_user_profiles.py --interactive
Upsert segment 64955 => [10581] to user u1
Segment value before: []
Number of segments after upsert: 1001
Segment value after: [64955, [10581, {}]]

Similarly, I upserted 8 more segments at once using map_put_items.

To fetch the 9 most recent segments out of the user’s 1009, I used map_get_by_value to search for any map value matching a list that looks like [10581, *], with the ‘Wildcard’ glob. See the ordering rules for more on how Aerospike compares between list values.

Updating multiple segments for user u1
{ 537: [10581, {}],
  5484: [10581, {}],
  12735: [10581, {}],
  21894: [10581, {}],
  23223: [10581, {}],
  24124: [10581, {}],
  40680: [10581, {}],
  66659: [10581, {}]}
Show all segments with TTL 10581:
[64955, [10581, {}], 537, [10581, {}], 5484, [10581, {}], 
 12735, [10581, {}], 21894, [10581, {}], 23223, [10581, {}], 
 24124, [10581, {}], 40680, [10581, {}], 66659, [10581, {}]]

Next I updated a segment’s TTL. As I mentioned in my article Operations on Nested Data Types in Aerospike, to operate on the embedded list holding the segment TTL and associated data, I needed to provide the context of how to get to that list element.

The context is map key 5484 => 0 index of the list stored as this map key’s value

Add 5 hours to the TTL of user u1's segment 5484
[5484, [10586, {}]]

Reading a User’s Segment Data

I demonstrated how I can get just the segments that will not be expiring today.

Only get segments outside the specified segment TTL range.

I used the map get_by_value_interval operation to find all the segments whose expiration is between [0, NIL] and [start-of-today, NIL] and specified that I wanted all elements not in that range. Notice the True argument designating the inverse of this range for the Python client’s map_get_by_value_range() helper method.

To showcase another capability of the map API, I counted how many segments the user had in a range of segment IDs.

Using the ‘count’ map result type for the map get_by_key_interval operation

Count how many segments u1 has in the segment ID range 8000-9000
15

In the case of fetching a user’s segments, a simple read operation (get) may be preferred because it is the fastest. My code sample is meant to show the expressiveness of Aerospike’s native map and list operations.

Trimming Stale Segments

There is a complement remove operation for most read operations in the list and map API.

Clean the stale segments for user u1
User u1 had 860 stale segments trimmed, with 149 segments remaining

As this operation has to inspect whether every segment is inside the specified range, it’s not one to add ahead of every read operation. Instead, it can be called periodically (once an hour, once a day) to perform the cleanup. As of version 4.7 (both Community Edition and Enterprise Edition), this operation can be attached to a background scan, to be applied to all records in a namespace or set.

Conclusion

A row-oriented modeling approach, leveraging the map and list data types, gives Aerospike an advantage in key-value operations over C* implementations, including an advanced C-based one such as ScyllaDB.

Combined with unique optimizations around NVMe drives, and lacking dependence on lots of DRAM, Aerospike provides much higher performance for user profile stores, with a lot less hardware, whether the scale is measured in gigabytes, terabytes, or petabytes.

Originally published on Medium (Aerospike Developer Blog), February 16 2020

Aerospike Modeling: IoT Sensors

Ronen Botzer — Sat, 25 Apr 2020 11:08:28 +0000

Rethinking a ScyllaDB Benchmark as an Aerospike Developer

Photo by Dan LeFebvre on Unsplash

In November, the folks over at Scylla announced that an IoT benchmark of theirs achieved a “1 billion rows per second” scan. In this article I’ll show how you would model this use case differently with Aerospike, in a way that achieves higher performance with less hardware.

Overview

The Scylla benchmark involved 1 million sensors, logging a temperature measurement once a minute for a year, so each sensor logs 1440 measurements per day, 525,600 per year. Last I checked (2019–12–02) this announcement wasn’t published as a detailed benchmark on the Scylla website, but rather as a (2019–11–05) press release, which sketches out their claims. The lack of details is unfortunate, but I’ll work with that.

A Cassandra Modeling Approach

In a Cassandra derivative like ScyllaDB, you would typically have a temperature column containing a single numeric value, with the row representing an instance of measurement. Accordingly, the Scylla benchmark talks about a dataset of 526 billion rows. In this modeling approach, each row is extremely sparse, holding just one data point for a distinct sensor-timestamp pair.

According to the Scylla press release, to retrieve the entire year’s worth of data, you need to scan the dataset. To get 3 months of sensor information you need to scan the dataset. But the same holds for getting a day’s data for all the sensors, a year’s data for a single sensor, or a single day’s data for an individual sensor — you scan the dataset.

Cassandra is a column-oriented database, and as such is more suitable for ad-hoc analytics than key-value operations. Like Aerospike, ScyllaDB is written in C (rather than Java), and Scylla have worked to optimize network and disk IO. The Scylla folks are good engineers, and they’ve put in hard work to be better than Apache Cassandra and DataStax. However, when it comes to transactional workloads (rather than analytics), a row-oriented NoSQL database like Aerospike has the upper hand. Modeling this use case in Aerospike will utilize fewer, larger records, with multiple data points in each.

The ScyllaDB Cluster Capacity

How big is this dataset anyway? Let’s assume that the row’s primary key is (sensor-id <int (4B)>, timestamp <timestamp (8B)>) and it has a temperature column with a single int (4B) value. A rough estimate, with a single copy of the data, would be 526 * 10^9 * 16 / 1024^4 = 7.65TiB . The cluster described by Scylla has 83 x n2.xlarge bare-metal servers from Packet.

Packet n2.xlarge server specs [2019–12–02]

In total, the cluster has a combined capacity of 286TiB of NVMe Flash, 31TiB of RAM, 2324 physical cores, 3320Gbps of NIC. By my estimate the dataset is 7.65TiB. Yes, the disparity between the cluster capacity and the size of this benchmark’s dataset should raise eyebrows.

An Aerospike Modeling Approach

Data Model

For the same use case described in the overview, we will collect each day’s data from one sensor into a single record. Every minute the application will log a temperature reading as a tuple [timestamp, temperature] and append it to a list of such tuples. At the end of the day this list will have 1440 tuples, and our application will roll over to a new day’s record for the sensor.

The choice of setting the list's type to UNORDERED versus ORDERED depends on whether we want to optimize for faster writes or faster in-record read operations, such as getting a range of values from the record. The lists’s type does not limit the operations that can be performed, but it does change their performance characteristics. Since time increases monotonically, I’m choosing to use an unordered list, for which the append operation has a O(1) performance.

Storage Requirements

Let’s consider how much space is used for this data structure. Aerospike lists are serialized using MessagePack, which compacts the data. An 8 byte timestamp and an 8 byte float will end up taking 15 bytes.

However, there’s no need to use a full Unix epoch timestamp here. Each record represents a day, so we can instead use midnight as the local epoch and enumerate the minutes since midnight. Midnight would be minute 0, 00:01 would be minute 1, etc. Similarly, the temperature does not need the precision of a float. If the sensor reports temperatures with a precision of one decimal point, we can multiply the value by 10 and store that as a (small) integer. MessagePack can now compact this tuple into 7 bytes.

A reading from 23:22, with a temperature of 62.1 degrees

Leveraging Compression

Compression is an Aerospike Enterprise Edition feature that can be used to further compact the IoT data we are collecting.
First, let’s consider the storage requirements without compression. At the end of the day we expect a record to take roughly 10K (1440 measurement tuples, plus some overhead). A year’s data for one sensor is ~ 3.5MB across 365 records. The memory cost for a record is 365 * 64 /1024 = 22.8KiB. In my example, I populated an Aerospike database with a year of measurements from one thousand sensors.

A year of uncompressed data from 1000 sensors in Aerospike Community Edition 4.6

The example above is running on a modest 2 core VM with 16GiB storage allocated and 2GiB of RAM.

Next, I upgraded to Aerospike Enterprise Edition 4.7 and added compression to my namespace configuration:

namespace test {
  high-water-disk-pct 80
  high-water-memory-pct 80
  replication-factor 1
  memory-size 2048M
  storage-engine device {
    file /opt/aerospike/data/test.dat
    filesize 16834M
    compression zstd
    compression-level 1
  }
}

Compression is applied on a record-by-record basis, so none of the existing data is automatically compressed after restarting the upgraded (EE-4.7) asd. To change this, I wrote a simple UDF that just touches a record, and applied it with a background scan to all the records in my namespace. This caused the records to be updated with a new TTL value, and stored in a compressed state, as defined by the namespace configuration.

Aerospike Query Client
Version 3.22.0
C Client Version 4.6.8
Copyright 2012-2019 Aerospike. All rights reserved.
aql> register module './ttl.lua'
OK, 1 module added.

aql> show modules
+-----------+--------------------------------------------+-------+
| filename  | hash                                       | type  |
+-----------+--------------------------------------------+-------+
| "ttl.lua" | "9614a68daf5353109372d96517d3d4f863e64ec1" | "LUA" |
+-----------+--------------------------------------------+-------+
[127.0.0.1:3000] 1 row in set (0.002 secs)

aql> execute ttl.touchttl() on test.sensor_data
OK, Scan job (13120129825472024600) created.

aql> show scans
+----------------+--------+------------+--------------+-------------+--------------+----------------+--------------------+------------------------+--------------+---------------+----------+------------------+--------+----------------+--------------------+------------+----------+--------------+--------+-----------------+----------------+-----------------------+
| active-threads | ns     | udf-active | udf-filename | recs-failed | udf-function | recs-succeeded | recs-filtered-bins | trid                   | job-progress | set           | priority | job-type         | module | recs-throttled | recs-filtered-meta | status     | run-time | net-io-bytes | rps    | time-since-done | socket-timeout | from                  |
+----------------+--------+------------+--------------+-------------+--------------+----------------+--------------------+------------------------+--------------+---------------+----------+------------------+--------+----------------+--------------------+------------+----------+--------------+--------+-----------------+----------------+-----------------------+
| "0"            | "test" | "0"        | "ttl"        | "0"         | "touchttl"   | "365000"       | "0"                | "13120129825472024600" | "100.00"     | "sensor_data" | "0"      | "background-udf" | "scan" | "365000"       | "0"                | "done(ok)" | "73279"  | "30"         | "5000" | "1814"          | "10000"        | "192.168.141.2:54138" |
+----------------+--------+------------+--------------+-------------+--------------+----------------+--------------------+------------------------+--------------+---------------+----------+------------------+--------+----------------+--------------------+------------+----------+--------------+--------+-----------------+----------------+-----------------------+
[127.0.0.1:3000] 1 row in set (0.001 secs)

Looking at the cluster with asadm, I saw that the dataset compressed to 32.8% of its original size (device_compression_ratio 0.328).

Same dataset with a 0.328 compression ratio thanks to Zstandard level 1

Compression doesn’t just save on space, it usually also lowers latency, as it’s faster to read a 3KiB object than a 10KiB one, even if the server has to decompress it (zstd has a very high decompression speed compared to the read IO speed).

Code Sample

The code I used is located in the repo aerospike-examples/modeling-iot-sensors. I started by running run_sensors.sh, which launched ten Python populate_sensor_data.py workers at a time, until a few minutes later I had a year of data from a thousand sensors. The sensor data is based on a year of (hourly) temperature data points I had downloaded from NOAA.

The query_iot_data.py script has an --interactive mode, which makes it easier to see what it’s doing. Even on my underpowered VM it runs fast over the example data.

To fetch three hours of data from one sensor I used the list get_by_value_interval operation to find all the values between [480, NIL] and [660, NIL]. See the ordering rules for more on how Aerospike compares between values in the list.

Three hours of a single sensor’s data for 2018–12–31

[ [480, 520],
  [481, 521],
  [482, 521],
  :
  :
  [657, 614],
  [658, 614],
  [659, 614]]

Getting a day’s data for any sensor is a single read. The number of records doesn’t matter; the latency of this operation will be the same, regardless of which day we fetch, or how many records are in the dataset.

Gets a single sensor’s data for 2018–04–02

To get a year of data for an individual sensor we do not need to scan the entire dataset. The way we’ve modeled our data allows us to build an array of 365 keys, then use it to make a single batch-read operation. If the batch was bigger it might have been more optimal to iterate through a few sub batches. For information on tuning batch operations see the knowledge base article FAQ — batch index tuning parameters.

Getting the data from all the sensors on a specific day is just as simple. If the batch was bigger than batch-max-requests we would need to either tune that configuration parameter or iterate through several smaller batches.

A day of data from all the sensors, no scan necessary

Finally, we can scan the entire dataset and filter out the records we want using a predicate filter. In the example below, I am doing a modulo operation on the record’s digest. This operation executes on the primary index metadata (in memory), without needing to read the records, so it’s very fast. A predicate filter can be as complex as you need it to be, with the conditional logic applied on the server side.

Scans the entire dataset with a predicate filter, returning about 1000 records

Conclusion

Aerospike is more suited for the type of IoT timeseries data collection and retrieval described in the original benchmark than ScyllaDB. You can intuitively see how with a row-oriented modeling approach, leveraging Aerospike’s strengths, you would get better performance than a C* database, using far less hardware.

Aerospike does not yet provide full timeseries database functionality such as being able to apply aggregate functions over a range of data. But it is a high performance, hyper scalable database, and provides the functionality needed for time slicing datasets.

An application can compute its own aggregates over the data retrieved quickly and efficiently from Aerospike. Similarly, you can do such computations in Spark, using Aerospike Connect for Spark to fetch data from Aerospike into Spark. The connector library knows when to use batch reads, and when to scan (with or without a predicate filter), using a similar pattern to my code sample. This approach is already being used in production by enterprise and community users.

Originally published on Medium (Aerospike Developer Blog), December 3 2019

Operations on Nested Data Types in Aerospike

Ronen Botzer — Sat, 25 Apr 2020 05:07:52 +0000

A document store modeling approach

Photo by Yingchih on Unsplash

Aerospike version 4.6 (released in August 2019) added the ability to apply list and map operations to elements nested at an arbitrary depth. In this post we'll see how this works. I'll start with an overview, so if you're familiar with Aerospike you can skip the following section.

Overview

Aerospike is a high performance, row-oriented, distributed database. Objects in Aerospike are called records. They are similar to rows in relational databases. Records are uniquely identified by the 3-tuple (namespace, set, user-key). A namespace combines the database and tablespace concepts of a relational database. The set in Aerospike is similar to a schema-less table. The user-key is simply the unique identifier for a record in this set from the perspective of the application. This is similar to how a primary key uniquely identifies a single row of a table in a relational database. The entire record is stored contiguously in the storage medium defined by the namespace (on SSD, in persistent memory, or in DRAM).

An Aerospike record keeps its data in one or more bins, which are similar to the columns of a row in a relational database, just without a schema. Each bin holds a value of a supported data type: integer, double, string, bytes, list, map, geospatial. Each data type has an API of atomic, server-side operations. For example, the integer data type has an increment operation, which can be used to implement counters in the record.

The list and map data types are particularly interesting and flexible. As storage units they can embed other data types inside them, including nesting other lists and maps. Both have extensive APIs.

The Aerospike database does single record transactions. Multiple operations, against a single record, can be executed efficiently under a record lock, atomically and in isolation.

Tracking High Scores

In this example, we’ll track the high scores for classic video games using a nested data structure { player: [score, {attribute map}] }.

Scores can be added individually or in bulk using map_put_items(), the Python client’s implementation of the Aerospike map API’s add_items() operation.

Each video game is tracked in a different record

At this point the scores can be retrieved by rank. The rank is established based on the ordering rules for the values of this map, which in this case are all lists with the tuple structure [score, {attribute map}].

Returns all the map elements by ascending rank. Due to Aerospike’s ordering rules, the list values of this map get ordered primarily by the value of their first element (the score)

[ 'ETC',
  [9200, {'dt': '2018-05-01 13:47:26', 'ts': 1525182446891}],
  'CPU',
  [9800, {'dt': '2017-12-05 01:01:11', 'ts': 1512435671573}],
  'CFO',
  [17400, {'dt': '2017-11-19 15:22:38', 'ts': 1511104958197}],
  'EIR',
  [18400, {'dt': '2018-03-18 18:44:12', 'ts': 1521398652483}],
  'SOS',
  [24700, {'dt': '2018-01-05 01:01:11', 'ts': 1515114071923}],
  'ACE',
  [34500, {'dt': '1979-04-01 09:46:28', 'ts': 291807988156}]]

Before Aerospike version 4.6 there was no way to apply map operations on the attribute map nested inside the list values of the scores map. The Complex Data Types (CDT) operations were limited to the top level elements of the list or map in question. Let’s assume that the attribute map optionally contains awards won by the players.

Grants the 🦄 award once and only once

In earlier versions of Aerospike, we would need to read the list value from the server to the application, add the awards map to the attribute map, then write the modified list back to the server. Leveraging the feature added in version 4.6, it can now be done atomically. I created a context that identifies the path to the attribute map, then applied a map_put operation at that spot. The map policy MAP_WRITE_FLAGS_CREATE_ONLY ensures this award is granted once. The MAP_WRITE_FLAGS_NO_FAIL policy makes the operation behave in a tolerant way if the 🦄 award was already in place. The transaction continues to the next operation (if there is one) and the client side doesn’t need to handle an exception.

Give the player with the top score a 🏆award

{ 'ACE': [ 34500,
           { 'awards': {'🏆': 1},
             'dt': '1979-04-01 09:46:28',
             'ts': 291807988156}],
  'CFO': [ 17400,
           { 'awards': {'🦄': 1},
             'dt': '2017-11-19 15:22:38',
             'ts': 1511104958197}],
  'CPU': [9800, {'dt': '2017-12-05 01:01:11', 'ts': 1512435671573}],
  'EIR': [ 18400,
           {'dt': '2018-03-18 18:44:12', 'ts': 1521398652483}],
  'ETC': [9200, {'dt': '2018-05-01 13:47:26', 'ts': 1525182446891}],
  'SOS': [ 24700,
           {'dt': '2018-01-05 01:01:11', 'ts': 1515114071923}]}

In the code section above, the context is enhanced to a further depth so that a 🏆 award is initialized if it doesn’t exist, then incremented. You need to have a path leading to an element, so simply incrementing without initializing it would risk a failure. Notice that the context path doesn’t have to be a physical set of direction changes. Here the 🏆 is given to the element with the highest rank (-1).

Hand out another top score 🏆award, then display the top three scores

[ 'EIR',
  [18400, {'dt': '2018-03-18 18:44:12', 'ts': 1521398652483}],
  'SOS',
  [24700, {'dt': '2018-01-05 01:01:11', 'ts': 1515114071923}],
  'ACE',
  [ 34500,
    {'awards': {'🏆': 2}, 'dt': '1979-04-01 09:46:28', 'ts': 291807988156}]]

This code lives in the aerospike-examples/aerospike-modeling repo on GitHub.

Originally published on Medium (Aerospike Developer Blog), November 2 2019