Forem: Ante Javor

Benchgraph Backstory: The Untapped Potential

Ante Javor — Thu, 27 Apr 2023 13:28:26 +0000

A few months ago, we launched Benchgraph, the platform for comparing the performance of graph databases. We tweaked our in-house benchmarking tool a bit and started benchmarking Neo4j since Memgraph is highly compatible with Neo4j. There were definitely things we did right and some others that we did wrong, but after a few months, we decided to update a few things based on community feedback making the overall benchmark much efficient and valuable to the community.

Let’s take a look at the things that have been improved in the benchmark and what has changed.

New bigger datasets

During the run of a typical benchmark, you can run specific queries for a fixed period of time or take a chunk of queries, execute them, and measure elapsed time. Either way, you need a decent amount of queries (same query with different arguments), and for that, you need a bigger dataset. On top of that, testing performance on the more extensive scaled datasets is necessary. To put things into perspective for query requirements, on some of the queries that are being run, the database executes up to 300k queries or less, depending on the query complexity. For a detailed look into statistics, take a look at the benchmark.json file that holds all the results from the benchmarks. There you can see how many specific queries were executed per particular test.

To get back to the point of the bigger dataset, Pokec was an initial dataset that was used for running benchmarks. It is still there, but it is also a bit limited in size; the tests were executed on 100k nodes and 1.7M edges (medium size). So it is not overly large and has quite a simple structure.

Because of the size, complexity, and feedback from the community, we decided to add a larger dataset. So the next dataset should be large, more complex, and recognizable. The choice was easy here; the industry-leading benchmark group Linked Data Benchmark Council (LDBC), which Memgraph is a part of, has open-sourced the datasets for benchmarking. The exact dataset is the social network dataset. It is a synthetically generated dataset representing a social network. It is being used in LDBC audited benchmarks, SNB interactive, and SNB Buissines intelligence benchmarks. Keep in mind that this is NOT an official implementation of an LDBC benchmark, the open-source dataset is being used as a basis for benchmarks, and it will be used for our in-house testing process and improving Memgraph.

The good thing about the LDBC SNB dataset is that it has a specific set of prefix sizes. For example, the SF1 tier of the LDBC SNB dataset has approximately 3M nodes and 17M edges. SF3 is 3 times the size of SF1, with approximately 9M nodes, and 52M edges. The scale sizes go up to SF1000, which is quite a big graph.
The test was run on SF1, which is fairly small for the datasets available in the wild, but tests will be executed on the bigger sizes soon (such as SF3 and SF10).

Both LDBC SNB interactive and LDBC SNB business intelligence operate on the schematically same graph, the social network, but it is being generated a bit differently.

More complex queries

Queries used for benchmarking on the Pokec dataset in the first iteration of the Benchgraph were quite general. They did cover most of the basic operations such as aggregations, writes, reads, updates, and graph analytical queries. But overall, the queries were not too complex and provided insight into how good transactional workloads can be in Memgraph and Neo4j. Despite being generic, those queries were picked by us, so again, based on the feedback from the community, more complex queries were added to the Benchgraph.

At first, the plan was to use only the LDBC dataset and write different queries for the dataset, but LDBC has a set of well-designed queries that were specifically prepared to stress the database. Each query targets a special scenario, also called “chock point.” Not to be mistaken, they do not have deep graph traversal doing around 100 hops, but they are definitely more complex than the ones written for the Pokec dataset. There are two sets of queries for the LDBC SNB: interactive and business intelligence. LDBC provides a reference Cypher implementation for both of these queries for Neo4j. We took those queries, tweaked the data types, and made the queries work on Memgraph. Again, to be perfectly clear, this is NOT an official implementation of an LDBC Benchmark; this goes for both interactive and business intelligence queries. The queries were used as the basis for running the benchmark.

Update to the benchmark procedure

The dataset and queries were not the only things that received an update. The process of running benchmarks has been updated, and the full technical details can be found in the methodology. However apart from that, here is a short overview of what has changed.

Tests on multiple concurrent clients

Running a database in production environments does not mean you will execute queries in isolation with a single connection to the database. The database can be used by multiple users and applications, which means it will have multiple connections concurrently. During that connection period, they would query the database system, and the database would need to respond to all the queries hitting them as fast as possible. Obviously, there will be noticeable differences in running this test on multiple worker threads. To test this, the tests were executed by using different numbers of worker threads that simulate that scenario; going to the Benchgraph, you will see the option Number of workers.

As you can see, the test is currently executed on 12, 24, and 48 worker threads. In the last iteration, the tests were run by using only 12 threads. But this will change in the future, and the number of workers will probably increase even further to show different levels of performance.

New volcanic warm-up

One of the things we did notice in the first run of the Benchgraph is that Neo4j benefits a lot when identical queries are being executed multiple times. We were not aware of the extent to which Neo4j can benefit from it. Although being an excellent database feature, it comes with a few downsides, but more on that later. There were previously cold and hot database conditions, as described in the methodology, but now, vulcanic is introduced.

In the Vulcanic database warm-up, the bulk of workload queries are executed first as a warm-up, and after that, all the queries are re-running it again. The second run is where the results are collected. This adds a bit of clarity on potential Neo4j performance. Keep in mind that volcanic warm-up tests database caching performance. Vulcanic warm-up is not applicable to the highly volatile dataset, which has a tendency to change drastically on a daily basis.

Hardware

As in the previous iteration, the tests were executed on identical hardware for both Memgraph and Neo4j. Due to the fact that our Intel HP server is a bit dusty and has a few years on its transistors the tests were executed on a newer AMD platform. There is a plan to expand this a bit further, potentially to AWS instances, since these things are standardized and people use them these days to deploy most of the applications.

Running configuration changes

There have been a few changes in the running setup. The first and obvious one is that database versions are bumped up: the database version, community editions Neo4j 5.6, and Memgraph 2.7.

Due to Neo4j's different architecture (on disk, while Memgraph being in-memory), Neo4j needs a lot more time to warm up and to record better slower performance on the smaller number of queries executed. Hence, the duration has increased for each test. Since a predefined set of queries for running a benchmark is generated before the test, the actual number of predefined queries has increased from 3 times to 10 times. On top of that, the previously mentioned vulcanic run was introduced to show the performance of repeated execution.

For the isolated workload, the flag that defines the duration of execution --single-threaded-runtime was set to 30 seconds (previously, it was 10 seconds). The lower bound for query execution was raised from 30 to 300 queries. This means that for slowest queries that took a few seconds to execute, at least 300 of those are being executed per test. Also, now each of the vendors executes an identical number of queries for each test.

For the mixed and realistic workloads, the number of executed queries has been bumped up from 100 to 500. Also, Memgraph got a label index in the Pokec dataset since those are not set automatically by Memgraph.

The exact running configuration and the rest of the details can be found in the methodology mentioned earlier.

Results

Let’s take a look at the Pokec dataset. For each of the results below, you can visit here. Or if you select: INTEL, cold run, 12 workers, pokec, medium dataset, isolated workload, and go to detailed results for Q5, you will see this:

This expansion 1 query has been executed 170996 times (this is visible in raw JSON results) on both vendors with identical arguments. During the test, three main metrics were measured: peak memory, throughput, and latency. In the previous iteration of the benchmark, it was dynamically decided to execute a predefined number of queries based on the duration, which resulted in different query counts for Memgraph and Neo4j, but this is not the case anymore. After 170996 queries, Memgraph proves to be faster 3.16 times faster and has a 5.42 times lower p99 latency. This is a simple single-hop query used quite often in graph analytical workloads.

Trying out these 170996 queries on 48 threads, the following results were measured:

Memgraph’s performance easily doubled with 48 threads, while Neo4j’s performance got worse. It looks like the Neo4j community edition has some multithreaded limits. P99 latency value has increased for both, which is a bit expected, but Memgraph is 8.73 times faster in this concurrent workload.

Moving back now to the new Vulcanic run, and back to 12 threads, the 170966 queries were executed, and after that, the same set of 170966 queries was executed to get the measurement.

Again, Neo4j improves on the stronger warm-up setting. Though it can be seen that p99 latency has dropped from 6.45 to 1.91 in Neo4j’s case.So, Memgraph is the stable one here.

The image below is again a vulcanic run, but this time under 48 threads.

Even in the vulcanic run, where identical queries have been executed twice, Memgraph proved to be 3 times faster than Neo4j on 48 concurrent threads. Takeaways: the Memgraph community edition can handle the concurrent workloads better than the Neo4j community edition.

What about LDBC interactive queries? Things are a bit more complicated here: On the AMD platform, cold run, 48 workers, ldbc_interactive dataset, and sf1 dataset size, Memgraph is faster than Neo4j on all the queries, except the interactive query 9. It looks like Memgraph has an issue with that query. Take a look into the peak memory that jumped to a big 16GB. This will be investigated further.

Moving back to the Vulcanic warm-up, Neo4j performs much better here than on the cold run.

But looking at the p99 latency on the cold run also tells an interesting story: Memgraph has 5 queries where p99 latency is over a second, while Neo4j has 10 of those.

This is somewhat expected, as mentioned before, due to the different database architectures. The situation in the BI run appears to be similar:

Memgraph is a step ahead on the cold runs. On the vulcanic run, Neo4j improves quite a bit. Looks like Memgraph has a fundamental issue with handling BI_query_12. Again, we’ll investigate it further.

All of the above, brings us to the potential issues of write performance; caches often get invalidated during the write performance of the database. And on that front, the situation is quite clear. Memgraph in-memory writes will be faster than Neo4j on disk writes:

There are a lot more results out there on Benchgraph; change different configurations and take a look at the results. If you want to check the full results yourself, go to the raw JSON files with the results in this iteration. Benchgraph provides just a subset of data, but more relevant data will be added, such as p95 latency, query counts, etc.

You can run benchmarks on your workload!

Benchgraph workloads and queries are here to give you a glimpse of the potential performance you can expect for each database. That being said, it does not mean you will have the same situation on your workload. Running benchmarks on your specific dataset and on your specific queries will show performance that matters to you. Database tends to evolve; something that was 10x slower and faster changes with newer versions of databases, so make sure to run your own tests.

But developing and running benchmarks is hard and time-consuming. That is the reason why we made Benchgraph simple to run. You can do it yourself by following the tutorial on running your own benchmarks via Benchgraph.

Next steps

How did you like the updates? Anything in particular that caught your attention?

Benchmarks can be configured in various ways. Tiny details can have an impact on the results, but either way Memgraph offers much more perspective for speed improvement compared to Neo4j.

From now on, Benchgraph will be at your disposal to unlock the untapped potential of your database, get performance insights, and make more informed decisions moving ahead. Looks like it’s time to add a new database to Benchgraph! As always, don’t hesitate to give us a shout if we can be of further assistance.

How to Benchmark Memgraph [or Neo4j] with Benchgraph?

Ante Javor — Thu, 27 Apr 2023 13:28:21 +0000

Preparing, running, and evaluating benchmarks is a tedious and time-consuming process. The hardest step in the whole process of benchmarking is setting up a “production-like” scenario. There are a lot of databases in production environments, a bunch of them serve different purposes, run different workloads, and are operated in different ways. This means it is impossible to simulate your specific workload and how your database will operate, so doing your own benchmarking is an important factor. On top of that, every use-case has some different metric that is used to evaluate the performance of the system.

Since we are running benchgraph in our CI/CD environment, we decided to do small tweaks to benchgraph to ease the process of running benchmarks on Memgraph (and Neo4j 🙂) on your hardware, on your workload, and under conditions that matter to you. At the moment, the focus was on executing pure Cypher queries.

To be able to run these benchmarks you need to have at least Python 3.7 and Docker installed and running, and a very basic knowledge of Python.

Before you start working on your workload, you need to download benchgraph zip or pull Memgraph Github repository. If you are using the Memgraph repository just position yourself into /tests/mgbench, if you are using zip just unzip it and open the folder in your favorite IDE or code editor.

Add your workload

In the project structure you will see various different scripts, take a look at the Benchgraph architecture for a more detailed overview of what each of these scripts does. But the important folder here is the workloads folder, where you will add your workload. You can start by creating a simple empty Python script, and giving it a name, in our case it is demo.py.

In order to specify the workload that you will be able to run on Memgraph and Neo4j, you need to implement some details into your script. Here are 5 steps you need to do:

Inherit the workload class
Define a workload name
Implement the dataset generator method
Implement the index generator method
Define the queries you want to benchmark

These five steps will result in something similar to this simplified version of demo.py example:

import random
from workloads.base import Workload

class Demo(Workload):

    NAME = "demo"

    def dataset_generator(self):

        queries = []
        for i in range(0, 10000):
            queries.append(("CREATE (:NodeA {id: $id});", {"id": i}))
            queries.append(("CREATE (:NodeB {id: $id});", {"id": i}))
        for i in range(0, 50000):
            a = random.randint(0, 9999)
            b = random.randint(0, 9999)
            queries.append(
                (("MATCH(a:NodeA {id: $A_id}),(b:NodeB{id: $B_id}) CREATE (a)-[:EDGE]->(b)"), {"A_id": a, "B_id": b})
            )

        return queries

    def indexes_generator(self):
        indexes = [
                    ("CREATE INDEX ON :NodeA(id);", {}),
                    ("CREATE INDEX ON :NodeB(id);", {}),
                ]
        return indexes

    def benchmark__test__get_nodes(self):
        return ("MATCH (n) RETURN n;", {})

    def benchmark__test__get_node_by_id(self):
        return ("MATCH (n:NodeA{id: $id}) RETURN n;", {"id": random.randint(0, 9999)})

Let’s break this demo.py script into smaller pieces and explain the steps you need to do so it is easier to understand what is happening.

1. Inherit the `workload` class

The Demo class has a parent class Workload. Each custom workload should be inherited from the base Workload class. This means you need to add an import statement on top of your script and specify inheritance in Python.

from workloads.base import Workload

class Demo(Workload):

2. Define the workload name

The class should specify the NAME property. This is used to describe what workload class you want to execute. When calling benchmark.py, the script that runs the benchmark process, this property will be used to differentiate different workloads.

NAME = "demo"

3. Implement the dataset generator method

The class should implement the dataset_generator() method. The method generates a dataset that returns the list of tuples. Each tuple contains a string of the Cypher query and dictionary that contains optional arguments, so the structure is as follows [(str, dict), (str, dict)...]. Let's take a look at how the example list would look like:

queries = [
    ("CREATE (:NodeA {id: 23});", {}),
    ("CREATE (:NodeB {id: $id, foo: $property});", {"id" : 123, "property": "foo" }),
    ...
]

As you can see, you can pass just a Cypher query as a pure string without any values in the dictionary.

("CREATE (:NodeA {id: 23});", {}),

Or you can specify parameters inside a dictionary. The variables next to $ sign in the query string will be replaced by the appropriate values behind the key from the dictionary. In this case, $id is replaced by 123, and $property is replaced by foo. The dictionary key names and variable names need to match.

("CREATE (:NodeB {id: $id, foo: $property});", {"id" : 123, "property": "foo" })

Back to our demo.py example, in the dataset_generator() method, here you specify queries for generating a dataset. That means all queries for dataset generation are based on this specified list of queries. Keep in mind that you can't import dataset files (such as CSV, JSON, etc.) directly since the database is running in Docker, you need to convert dataset files to pure Cypher queries. But that should not be too hard to do in Python.

In the demo.py, the first for loop is preparing the queries for creating 10000 nodes with the label NodeA and 10000 nodes with the label NodeB. We are using random class to generate a random sequence of integer IDs. Each node has an id between 0 and 9999. In the second for loop, queries for connecting nodes randomly are generated. There is a total of 50000 edges, each connected to random NodeA and NodeB.

def dataset_generator(self):

    for i in range(0, 10000):
        queries.append(("CREATE (:NodeA {id: $id});", {"id" : i}))
        queries.append(("CREATE (:NodeB {id: $id});", {"id" : i}))
    for i in range(0, 50000):
        a = random.randint(0, 9999)
        b = random.randint(0, 9999)
        queries.append((("MATCH(a:NodeA {id: $A_id}),(b:NodeB{id: $B_id}) CREATE (a)-[:EDGE]->(b)"), {"A_id": a, "B_id" : b}))

    return queries

4. Implement the index generator method

The class should also implement the indexes_generator() method. This is implemented the same way as the dataset_generator() method, instead of queries for the dataset, indexes_generator() should return the list of indexes that will be used. Of course, you can include constraints and other queries you have in your workload. The list of queries from indexes_generator() will be executed before the queries for the dataset generator. The returning structure again is the list of tuples that contains query string and dictionary of parameters. Here is an example:

def indexes_generator(self):
    indexes = [
                ("CREATE INDEX ON :NodeA(id);", {}),
                ("CREATE INDEX ON :NodeB(id);", {}),
            ]
    return indexes

5. Define the queries you want to benchmark

Now that your database has indexes and the dataset imported, you can specify what queries you wish to benchmark on the given dataset. Here are two queries that demo.py workload defines. They are written as Python methods that return a single tuple with a query and dictionary, like in the data generator method.

def benchmark__test__get_nodes(self):
    return ("MATCH (n) RETURN n;", {})

def benchmark__test__get_node_by_id(self):
    return ("MATCH (n:NodeA{id: $id}) RETURN n;", {"id": random.randint(0, 9999)})

The necessary details here are that each of the methods you wish to use in the benchmark test needs to start with benchmark__ in the name, otherwise, it will be ignored. The complete method name has the following structure benchmark__group__name. The group can be used to execute specific tests, but more on that later.

From the workload setup, this is all you need to do. The next step is running your workload.

Run benchmarks on your workload

Let's start with the most straightforward way to run the demo.py workload from the example above. The main script that manages benchmark execution is benchmark.py, it accepts a variety of different arguments, but we will get there. Open the terminal of your choice, and position yourself in the downloaded benchgraph folder.

To start the benchmark, you need to run the following command:

python3 benchmark.py vendor-docker --vendor-name ( memgraph-docker || neo4j-docker ) benchmarks demo/*/*/* --export-results result.json --no-authorization

For example, to run this on Memgraph, the command looks like this:

python3 benchmark.py vendor-docker --vendor-name memgraph-docker benchmarks “demo/*/*/*” --export-results results.json --no-authorization

After a few seconds or minutes, depending on your workload, the benchmark should be done with execution.
In your terminal, you should see something similar to this:

On the image in the end, you can see the summary of results, feel free to explore and write different queries for Memgraph to see what type of performance you can expect.

To run this same workload on Neo4j, just change –vendor-name argument to neo4j-docker. If you stumble upon the issues with setting specific indexes or queries, take a look at how to run the same workload on different vendors.

How to compare results

Once the benchmark has been run on both vendors, or with different configurations, the results are saved in a file specified by --export-results argument. You can use the results files and compare them against other vendor results via the compare_results.py script:

python3 compare_results.py --compare path_to/run_1.json path_to/run_2.json --output run_1_vs_run_2.html --different-vendors

The output is an HTML file with a visual representation of the performance differences between two compared vendors. The first passed summary JSON file is the reference point. Feel free to open an HTML file in any browser at hand.

How to configure the benchmark run

Configuring your benchmark run will enable you to see how things change under different conditions. Some arguments used in the run above are self-explanatory. For a full list take a look at the benchmark.py script. For now, let's break down the most important ones:

NAME/VARIANT/GROUP/QUERY - The argument demo/*/*/* says to execute the workload named demo, and all of its variants, groups, and queries. This flag is used for direct control of what workload you wish to execute. The NAME here is the name of the workload defined in the Workload class. VARIANT is an additional workload configuration, which will be explained a bit later. GROUP is defined in the query method name, and QUERY is the query name you wish to execute. If you want to execute a specific query from demo.py, it would look like this: demo/*/test/get_nodes. This will run a demo workload on all variants, in the test query group and query get_nodes.
--single-threaded-runtime-sec - The question at hand is how many of each specific queries you wish to execute as a sample for a database benchmark. Each query can take a different time to execute, so fixating a number could yield some queries finishing in 1 second and others running for a minute. This flag defines the duration in seconds that will be used to approximate how many queries you wish to execute. The default value is 10 seconds, this means the benchmark.py will generate a predetermined number of queries to approximate single treaded runtime of 10 seconds. Increasing this will yield a longer running test. Each specific query will get a different count that specifies how many queries will be generated. This can be inspected after the test. For example, for 10 seconds of single-threaded runtime, the queries from demo workload get_node_by_id got 64230 different queries, while get_nodes got 5061 because of different time complexity of queries.
--num-workers-for-benchmark - The flag defines how many concurrent clients will open and query the database. With this flag, you can simulate different database users connecting to the database and executing queries. Each of the clients is independent and executes queries as fast as possible. They share a total pool of queries that were generated by the --single-threaded-runtime-sec. This means the total number of queries that need to be executed is shared between a specified number of workers.
--warm-up - The warm-up flag can take three different arguments, cold, hot, and vulcanic. Cold is the default. There is no warm-up being executed, hot will execute some predefined queries before the benchmark, while vulcanic will run the whole workload first before taking measurements. Here is the implementation of warm-up

How to run the same workload on the different vendors

The base workload class has benchmarking context information that contains all benchmark arguments used in the benchmark run. Some are mentioned above. The key argument here is the --vendor-name, which defines what database is being used in this benchmark.

During the creation of your workload, you can access the parent class property by using self.benchmark_context.vendor_name. For example, if you want to specify special index creation for each vendor, the indexes_generator() could look like this:

 def indexes_generator(self):
        indexes = []
        if "neo4j" in self.benchmark_context.vendor_name:
            indexes.extend(
                [
                    ("CREATE INDEX FOR (n:NodeA) ON (n.id);", {}),
                    ("CREATE INDEX FOR (n:NodeB) ON (n.id);", {}),
                ]
            )
        else:
            indexes.extend(
                [
                    ("CREATE INDEX ON :NodeA(id);", {}),
                    ("CREATE INDEX ON :NodeB(id);", {}),
                ]
            )
        return indexes

The same applies to the dataset_generator(). During the generation of the dataset, you can use special types of queries for different vendors, to simulate identical scenarios.

Happy benchmarking

Have fun benchmarking Memgraph and Neo4j! We would love to hear more about your results on our Discord server. If you have issues understanding what is happening, take a look at Benchgraph architecture or just reach out!

Introduction to Benchgraph and its Architecture

Ante Javor — Thu, 27 Apr 2023 13:28:07 +0000

Building a performant graph database comes with many different challenges, from architecture design, to implementation details, technologies involved, product maintenance — the list goes on and on. And the decisions made at all of these crossroads influence the database performance.

Developing and maintaining a product is a never-ending phase. And proper performance testing is necessary to maintain database performance characteristics during its whole lifecycle.

This is where benchgraph comes into play. To ensure the consistency of Memgraph’s performance, tests are run via benchgraph, Memgraph's in-house benchmarking tool. On each commit, theCI/CD infrastructure runs performance tests to check how each code change influenced the performance. Let’s take a look at benchgraph architecture.

Benchgraph architecture

At the moment, benchgraph is a project under Memgraph repository (previously Mgbench). It consists of Python scripts and a C++ client. Python scripts are used to manage the benchmark execution by preparing the workload, configurations, and so on, while the C++ client actually executes the benchmark.

Some of the more important Python scripts are:

benchmark.py - The main entry point used for starting and managing the execution of the benchmark. This script initializes all the necessary files, classes, and objects. It starts the database and the benchmark and gathers the results.
runners.py - The script that configures, starts, and stops the database.
benchmark_context.py - It gathers all the data that can be configured during the benchmark execution.
base.py - This is the base workload class. All other workloads are subclasses located in the workloads directory. For example, ldbc_interactive.py defines ldbc interactive dataset and queries (but this is NOT an official LDBC interactive workload). Each workload class can generate the dataset, use custom import ofthe dataset or provide a CYPHERL file for the import process.
compare_results.py - Used for comparing results from different benchmark runs.

The C++ bolt benchmark client has its own set of features. It runs over the Bolt protocol and executes Cypher queries on the targeted database. It supports validation, time-dependent execution, and running queries on multiple threads.

Let’s dive into benchmark.py, runners.py, and the C++ client a bit further.

benchmark.py

The benchmak.py script is filled with important details crucial for running the benchmark, but here is just a peek at a few of them.

All arguments are passed and interpreted in the benchmark.py script. For example, the following snippet is used to set the number of workers that will import the data and execute the benchmark:

…
benchmark_parser.add_argument(
        "--num-workers-for-import",
        type=int,
        default=multiprocessing.cpu_count() // 2,
        help="number of workers used to import the dataset",
    )
    benchmark_parser.add_argument(
        "--num-workers-for-benchmark",
        type=int,
        default=1,
        help="number of workers used to execute the benchmark",
    )
…

In the procedure that generates the queries for the workload, you can set up the seed for each specific query. As some arguments in the queries are generated randomly, the seed ensures that the identical sequence of randomly generated queries is executed.

def get_queries(gen, count):
    # Make the generator deterministic.
    random.seed(gen.__name__)
    # Generate queries.
    ret = []
    for i in range(count):
        ret.append(gen())
    return ret

The following methods define how warmup, the mixed, and realistic workload are executed and how the query count is approximated.

def warmup(...):
    …

def mixed_workload(...):
          …

def get_query_cache_count(...):
    …

runners.py

The runners.py script manages the database, and the C++ client executes the benchmark. Runners script can
manage both native Memgraph and Neo4j, as well as Docker Memgraph and Neo4j.

Vendors are handled with the following classes; of course, implementation details are missing:

class Memgraph(BaseRunner):
    …
class Neo4j(BaseRunner):
    …
class MemgraphDocker(BaseRunner):
    …
 class Neo4jDocker(BaseRunner)
    …

Keep in mind that the Docker versions have a noticeable performance overhead compared to the native versions.

The C++ Bolt client executing benchmarks can also be managed in the native and Docker form.


class BoltClient(BaseClient):
    …
class BoltClientDocker(BaseClient):
    …

C++ Bolt client

The other important piece of code is the C++ bolt client. The client is used for the execution of the workload. The workload is specified as a list of Cypher queries. The client can simulate multiple concurrent connections to the database. The following code snippet initializes a specific number of workers and connects them to the database:

for (int worker = 0; worker < FLAGS_num_workers; ++worker) {
    threads.push_back(std::thread([&, worker]() {
      memgraph::io::network::Endpoint endpoint(FLAGS_address, FLAGS_port);
      memgraph::communication::ClientContext context(FLAGS_use_ssl);
      memgraph::communication::bolt::Client client(context);
      client.Connect(endpoint, FLAGS_username, FLAGS_password);
…

By using multiple concurrent clients, the benchmark simulates different users connecting to the database and executing queries. This is important because you can simulate how your database handles higher data loads.

Each worker's latency values are collected during execution, and after that, some basic tail-latency values are calculated.

…
 for (int i = 0; i < FLAGS_num_workers; i++) {
    for (auto &e : worker_query_latency[i]) {
      query_latency.push_back(e);
    }
  }
  auto iterations = query_latency.size();
  const int lower_bound = 10;
  if (iterations > lower_bound) {
    std::sort(query_latency.begin(), query_latency.end());
    statistics["iterations"] = iterations;
    statistics["min"] = query_latency.front();
    statistics["max"] = query_latency.back();
    statistics["mean"] = std::accumulate(query_latency.begin(), query_latency.end(), 0.0) / iterations;
    statistics["p99"] = query_latency[floor(iterations * 0.99)];
    statistics["p95"] = query_latency[floor(iterations * 0.95)];
    statistics["p90"] = query_latency[floor(iterations * 0.90)];
    statistics["p75"] = query_latency[floor(iterations * 0.75)];
    statistics["p50"] = query_latency[floor(iterations * 0.50)];

  } else {
    spdlog::info("To few iterations to calculate latency values!");
    statistics["iterations"] = iterations;
  }
…

For the rest of the details about all the components mentioned above and used in Memgraph’s CI/CD, feel free to refer to the code base.

Benchgraph benchmark process

Each benchmark has a set of rules or steps it follows to get benchmark results, such as duration of the benchmark, query variety, restarting a database, and similar. Benchgraph has its own set of running specifics. The benchmark process is also explained in the benchgraph methodology but for running a benchgraph. It’s important to understand each step well because each step influences benchmark results.
The image below shows the key steps in running a benchmark with Benchgraph:

The first step is to start the database with predefined configuration options. Running Memgraph with different configuration options allows us to see respective performance implications.

The next step is the import procedure. Depending on the configuration of workload classes mentioned above, import may be custom, or data can be imported by executing Cypher queries via the C++ Bolt client. Being able to execute a list of Cypher queries allows us to specify different types of customer workloads quickly. After the import, the newly imported data is exported as a snapshot and reused to execute all the workloads that follow. But, if the first workload executed write queries, the dataset will be changed and therefore unusable because it will influence the results of the benchmark.

Once the import is finished, the database is stopped. Importing a dataset can stress the database and influence measurements, so the database must be restarted. After the restart, the database imports data from a snapshot, and the performance test can begin.

At the beginning of the test, queries are generated based on the benchmark configuration and the type of workload.

Benchgraph support three types of workloads:

Isolated - Concurrent execution of a single type of query.
Mixed - Concurrent execution of a single type of query mixed with a certain percentage of queries from a designated query group.
Realistic - Concurrent execution of queries from write, read, update, and analyze groups.

Each of the workloads can be used to simulate a different production scenario.

Once the queries are executed, metrics such as basic latency measurements, queries per second, and peak RAM usage during the execution are collected and reported daily via our CI/CD infrastructure to graphana.

In the end, the database is stopped, and depending on the workload, it is started again with a fresh dataset from the snapshot, and the execution of the next query or workload begins again.

Just a side note on restarting a database during the execution of a benchmark: In an average use case, databases can run for a prolonged period of time. They are not being restarted very often, let’s say, in some edge cases when you are doing some upgrades or are having issues. That being said, why are databases restarted during benchmarks? Executing test after test on the non-restarted database can lead to tests being influenced by previously run tests. For example, if you want to measure the performance of queries X and Y, they should be run under the same conditions, which means a database with a fresh dataset and without any caches.

Benchmark configuration options

Here are just some of the flags that can be used to configure benchgraph during the performance runs.

--num-workers-for-benchmark - This flag defines the number of workers that will be used to query the database. Each worker is a new thread that connects to Memgraph and executes queries. All threads share the same pool of queries, but each query is executed just once.

--single-threaded-runtime-sec - This flag defines the pool number of queries that will be executed in the benchmark. The question at hand is how many of each specific query you wish to execute as a sample for a database benchmark. Each query can take a different time to execute, so fixating a number, let’s say 100 queries, could be finished in 1 second, and 100 queries of a different type could run for an hour. To avoid such an issue, this flag defines the duration of single-threaded runtime in seconds that will be used to approximate the number of queries you wish to execute. The queries will be pre-generated based on the required time range for a single-threaded runtime.

--warm-up - The warm-up flag can take three different arguments, cold, hot, and vulcanic. Cold is the default. There is no warm-up being executed, hot will execute some predefined queries before the benchmark, while vulcanic will run the whole workload first before taking measurements. Databases in production environments are usually run pre-warmed and pre-cached since they are running for extended periods of time. But this is not always the case; warming up the database takes time, and caching can be ruined by the volatility of the data inside the database. Cold performance is the worst-case scenario for every database and should be considered.

--workload-realistic, --workload-mixed - These flags are used to specify the workload type. By default, Benchgraph runs on an isolated workload.

--time-depended-execution - A flag for defining how long the queries will be executed in seconds. The same pool of queries will be re-run all over again until time-out. This is useful for testing caching capabilities of the database.

Why does Memgraph build an in-house benchmark tool?

Building benchmark infrastructure is time-consuming, and there are a few options for load-testing tools on the market. On top of that, building benchmarking infrastructure is error-prone; memories of unconfigured indexes and running benchmarks on debug versions of Memgraph are fun 😂

But due to the need for a custom benchmark process, configuration options, and specific protocol, Memgraph is working on a house benchmarking tool Benchgraph. This is not universal advice: We would not advise embarking on this journey if some of the tools out there can satisfy your benchmarking needs. But in the long run, an internal benchmarking tool provides a lot of flexibility, and an external tool is a great addition as validation and support to all performance tests being executed.

How Much Money Will You Spend on Hosting a Database

Ante Javor — Tue, 10 Jan 2023 13:45:58 +0000

Each modern application needs an appropriate database that has basic datastore capabilities or strong analytical capabilities. There is a variety of options to choose from, and choosing a database can be a complex adventure. Just start with the proprietary or open-source choice, and things get very messy. But, after screening several database options and their features, you will probably narrow it down to a few possible candidates for your databases. No matter what databases are on your shortlist, open-source or proprietary, there are always some costs attached to running that database system. At some point in decision-making, the cost will be one of the most important factors when deciding between your options. The issue is that approximating the true future cost of owning a database can be hard.

The cost can come from various sources, such as licensing, training, features included, support, and hosting of the database etc. Each mentioned cost is important and should be considered independently because it can differ from vendor to vendor. But, it’s the accumulation of those costs that form the cost of ownership. One cost is often overlooked - hosting. And it can differ tremendously from database to database.

The cost of hosting a database is always present and constant, if you are renting a full VMs and the data volume is not growing rapidly. Since hosting costs are usually paid in shorter time frames, for example on a monthly basis, they can add up to noticeable sums over time. Understanding the cost of ownership and hosting costs, in particular, can give you a full view of the database cost.

If you have highly interconnected and recursive data and you want to do analytics on it, you will have to find an appropriate graph database. In this blog post, we will compare the hosting costs of running Memgraph and Neo4j graph databases.

Memgraph and Neo4j are compatible databases. Both use Bolt protocol and Cypher for queries, which means you can easily switch your infrastructure between them. Both vendors have free community and paid enterprise editions of the database. There are some differences between editions, for example Memgraph community edition has a replication for high availability, while Neo4j community does not. To understand difrences in hostings costs, let’s see how community editions compare head-to-head.

Approximate the cost of hosting a database

When engineers start to think about the cost of ownership of database systems and compare vendors, they usually focus on server resource usage. Engineers like to keep track of each CPU cycle, megabyte spent here and there, and overall database performance. Hence, the important metrics are CPU time, RAM, Network and disk storage. Network and disk storage can also contribute cost, but in this blog post we will be talking about small to medium datasets in which case these two parameters play a minor role.

CPU time is the cost of running computations on the database. Since you will probably rent a full cloud virtual machine just for your database, CPU time will not matter too much. But overall database performance will be important since performance can limit your use case. To combat database inefficiency and higher load, you will have to increase CPU count or CPU power. Performance of each database was already discussed in the Memgraph vs. Neo4j: A Performance Comparison.

RAM usage is what can considerably shape the overall price of an instance. In general, you want to use as less CPU and RAM as possible since their high usage can generate a lot of costs. Let’s assume that you have an unspecified-sized dataset at the moment, and you need to host Memgraph and Neo4j on the most popular cloud VM vendor, AWS.

Amazon AWS EC2 T3 instances have a balance of computing, memory, and network resources for a broad spectrum of general purpose workloads, including large-scale micro-services, small and medium databases and business-critical applications. There are also other types of instances available on AWS that have different purposes, such as CPU-optimized, memory-optimized, and storage-optimized, that can fit your use case even better, but T3 instances have been chosen for demonstration purposes since they are quite general.

Let's assume you predict running your database for three years on this infrastructure, after which you will probably need to scale up. This would be the on-demand cost of running the instances for a three full-year rent, at the time of writing:

Keep in mind, that AWS offer reservations, which can half the stated renting prices, but this is just an aproximation of the costs. When choosing between Memgraph and Neo4j, their memory usage is what will define how large the instance needs to be and how much money you will need to spend in the end.

Differences between Memgraph and Neo4j memory architecture

Before going into differences in memory usage, it is important to understand how both systems operate. Since Neo4j and Memgraph are architecturally different systems, they use memory quite differently.

Memgraph is a native C++ database and stores all data in RAM memory. No data is stored on the disk, and all queries are executed on the data in RAM. Memgraph supports persistency, which means that if the server loses power, data present in Memgraph at the moment before the crash will not be lost. Persistency is achieved with periodic snapshots saved as a backup on disk. By design, Memgraph is better equipped to execute real-time computations that need to be executed in the shortest possible timeframe.But for this very reason, Memgraph is limited by the available RAM of your machine. If your dataset doesn’t fit in RAM, you can’t use Memgraph. That’s why it’s important to calculate the approximate size of your dataset beforehand and find out how to control memory.

Neo4j story is a bit more complex since it’s JVM based. JVM performs all RAM allocation, and JVM overhead comes with Neo4j as standard. Since Neo4j is contained in JVM, understanding complete memory usage is not as straightforward as with Memgraph. On top of that, Neo4j stores all the data on disk, but it is also loading data in RAM as a cache. This means queries can use both data from RAM and disk. Querying the data from RAM will lead to faster performance while querying the data from disk will lead to performance degradation.

The memory storage available for caching is defined by the page cache property in the Neo4j configuration. It gives you the ability to define how much graph data can be stored in RAM. Neo4j is not limited by total RAM, but rather by total disk storage, but a drop in the Neo4j’s performance without the RAM would be noticeable.

To improve performance, Neo4j will try to load as much data as possible to the RAM cache, performing the so-called warm-up process.The documentation recommends giving 90% of the instance available memory to the page cache. That is an optimization from the Neo4j side that will help improve the overall performance of Neo4j.

Even though there are many differences between on-disk and in-memory storage, the choice primarily depends on your use case and your requirements.

Benchmark shines light on hosting costs

Now that you know that memory usage is an important factor for hosting costs and how both Memgraph and Neo4j work with memory, how do you actually decide which database to chose? The best option would be the one that uses less memory, thus enables you to use smaller instance. To find out memory measurements, we have executed workloads on the database containing the Slovenian social network, Pokec available in three different sizes, small, medium, and large:

small - 10,000 vertices, 121,716 edges
medium - 100,000 vertices, 1,768,515 edges
large - 1,632,803 vertices, 30,622,564 edges

As the sizes of the datasets in production enviroments vary from few thousands to trillions of nodes and edges, these datasets are all on the smaller, as they are used for demonstration purposes. In the Memgraph storage engine, the small dataset takes approximately 40 MB of RAM, the medium 400MB of RAM, and the large dataset takes approximately 4GB of RAM.

Memory is tracked during importing of the dataset from the CYPHERL file and the execution of two queries. While the workloads are executing on a Linux machine, the script samples the total Memgraph and Neo4j RSS usage every 50 milliseconds and tracks the memory usage

Keep in mind that the workloads were run with the out-of-the-box community version of Neo4j and Memgraph, databases weren’t additionally configured.

The two executed queries are K-hop query and an aggregation query. The expansion or K-hop queries are among the most interesting queries in any graph analytical workload. Expansion queries start from the target node and return all the connected nodes that are a defined number of hops away. It is an analytical query that is fairly cheap to execute and used a lot:

MATCH (s:User {id: $id})-->(n:User) RETURN n.id

The aggregation query is very memory intensive. It needs to match all the nodes in the database and get the minimum, maximum and average values of a certain node property.

MATCH (n) RETURN min(n.age), max(n.age), avg(n.age)

So how did Memgraph and Neo4j do on performing these tasks? If you are presuming that, because Memgraph is an in-memory graph database, it must use more RAM than Neo4j, an on-disk graph database, you will be surprised.

The graph below shows memory usage during the execution of an expansion 1 query on a small dataset. The identical query was run 10.000 times on 12 concurrent clients.

As you can see from the line chart, Memgraph executed these 10.000 queries much faster, and with less memory. As Memgraph executed this workload in just a fraction of the time, compared to Neo4j, the linechart is not the best way to visualize the data.

It is interesting to see how Neo4j memory usage is rising and falling due to the JVM and workload execution, because JVM is trying to allocate optimal amount of memory during the execution. It is important to note that Neo4j is not using all of the memory to execute this workload, meaning that JVM is over-allocating memory for future use, which is usual behavior for JVM-based applications. Due to it, it’s harder to interpret actual memory usage, and Neo4j didn’t ease the process as well.

The graph below shows memory usage during the execution of 10.000 aggregation queries on a small dataset.

Notice how Memgraph’s memory usage is fairly constant at 300 MB during the entire workload execution, while Neo4j’s memory usage is again using various amounts of memory during the run. Again, JVM allocates extra memory while executing the queries, and in the end, it settles around 5.3 GB.

Neo4j memory usage can be decreased to use RAM for JVM and some extra RAM for other components like transactions, query caching, etc. But as it turns out, the costs of this modification is performance and stability degradation. During testing, we experimented a bit with Neo4j memory usage configurations and limited Neo4j memory usage a bit more aggressively, which resulted in several crashes. The ability to fine-tune a database is great for configuring software into perfect condition, but it can increase engineering costs, since you need to invest engineering time to properly understand the system and fine tune the database configuration. Memgraph is fully self-manageable, which is shown by the constant memory usage.

Overall, Neo4j consumes more memory than Memgraph while executing workloads on the small dataset, which is expected for JVM based system, but this can change with the scale of the dataset. To understand the worst-case scenario of operating the database on AWS instances, peak memory usage should be considered. Peak memory usage will show the amount of RAM the instance should use in the worst case scenario.

Here are peak memory usages for the small dataset:

Memgraph peak memory usage is 235 MB during data import, while Neo4j is 5 345 MB during aggregation queries.

In this particular case, Memgraph could potentially fit on the t3.nano instance, but that would be a very thigh squeeze, and the instance could be used only for the database, no other software could be run on the instance. This means Memgraph fits on a t3.micro which costs $299 for a three-year rent. Neo4j, on the other hand, could fit on a t3.large which costs $2.369 for a three-year rent. That is a difference of $2.070 dollars for a three-year rent

Let’s see what happens on a medium-sized dataset:

In this case, Memgraph's peak memory usage is 1.2 GB during data import, while Neo4j is 6.3 GB during aggregation queries. Relative to these peak values, Memgraph could fit on a t3.small which costs 599$ for a three-year rent. Neo4j, on the other hand, could fit on a t3.large which costs 2.369$ for a three-year rent. That is a difference of 1.770$ dollars for a three-year rent.

With the large dataset, Memgraph's peak memory usage is 6.07 GB during data import, while Neo4j is 17.9 GB, also on data import. The import process consists of many serial transactions, which is memory-wise stressful for both databases. In this particular case, Memgraph fits on a t3.large which costs 2.369$ for a three-year rent. Neo4j, on the other hand, could fit on a t3.2xlarge which costs 9,586$ for a three-year rent. That is a difference of 7.217$ for a three-year rent.

As you can see, in some situations Memgraph can be more memory efficent then Neo4j. By correlating peak memory usages to AWS instance pricing savings with Memgraph could be substantial. This depends of few factors, such as, dataset size, RAM restrictctios, performance goals, instance types etc.

Conclusion

Database hosting is an interesting cost in the total cost of ownership and it should be taken into consideration before deciding which database to choose. As you can see, Memgraph is a pretty stable database regarding memory usage, which could lead to potentially significant cost savings. On top of that, Memgraph is also faster, which can lead to a more responsive system overall.

We would love to see what type of application you want to develop with Memgraph. Take Memgraph for a test drive, check out our documentation to understand the nuts and bolts of how it works, or read more blogs to see a plethora of use cases using Memgraph. If you have any questions for our engineers or want to interact with other Memgraph users, join the Memgraph Community on Discord!

Get a Feature-Rich Open-Source Community Edition Graph Database Ready for Production

Ante Javor — Mon, 09 Jan 2023 10:45:02 +0000

In the last couple of years, the world has embraced the open-source community, bringing many community-driven projects to success. These days if you need a software solution for some business challenge, you can probably find the solution in a wide range of successful open-source projects. Since graph database space is fairly young and most projects started in the last few years, most graph databases are developed under open-source licenses from the get-go. This is great for the whole evolving ecosystem of graph databases, and it gives users free access to the most cutting-edge graph database technology.

Developing a scalable and robust graph database takes immense amounts of effort and capital. This is why most database companies require a lot of capital upfront. The fact that you are giving your product for free to the community doesn’t help the fact that you need capital to maintain, scale, and improve the database. Most vendors then differentiate between the free community version and the enterprise version that comes with a bill. It varies from vendor to vendor, but community and enterprise versions are usually identical products based on the same codebase. The difference is that the enterprise editions include more features, especially regarding security, and they allow for a bigger scale of data.

So to decide what version of the graph database you need, you need to decide what features are important for your use case and do they come with a cost or not. In this blog post, the focus will be on Memgraph and Neo4j as potential vendors for your graph solution. Memgraph and Neo4j are compatible databases. They both use Bolt protocol and Cypher for queries, which means you can easily switch your infrastructure between them. Both vendors have community and enterprise editions of the database. Let’s see how they compare head to head.

Production-ready database

To call a database production ready, it needs to have several core features. The first and most important feature is ACID transactions, where ACID stands for atomicity, consistency, isolation, and durability. ACID transactions ensure that your database is executing queries properly. This means that your graph data won’t be compromised by incomplete transactions and left in a corrupted state. To avoid getting into extreme engineering details about each ACID property, this means your database has basic engineering prerequisites to be called a database.

The other basic feature or prerequisite is persistence. It ensures that your database is saving data and its current state to permanent memory storage, which means that losing power on your database server won’t lose any data. These two properties are not the only important ones, but each production-ready database needs to have them.

Obviously, both Memgraph and Neo4j support ACID transactions. Neo4j being an on-disk database, is, by design, a persistent database. Memgraph is an in-memory graph database, so it does periodic snapshots that are stored in permanent memory, which enables persistence.

After the database is capable of doing ACID transactions and being persistent, other features are highly dependent on the specific use case that you need, and those features will define if you need an enterprise or community version of the database for production. For a production environment, you should probably deploy an enterprise version of the database, right? Well, not really. It depends on what database features are included in the community versions.

Memgraph and Noe4j community editions head to head

Both Memgraph and Neo4j are open-source vendors that provide a lot of value to graph communities. Building great products and making them available to the community comes at the cost of operating expenses. That is why both vendors provide enterprise editions of their graph database. Enterprise editions usually contain most of the features that are needed for more scalable and demanding use cases. Both Memgraph and Neo4j come with their own set of enterprise features, which mainly focus on security, such as LDAP integration, activity auditing, role-based authorization etc.

Obviously, using a community version of the database is cheaper for you. This can be crucial for small companies and startups that need a performant graph database but don’t have bucks for the enterprise edition. Below are several features that differentiate Memgraph and Neo4j community editions. Some of them might be very important for your use case, and some of them less important, but they are all very universal to any graph solution.

High availability

Running any software these days needs to provide some agreed-upon and universal availability claims, such as 99.99% uptime per year. These constraints and values can differ from project to project, but deploying any database in production without the supported features for high availability is not recommended for critical infrastructure projects.

There are several different ways to achieve high availability, one of them being replication. Implementations of replication can be done in several different ways, but concepts and benefits are similar. Some of the benefits that come with replication are the following:

high availability,
server load balancing,
data reliability,
disaster recovery,
lower query latency.

Database replication is the process in which the main database instance sends changes or updates to the database replicas. In usual configurations, each database instance is located on a different server. If, for any reason, the main database instance fails, another secondary backup server with a replica of the database will take care of the upcoming requests. There can be multiple replicas of the main instance, depending on the implementation and redundancy requirements. This also means that if multiple servers have a hardware failure, you will not lose your data.

Replication can be a bit of a complex topic that is out of the scope of this blog post. But the important thing is that running replication and syncing data between different servers leads to many benefits, one of them being high availability, that will ensure that your services are always up and running. This is especially important for critical infrastructure applications.

This brings the discussion back to Memgraph and Neo4j community editions. Neo4j community edition does not support replication of the database. This feature and other features regarding multiple database instances are part of clustering features under the enterprise edition of Neo4j. If your data in a community edition is stored on a single instance of Neo4j, it is vulnerable to any type of server failure. Memgraph does support replication, and in the community edition, you can run replication on a cluster of Memgraph instances.

Performance

One of the most important aspects of running any database system is performance. It affects every graph project and application built on top of the database. You want your graph database to be performant as possible so you can build fast and reliable applications on top of it. Graph databases thrive on complex analytical workloads, so if you are building real-time streaming applications, performance is a must from the get-go.

Memgraph community edition is quite performant as it can run up to 120 times than the Neo4j community edition. The key here is that Memgraph’s performance doesn’t differ between community and enterprise editions. If you run a small business and want a performant graph database out of the box, Memgraph’s community edition offers it for free.

On the other hand, the situation with the Neo4j community edition is a bit more complex. As mentioned on their official website, the enterprise edition benefits from the faster Cypher, which should run queries from 50% to 100% faster on the enterprise edition. This means faster queries are not supported on the community edition of the database.

Even though it’s an on-disk graph database, Neo4j is able to cache a lot of graph data to RAM. It does so to improve performance and avoid costly disk access. In order to benefit from that performance, you need to per-warm the database cache by executing various different queries and just giving it time. Without this pre-warm procedure, Neo4j will perform quite poorly.

Once Neo4j is warmed, it should not be turned off or restarted since that will empty the cache. There is an automatic warm-up feature and automatic re-heat after a restart, but they are limited to the enterprise edition. In the community edition, you will need to crank it up manually to get some performance boost.

Also, some Neo4j community versions have 4 CPU core limits, but it is unclear from the documentation which components are affected by these limitations. All in all, the Neo4j Community Edition restricts performance, while Memgraph offers full performance for both editions of the database.

Data science on graphs

Since community editions are oriented towards small and medium-sized companies with humble R&D budgets, it’s useful to know that both Memgraph and Neo4j have a library of supported algorithms. Neo4j’s Graph Data Science library (GDS) and Memgraph Advanced Graph Extensions (MAGE).

Both libraries are open-source and contain various graph algorithms such as PageRank, community detection algorithms, etc. These out-of-the-box solutions save users time in developing common algorithms. They are usually hard to understand and write.

But, there are some differences regarding supported algorithms and how they are implemented. In the Neo4j community edition of GDS, you can execute algorithms on just 4 CPU cores. Enterprise edition lifts the CPU limit. On top of that, Neo4j GDS with enterprise license benefits from optimized graph implementations, which are not present in the community edition. With Memgraph MAGE, there are no CPU restrictions, and all algorithms perform in their optimized, native form without restrictions.

Property constraints

One of the features you will probably need is property constraints when creating nodes and relationships. This feature ensures the presence of a property. Having a node representing a person but without a name or last name properties doesn’t make much sense. In the community edition of Neo4j, you cannot create a property constraint on a node or relationship, it’s an enterprise feature. Memgraph’s community edition does not have that or similar Cypher restrictions.

Memgraph and Neo4j have some cool and useful features in the community version. Not all features are present in both cases, so you should check your own set of requirements for the project. Mentioned features above are just some of the general features that could be universally important to any graph project.

Cost of ownership

Both databases are open-source community editions and are available to the public for free, so there is no direct cost to using them. But, what you need to pay for is hosting the databases in production. If you host a database on public VMs, the hosting cost will correlate with CPU time, memory, network, and storage usage.

The bar chart below shows Memgraph and Neo4j memory usage during the execution of 4 different types of workloads:

Even though Memgraph is a high-performant in-memory database, in some scenarios, Neo4j will actually consume more memory. At these four mixed workloads, Neo4j is paying the price for being based on JVM, which can have a lot of memory overhead. As seen on the chart, Neo4j uses up to 2.2GB of memory, while Memgraph uses around 400MB for the identical task. Neo4j will allocate quite large amounts of memory and use it only partially for caching, which leads to buying more expensive cloud virtual machines and making hosting more expensive.

Memory usage costs are highly correlated with replication. The downside of replication is the hosting costs of running multiple database replica instances. The hosting cost of every replica instance is identical to the costs of hosting the main instance, as they all require the same resources. As you can see, in some situations, a single instance of Neo4j can have a higher hosting cost than Memgraph, therefor running multiple instances in replication will lead to much higher costs overall. This, of course, depends on several factors, how big the dataset is, how restrictive you are about RAM usage, what AWS instance is being used, etc.

Conclusion

Having multiple choices for an open-source graph database is great for developers. When choosing which graph database to use, it comes down to supported features. As you can see, Memgraph community edition has stellar performance, supports replication to achieve high availability, and allows property constraints. Neo4j, on the other hand, is a bit more restrictive with the community edition and is less performant, but it is a company that paved the way for graph databases with a rich legacy which can sometimes be a two-sided coin.

When the time comes to make a choice, keep in mind that available features are not the only important thing. You should also consider the cost of ownership, supported visualization tools, stellar documentation, customer support, supported languages, easy-to-use APIs, etc. Memgraph as a company was started because we couldn’t find a database that would address all these issues in a satisfactory way, starting from performance to other important features, which is why these are the issues we address with extra attention.

How to Choose a Graph Database for Your Real-Time Application

Ante Javor — Tue, 20 Dec 2022 14:47:20 +0000

Graph databases are a powerful tool to analyze high volumes of highly connected data. Alongside batch processing, graph databases can execute real-time analytics on streaming data by using graph algorithms and queries. But what does real-time even mean, and how it fits into the context of graph databases?

Real-time software can differ widely from use case to use case. From Formula 1 real-time implementations with extremely high velocity to simple food delivery real-time alert systems. In general, real-time software is built with time requirements in mind and should be as responsive as possible. The exact definition of time requirement is defined by the use case. In F1, data is changing in milliseconds, which means tens of milliseconds are considered real-time, while in food delivery, events happen in seconds, which means seconds are considered real-time. It may not seem much of a difference from milliseconds to seconds, but from an engineering perspective, the difference is huge. Lowering time requirements under a second requires a lot of engineering effort. In general, a use case is considered to require real-time analytics if the changes need to be observed frequently, in most cases, in intervals under 1 second.

Another important aspect of real-time software is the value of information which is at its highest the moment a certain event occurs and a change happens. The F1 team finds it important to notice an opportunity or issue as soon as possible, and in the food delivery case, you can plan your evening depending on the estimated food delivery time based on delivery updates. If users face latency, the information loses value and the real.time software becomes useless. Delivering important information within the right time limits, aka in real-time, on a huge scale of data can be a challenging engineering effort.

Graph analytics that yields data-driven actions and recommendations can be a part of real-time systems and serve analytics to provide users with relevant information. As a bunch of different components will be built on top of that database, its impact on the whole chain should be as minimal as possible.

There are several graph databases to choose from, and this blog post will consider Memgraph and Neo4j as possible vendors for a real-time solution. These graph databases were chosen since they both have great interoperability. First, let’s see what metrics define what graph database could be a good fit for real-time use cases.

Real-time solutions require low latency and high throughput

Real-time software infrastructure needs to be able to serve a bunch of analytical dashboard applications and front-facing user applications under time-defined time restrictions. This means application infrastructure must have the lowest possible latency because unwanted latencies can creep in as data volume, number of clients or workload increases. This means infrastructure should be optimized for the worst-case scenario from the start of the project development. Latency value represents the ability of the software infrastructure to respond to certain types of events or requests within the defined time limit or lower.

Latency can be introduced in many different components of infrastructure. The first and most important latency measurement is end-to-end latency. It represents the amount of time it takes for an event or request to be sent from the user and for a response to be received by the user. End-to-end latency is important because the time frame in which the user sends a request and receives an answer defines the value of a real-time product.

In real-time applications, end-to-end latency has an available time budget of less than a second, and it must be allocated very delicately. This means all components in the software infrastructure participating in the user request must take less than a second. This can include frontend latency, network latency, application latency, database latency etc., and it’s different with each use case. Each mentioned component can be broken down into several smaller subcomponents, which show the exact latency of each subcomponent.

One of the components crucial for real-time analytical applications is database query latency, and querying large volumes of connected data can present technical challenges.

Let’s consider this example. A real-time application has an available time budget of 500 milliseconds. The total database query latency can accumulate just a fraction of it, let’s assume 10% or 50ms. That’s the worst-case upper limit. It would be perfect if the database could consume even less. If queries would take just 10ms, the rest of the end-to-end latency budget could be used by the other parts of the system or remain unallocated, resulting in more responsive infrastructure. Breaking the upper limit would mean the service is not functioning properly.

Even though the query latency should be low and under budget, it is not the only metric that indicates database performance. Measuring query latency on several queries without a heavy load on the database can give good latency results but poor scalability. This means the database is fast on serving one or a few users simultaneously. So how do we measure scalability?

The measurement that represents scalability is throughput. Throughput defines how many requests the database can serve per unit of time. Usually, it’s represented as queries per second (QPS). Simply speaking, throughput will represent how many user requests the database can handle per unit of time. If the database has more throughput, this means it can serve more users. Lower latency is usually correlated with high throughput, but it can also vary a lot due to different database implementations, but we'll discuss that in another post. That’s one of the reasons why it is important to measure throughput concurrently because real-time applications need to serve many users while preserving both latency and throughput.

Who is better for real-time software - Memgraph or Neo4j?

Both Memgraph and Neo4j are graph database vendors that you can use for graph analytics, but the question is - what vendor is a better fit for your real-time use case?

Memgraph and Neo4j are quite similar from an interoperability perspective - both databases use Bolt protocol and Cypher query language. But from the implementation perspective, Memgraph and Neo4j are architecturally completely different systems. What distinguishes them the most is that Neo4j is based on JVM, while Memgraph is written in native C++. To understand the performance limits of each vendor, we have been benchmarking Memgraph 2.4 and Neo4j 5.1 community editions. For a full report on all differences and how we executed this test, you can take a look at the performance comparison.

As mentioned, the most important metrics for real-time software are latency and throughput. It’s important to notice that latency is single-threaded, while throughput is calculated as the workload of 12 concurrent clients querying the database. The current benchmark configuration consists of 23 queries, each being of a different type - read, write, update, aggregate and analytical.

Latency

Let’s take a look at latency results in milliseconds for each of the queries:

If you take a look at the last column, it shows how much faster Memgraph is at this specific query. By looking at the whole table, you can notice that Memgraph is faster across the board. The absolute values of latency across the whole table show that most of the queries executed with Memgraph took from 1.09 to 11.32 milliseconds with few exceptions. Outliers are queries Q9 to Q12, which are complex queries touching most of the dataset’s nodes and edges. With Neo4j, the queries were executed from 13 milliseconds to 135 milliseconds without the mentioned outliers.

Absolute query latency values on both vendors seem pretty low, but if the total end-to-end latency request budget is under 500 milliseconds and the database latency budget is 50 ms, which is a fair amount of time for real-time software, any query that is executed longer than that is out of budget and not a valid query for this real-time use case. For example, 84 milliseconds query is out of budget and not a valid query for a real-time use case. Just a side note, Google search results return values in approximately 400-700 milliseconds, you want your service to try to be responsive as Google search is.

One of the most interesting queries in any graph analytical workload is the expansion or K-hop queries. Expansion queries start from the target node and return all the connected nodes that are a defined number of hops away. Expansion queries are data intensive and pose a challenge to databases. Probably the most used expansion query is the one with a single hop. It’s an analytical query that is fairly cheap to execute and used a lot. This is query Q5 in the table:

This specific query takes Memgraph 1.09 milliseconds to execute, while it takes Neo4j 27.96 milliseconds. If the time budget for database latency is 50 milliseconds, this means Memgraph is consuming 2.18% of the budget while Neo4j is consuming 55% of the budget.

Now let’s take a look at Q7, which is an expansion 2 query. It takes Memgraph 6.10 milliseconds to execute it and Neo4j 88 milliseconds. To execute this query, Memgraph uses 12.2% percent of the available time budget, while Neo4j uses 176% of the available time budget. Memgraph easily stays within the budget of 50ms. This is just an approximation of the possible budget for queries, of course, the budget depends on the real-time use case. But regardless of your budget, Memgraph will provide more space for future latency improvements.

Throughput

Obviously, completing a request under the budget is the best-case scenario, but this is just a part of the picture. The database should be able to serve multiple concurrent users at the same time. If there are thousands of users using the application at the same time, the database should perform similarly as in the case of a single user. This is where throughput comes into play. By measuring concurrent throughput, we can estimate how much of a particular query database can serve in a 1-second time frame. Here is the throughput for 4 different workloads. Each workload consists of multiple queries of read, write, update, and analytical types.

On workload W1, which is made of 20% analytical, 40% read, 10% update, and 30% write queries, Memgraph can handle 3.889 queries per second, while Neo4j can handle 37 queries per second. Handling a higher number of clients that are creating a request in the same time frame shows how capable the database is in handling bigger volumes of data in short bursts. This means that latency is not impacted by more concurrent clients, and performance is in line with latency values. As you can see on, Memgraph has a much higher throughput. This means that Memgraph is more scalable and can handle much faster large volumes of data needed for a real-time graph database.

Benefits and cost implications

Running all this software doesn’t cost much since both tested databases are open-source community editions and are available to the public for free, there is no direct cost to use the database. Of course, each vendor has its own enterprise editions of the database that you can pay for, but that is out of the scope of this blog post.

What you need to pay for is hosting these databases in production, and that can lead to hosting expenses. The hosting cost will correlate with CPU time, memory and storage usage. Take a look at the chart below to see memory usage for mentioned workloads:

Conclusion

Low latency and high throughput are key metrics when deciding whether a graph database can be considered capable of delivering real-time graph analytics. As benchmarking results show, Memgraph is quite a capable graph database, while Neo4j is also a capable graph database, just a few gears slower and a bit memory hungry in some scenarios. We would love to see what real-time application you want to develop with Memgraph. Take Memgraph for a test drive, check out our documentation to understand the nuts and bolts of how it works, or read more blogs to see a plethora of use cases using Memgraph. If you have any questions for our engineers or want to interact with other Memgraph users, join the Memgraph Community on Discord!

Memgraph vs. Neo4j: A Performance Comparison

Ante Javor — Thu, 01 Dec 2022 13:24:13 +0000

Over the past few weeks, we have been benchmarking Memgraph 2.4 and Neo4j 5.1 community editions. Graph databases with similar capabilities, but architecturally completely different systems. What distinguishes them the most is that Neo4j is based on the JVM, while Memgraph is written in native C++.
To understand both databases' performance limits, we executed a wide variety of concurrent Cypher queries in different combinations.

If you prefer a TL;DR, here it is: Memgraph is approximately 120 times faster than Neo4j, all while consuming one quarter of the memory and providing snapshot isolation instead of Neo4j’s default of read-committed.

Benchmarking is subtle and must be performed carefully. So, if you are interested in the nuts and bolts of it, see the full benchmark results or dive straight into the methodology or read on for the summary of results between Memgraph and Neo4j.

Benchmark Overview

In its early stages, mgBench was developed to test and maintain Memgraph performance. Upon changing Memgraph’s code, a performance test is run on CI/CD infrastructure to ensure no degradation. Since we already had a full testing infrastructure in place, we thought it would be interesting to run a performance test on various graph databases. Due to its compatibility, Neo4j was first on the integration list.

Benchmarks are hard to create and often biased, so it’s good practice to understand the benchmark’s main goals and technical details before diving deeper into the results. The primary goal of the benchmark is to measure the performance of any database “out of the box”, that is, without fine-tuning the database. Configuring databases can introduce bias, and we wanted to do a fair comparison.

To run benchmarks, tested systems needed to support the Bolt protocol and the Cypher query language. So, both databases were queried over the Bolt protocol using a low-overhead C++ client, stabilizing and minimizing the measurement overhead. The shared Cypher query language enables executing the same queries on both databases, but as mgBench evolves and more vendors are added to the benchmark, this requirement will be modified.

Performance and memory tests can be run by executing three different workloads:

Isolated - Concurrent execution of a single type of query.
Mixed - Concurrent execution of a single type of query mixed with a certain percentage of queries from a designated query group..
Realistic - Concurrent execution of queries from write, read, update and analyze groups.

In our test, all benchmark workloads were executed on a mid-range HP DL360 G6 server, with a 2 x Intel Xeon X5650 6C12T @ 2.67GHz and 144 GB of 1333 MHz DDR3 memory, running Debian 4.19.

Each workload was executed with 12 concurrent clients querying the database. All results were acquired in both cold and hot run. When performing a hot run, the database was pre-queried before executing any benchmark queries and taking measurements.

To get the full scope of the benchmark’s technical details, take a look at our methodology. It contains details such as queries performed, the benchmark’s limitations and plans for the future. You can also find reproduction steps to validate the results independently.

Results

The whole benchmark executes 23 representative workloads, each consisting of a write, read, update, aggregate or analytical query. Write, read, update, and aggregate queries are fairly simple and deal with nodes and edges properties, while analytical queries are more complex.

The key values mgBench measures are latency, throughput and memory. Each of these measurements is vital when deciding what database to use for a certain use case. The results shown below have been acquired by executing queries on a small dataset on a cold run.

Latency

Latency is easy to measure and is therefore included in almost every benchmark. It’s a base performance metric representing how long a query takes to execute in milliseconds. Query complexity presents an important factor in absolute values of query latency. Therefore, executing complex analytical queries will take more time and latency will be higher, while more straightforward queries will execute faster, thus having lower latency.

The latency of the same queries can vary due to query caching. It is expected that running the same query for the first time will result in a higher latency than on the second run. Because of the variable latency, it is necessary to execute a query several times to approximate its latency better. We pay particular attention to the 99th percentile, representing the latency measurement that 99% of all measurements are faster than. This might sound as though we are focusing on outliers, but in practice it is vital to understand the “tail latency” for any system that you will build reliable systems on top of.

One of the most interesting queries in any graph analytical workload is the expansion or K-hop queries. Expansion queries start from the target node and return all the connected nodes that are a defined number of hops away. Depending on the dataset, expanding several hops away from the target will probably cause the query to touch most of the nodes in the dataset. Expansion queries are data intensive and pose a challenge to databases. In mgBench, there are expansion queries with hops from 1 to 4. Probably the most used expansion query is the one with a single hop. It is an analytical query that is fairly cheap to execute and used a lot:

MATCH (s:User {id: $id})-->(n:User) RETURN n.id

As the bar chart above shows, it takes Memgraph 1.09 milliseconds to execute Expansion 1 query, while it takes Neo4j 27.96 milliseconds. On this particular query, Memgraph is 25 times faster. But this is just one sample on a single query. To get a complete picture of latency performance, here is the latency measured across 23 different queries.

As you can see, Memgraph latency is multiple times lower compared to Neo4j on each of the 23 queries. A full query list is available in the methodology section. In Memgraph, latency ranges from 1.07 milliseconds to one second on a Q11, Expansion 4 query, the most challenging one in mgBench. Neo4j ranges from 13.73 milliseconds to 3.1 seconds. Lower query latency across the board brings a great head start, but it is not the complete picture of the performance difference between databases.

Latency is just an end result that can depend on various things, such as query complexity, database workload, query cache, dataset, query, etc. The database needs to be tested under concurrent load to understand the database performance limits since it imitates the production environment better.

Throughput

Throughput represents how many queries you can execute per fixed time frame, expressed in queries per second (QPS). To measure throughput on an isolated workload, each of the 23 queries is executed in isolation concurrently, a fixed number of times, and divided by the total duration of execution. The number of queries executed depends on query latency. If latency is low, more queries are executed. If latency is high, fewer queries are executed. In this case, the number of executed queries is equivalent to an approximation of 10 seconds worth of single-threaded workload.

More throughput means the database can handle more workload. Let's look at the results of the Expansion 1 query mentioned earlier:

Memgraph has a concurrent throughput of 32,028 queries per second on Expansion 1, while Neo4j can handle 280 queries per second, which means Memgraph is 114 times faster than Neo4j on this query. But again, this is throughput for just one query. Let's look at all 23 queries in an isolated workload:

As you can see on the bar chart above, Memgraph has significantly higher throughput on each of the 23 queries. Since the difference in throughput is huge for some queries, it is best to focus on the absolute number for each query. The worst case scenario for Memgraph and the best case scenario for Neo4j is Q4, where Memgraph is “just” 5.1 times faster than Neo4j.

Database latency and throughput can be significantly influenced by result caching. Executing identical queries several times in a row can yield a superficial latency and throughput. In the production environment, the database is constantly executing write and read queries. Writing into a database can trigger the invalidation of the cache. Writing can become a bottleneck for the database and deteriorate latency and throughput.

Measuring throughput while writing into the database will yield a more accurate result that reflects realistic caching behavior while serving mixed workloads. To simulate that scenario, a mixed workload executes a fixed number of queries that read, update, aggregate or analyze the data concurrently with a certain percentage of write queries.

The Expansion 1 query executed before is now concurrently executed mixed with 30% of write queries:

As you can see, Memgraph has maintained its performance in a mixed workload and has a 132 times higher throughput executing a mixed workload than Neo4j. Here are the full throughput results for the mixed workload:

The total mixed workload didn’t impact overall Memgraph's performance, it is still performing very well on each of the 21 queries. Two write queries used for creating mixed workload are not tested, since it would not be mixed workload.

Both isolated and mixed workloads have given a glimpse of the possible production performance of the database, while each of the previous measurements can have limitations. Realistic workload is the most challenging test of database performance. Most of the databases in production execute a wide range of queries that differ in complexity, order, and type. The realistic workload consists of writing, reading, updating, and doing analytical work concurrently on the database. The workload is defined by the percentage of each operation. Each workload is generated non-randomly for each vendor and is identical on each run.
Currently, there are 4 realistic workload distributions:

W1 - 20% Analytical, 40% Read, 10% Update and 30% Write
W2 - 70 % Read and 30 % Write
W3 - 50 % Read and 50 % Write
W4 - 30 % Read and 70 % Write

Take a look at the throughput results for each workload:

Memgraph performs well on the mixed workload, with 102 times higher throughput on W1, 122 times higher on W2, 125 times higher on W3, and 121 times higher on W4. Overall, Memgraph is approximately 120 times faster than Neo4j.

Memory usage

The performance of the database is one of many factors that define which database is appropriate for which use case. However, memory usage is one of the factors that can really make a difference in choosing a database, as it significantly impacts the cost-to-performance ratio. Memory usage in mgBench is calculated as peak RES (resident size) memory for each query or workload execution. The result includes starting the database, executing the query or workload, and stopping the database. The peak RES is extracted from process PID as VmHVM (peak resident set size) before the process is stopped. Peak memory usage defines the worst-case scenario for a given query or workload, while RAM footprint is lower on average.
Since Memgraph is an in-memory graph database, some cost must be attached to that performance. Well, this is where things get interesting. Take a look at the memory usage of realistic workloads:

As you can see, Memgraph uses approximately a quarter of the Neo4j memory. Since Neo4j is JVM based, it is paying the price for the JVM overhead.

Consistency

While it is beyond the scope of this blog post to cover the implications of database isolation levels and how weaker isolation levels prevent broad classes of applications from being correctly built on top of them at all, the high-level picture is that Memgraph supports snapshot isolation out of the box, while Neo4j provides a much weaker read-committed isolation level by default. Neo4j does support optional manual locking, but this does not protect all queries by default. In general, it is not practical for most engineers to reason about the subtleties of transaction concurrency control by requiring them to implement their own database locking protocol on top of the application that they are responsible for building and operating.

In practice, this means that a far broader class of applications may be correctly implemented on top of Memgraph out-of-the-box without requiring engineers to understand highly subtle concurrency control semantics. The modern database ecosystem includes many examples of systems that perform extremely well without giving up correctness guarantees from a stronger isolation level, and Memgraph demonstrates that it is simply not necessary to give up the benefits of Snapshot Isolation while achieving great performance for the workloads that we specialize in. In the future we intend to push our correctness guarantees even farther.

Conclusion

As the benchmark shows, Memgraph is performing an order of magnitudes faster in concurrent workload than Neo4j, which can be crucial for running any real-time analytics that relies on getting the correct information exactly when needed.

Benchmark can be found here and all the code used for running these benchmarks is publicly available so if you want to reproduce and validate the results by yourself, you can do it by following the instructions from the methodology. Otherwise, we would love to see what you could do with Memgraph. Take it for a test drive, check out our documentation to understand the nuts and bolts of how it works or read more blogs to see a plethora of use cases using Memgraph. If you have any questions for our engineers or want to interact with other Memgraph users, join the Memgraph Community on Discord!

Forem: Ante Javor

Benchgraph Backstory: The Untapped Potential

New bigger datasets

More complex queries

Update to the benchmark procedure

Tests on multiple concurrent clients

New volcanic warm-up

Hardware

Running configuration changes

Results

You can run benchmarks on your workload!

Next steps

How to Benchmark Memgraph [or Neo4j] with Benchgraph?

Add your workload

1. Inherit the workload class

2. Define the workload name

3. Implement the dataset generator method

4. Implement the index generator method

5. Define the queries you want to benchmark

Run benchmarks on your workload

How to compare results

How to configure the benchmark run

How to run the same workload on the different vendors

Happy benchmarking

Introduction to Benchgraph and its Architecture

Benchgraph architecture

benchmark.py

runners.py

C++ Bolt client

Benchgraph benchmark process

Benchmark configuration options

Why does Memgraph build an in-house benchmark tool?

How Much Money Will You Spend on Hosting a Database

Approximate the cost of hosting a database

Differences between Memgraph and Neo4j memory architecture

Benchmark shines light on hosting costs

Conclusion

Get a Feature-Rich Open-Source Community Edition Graph Database Ready for Production

Production-ready database

Memgraph and Noe4j community editions head to head

High availability

Performance

Data science on graphs

Property constraints

Cost of ownership

Conclusion

How to Choose a Graph Database for Your Real-Time Application

Real-time solutions require low latency and high throughput

Who is better for real-time software - Memgraph or Neo4j?

Latency

Throughput

Benefits and cost implications

Conclusion

Memgraph vs. Neo4j: A Performance Comparison

Benchmark Overview

Results

Latency

Throughput

Memory usage

Consistency

Conclusion

1. Inherit the `workload` class