Forem: Alejandro Duarte

The "bus factor" risk in MongoDB, MariaDB, Redis, MySQL, PostgreSQL, and SQLite

Alejandro Duarte — Tue, 03 Mar 2026 15:18:25 +0000

Ever wonder what would happen to an open source database project in case its main developers "get hit by a bus"? Or less dramatically, if they leave the project completely. That's what the "bus factor" (also called "truck factor") measures: how many people would have to disappear before no one left knows how to fix or update specific parts of the code.

The bus factor ranking

I’ve been messing around with a tool called the Bus Factor Explorer by JetBrains to explore the associated risk in some of the most popular open source databases. I looked at six of the big ones to see where they stand. Here’s the current baseline (March 2026) according to this tool:

Database	Bus Factor (higher is better)
MongoDB	7
MariaDB	5
Redis	5
MySQL	2
PostgreSQL	2
SQLite	2

For example, for MongoDB to "hit a wall", 7 devs would have to leave the project. For MySQL, PostgreSQL, and SQLite, if 2 devs leave, the project is at risk of stalling. Of course, this is just one factor; there are other factors that influence the continuation of open source projects, but this is still an interesting insight you can take into consideration when choosing a database.

The tool has a categorization according to the project's bus factor (a baseline that can be adjusted in the tool if you want):

✅ OK: 5 or more (MongoDB, MariaDB, Redis)
⚠️ Low: 2 to 4 (MySQL, PostgreSQL, SQLite)
🔴 Dangerous: 0 or 1 (N/A)

Simulating top contributors leaving the projects

I also used the tool to see what happens to the root directories (the functional parts of the code) if you start unchecking the top contributors to each project. Specifically, I wanted to see the proportion of directories that would be "lost" if the top one or two contributors left the project. I ignored the individual files at the root level. The results look like this:

Database	Directories (total)	Directories lost (1 dev gone)	Directories lost (2 devs gone)
Redis	6	0 (0.0%)	0 (0.0%)
MariaDB	30	2 (6.7%)	5 (16.7%)
MongoDB	28	1 (3.6%)	7 (25.0%)
PostgreSQL	6	0 (0.0%)	5 (83.3%)
MySQL	30	1 (3.3%)	30 (100.0%)
SQLite	11	5 (45.5%)	11 (100.0%)

Here, a lower number is better (fewer directories impacted). So, Redis is pretty strong in this simulation as no directories would take a hit. On the other end, we have MySQL and SQLite with 100% of their directories potentially left unmaintained if the top two developers leave. PostgreSQL would lose 83% of its directories in a similar situation. These were the big surprises to me, although all this is aligned with the fact that they have a low bus factor (high risk) of 2.

MariaDB did pretty well, especially when compared to MySQL, which supports what I have been trying to say about how MariaDB is much more than a fork of MySQL in my articles and talks (see this, and this, if you are curious).

Other important factors when evaluating open source projects

You should not rely merely on "bus factor" risk assessments. This is useful in mission-critical situations or when comparing projects that are even in other metrics that you might be using to compere and evaluate them. Here are some other aspects to look at:

Corporate Backing: Does the project have large companies behind it? Even if a lead developer leaves, the company is likely to hire someone else to fill the gap, ensuring continuity.
Community Activity: Look at the number of active issues, pull requests, and discussions. A buzzing community can often sustain a project better than a few silent experts.
Documentation Quality: Comprehensive documentation ensures that knowledge isn't locked in someone's head, making it easier for new contributors to onboard.
Licensing: Ensure the license fits your use case, as this impacts the project's long-term viability and adoption. For example, a GPL-licensed project doesn't allow anybody to distribute a closed-source application based on it, hence the project is future-proof as far as licensing goes.

Investigate your projects and its dependencies

I investigated the bus factor of other open source projects that I'm a big fan of, like MyBatis, Vaadin, and others, but I'll let you do your own findings with the tool. Let me know if you find something interesting!

Appendix: Raw Impact Data

This appendix lists every directory and file at the root level that drops to a Bus Factor of 0 in the simulations.

1. MongoDB (Baseline: 7 - OK)

With 1 dev gone: .bazelrc.local.example, .gitattributes, .prettierignore, MODULE.bazel, distsrc, etc
With 2 devs gone: All of the above, plus bazel, buildscripts, jstests

2. MariaDB (Baseline: 5 - OK)

With 1 dev gone: VERSION, config.h.cmake, CMakeLists.txt, libmysqld, debian
With 2 devs gone: VERSION, config.h.cmake, CMakeLists.txt, libmysqld, debian

3. Redis (Baseline: 5 - OK)

With 1 dev gone: None
With 2 devs gone: None

4. MySQL (Baseline: 2 - Low)

With 1 dev gone: mysql/mysql-server (root meta), iwyu_mappings.imp, config.h.cmake, CMakeLists.txt, libmysql, mysys, client, components, strings, mysql-test
With 2 devs gone: Virtually all project directories (36 total), including storage, sql, router, extra, plugin, utilities, etc.

5. PostgreSQL (Baseline: 2 - Low)

With 1 dev gone: COPYRIGHT, meson_options.txt
With 2 devs gone: Virtually all functional directories (23 total), including src, contrib, doc, config, interfaces, etc.

6. SQLite (Baseline: 2 - Low)

With 1 dev gone: VERSION, manifest.tags, make.bat, manifest.uuid, configure, README.md, mptest, main.mk, Makefile.msc, doc, manifest, tool, src, ext, test
With 2 devs gone: Virtually all project boxes (28 total), including root-level scripts and all core subdirectories.

MariaDB doesn't depend on MySQL

Alejandro Duarte — Wed, 21 Jan 2026 18:02:51 +0000

When MariaDB was first announced in 2009 by Michael "Monty" Widenius, it was positioned as a "fork of MySQL". I think that was a Bad Idea™. Okay, maybe it wasn't a bad idea as such. After all, MariaDB indeed is a fork of MySQL. But what is a fork in the software sense, and how is this reflected in MariaDB? A fork is a software project that takes the source code of another project and continues development independently from the original. Forks often start by maintaining compatibility with their parent project, but they can evolve to become detached with their own features, architecture, bug tracker, mailing list, development philosophy, and community. This is the case of MariaDB, with the addition that it continues to be highly compatible with old MySQL versions and with its current ecosystem at large.

Before we dig into it, let me clarify that I like MySQL. It was the very first database that I installed during my university time, and I have used it in hobby as well as production projects for a long time. So, why did I affirm that positioning MariaDB as a fork of MySQL was a bad idea? In short, because MariaDB doesn't depend on MySQL. The idea of defining MariaDB merely as a fork of MySQL leads to misconceptions around its future. Take as an example this old comment on Hacker News which refers to the phrase "RIP Open Source MySQL":

"Forgive my ignorance, but doesn't this harm MySQL forks as well? Since the test cases are unavailable from now on, say for example they wanted to reimplement a certain feature, isn't it much harder for them to validate that their implementation works correctly?"

I sympathize with the author of this comment. We were unintentionally misled by the "fork of MySQL" slogan. I encounter this kind of lack of clarity more often than I would like. But the reality is that the development of MariaDB has been independent for many years already. MariaDB developers don't wait for MySQL to implement features, test cases, fix bugs, or innovate. They write their own tests, create their own features, and solve problems in their own way. When Oracle changes something in MySQL or restricts access to a component, that has no meaningful impact on MariaDB's roadmap because the projects have diverged so significantly that they're essentially different database systems that happen to share some common ancestry, be highly compatible (you can use MySQL connectors and tools with MariaDB), and are named after Monty's children.

So, how come projects like Ubuntu “depend” on upstream projects (e.g. Debian) and others like MariaDB don't? In his paper Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers!, David A. Wheeler (Director of Open Source Supply Chain Security at the Linux Foundation), identifies four potential outcomes for software fork attempts:

The death of the fork: The most common outcome, since keeping a software project alive requires considerable effort.
A re-merging of the fork with the original: Both software projects rejoin each other.
The death of the original: Users and developers move to the new younger project.
Successful branching: Both find success with different developers and end users.

For years, the MySQL-MariaDB situation was clearly a successful branching where both projects found new homes. One in Oracle, the other in the new MariaDB Foundation / MariaDB plc duo. Contrary to what many would have thought, Oracle invested in MySQL and continued its development in the open despite having its own close-source relational database. For a period of time, MariaDB kept merging MySQL code commit by commit. However, this changed in 2014 when Oracle stopped publishing MySQL's source code on Launchpad. Even though MariaDB still merges changes from InnoDB, this marked a clear point of divergence in codebases.

Recent (and not so recent) findings and events show that Oracle has slowed down at least on the innovation front and at worst on the maintenance side. In his article Stop using MySQL in 2026, it is not true open source, Otto Kekäläinen (former Software Development Manager at AWS), shows that "the number of git commits on github.com/mysql/mysql-server has been significantly declining in 2025." He also highlights the steep decrease in MySQL's popularity according to DB-Engines, as well as the reported "degraded performance with newer MySQL versions." Are we witnessing a "death of the original" here? I don't know.

In light of all this, many developers are starting to evaluate migration strategies to other relational databases with MariaDB and TiDB being two of the most attractive options. According to Otto Kekäläinen, "TiDB only really shines with larger distributed setups, so for the vast majority of regular small and mid-scale applications currently using MySQL, the most practical solution is probably to just switch to MariaDB." How about the elephant in the room, you might ask? PostgreSQL is a database with tons of forks and third-party extensions that you can download which makes it popular not only due to its features but the sheer number of companies marketing their PostgreSQL flavor online. For applications currently using MySQL, migrating to PostgreSQL requires a lot of work including SQL code and connector migrations. Two tasks that can be close to zero-effort with MariaDB. Check for example this crazy live broadcast where Cantamen (Germany's leading car-sharing service provider) migrates from MySQL to MariaDB with the help of Monty himself.

Let’s get back to my highly opinionated introductory statement... MariaDB is a—now we have learned—detached fork of MySQL, and, to be fair, it has also been positioned as a "MySQL replacement" which is something very accurate to state. I'm glad to see the "replacement" slogan more and more often as opposed to the "fork" one. I personally suggested to Kaj Arnö (Executive Chairman at the MariaDB Foundation) going with something even stronger like "MariaDB fixes MySQL". That's a bit too strong, perhaps. I'm glad they softened it to "MariaDB is the Future of MySQL".

Keyword vs. semantic search with AI

Alejandro Duarte — Wed, 29 Oct 2025 15:22:31 +0000

When building search for an application, you typically face two broad approaches:

Traditional keyword-based search — match words exactly or with simple variants.
Semantic (or vector) search — match meaning or context using AI embeddings.

There's also a hybrid approach, but I will let that for a future article. Instead, in this post I’ll walk you through how the two brad approaches work in Python using MariaDB and an AI embedding model, highlight where they differ, and show code that you can adapt.

The key components

For this example, I used MariaDB Cloud to spin up a free serverless database. Within seconds I had a free instance ready. I grabbed the host/user/password details, connected with VS Code, created a database called demo, created a products table and loaded ~500 rows of product names via LOAD DATA LOCAL INFILE. This is an extremely small dataset, but it's enough for learning and experimentation.

Then I built a small Python + FastAPI app. First I implemented a simple keyword search (by product name) endpoint using full-text index, then I implemented semantic (vector) search using AI-generated vector embeddings + MariaDB’s vector support. You can see the whole process in this video.

Keyword-based search: simple and familiar

For keyword search I used a full-text index on the name column of of the products table. With this index in place, I could search by product name using this SQL query:

SELECT name
FROM products
ORDER BY MATCH(name) AGAINST(?)
LIMIT 10;

I exposed this functionality using a FastAPI endpoint as follows:

@app.get("/products/text-search")
def text_search(query: str):
    cursor = connection.cursor()
    cursor.execute(
        "SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;", (query,)
    )
    return [name for (name,) in cursor]

Pros:

Runs fast.
Works well when users type exact or close terms.
Uses built-in SQL features (no external AI model needed).

Cons:

Misses synonyms, context or related meaning.
Doesn’t understand intent (if user types “running shoes”, a strict keyword search may miss “jogging trainers” or “sneakers”).
Quality depends heavily on the wording.

In my demo, the endpoint returned several products that were not relevant to “running shoes”.

Semantic (vector) search: matching meaning

To go beyond keywords I implemented a second endpoint:

I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector.
Store those vectors in MariaDB with the vector integration for LangChain.
At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products.

Here’s how I implemented the ingestion endpoint:

@app.post("/products/ingest")
def ingest_products():
    cursor = connection.cursor()
    cursor.execute("SELECT name FROM products;")
    vector_store.add_texts([name for (name,) in cursor])
    return "Products ingested successfully"

And this is the semantic search endpoint:

@app.get("/products/semantic-search")
def search_products(query: str):
    results = vector_store.similarity_search(query, k=10)
    return [doc.page_content for doc in results]

The LangChain integration for MariaDB makes the whole process extremely easy. The integration creates two tables:

langchain_collection: Each row represents a related set of vector embeddings. I have only one in this demo which corresponds to the product names.
langchain_embedding: The vector embeddings. Each vector belongs to a collection (many-to-one to langchain_collection).

When I ran the semantic search endpoint with the same query “running shoes”, the results felt much more relevant: they included products that didn’t match “running” or “shoes” literally but were semantically close.

Keyword vs. semantic — when to use which

Here’s a quick comparison:

Approach	Pros	Cons
Keyword search	Quick to set up, uses SQL directly	Limited to literal term matching, less clever
Semantic search	Matches meaning and context, more flexible	Requires embedding model + vector support

Pick keyword search when:

Your search domain is small and predictable or, obviously, you need exact keyword match.
Users know exactly what they’re looking for (specific codes, exact names).
You want minimal dependencies and complexity.

Pick semantic search when:

You need to handle synonyms, similar concepts, user intent.
The dataset or domain has natural language variation.
You’re willing to integrate an embedding model and manage vector storage/indexing. MariaDB helps with this.

In many real-world apps you’ll use a hybrid: start with keyword search, and for higher-value queries or when exact match fails, fall back to semantic search. Or even mix the two via hybrid search. MariaDB helps with this too.

How simple the integration can be

In my demo I triggered vector ingestion via a POST endpoint (/ingest). That reads all product names, computes embeddings, and writes them to MariaDB. One line of code (via LangChain + MariaDB integration) handled the insertion of ~500 rows of vectors.

Once vectors are stored, adding a semantic search endpoint was just a few lines of code. The MariaDB vector support hidden most of the complexity.

The source code

You can find the code on GitHub. I have one simplistic easy-to-follow program in the webinar-main.py and a more elaborate one with good practices in backend.py. Feel free to clone the repository, modify it, experiment with your own datasets, and let us know if there's anything you'd like to see in the LangChain integration for MariaDB.

What can go wrong when using database transactions?

Alejandro Duarte — Mon, 06 Oct 2025 21:00:00 +0000

Note: This is an excerpt from an unedited version of my book MariaDB for Developers.

To this point, we have understood the concept of atomicity—either all operations succeed or none do. What can go wrong? It seems like we are covered. And we are. Until we introduce concurrency in our system. MariaDB is one of the most highly performant database systems and tries to parallelize processing to increase throughput. Parallelizing means that MariaDB can execute transactions from different sessions at the same time by interleaving operations from different transactions instead of waiting for one to finish before starting the next. Each transaction has its own sequence of operations, but MariaDB executes them in overlapping order. Figure 8-2 shows two transactions (A and B) and multiple database operations interleaved through time.

Figure 8-2: Interleaving database operations for parallelism.

This interleaving allows MariaDB to use CPU and I/O resources more efficiently than without parallelism. This however opens the door to subtle problems when the parallel transactions read and write overlapping data. Let's study some of these problems known as concurrency phenomena.

Dirty reads

Friday afternoon and we've got a winner! Our to-do application—which by chapter 6 became more of a project management tool than a to-do app—is so central to the business, that prizes are given to users who excelled at reporting bugs or helping its development. Our to-do application allows the HR team to grant prizes to users and this use case involves reducing the quantity of the awarded prize in the prizes table.

Janet and Moe, both from HR, are using our to-do app at the same time. Janet is about to grant today's prize (named "Bagelers" in our database), while Moe is viewing a dashboard that shows an overview of the prize inventory. Jane selects the winner and the prize, and clicks on "Grant prize". Our to-do app starts a new transaction that decreases the quantity for Bagelers from 8 to 7. At that moment, Moe's refreshes the dashboard and sees that there are 7 Bagelers. However, the system crashes and since the transaction was never committed, the new quantity is not written to disk. Jane gets an error, but Moe doesn't. To him, there are 7 Bagelers. He is seeing incorrect data. This is called a dirty read. Figure 8-3 shows an example of the sequence of operations that lead to a dirty read at time t3.

Figure 8-3: Example of dirty read phenomena.

Non-repeatable reads

A similar situation can occur when a transaction reads a value twice, but such value is modified between the reads by another transaction. In this case, the second read would obtain a different value. This phenomena is called a non-repeatable read and can lead to incorrect results if the values are used for other calculations in the same transaction. Figure 8-4 shows a non-repeatable read at time t5.

Figure 8-4: Example of non-repeatable read phenomena.

Phantom reads

If the write operation in the previous example implies inserting rows, we get what's called a phantom read phenomena. Figure 8-5 shows a phantom read at time t5.

Figure 8-5: Example of phantom read phenomena.

Disaster Recovery and AI Vectors in MariaDB Kubernetes Operator 25.08.0

Alejandro Duarte — Thu, 07 Aug 2025 15:48:43 +0000

The latest MariaDB Kubernetes Operator release, version 25.08.0, is now available. This ships with enhancements, especially in how you can approach disaster recovery in MariaDB clusters, and how the Operator is adapting to the requirements of modern data-centric applications.

Disaster Recovery with Physical Backups

One of the main features in 25.08.0 is the introduction of PhysicalBackup Custom Resources. For some time, logical backups have been the only supported method, but as databases grow, so do the challenges of restoring them quickly. Physical backups offer a more efficient path, especially for large datasets. They work at the physical directory level rather than through execution of SQL statements.

This capability has been implemented in two ways:

mariadb-backup Integration: MariaDB's native backup tool, mariadb-backup, can be used directly through the Operator. You can define PhysicalBackup CRs to schedule backups, manage retention, apply compression, and even specify S3-compatible storage. The restoration process is straightforward: simply reference the PhysicalBackup in a new MariaDB resource bootstrapFrom field, and the Operator handles the rest, preparing and restoring the backup files. A great feature for reducing RTO (Recovery Time Objective).
Kubernetes-native VolumeSnapshots: Alternatively, if your Kubernetes environment is set up with CSI drivers that support VolumeSnapshots, physical backups can now be created directly at the storage level. This method creates snapshots of MariaDB data volumes, offering another robust way to capture a consistent point-in-time copy of your database. Restoring from a VolumeSnapshot is equally simple and allows for quick provisioning of new clusters from these storage-level backups.

These new physical backup options provide greater flexibility and significantly faster recovery times, which are absolutely critical in production environments. The aim is to provide more tools to build a resilient and robust data infrastructure.

MariaDB 11.8 and the `VECTOR` Data Type

The Operator is also keeping pace with the latest advancements in MariaDB itself. The MariaDB Kubernetes Operator 25.08.0 now supports MariaDB 11.8 Community Server by default.

One of the most important features in MariaDB 11.8 is the VECTOR data type. This is a notable development for anyone working with AI applications (watch this webinar to get up to speed). High-dimensional vectors are fundamental in areas like similarity search for RAG (Retrieval Augmented Generation) applications.

With the VECTOR data type, these vectors can be stored and manipulated directly within a MariaDB database without having to introduce an often complex, dedicated vector database—MariaDB is also a vector database! If you're using frameworks like LangChain or Spring AI, the new MariaDB integrations allow using MariaDB as a vector store.

Deployments with the New Helm Chart

The mariadb-cluster Helm chart has been introduced to facilitate database instance deployments. This new chart streamlines the provisioning of a MariaDB cluster along with its associated Custom Resources managed by the Operator. Instead of manually configuring relationships between different CRs, the Helm chart takes care of it, allowing management of an entire MariaDB deployment as a single Helm release.

Enterprise Support

For mission-critical, production environments, the MariaDB Enterprise Kubernetes Operator provides enterprise-grade support, including Red Hat OpenShift certification, and utilizes a secure Red Hat UBI base image. The operator offers robust security features like a customizable certificate lifecycle and advanced private key algorithms, while managing both MariaDB Enterprise Server and MariaDB MaxScale.

What Else is New?

Beyond these major highlights, this release includes important replication improvements, pushing closer to general availability for this feature. You'll also find various bug fixes and enhancements. A new calendar-based versioning scheme has also been adopted for clarity and predictability.

The team is committed to continuously improving the MariaDB Operator based on user needs. Your feedback is invaluable! So if you have any questions or encounter any issues, please don't hesitate to open an issue on GitHub or join us on the MariaDB Community Slack.

As always, the detailed changelog and upgrade guide can be found on the GitHub release page.

Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot

Alejandro Duarte — Fri, 23 May 2025 11:29:40 +0000

If you're like me, learning how to run databases inside Kubernetes sounds better when it's hands-on, physical, and brutally honest. So instead of spinning up cloud VMs or using Kind or minikube on a laptop, I went small and real: four Orange Pi 3 LTS boards (a Raspberry Pi alternative), each with just 2GB RAM.

My goal? Get MariaDB — and eventually Galera replication — running on Kubernetes using the official MariaDB Kubernetes Operator.

TL;DR: If you came here for the code, you can find Ansible playbooks on this GitHub repository, along with instructions on how to use them. For production environments, see this manifest.

Disclaimer: This isn’t a tutorial on building an Orange Pi cluster, or even setting up K3s. It’s a record of what I tried, what worked, what broke, and what I learned when deploying MariaDB on Kubernetes.

This article ignores best practices and security in favor of simplicity and brevity of code. The setup presented here helps you to get started with the MariaDB Kubernetes Operator so you can continue your exploration with the links provided at the end of the article.

Info: The MariaDB Kubernetes Operator has been in development since 2022 and is steadily growing in popularity. It’s also Red Hat OpenShift Certified and available as part of MariaDB Enterprise. Galera is a synchronous multi-primary cluster solution that enables high availability and data consistency across MariaDB nodes.

Stripping K3s Down to the Essentials

First of all, I installed K3s (a certified Kubernetes distribution built for IoT and edge computing) on the control node as follows (ssh into the control node):

curl -sfL https://get.k3s.io | \\
INSTALL_K3S_EXEC="--disable traefik \
                  --disable servicelb \
                  --disable cloud-controller \
                  --disable network-policy" \
sh -s - server --cluster-init

These flags strip out components I didn't need:

traefik: No need for HTTP ingress.
servicelb: I relied on NodePorts instead.
cloud-controller: Irrelevant on bare-metal.
network-policy: Avoided for simplicity and memory savings.

On worker nodes, I installed K3s and joined the cluster with the usual command (replace <control-node-ip> with the actual IP of the control node):

curl -sfL https://get.k3s.io | \
K3S_URL=https://<control-node-ip>:6443 \
K3S_TOKEN=<token> sh -

To be able to manage the cluster from my laptop (MacOS), I did this:

scp orangepi@<master-ip>:/etc/rancher/k3s/k3s.yaml ~/.kube/config

sed -i -e 's/127.0.0.1/<control-node-ip>/g' ~/.kube/config

Windows users can do the same using WinSCP or WSL + scp. And don’t forget to replace <control-node-ip> with the actual IP again.

Installing the MariaDB Operator

Here’s how I installed the MariaDB Kubernetes operator via Helm (ssh into the control node):

helm repo add mariadb-operator https://helm.mariadb.com/mariadb-operator

helm install mariadb-operator-crds mariadb-operator/mariadb-operator-crds

helm install mariadb-operator mariadb-operator/mariadb-operator

It deployed cleanly with no extra config, and the ARM64 support worked out of the box. Once installed, the operator started watching for MariaDB resources.

The MariaDB Secret

I tried to configure the MariaDB root password in the same manifest file (for demo purposes), but it failed, especially with Galera. I guess the MariaDB servers are initialized before the secret, which makes the startup process fail. So, I just followed the documentation (as one should always do!) and created the secret via command line:

kubectl create secret generic mariadb-root-password --from-literal=password=demo123

I also got the opportunity to speak with Martin Montes (Sr. Software Engineer at MariaDB plc and main developer of the MariaDB Kubernetes Operator). He shared this with me:

“If the rootPasswordSecretKeyRef field is not set, a random one is provisioned by the operator. Then, the init jobs are triggered with that secret, which ties the database's initial state to that random secret. To start over with an explicit secret, you can delete the MariaDB resource, delete the PVCs (which contain the internal state), and create a manifest that contains both the MariaDB and the Secret. It should work.”

You can find some examples of predictable password handling here.

Minimal MariaDB Instance: The Tuning Game

My first deployment failed immediately: OOMKilled. The MariaDB Kubernetes Operator is made for real production environments, and it works out of the box on clusters with enough compute capacity.

However, in my case, with only 2GB per node, memory tuning was unavoidable. Fortunately, one of the strengths of the MariaDB Kubernetes Operator is its flexible configuration. So, I limited memory usage, dropped buffer pool size, reduced connection limits, and tweaked probe configs to prevent premature restarts.

Here’s the config that ran reliably:

# MariaDB instance
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-demo
spec:
  rootPasswordSecretKeyRef:       # Reference to a secret containing root password for security
    name: mariadb-root-password
    key: password
  storage:
    size: 100Mi                   # Small storage size to conserve resources on limited-capacity SD cards
    storageClassName: local-path  # Local storage class for simplicity and performance
  resources:
    requests:
      memory: 512Mi               # Minimum memory allocation - suitable for IoT/edge devices like Raspberry Pi, Orange Pi, and others
    limits:
      memory: 512Mi               # Hard limit prevents MariaDB from consuming too much memory on constrained devices
  myCnf: |
    [mariadb]
    # Listen on all interfaces to allow external connections
    bind-address=0.0.0.0

    # Disable binary logging to reduce disk I/O and storage requirements
    skip-log-bin

    # Set to ~70% of available RAM to balance performance and memory usage
    innodb_buffer_pool_size=358M

    # Limit connections to avoid memory exhaustion on constrained hardware
    max_connections=20
  startupProbe:
    failureThreshold: 40          # 40 * 15s = 10 minutes grace
    periodSeconds: 15             # check every 15 seconds
    timeoutSeconds: 10            # each check can take up to 10s
  livenessProbe:
    failureThreshold: 10          # 10 * 60s = 10 minutes of failing allowed
    periodSeconds: 60             # check every 60 seconds
    timeoutSeconds: 10            # each check can take 10s
  readinessProbe:
    failureThreshold: 10          # 10 * 30s = 5 minutes tolerance
    periodSeconds: 30             # check every 30 seconds
    timeoutSeconds: 5             # fast readiness check
---
# NodePort service
apiVersion: v1
kind: Service
metadata:
  name: mariadb-demo-external
spec:
  type: NodePort                    # Makes the database accessible from outside the cluster
  selector:
    app.kubernetes.io/name: mariadb # Targets the MariaDB pods created by operator
  ports:
    - protocol: TCP
      port: 3306                    # Standard MariaDB port
      targetPort: 3306              # Port inside the container
      nodePort: 30001               # External access port on all nodes (limited to 30000-32767 range)

The operator generated the underlying StatefulSet and other resources automatically. I checked logs and resources — it created valid objects, respected the custom config, and successfully managed lifecycle events. That level of automation saved time and reduced YAML noise.

Info: Set the innodb_buffer_pool_size variable to around 70% of the total memory.

Warning: Normally, it is recommended to not set CPU limits. This can make the whole initialization process and the database itself slow (and cause CPU throttling). The trade-off of not setting limits is that it might steal CPU cycles from other workloads running on the same Node.

Galera Cluster: A Bit of Patience Required

Deploying a 3-node MariaDB Galera cluster wasn’t that difficult after the experience gained from the single-instance deployment — it only required additional configuration and minimal adjustments. The process takes some time to complete, though. So be patient if you are trying this on small SBCs with limited resources like the Orange Pi or Raspberry Pi.

SST (State Snapshot Transfer) processes are a bit resource-heavy, and early on, the startup probe would trigger restarts before nodes could sync on these small SBCs already running Kubernetes. I increased probe thresholds and stopped trying to watch the rollout step-by-step, instead letting the cluster come up at its own pace.

And it just works! By the way, this step-by-step rollout is designed to avoid downtime: rolling the replicas one at a time, waiting for each of them to sync, proceeding with the primary, and switching over to an up-to-date replica. Also, for this setup, I increased the memory a bit to let Galera do its thing.

Here’s the deployment manifest file that worked smoothly:

# 3-node multi-master MariaDB cluster
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-galera
spec:
  replicas: 3                   # Minimum number for a fault-tolerant Galera cluster (balanced for resource constraints)
  replicasAllowEvenNumber: true # Allows cluster to continue if a node fails, even with even number of nodes
  rootPasswordSecretKeyRef:
    name: mariadb-root-password # References the password secret created with kubectl
    key: password
    generate: false             # Use existing secret instead of generating one
  storage:
    size: 100Mi                 # Small storage size to accommodate limited SD card capacity on Raspberry Pi, Orange Pi, and others
    storageClassName: local-path
  resources:
    requests:
      memory: 1Gi               # Higher than single instance to accommodate Galera overhead
    limits:
      memory: 1Gi               # Strict limit prevents OOM issues on resource-constrained nodes
  galera:
    enabled: true               # Activates multi-master synchronous replication
    sst: mariabackup            # State transfer method that's more efficient for limited bandwidth connections
    primary:
      podIndex: 0               # First pod bootstraps the cluster
    providerOptions:
        gcache.size: '64M'      # Reduced write-set cache for memory-constrained environment
        gcache.page_size: '64M' # Matching page size improves memory efficiency
  myCnf: |
    [mariadb]
    # Listen on all interfaces for cluster communication
    bind-address=0.0.0.0

    # Required for Galera replication to work correctly
    binlog_format=ROW

    # ~70% of available memory for database caching
    innodb_buffer_pool_size=700M

    # Severely limited to prevent memory exhaustion across replicas
    max_connections=12
  affinity:
    antiAffinityEnabled: true   # Ensures pods run on different nodes for true high availability
  startupProbe:
    failureThreshold: 40# 40 * 15s = 10 minutes grace
    periodSeconds: 15           # check every 15 seconds
    timeoutSeconds: 10          # each check can take up to 10s
  livenessProbe:
    failureThreshold: 10        # 10 * 60s = 10 minutes of failing allowed
    periodSeconds: 60           # check every 60 seconds
    timeoutSeconds: 10          # each check can take 10s
  readinessProbe:
    failureThreshold: 10        # 10 * 30s = 5 minutes tolerance
    periodSeconds: 30           # check every 30 seconds
    timeoutSeconds: 5           # fast readiness check
---
# External access service
apiVersion: v1
kind: Service
metadata:
  name: mariadb-galera-external
spec:
  type: NodePort                    # Makes the database accessible from outside the cluster
  selector:
    app.kubernetes.io/name: mariadb # Targets all MariaDB pods for load balancing
  ports:
    - protocol: TCP
      port: 3306                    # Standard MariaDB port
      targetPort: 3306              # Port inside the container
      nodePort: 30001               # External access port on all cluster nodes (using any node IP)

After tuning the values, all three pods reached Running. I confirmed replication was active, and each pod landed on a different node — kubectl get pods -o wide confirmed even distribution.

Info: To ensure that every MariaDB pod gets scheduled on a different Node, set spec.gallera.affinity.antiAffinityEnabled to true.

Did Replication Work?

Here’s the basic test I used to check if replication worked:

kubectl exec -it mariadb-galera-0 -- mariadb -uroot -pdemo123 -e "
CREATE DATABASE test;
CREATE TABLE test.t (id INT PRIMARY KEY AUTO_INCREMENT, msg TEXT);
INSERT INTO test.t(msg) VALUES ('It works!');"


kubectl exec -it mariadb-galera-1 -- mariadb -uroot -pdemo123 -e "SELECT * FROM test.t;"
kubectl exec -it mariadb-galera-2 -- mariadb -uroot -pdemo123 -e "SELECT * FROM test.t;"

The inserted row appeared on all three nodes. I didn’t measure write latency or SST transfer duration—this wasn’t a performance test. For me, it was just enough to confirm functional replication and declare success.

Since I exposed the service using a simple NodePort, I was also able to connect to the MariaDB cluster using the following:

mariadb -h <master-ip> --port 30001 -u root -pdemo123

I skipped Ingress entirely to keep memory usage and YAML code minimal.

What I Learned

The MariaDB Operator handled resource creation pretty well — PVCs, StatefulSets, Secrets, and lifecycle probes were all applied correctly with no manual intervention.
Galera on SBCs is actually possible. SST needs patience, and tuning memory limits is critical, but it works!
Out-of-the-box kube probes often don’t work on slow hardware. Startup times will trip checks unless you adjust thresholds.
Node scheduling worked out fine on its own. K3s distributed the pods evenly.
Failures teach more than success. Early OOM errors helped me understand the behavior of stateful apps in Kubernetes much more than a smooth rollout would’ve.

Final Thoughts

This wasn’t about benchmarks, and it wasn’t for production. For production environments, see this manifest. This article was about shrinking a MariaDB Kubernetes deployment to get it working on a constrained environment. It was also about getting started with the MariaDB Kubernetes Operator and learning what it does for you.

The operator simplified a lot of what would otherwise be painful on K8s: it created stable StatefulSets, managed volumes and config, and coordinated cluster state without needing glue scripts or sidecars. Still, it required experimentation on this resource-limited cluster. Probes need care. And obviously, you won’t get resilience or high throughput from an SBC cluster like this, especially if you have a curious dog or cat around your cluster! But this is a worthwhile test for learning and experimentation. Also, if you don’t want to fiddle with SBCs, try Kind or minikube.

By the way, the MariaDB Kubernetes Operator can do much more for you. Check this repository to see a list of the possibilities. Here are just a few worth exploring next:

Multiple HA modes: Galera Cluster or MariaDB Replication.
Advanced HA with MaxScale: a sophisticated database proxy, router, and load balancer for MariaDB.
Flexible storage configuration. Volume expansion.
Take, restore and schedule backups.
Cluster-aware rolling update: roll out replica Pods one by one, wait for each of them to become ready, and then proceed with the primary Pod, using ReplicasFirstPrimaryLast.
Issue, configure and rotate TLS certificates and CAs.
Orchestrate and schedule sql scripts.
Prometheus metrics via mysqld-exporter and maxscale-exporter.

Vector Storage, Indexing, and Search With MariaDB

Alejandro Duarte — Tue, 28 Jan 2025 13:56:36 +0000

When you develop generative AI applications, you typically introduce three additional components to your infrastructure: an embedder, an LLM, and a vector database.

However, if you are using MariaDB, you don't need to introduce an additional database along with its own SQL dialect — or even worse — its own proprietary API. Since MariaDB version 11.7 (and MariaDB Enterprise Server 11.4) you can simply store your embeddings (or vectors) in any column of any table—no need to make your applications database polyglots.

"After announcing the preview of vector search in MariaDB Server, the vector search capability has now been added to the MariaDB Community Server 11.7 release," writes Ralf Gebhardt, Product Manager for MariaDB Server at MariaDB. This includes a new datatype (VECTOR), vector index, and a set of functions for vector manipulation.

Why Are Vectors Needed in Generative AI Applications?

Vectors are needed in generative AI applications because they embed complex meanings in a compact fixed-length array of numbers (a vector). This is more clear in the context of retrieval-augmented generation (or RAG). This technique allows you to fetch relevant data from your sources (APIs, files, databases) to enhance an AI model input with the fetched, often private-to-the-business, data.

Since your data sources can be vast, you need a way to find the relevant pieces, given that current AI models have a finite context window — you cannot simply add all of your data to a prompt. By creating chunks of data and running these chunks of data through a special AI model called embedder, you can generate vectors and use proximity search techniques to find relevant information to be appended to a prompt.

For example, take the following input from a user in a recommendation chatbot:

I need a good case for my iPhone 15 pro.

Since your AI model was not trained with the exact data containing the product information in your online store, you need to retrieve the most relevant products and their information before sending the prompt to the model.

For this, you send the original input from the user to an embedder and get a vector that you can later use to get the closest, say, 10 products to the user input. Once you get this information (and we'll see how to do this with MariaDB later), you can send the enhanced prompt to your AI model:

I need a good case for my iPhone 15 pro. Which of the following products better suit my needs?

1. ProShield Ultra Case for iPhone 15 Pro - $29.99: A slim, shock-absorbing case with raised edges for screen protection and a sleek matte finish.

2. EcoGuard Bio-Friendly Case for iPhone 15 Pro - $24.99: Made from 100% recycled materials, offering moderate drop protection with an eco-conscious design.

3. ArmorFlex Max Case for iPhone 15 Pro - $39.99: Heavy-duty protection with military-grade durability, including a built-in kickstand for hands-free use.

4. CrystalClear Slim Case for iPhone 15 Pro - $19.99: Ultra-thin and transparent, showcasing the phone's design while providing basic scratch protection.

5. LeatherTouch Luxe Case for iPhone 15 Pro - $49.99: Premium genuine leather construction with a soft-touch feel and an integrated cardholder for convenience.

This results in AI predictions that use your own data.

Creating Tables for Vector Storage

To store vectors in MariaDB, use the new VECTOR data type. For example:

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    description TEXT,
    embedding VECTOR(2048)
);

In this example, the embedding column can hold a vector of 2048 dimensions. You have to match the number of dimensions that your embedder generates.

Creating Vector Indexes

For read performance, it's important to add an index to your vector column. This speeds up similarity searches. You can define the index at table creation time as follows:

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    description TEXT,
    embedding VECTOR(2048) NOT NULL,
    VECTOR INDEX (embedding)
);

For greater control, you can specify the distance function that the database server will use to build the index, as well as the M value of the Hierarchical Navigable Small Worlds (HNSW) algorithm used by MariaDB. For example:

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    description TEXT,
    embedding VECTOR(2048) NOT NULL,
    VECTOR INDEX (embedding) M=8 DISTANCE=cosine
);

Check the documentation for more information on these configurations.

Inserting Vectors

When you pass data (text, image, audio) through an embedder, you get a vector. Typically, this is a series of numbers in an array in JSON format. To insert this vector in a MariaDB table, you can use the VEC_FromText function. For example:

INSERT INTO products (name, embedding)
VALUES
  ("Alarm clock", VEC_FromText("[0.001, 0, ...]")),
  ("Cow figure", VEC_FromText("[1.0, 0.05, ...]")),
  ("Bicycle", VEC_FromText("[0.2, 0.156, ...]"));

Remember that the inserted vectors must have the correct number of dimensions as defined in the CREATE TABLE statement.

Similarity Search (Comparing Vectors)

In RAG applications, you send the user input to an embedder to get a vector. You can then query the records in your database that are closer to that vector. Closer vectors represent data that are semantically similar. At the time of writing this, MariaDB has two distance functions that you can use for similarity or proximity search:

VEC_DISTANCE_EUCLIDEAN: calculates the straight-line distance between two vectors. It is best suited for vectors derived from raw, unnormalized data or scenarios where spatial separation directly correlates with similarity, such as comparing positional or numeric features. However, it is less effective for high-dimensional or normalized embeddings since it is sensitive to differences in vector magnitude.
VEC_DISTANCE_COSINE: measures the angular difference between vectors. Good for comparing normalized embeddings, especially in semantic applications like text or document retrieval. It excels at capturing similarity in meaning or context.

Keep in mind that similarity search using the previous functions is only approximate and highly depends on the quality of the calculated vectors and, hence, on the quality of the embedder used.

The following example, finds the top 10 most similar products to a given vector ($user_input_vector should be replaced with the actual vector returned by the embedder over the user input):

SELECT id, name, description
FROM products
ORDER BY VEC_DISTANCE_COSINE(
  VEC_FromText($user_input_vector),
  embedding
)
LIMIT 10;

The VEC_DISTANCE_COSINE and VEC_DISTANCE_EUCLIDEAN functions take two vectors. In the previous example, one of the vectors is the vector calculated over the user input, and the other is the corresponding vector for each record in the products table.

A Practical Example

I have prepared a practical example using Java and no AI frameworks so you truly understand the process of creating generative AI applications leveraging MariaDB's vector search capabilities. You can find the code on GitHub.

Supercharge Your App: MariaDB In-Memory Tables as a Cache

Alejandro Duarte — Mon, 19 Aug 2024 10:49:02 +0000

Redis is mainly used as an application cache or a quick-response database. But wait, you can always create a cache in a relational database as follows:

CREATE TABLE cache(
    ´key´ VARCHAR(64) PRIMARY KEY,
    value VARCHAR(255) NOT NULL,
    last_updated TIMESTAMP
        DEFAULT CURRENT_TIMESTAMP
        ON UPDATE CURRENT_TIMESTAMP
);

Moreover, with MariaDB, you can pick one from the many available storage engines. For example, if you want to store the previous cache table in memory, simply use the MEMORY storage engine:

CREATE TABLE cache(
    ...
) ENGINE=MEMORY;

When you configure the cache table to use the MEMORY storage engine, its data will reside entirely in RAM. This is interestingly similar to how Redis operates, keeping data in memory for low-latency access. This looks great and it definitely has its benefits. However, there are a few nuances that are worth exploring.

Example Use Case

Let’s say you have a web application that needs to track session IDs. Using a MariaDB MEMORY table sounds like a good idea here—there’s potential for reducing the load on your primary databases and improving response times for your users. Here's how you could implement such a cache using MariaDB’s MEMORY storage engine:

CREATE OR REPLACE TABLE users_cache (
    user_name VARCHAR(50) NOT NULL PRIMARY KEY REFERENCES users(id),
    session_id VARCHAR(50),
    last_updated DATETIME NOT NULL
        DEFAULT CURRENT_TIMESTAMP
        ON UPDATE CURRENT_TIMESTAMP
) ENGINE=MEMORY;

Insertion into this table would typically occur whenever a user logs in. But generalizing a bit more, insertion in a cache could happen every time you read data stored on disk-based tables. We’ll use the later approach and assume that storing session IDs permanently is a business requirement just so that we can make more experiments. In any case, you can set up a background job to refresh this cache periodically or invalidate it when the underlying data changes.

Cache Invalidation and Management

There are several ways to handle cache invalidation with MariaDB. For example, you can set a limited row-based lifespan (through a column for expiration time) and use event schedulers to clear or update cached data at fixed intervals. Here’s an example:

CREATE OR REPLACE EVENT ev_remove_stale_user_cache_entries
    ON SCHEDULE EVERY 20 MINUTE DO
        DELETE FROM users_cache
        WHERE NOW() > last_updated + INTERVAL 2 HOUR;

For testing you can use different intervals. For example EVERY 1 SECOND and INTERVAL 20 SECOND. Also, remember to enable MariaDB’s event scheduler by setting the event_scheduler configuration property or, for testing, by running:

SET GLOBAL event_scheduler = ON;

If you want to try this out, you can find a complete example on GitHub. Or watch the short video demoing the example in action.

Pros and cons

Although using the MEMORY storage engine can speed up data retrieval times, as always, this depends on the exact use case—you should test this configuration with your applications before making decisions. In particular, it’s important to be aware that MEMORY performs table-wide locks. This means that it might not be well-suited when you need to update the cache more frequently than you read it. Or in other words, using the MEMORY storage engine is a good option for data that needs to be accessed frequently and updated infrequently.

A key advantage of using the MEMORY engine is when your app needs to mix cache data with data in tables of a relational database, for example, during a single HTTP request. Imagine an app that processes user information updates. Each update might involve writing to a cache and simultaneously updating a relational record. This would require two different accesses to two different databases. With MariaDB, you can handle this in a single database using SQL. This eliminates the overhead and complexity of managing separate data stores and coordinating between them. Here’s a simplified example of how such an operation could look:

SET @data = "other data";
UPDATE users SET data = @data WHERE id = 123;
REPLACE INTO users_cache (user_id, data) VALUES (123, @data);

In this example, the user's data columns in both the users and the user_cache tables are updated in a single call to the database. Keep in mind that the MEMORY storage engine is not transactional, which is less important when you compare it with a cache that lives in a completely different database technology than your operational database anyway.

An additional and obvious but important advantage of using the MEMORY engine is that you can remove persistence-polyglot logic from your app. If your team is already familiar with SQL, MariaDB provides a seamless experience without the need to learn new syntax or juggle another technology stack.

How Does It Stack Up Against Redis?

While Redis is a powerful tool for handling simple data structures like strings, hashes, lists, sets, and sorted sets directly in-memory, MariaDB's MEMORY engine handles complex queries more naturally because it supports the full power of SQL and relational database systems. This means you can perform joins, subqueries, and even complex transactions, which are not as straightforward in Redis.

Now, there’s also the question of scalability. Especially horizontal scalability. This involves adding more nodes to a system to distribute load and increase capacity without interrupting service. Both Redis and MariaDB offer robust solutions, but their mechanisms are different.

Redis achieves horizontal scalability primarily through sharding, where data is partitioned across multiple Redis instances. This can be configured manually or managed via Redis Cluster, which handles sharding and provides high availability through failover and replication processes. Redis Cluster supports up to 1000 nodes, which allows it to scale massively. This model is particularly effective for applications requiring ultra-fast operations and high throughput on simple data structures.

MariaDB offers a somewhat similar approach to scalability. And you probably guessed it—this is achieved through a storage engine. The Spider engine partitions table data across multiple MariaDB nodes, treating them as one logical entity. This enables querying and updating data across various physical servers seamlessly as if they were on a single local server. The Spider engine supports SQL and transactional data operations so you can run complex queries when you need. It’s useful for large database environments where data distribution is essential for performance and management to meet the demands of large-scale applications.

Durability and Persistence

One difference to consider is that the MEMORY storage engine in MariaDB does not offer data persistence after server restarts. Data stored in MEMORY tables is volatile; it's cleared when the database restarts, much like Redis in its default configuration. If persistence is crucial, you might consider using MariaDB's Aria or InnoDB engines for caching. In fact, InnoDB has excellent performance, thanks to its cache mechanism, which reduces the load on primary nodes.

Final Thoughts

Switching from Redis to MariaDB for caching might not suit every project, but it's a viable option for those looking to streamline their technology stack or leverage their existing SQL expertise. It provides an easy way to implement caching solutions with tools you already know and reduces the overhead of managing additional systems. Plus, for those looking for a middle ground, MariaDB can also serve as a complementary caching layer alongside Redis, taking advantage of both systems' strengths. Moreover, you can leverage Redis as a cache for MariaDB MaxScale.

Try MariaDB and set up your own in-memory cache with the MEMORY storage engine. Experience how it fits with your existing SQL knowledge. I have created a simple plain-text file with detailed instructions and code that you can run to see a cache in action using only SQL! So all you need is to connect to your MariaDB server (spin one up quickly using Docker if you don’t have one running already) and run the commands in any SQL client compatible with MariaDB (most of them are). Here you can see the demo in action:

If you have questions or want to share your experience, don’t hesitate to join the MariaDB Community Slack and let us know!

Packages for Store Routines in MariaDB 11.4

Alejandro Duarte — Fri, 28 Jun 2024 17:37:14 +0000

MariaDB 11.4 introduced many advanced features. One that grabbed my attention is the general support of packages for stored routines. Although this was previously available by activating the Oracle compatibility mode, now the feature is available generally out-of-the-box. This will help you to significantly enhance the organization of database development within a MariaDB environment. Packages provide a modular approach to managing database logic. This addition aligns MariaDB more closely with other advanced database systems that have long utilized packages, such as Oracle, and sets it apart in that regard from other open-source relational databases that don't support packages.

Packages in MariaDB allow you to group related stored procedures, functions, variables, and other elements together into a single unit. This structure provides several benefits, including improved code organization, enhanced reusability, and simplified maintenance. Prior to this update, each stored procedure and function in MariaDB existed independently, which could lead to a cluttered schema and more complicated management of complex business logic when implemented in the database. Packages address this issue by providing a way to logically group related routines.

Again, the primary advantage of using packages is the encapsulation of related routines. For example, in an e-commerce application, operations related to order processing can be grouped into a single package called OrderProcessing. This package might include procedures like PlaceOrder, CancelOrder, and UpdateOrderStatus, as well as functions such as GetOrderDetails. This logical grouping makes the database schema more organized and the codebase easier to navigate and hence to maintain.

Creating packages

Creating and using packages in MariaDB 11.4 and later is straightforward. Simply use the CREATE PACKAGE statement to define a package and the CREATE PACKAGE BODY statement to implement the package’s routines. Here is a simplified example without the actual business logic implementation:

DELIMITER $$

CREATE OR REPLACE PACKAGE OrderProcessing
  PROCEDURE PlaceOrder(customer_id INT, product_id INT, quantity INT);
  PROCEDURE CancelOrder(order_id INT);
  FUNCTION GetOrderDetails(order_id INT) RETURNS JSON;
END;

CREATE OR REPLACE PACKAGE BODY OrderProcessing
  PROCEDURE PlaceOrder(customer_id INT, product_id INT, quantity INT)
  BEGIN
    -- Implementation code here
  END;

  PROCEDURE CancelOrder(order_id INT)
  BEGIN
    -- Implementation code here
  END;

  FUNCTION GetOrderDetails(order_id INT) RETURNS JSON
  BEGIN
    -- Implementation code here
  END;
END;

$$
DELIMITER ;

In this example, the OrderProcessing package is defined with two procedures and one function. The package body provides the implementation for these routines, encapsulating the logic related to order processing within a single package.

Calling packaged stored routines

To call a procedure or function that was defined in a package, you use the "dot notation". Here's an example of how to call the CancelOrder procedure in the OrderProcessing package:

CALL OrderProcessing.CancelOrder(7);

The same applies to functions in packages:

SELECT OrderProcessing.GetOrderDetails(7);

You can also define package-level variables and constants, which are accessible to all routines within the package. This way you can share common data without relying on global variables or passing around parameters between routines. By centrally managing shared data within the package, you reduce code duplication and minimize the risk of errors. The following is an example (don't take the variable names too seriously in this snippet of code):

DELIMITER $$

CREATE OR REPLACE PACKAGE OrderProcessing
  -- procedure list (see previous example)
END;

CREATE OR REPLACE PACKAGE BODY OrderProcessing
  -- variable declarations
  DECLARE some_count INT DEFAULT 1;
  DECLARE some_total INT DEFAULT 0;

  -- procedure definitions (see previous example)
END;

$$
DELIMITER ;

Other benefits of packages

The introduction of packages in MariaDB 11.4 also brings the possibility of better version control and modularization of code. You can now manage stored routines more effectively, making it easier to track changes and updates. This modularization is particularly beneficial in large projects, where multiple developers might be working on different parts of the database logic simultaneously. By isolating different functionalities into packages, conflicts and overlaps can be minimized, leading to a smoother development process.

Moreover, packages support forward declarations, which means that the routines can be defined before their actual implementation. This feature allows for a more flexible and structured approach to coding, where developers can outline the package interface first and then fill in the details. This separation of interface and implementation can lead to cleaner, more understandable code, facilitating collaboration and reducing the learning curve for new developers joining a project.

For developers accustomed to working with Oracle databases, the inclusion of packages in MariaDB 11.4 will feel familiar and welcome. It bridges a functional gap between MariaDB and Oracle, making it easier to transition between these platforms.

Packages vs multiple schemas

It's important to note the distinction between packages and merely using multiple database schemas. While multiple schemas can help segregate different parts of a database, they do not offer the same level of organization and encapsulation as packages. Schemas are useful for dividing distinct areas of data and logic, but they do not inherently group related procedures and functions together in a way that enhances modularity and maintainability. Packages, on the other hand, allow for a more granular and cohesive approach, grouping related logic together within the same schema. This not only simplifies the management of routines but also improves the clarity and maintainability of the code.

Try it out

As always, MariaDB continues to evolve. Try it out today by downloading the latest version of MariaDB, or if you have Docker running:

docker run --name mariadb-11.4 -e MARIADB_ROOT_PASSWORD=my-secret-pw -e MARIADB_DATABASE=mydb -e MARIADB_USER=myuser -e MARIADB_PASSWORD=mypassword -d mariadb:11.4

docker exec -it mariadb-11.4 mariadb -p"mypassword"

Using Temporary Tables in MariaDB

Alejandro Duarte — Thu, 02 May 2024 21:13:18 +0000

Let's explore how temporary tables work in MariaDB. First, we have to connect to the server. For example (use your own connection details):

mariadb -h 127.0.0.1 -u root -p"RootPassword!" --database demo

Now, just to point something out, let's create a standard (permanent) table. Here's how:

CREATE TABLE t (
    c INT
);

This table, t, will persist in the database even after we exit the client:

exit

When we reconnect and check the existing tables using SHOW TABLES;, the table t will still be listed:

mariadb -h 127.0.0.1 -u root -p"RootPassword!" --database demo

SHOW TABLES;

+----------------+
| Tables_in_demo |
+----------------+
| t              |
+----------------+

All this is pretty obvious, but now, let's recreate this table and try something different:

CREATE OR REPLACE TEMPORARY TABLE t (
    c INT
);

Notice the TEMPORARY keyword. After creating this table, if we run SHOW TABLES;, it appears in the list. We can insert data into it, query it, join it with other tables. It behaves like a normal table during the current session. However, if we exit the client, then reconnect, and perform a SHOW TABLES; again, the temporary table t will not be listed. A temporary table only exists for the duration of the session in which it was created and other sessions won't be able to see it.

Use Case for Temporary Tables

Temporary tables are quite useful for transient data operations. For instance, consider a table called products in our database:

CREATE TABLE products (
  id INT NOT NULL AUTO_INCREMENT,
  code VARCHAR(100) NOT NULL,
  name VARCHAR(250) NOT NULL,
  description TEXT DEFAULT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY code (code)
)

We can create a temporary table that mimics the structure of products:

CREATE TEMPORARY TABLE t LIKE products;

We can confirm this by running:

DESCRIBE t;

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| code        | varchar(100) | NO   | UNI | NULL    |                |
| name        | varchar(250) | NO   |     | NULL    |                |
| description | text         | YES  |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

Initially, t will be empty. However, suppose we want to transfer some data from products to t. Let’s assume we only want to include products that contain the number 0 in their code:

INSERT INTO t
SELECT * FROM products
WHERE code LIKE '%0%';

After running this command, if we query the temporary table t:

SELECT * FROM t;

+----+--------+------------------+---------------------------------------------------+
| id | code   | name             | description                                       |
+----+--------+------------------+---------------------------------------------------+
|  1 | BG2024 | BugBlaster       | Eradicates software bugs with a single scan.      |
|  3 | FW001  | FireWhale        | An oversized, comprehensive firewall solution.    |
|  4 | CLD404 | CloudNine Finder | Find your way back from cloud outages and errors. |
+----+--------+------------------+---------------------------------------------------+

We see the filtered data.

Conclusion

Temporary tables offer a powerful way to handle data for temporary processing without affecting the persistent data store. They are particularly useful in scenarios where data needs to be manipulated or transformed temporarily. You can use permanent tables for this kind of data manipulation but temporary tables are useful when you need automatic cleanup, reduced risk of naming conflicts, isolation and security, and resource management for query performance.

Why do We Need Databases and SQL?

Alejandro Duarte — Thu, 07 Mar 2024 16:34:01 +0000

SQL has a long and proven history. It has survived the fuss around NoSQL. And even if not perfect, it has demonstrated to be the best available language for data. This is no surprise! The story began in the 1960s with the development of databases—an era marked by the introduction of Integrated Data Store (IDS) at General Electric. However, it was Edgar Codd's relational model that revolutionized data handling. His model, which turned data into a series of tables (or more strictly, relations), has influenced database systems ever since. This era also saw the birth of SQL (Structured Query Language), which became the standard language for interacting with relational databases, including MariaDB and other.

The utility of relational database systems

So, why do we need all this database stuff? Let's imagine you're building an app, maybe a simple to-do list to keep track of your daily tasks. Initially, you might think, "Why not just save each task directly to a file?" After all, my programming language has constructs and libraries to save and read data from disk. Also, implementing this seems straightforward: create a task, write it to a file; delete a task, remove it from the file. These are good points, however as your app gains traction, users start to aggregate, and suddenly, you have thousands of users trying to add, delete, and modify tasks simultaneously. At this point, the simplicity of files becomes fragile. Imagine one user is updating a task at the exact moment another tries to delete it. Or maybe two users are editing the same task at the same time. With a simple file system, you're likely to end up with corrupted or lost data because there's no inherent mechanism to handle such conflicts.

Databases handle these situations gracefully through the ACID properties. Essentially, a set of principles ensures that even if your app crashes midway through an update, the data remains consistent and no half-completed tasks are left hanging. Back to the to-do app example, imagine trying to move your task "Buy groceries" from pending to completed which requires also changing the last_updated property, but your app crashes right in the middle. With a relational database, it's all or nothing—either the task is marked complete and the last_updated property reflects the new time value, or it's like you never tried to update it in the first place, avoiding those incorrect half-states.

Now, let's consider data relationships. In your app, tasks might belong to different categories or users. In a file system, maintaining these relationships is cumbersome. You might end up with a separate file for each category or user, but then how do you quickly find all tasks across categories or ensure two users don't end up with the same task ID? Databases have the ability to manage complex relationships, making it easy to query all tasks for a specific user or category, or even more complex queries like "show me the number of completed tasks for user U grouped by category C during the last month."

Security is another biggie. In a file system, if someone gains access to your files, they have your data. Databases offer robust security features, like access controls and encryption, safeguarding your data from unauthorized eyes.

And then there's the issue of growth. Your simple to-do app might evolve into a complex enterprise project management tool over time. With a file system, every change can feel like renovating a building with people still inside. Databases are built to be flexible and scalable, meaning they're designed to grow with your needs, whether you're adding new features or handling more users.

In the end, choosing a database over a simple file system is about preparing for success while standing on solid ground. It's about ensuring that as your app grows, your data remains secure, consistent, and manageable, and your users happy. After all, no one likes losing their to-do list to a random crash or waiting forever for their tasks to load because the system is bogged down handling conflicts and searches!

A bit of history

It was Edgar Codd who proposed the Relational Model for databases and, since he was a mathematician, formalized the concepts creating what is called Relational Algebra and Relational Calculus. All this was theoretical, until IBM and others started to implement the concepts in academic and research projects. They also wanted to come up with a standard language for querying data in relational databases. At first they invented QUEL (Querying Using the English Language) at the University of California, Berkeley. At IBM, researchers wanted to come up with their own language and started a project which I perceive more as a game between colleagues called SQUARE (Specifying Queries Using a Relational Environment). This led to a query language that had a scientific-like notation with subindexes and super-indexes which was hard to type on computer keyboards. To solve this, they redefined the language to only use standard characters and in an ingenious and probably friendly mockery way called it SEQUEL. This name however, was a trademark in the UK which prevented them from using it. They removed the vowels in SEQUEL and boom! SQL was born. By 1986, SQL would become an ISO and ANSI standard.

As a curious historical remark, although their inventors had to rename SEQUEL to SQL, they continued to call it "sequel". Even today many software developers and IT professionals continue to pronounce it "sequel". The name Structured Query Language (SQL) would appear later.

The utility of SQL

SQL is a declarative language, meaning that you specify what you want to get and not how to get it. The database is in charge of doing whatever needs to be done to get the data requested. SQL isolates database complexity. A database is a complex piece of software with tons of algorithms implemented in it. This algorithms deal with different ways to get data stored in disk or memory. Different algorithms are more efficient in different circumstances which includes different queries and different datasets.

For example, in MariaDB, a component called the query optimizer is in charge of deciding what algorithms to use given a SQL query and stats gathered on the actual data. The query optimizer analyzes the SQL query, the data structures, the database schema, and the statistical distribution of the data. It then decides whether to use an index, which joining algorithm is the best, and how to sequence the operations. This process involves a remarkable amount of complexity and mathematical precision, all of which the database abstractly manages for you. As a developer you only need to worry about constructing the query to get the data you need and let the database figure out whether to use or not an index (with some datasets, not using an index could be faster), B-trees, hash tables, and even whether to add the data to an in-memory cache, as well as many other things.

SQL also allows you to handle writes, that is, creating and updating data. It also allows you to define the schema of the database, or in short and over-simplifying, the tables and their column structure. In fact there's much more that SQL allows you to do and its functionality can be divided in four categories:

Data definition language (DDL): Creating and manipulating the schema.
Data manipulation language (DML): Inserting, updating, and deleting data from the database.
Data query language (DQL): Retrieving data from the database.
Data control language (DCL): Dealing with rights and permissions over the database and its objects.

In my more than 15 years of experience in the industry, I have rarely seen the previous categories used in a work environment, with the exception of DDL to refer to activities related to handling database schema updates. These categories are useful mostly in academic circles or in teams implementing relational database management software. However, it's good to know that these terms exist and are used by others as it helps in discussions around database technology. With this in mind, let me briefly touch on one of such discussions.

Some would say that developers have to deal only with DML and DQL while DDL and DCL are a concern of DBAs. In practice, this division is not so easy to make. Developers need to understand how database objects (like tables and columns) are created and how access to this objects is managed. However, it is true that developers spend most of their time writing SQL statements to modify and query data. You'll see that this book focuses on DML and DQL while explaining other categories as they are needed. On the other hand, DBA's are experts on everything database—from infrastructure and general database management to SQL query optimization and migration, a DBA is always a valuable brain to have in your team.

Conclusion

So in conclusion, databases solve real problems that application developers face, thanks to their ability to ensure data integrity through ACID properties, manage complex relationships, and provide robust security features. I only scratched the surface here, but this should be enough to give the novice IT practitioners a quick refresh on the importance of relational databases and SQL.

Fast Analytics with MariaDB ColumnStore

Alejandro Duarte — Fri, 19 Jan 2024 13:33:36 +0000

Slow query times in large datasets are a common headache in database management. MariaDB ColumnStore offers a neat way out of this. It's a columnar storage engine that significantly speeds up data analytics. Typically, you can improve query performance in relational databases by adding appropriate indexes. However, maintaining indexes is hard, especially with ad-hoc queries where you don't really know where indexes are going to be needed. ColumnStore eases this pain. It's as if you had an index on each column, but without the hassle of creating and updating them. The price to pay? Well, inserts are not as fast as with InnoDB, so this is not the best option for operational/transactional databases but rather for analytical ones. Bulk inserts are very fast though.

There's plenty of online documentation about ColumnStore, so I won't go through all the details on how it works or how to deploy it on production. Instead, in this article, I'll show you how to try MariaDB ColumnStore on your computer using Docker.

Prerequisites

You'll need:

The mariadb command line tool
Docker

Setting up MariaDB ColumnStore

Run a container with MariaDB + ColumnStore:

docker run -d -p 3307:3306 -e PM1=mcs1 --hostname=mcs1 --name mcs1 mariadb/columnstore

This command runs a new Docker container using the official ColumnStore image, with several specified options:

docker run: Starts a new Docker container.
-d: Runs the container in detached mode (in the background).
-p 3307:3306: Maps port 3307 on the host (your computer) to port 3306 inside the container. This makes the database accessible on port 3307 on the host machine.
-e PM1=mcs1: The PM1 environment variable PM1 specifies the primary database node (mcs1).
--hostname=mcs1: Sets the hostname of the container to mcs1.
--name mcs1: Names the container mcs1.
mariadb/columnstore: Specifies the Docker image to use, in this case, an image for MariaDB with the ColumnStore storage engine.

Provision ColumnStore:

docker exec -it mcs1 provision mcs1

The command docker exec is used to interact with a running Docker container. This is what each option does:

docker exec: Executes a command in a running container.
-it: This option ensures the command is run in interactive mode with a terminal.
mcs1 (first occurrence): This is the name of the Docker container in which the command is to be executed.
provision mcs1 This is the specific command being executed inside the container. provision is a script included in the Docker image that initialize and configure the MariaDB ColumnStore environment within the container. The argument mcs1 is passed to the provision command to specify the host for the MariaDB server within the Docker container.

Connect to the MariaDB server using the default credentials defined in the MariaDB ColumnStore Docker image:

mariadb -h 127.0.0.1 -P 3307 -u admin -p'C0lumnStore!'

Check that ColumnStore is available as a storage engine by running the following SQL sentence:

SHOW ENGINES;

Setting up a demo database

Create the operations database and its InnoDB tables:

CREATE DATABASE operations;

CREATE TABLE operations.doctors(
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL CHECK(TRIM(name) != '')
) ENGINE=InnoDB;

CREATE TABLE operations.appointments(
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL CHECK(TRIM(name) != ''),
    phone_number VARCHAR(15) NOT NULL CHECK(phone_number RLIKE '[0-9]+'),
    email VARCHAR(254) NOT NULL CHECK(TRIM(email) != ''),
    time DATETIME NOT NULL,
    reason ENUM('Consultation', 'Follow-up', 'Preventive', 'Chronic') NOT NULL,
    status ENUM ('Scheduled', 'Canceled', 'Completed', 'No Show'),
    doctor_id BIGINT UNSIGNED NOT NULL,
    CONSTRAINT fk_appointments_doctors FOREIGN KEY (doctor_id) REFERENCES doctors(id)
) ENGINE=InnoDB;

Create the analytics database and its ColumnStore table:

CREATE DATABASE analytics;

CREATE TABLE analytics.appointments(
    id BIGINT UNSIGNED NOT NULL,
    name VARCHAR(200) NOT NULL,
    phone_number VARCHAR(15) NOT NULL,
    email VARCHAR(254) NOT NULL,
    time DATETIME NOT NULL,
    reason VARCHAR(15) NOT NULL,
    status VARCHAR(10) NOT NULL,
    doctor_id BIGINT UNSIGNED NOT NULL
) ENGINE=ColumnStore;

You can use the same database (or schema, they are synonyms in MariaDB) for both the InnoDB and ColumnStore tables if you prefer. Use a different name for the ColumnStore table if you opt for this alternative.

Inserting demo data

Insert a few doctors:

INSERT INTO operations.doctors(name)
VALUES ("Maria"), ("John"), ("Jane");

Create a new file with the name test_data_insert.py with the following content:

import random
import os
import subprocess
from datetime import datetime, timedelta

# Function to generate a random date within a given range
def random_date(start, end):
    return start + timedelta(days=random.randint(0, int((end - start).days)))

# Function to execute a given SQL command using MariaDB
def execute_sql(sql):
    # Write the SQL command to a temporary file
    with open("temp.sql", "w") as file:
        file.write(sql)
    # Execute the SQL command using the MariaDB client
    subprocess.run(["mariadb", "-h", "127.0.0.1", "-P", "3307", "-u", "admin", "-pC0lumnStore!", "-e", "source temp.sql"])
    # Remove the temporary file
    os.remove("temp.sql")

print("Generating and inserting data...")

# Total number of rows to be inserted
total_rows = 4000000
# Number of rows to insert in each batch
batch_size = 10000

# Possible values for the 'reason' column and their associated weights for random selection
reasons = ["Consultation", "Follow-up", "Preventive", "Chronic"]
reason_weights = [0.5, 0.15, 0.25, 0.1]

# Possible values for the 'status' column and their associated weights for random selection
statuses = ["Scheduled", "Canceled", "Completed", "No Show"]
status_weights = [0.1, 0.15, 0.7, 0.05]

# Possible values for the 'doctor_id' column and their associated weights for random selection
doctors = [1, 2, 3]
doctors_weights = [0.4, 0.35, 0.25]

# List of patient names
names = [f"Patient_{i}" for i in range(total_rows)]

# Insert data in batches
for batch_start in range(0, total_rows, batch_size):
    batch_values = []

    # Generate data for each row in the batch
    for i in range(batch_start, min(batch_start + batch_size, total_rows)):
        name = names[i]
        phone_number = f"{random.randint(100, 999)}-{random.randint(100, 999)}-{random.randint(1000, 9999)}"
        email = f"patient_{i}@example.com"
        time = random_date(datetime(2023, 1, 1), datetime(2024, 1, 1)).strftime("%Y-%m-%d %H:%M:%S")
        reason = random.choices(reasons, reason_weights)[0]
        status = random.choices(statuses, status_weights)[0]
        doctor_id = random.choices(doctors, doctors_weights)[0]

        # Append the generated row to the batch
        batch_values.append(f"('{name}', '{phone_number}', '{email}', '{time}', '{reason}', '{status}', {doctor_id})")

    # SQL command to insert the batch of data into the 'appointments' table
    sql = "USE operations;\nINSERT INTO appointments (name, phone_number, email, time, reason, status, doctor_id) VALUES " + ", ".join(batch_values) + ";"
    # Execute the SQL command
    execute_sql(sql)
    # Print progress
    print(f"Inserted up to row {min(batch_start + batch_size, total_rows)}")

print("Data insertion complete.")

Insert 4 million appointments by running the Python script:

python3 test_data_insert.py

Populate the ColumnStore table by connecting to the database and running:

INSERT INTO analytics.appointments (
    id,
    name,
    phone_number,
    email,
    time,
    reason,
    status,
    doctor_id
)
SELECT
    appointments.id,
    appointments.name,
    appointments.phone_number,
    appointments.email,
    appointments.time,
    appointments.reason,
    appointments.status,
    appointments.doctor_id
FROM operations.appointments;

Run cross-engine SQL queries

MariaDB ColumnStore is designed to run in a cluster of multiple servers. It is there where you see massive performance gains in analytical queries. However, we can also see this in action with the single-node setup of this article.

Run the following query and pay attention to the time it needs to complete (make sure it queries the operations database):

SELECT doctors.name, status, COUNT(*) AS count
FROM operations.appointments -- use the InnoDB table
JOIN doctors ON doctor_id = doctors.id
WHERE status IN (
    'Scheduled',
    'Canceled',
    'Completed',
    'No Show'
)
GROUP BY doctors.name, status
ORDER BY doctors.name, status;

On my machine, it took around 3 seconds.

Now modify the query to use the ColumnStore table instead (in the analytics database):

SELECT doctors.name, status, COUNT(*) AS count
FROM analytics.appointments -- use the ColumnStore table
JOIN doctors ON doctor_id = doctors.id
WHERE status IN (
    'Scheduled',
    'Canceled',
    'Completed',
    'No Show'
)
GROUP BY doctors.name, status
ORDER BY doctors.name, status;

It takes less than a second. Of course, you can speed up the first query by adding an index in this simplistic example, but imagine the situation in which you have hundreds of tables—it will become harder and harder to manage indexes. ColumnStore removes this complexity.

Forem: Alejandro Duarte

The "bus factor" risk in MongoDB, MariaDB, Redis, MySQL, PostgreSQL, and SQLite

The bus factor ranking

Simulating top contributors leaving the projects

Other important factors when evaluating open source projects

Investigate your projects and its dependencies

Appendix: Raw Impact Data

1. MongoDB (Baseline: 7 - OK)

2. MariaDB (Baseline: 5 - OK)

3. Redis (Baseline: 5 - OK)

4. MySQL (Baseline: 2 - Low)

5. PostgreSQL (Baseline: 2 - Low)

6. SQLite (Baseline: 2 - Low)

MariaDB doesn't depend on MySQL

Keyword vs. semantic search with AI

The key components

Keyword-based search: simple and familiar

Semantic (vector) search: matching meaning

Keyword vs. semantic — when to use which

How simple the integration can be

The source code

What can go wrong when using database transactions?

Dirty reads

Non-repeatable reads

Phantom reads

Disaster Recovery and AI Vectors in MariaDB Kubernetes Operator 25.08.0

Disaster Recovery with Physical Backups

MariaDB 11.8 and the VECTOR Data Type

Deployments with the New Helm Chart

Enterprise Support

What Else is New?

Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot

Stripping K3s Down to the Essentials

Installing the MariaDB Operator

The MariaDB Secret

Minimal MariaDB Instance: The Tuning Game

Galera Cluster: A Bit of Patience Required

Did Replication Work?

What I Learned

Final Thoughts

Vector Storage, Indexing, and Search With MariaDB

Why Are Vectors Needed in Generative AI Applications?

Creating Tables for Vector Storage

Creating Vector Indexes

Inserting Vectors

Similarity Search (Comparing Vectors)

A Practical Example

Supercharge Your App: MariaDB In-Memory Tables as a Cache

Example Use Case

Cache Invalidation and Management

Pros and cons

How Does It Stack Up Against Redis?

Durability and Persistence

Final Thoughts

Packages for Store Routines in MariaDB 11.4

Creating packages

Calling packaged stored routines

Other benefits of packages

Packages vs multiple schemas

Try it out

Using Temporary Tables in MariaDB

Use Case for Temporary Tables

Conclusion

Why do We Need Databases and SQL?

The utility of relational database systems

A bit of history

The utility of SQL

Conclusion

Fast Analytics with MariaDB ColumnStore

Prerequisites

Setting up MariaDB ColumnStore

Setting up a demo database

Inserting demo data

Run cross-engine SQL queries

MariaDB 11.8 and the `VECTOR` Data Type