<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alejandro Duarte</title>
    <description>The latest articles on Forem by Alejandro Duarte (@alejandro_du).</description>
    <link>https://forem.com/alejandro_du</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F473376%2Fdc3a2dfd-0d2e-42b5-997f-ccf973d49199.jpg</url>
      <title>Forem: Alejandro Duarte</title>
      <link>https://forem.com/alejandro_du</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/alejandro_du"/>
    <language>en</language>
    <item>
      <title>The "bus factor" risk in MongoDB, MariaDB, Redis, MySQL, PostgreSQL, and SQLite</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Tue, 03 Mar 2026 15:18:25 +0000</pubDate>
      <link>https://forem.com/alejandro_du/the-bus-factor-risk-in-mongodb-mariadb-redis-mysql-postgresql-and-sqlite-24dh</link>
      <guid>https://forem.com/alejandro_du/the-bus-factor-risk-in-mongodb-mariadb-redis-mysql-postgresql-and-sqlite-24dh</guid>
      <description>&lt;p&gt;Ever wonder what would happen to an open source database project in case its main developers "get hit by a bus"? Or less dramatically, if they leave the project completely. That's what the "bus factor" (also called "truck factor") measures: how many people would have to disappear before no one left knows how to fix or update specific parts of the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bus factor ranking
&lt;/h2&gt;

&lt;p&gt;I’ve been messing around with a tool called the &lt;a href="https://github.com/JetBrains-Research/bus-factor-explorer" rel="noopener noreferrer"&gt;Bus Factor Explorer&lt;/a&gt; by JetBrains to explore the associated risk in some of the most popular open source databases. I looked at six of the big ones to see where they stand. Here’s the current baseline (March 2026) according to this tool:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;th&gt;Bus Factor (higher is better)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MongoDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MariaDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Redis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MySQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQLite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, for &lt;strong&gt;MongoDB&lt;/strong&gt; to "hit a wall", 7 devs would have to leave the project. For &lt;strong&gt;MySQL&lt;/strong&gt;, &lt;strong&gt;PostgreSQL&lt;/strong&gt;, and &lt;strong&gt;SQLite&lt;/strong&gt;, if 2 devs leave, the project is at risk of stalling. Of course, this is just one factor; there are other factors that influence the continuation of open source projects, but this is still an interesting insight you can take into consideration when choosing a database.&lt;/p&gt;

&lt;p&gt;The tool has a categorization according to the project's bus factor (a baseline that can be adjusted in the tool if you want):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;OK&lt;/strong&gt;: 5 or more (MongoDB, MariaDB, Redis)&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Low&lt;/strong&gt;: 2 to 4 (MySQL, PostgreSQL, SQLite)&lt;/li&gt;
&lt;li&gt;🔴 &lt;strong&gt;Dangerous&lt;/strong&gt;: 0 or 1 (N/A)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Simulating top contributors leaving the projects
&lt;/h2&gt;

&lt;p&gt;I also used the tool to see what happens to the root directories (the functional parts of the code) if you start unchecking the top contributors to each project. Specifically, I wanted to see the proportion of directories that would be "lost" if the top one or two contributors left the project. I ignored the individual files at the root level. The results look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;th&gt;Directories (total)&lt;/th&gt;
&lt;th&gt;Directories lost (1 dev gone)&lt;/th&gt;
&lt;th&gt;Directories lost (2 devs gone)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Redis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0 (0.0%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0 (0.0%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MariaDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2 (6.7%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5 (16.7%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MongoDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;28&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1 (3.6%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7 (25.0%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0 (0.0%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5 (83.3%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MySQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1 (3.3%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30 (100.0%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQLite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5 (45.5%)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11 (100.0%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here, a lower number is better (fewer directories impacted). So, &lt;strong&gt;Redis&lt;/strong&gt; is pretty strong in this simulation as no directories would take a hit. On the other end, we have &lt;strong&gt;MySQL&lt;/strong&gt; and &lt;strong&gt;SQLite&lt;/strong&gt; with 100% of their directories potentially left unmaintained if the top two developers leave. &lt;strong&gt;PostgreSQL&lt;/strong&gt; would lose 83% of its directories in a similar situation. These were the big surprises to me, although all this is aligned with the fact that they have a low bus factor (high risk) of 2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MariaDB&lt;/strong&gt; did pretty well, especially when compared to &lt;strong&gt;MySQL&lt;/strong&gt;, which supports what I have been trying to say about how &lt;em&gt;MariaDB is much more than a fork of MySQL&lt;/em&gt; in my articles and talks (see &lt;a href="https://programmingbrain.com/2025/01/mariadb-does-not-depend-on-mysql" rel="noopener noreferrer"&gt;this&lt;/a&gt;, and &lt;a href="https://www.youtube.com/watch?v=zj02QzbbN8o" rel="noopener noreferrer"&gt;this&lt;/a&gt;, if you are curious).&lt;/p&gt;

&lt;h2&gt;
  
  
  Other important factors when evaluating open source projects
&lt;/h2&gt;

&lt;p&gt;You should not rely merely on "bus factor" risk assessments. This is useful in mission-critical situations or when comparing projects that are even in other metrics that you might be using to compere and evaluate them. Here are some other aspects to look at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Corporate Backing:&lt;/strong&gt; Does the project have large companies behind it? Even if a lead developer leaves, the company is likely to hire someone else to fill the gap, ensuring continuity.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Community Activity:&lt;/strong&gt; Look at the number of active issues, pull requests, and discussions. A buzzing community can often sustain a project better than a few silent experts.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Documentation Quality:&lt;/strong&gt; Comprehensive documentation ensures that knowledge isn't locked in someone's head, making it easier for new contributors to onboard.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Licensing:&lt;/strong&gt; Ensure the license fits your use case, as this impacts the project's long-term viability and adoption. For example, a GPL-licensed project doesn't allow anybody to distribute a closed-source application based on it, hence the project is future-proof as far as licensing goes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Investigate your projects and its dependencies
&lt;/h2&gt;

&lt;p&gt;I investigated the bus factor of other open source projects that I'm a big fan of, like MyBatis, Vaadin, and others, but I'll let you do your own findings with the &lt;a href="https://github.com/JetBrains-Research/bus-factor-explorer" rel="noopener noreferrer"&gt;tool&lt;/a&gt;. Let me know if you find something interesting!&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: Raw Impact Data
&lt;/h2&gt;

&lt;p&gt;This appendix lists every directory &lt;strong&gt;and file&lt;/strong&gt; at the root level that drops to a &lt;strong&gt;Bus Factor of 0&lt;/strong&gt; in the simulations.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. MongoDB (Baseline: 7 - OK)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;With 1 dev gone:&lt;/strong&gt; &lt;code&gt;.bazelrc.local.example&lt;/code&gt;, &lt;code&gt;.gitattributes&lt;/code&gt;, &lt;code&gt;.prettierignore&lt;/code&gt;, &lt;code&gt;MODULE.bazel&lt;/code&gt;, &lt;code&gt;distsrc&lt;/code&gt;, &lt;code&gt;etc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;With 2 devs gone:&lt;/strong&gt; All of the above, plus &lt;code&gt;bazel&lt;/code&gt;, &lt;code&gt;buildscripts&lt;/code&gt;, &lt;code&gt;jstests&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. MariaDB (Baseline: 5 - OK)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;With 1 dev gone:&lt;/strong&gt; &lt;code&gt;VERSION&lt;/code&gt;, &lt;code&gt;config.h.cmake&lt;/code&gt;, &lt;code&gt;CMakeLists.txt&lt;/code&gt;, &lt;code&gt;libmysqld&lt;/code&gt;, &lt;code&gt;debian&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;With 2 devs gone:&lt;/strong&gt; &lt;code&gt;VERSION&lt;/code&gt;, &lt;code&gt;config.h.cmake&lt;/code&gt;, &lt;code&gt;CMakeLists.txt&lt;/code&gt;, &lt;code&gt;libmysqld&lt;/code&gt;, &lt;code&gt;debian&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Redis (Baseline: 5 - OK)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;With 1 dev gone:&lt;/strong&gt; &lt;em&gt;None&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;With 2 devs gone:&lt;/strong&gt; &lt;em&gt;None&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. MySQL (Baseline: 2 - Low)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;With 1 dev gone:&lt;/strong&gt; &lt;code&gt;mysql/mysql-server&lt;/code&gt; (root meta), &lt;code&gt;iwyu_mappings.imp&lt;/code&gt;, &lt;code&gt;config.h.cmake&lt;/code&gt;, &lt;code&gt;CMakeLists.txt&lt;/code&gt;, &lt;code&gt;libmysql&lt;/code&gt;, &lt;code&gt;mysys&lt;/code&gt;, &lt;code&gt;client&lt;/code&gt;, &lt;code&gt;components&lt;/code&gt;, &lt;code&gt;strings&lt;/code&gt;, &lt;code&gt;mysql-test&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;With 2 devs gone:&lt;/strong&gt; Virtually all project directories (36 total), including &lt;code&gt;storage&lt;/code&gt;, &lt;code&gt;sql&lt;/code&gt;, &lt;code&gt;router&lt;/code&gt;, &lt;code&gt;extra&lt;/code&gt;, &lt;code&gt;plugin&lt;/code&gt;, &lt;code&gt;utilities&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. PostgreSQL (Baseline: 2 - Low)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;With 1 dev gone:&lt;/strong&gt; &lt;code&gt;COPYRIGHT&lt;/code&gt;, &lt;code&gt;meson_options.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;With 2 devs gone:&lt;/strong&gt; Virtually all functional directories (23 total), including &lt;code&gt;src&lt;/code&gt;, &lt;code&gt;contrib&lt;/code&gt;, &lt;code&gt;doc&lt;/code&gt;, &lt;code&gt;config&lt;/code&gt;, &lt;code&gt;interfaces&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. SQLite (Baseline: 2 - Low)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;With 1 dev gone:&lt;/strong&gt; &lt;code&gt;VERSION&lt;/code&gt;, &lt;code&gt;manifest.tags&lt;/code&gt;, &lt;code&gt;make.bat&lt;/code&gt;, &lt;code&gt;manifest.uuid&lt;/code&gt;, &lt;code&gt;configure&lt;/code&gt;, &lt;code&gt;README.md&lt;/code&gt;, &lt;code&gt;mptest&lt;/code&gt;, &lt;code&gt;main.mk&lt;/code&gt;, &lt;code&gt;Makefile.msc&lt;/code&gt;, &lt;code&gt;doc&lt;/code&gt;, &lt;code&gt;manifest&lt;/code&gt;, &lt;code&gt;tool&lt;/code&gt;, &lt;code&gt;src&lt;/code&gt;, &lt;code&gt;ext&lt;/code&gt;, &lt;code&gt;test&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;With 2 devs gone:&lt;/strong&gt; Virtually all project boxes (28 total), including root-level scripts and all core subdirectories.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mysql</category>
      <category>postgres</category>
      <category>mongodb</category>
      <category>mariadb</category>
    </item>
    <item>
      <title>MariaDB doesn't depend on MySQL</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Wed, 21 Jan 2026 18:02:51 +0000</pubDate>
      <link>https://forem.com/alejandro_du/mariadb-doesnt-depend-on-mysql-4ok3</link>
      <guid>https://forem.com/alejandro_du/mariadb-doesnt-depend-on-mysql-4ok3</guid>
      <description>&lt;p&gt;When MariaDB was first announced in 2009 by &lt;a href="https://en.wikipedia.org/wiki/Michael_Widenius" rel="noopener noreferrer"&gt;Michael "Monty" Widenius&lt;/a&gt;, it was positioned as a "fork of MySQL". I think that was a Bad Idea™. Okay, maybe it wasn't a bad idea as such. After all, MariaDB indeed is a fork of MySQL. But what is a &lt;em&gt;fork&lt;/em&gt; in the software sense, and how is this reflected in MariaDB? A fork is a software project that takes the source code of another project and continues development independently from the original. Forks often start by maintaining compatibility with their parent project, but they can evolve to become detached with their own features, architecture, bug tracker, mailing list, development philosophy, and community. This is the case of MariaDB, with the addition that it continues to be highly compatible with old MySQL versions and with its current ecosystem at large.&lt;/p&gt;

&lt;p&gt;Before we dig into it, let me clarify that I like MySQL. It was the very first database that I installed during my university time, and I have used it in hobby as well as production projects for a long time. So, why did I affirm that positioning MariaDB as a fork of MySQL was a bad idea? In short, because MariaDB doesn't depend on MySQL. The idea of defining MariaDB merely as a fork of MySQL leads to misconceptions around its future. Take as an example this old comment on &lt;a href="https://news.ycombinator.com/item?id=4401796" rel="noopener noreferrer"&gt;Hacker News&lt;/a&gt; which refers to the phrase "RIP Open Source MySQL":&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Forgive my ignorance, but doesn't this harm MySQL forks as well? Since the test cases are unavailable from now on, say for example they wanted to reimplement a certain feature, isn't it much harder for them to validate that their implementation works correctly?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I sympathize with the author of this comment. We were unintentionally misled by the "fork of MySQL" slogan. I encounter this kind of lack of clarity more often than I would like. But the reality is that the development of MariaDB has been independent for many years already. MariaDB developers don't wait for MySQL to implement features, test cases, fix bugs, or &lt;a href="https://mariadb.com/resources/blog/15-reasons-why-developers-and-dbas-love-mariadb-server/" rel="noopener noreferrer"&gt;innovate&lt;/a&gt;. They write their own tests, create their own features, and solve problems in their own way. When Oracle changes something in MySQL or restricts access to a component, that has no meaningful impact on MariaDB's roadmap because the projects have diverged so significantly that they're essentially different database systems that happen to share some common ancestry, be highly compatible (you can use MySQL connectors and tools with MariaDB), and are &lt;a href="https://www.youtube.com/watch?v=zj02QzbbN8o&amp;amp;t=725s" rel="noopener noreferrer"&gt;named after Monty's children&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So, how come projects like Ubuntu “depend” on upstream projects (e.g. Debian) and others like MariaDB don't? In his paper &lt;a href="https://dwheeler.com/oss_fs_why.html#forking" rel="noopener noreferrer"&gt;Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers!&lt;/a&gt;, David A. Wheeler (Director of Open Source Supply Chain Security at the Linux Foundation), identifies four potential outcomes for software fork attempts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The death of the fork:&lt;/strong&gt; The most common outcome, since keeping a software project alive requires considerable effort.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A re-merging of the fork with the original:&lt;/strong&gt; Both software projects rejoin each other.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The death of the original:&lt;/strong&gt; Users and developers move to the new younger project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Successful branching:&lt;/strong&gt; Both find success with different developers and end users.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For years, the MySQL-MariaDB situation was clearly a &lt;em&gt;successful branching&lt;/em&gt; where both projects found new homes. One in Oracle, the other in the new &lt;a href="https://mariadb.org" rel="noopener noreferrer"&gt;MariaDB Foundation&lt;/a&gt; / &lt;a href="https://mariadb.com" rel="noopener noreferrer"&gt;MariaDB plc&lt;/a&gt; duo. Contrary to what many would have thought, Oracle invested in MySQL and continued its development in the open despite having its own close-source relational database. For a period of time, MariaDB kept merging MySQL code commit by commit. However, this changed in 2014 when &lt;a href="https://launchpad.net/mysql-server" rel="noopener noreferrer"&gt;Oracle stopped publishing MySQL's source code on Launchpad&lt;/a&gt;. Even though MariaDB still merges changes from InnoDB, this marked a clear point of divergence in codebases.&lt;/p&gt;

&lt;p&gt;Recent (and not so recent) findings and events show that Oracle has slowed down at least on the innovation front and at worst on the maintenance side. In his article &lt;a href="https://optimizedbyotto.com/post/reasons-to-stop-using-mysql/" rel="noopener noreferrer"&gt;Stop using MySQL in 2026, it is not true open source&lt;/a&gt;, Otto Kekäläinen (former Software Development Manager at AWS), shows that "the number of git commits on github.com/mysql/mysql-server has been significantly declining in 2025." He also highlights the steep decrease in MySQL's popularity according to &lt;a href="https://db-engines.com/en/ranking_trend" rel="noopener noreferrer"&gt;DB-Engines&lt;/a&gt;, as well as the reported "degraded performance with newer MySQL versions." Are we witnessing a "death of the original" here? I don't know.&lt;/p&gt;

&lt;p&gt;In light of all this, many developers are starting to evaluate migration strategies to other relational databases with MariaDB and TiDB being two of the most attractive options. According to Otto Kekäläinen, "TiDB only really shines with larger distributed setups, so for the vast majority of regular small and mid-scale applications currently using MySQL, the most practical solution is probably to just switch to MariaDB." How about the elephant in the room, you might ask? PostgreSQL is a database with tons of forks and third-party extensions that you can download which makes it popular not only due to its features but the sheer number of companies marketing their PostgreSQL flavor online. For applications currently using MySQL, migrating to PostgreSQL requires a lot of work including SQL code and connector migrations. Two tasks that can be close to zero-effort with MariaDB. Check for example &lt;a href="https://www.youtube.com/watch?v=ZvrP_X9x4eE" rel="noopener noreferrer"&gt;this crazy live broadcast&lt;/a&gt; where Cantamen (Germany's leading car-sharing service provider) migrates from MySQL to MariaDB with the help of Monty himself.&lt;/p&gt;

&lt;p&gt;Let’s get back to my highly opinionated introductory statement... MariaDB is a—now we have learned—detached fork of MySQL, and, to be fair, it has also been positioned as a "MySQL replacement" which is something very accurate to state. I'm glad to see the "replacement" slogan more and more often as opposed to the "fork" one. I personally suggested to &lt;a href="https://en.wikipedia.org/wiki/Kaj_Arn%C3%B6" rel="noopener noreferrer"&gt;Kaj Arnö&lt;/a&gt; (Executive Chairman at the MariaDB Foundation) going with something even stronger like "MariaDB fixes MySQL". That's a bit too strong, perhaps. I'm glad they softened it to &lt;a href="https://mariadb.org/mariadb-is-the-future-of-mysql/" rel="noopener noreferrer"&gt;"MariaDB is the Future of MySQL"&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>database</category>
      <category>mysql</category>
      <category>mariadb</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>Keyword vs. semantic search with AI</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Wed, 29 Oct 2025 15:22:31 +0000</pubDate>
      <link>https://forem.com/alejandro_du/keyword-vs-semantic-search-with-ai-c03</link>
      <guid>https://forem.com/alejandro_du/keyword-vs-semantic-search-with-ai-c03</guid>
      <description>&lt;p&gt;When building search for an application, you typically face two &lt;strong&gt;broad&lt;/strong&gt; approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traditional keyword-based search — match words exactly or with simple variants.&lt;/li&gt;
&lt;li&gt;Semantic (or vector) search — match meaning or context using AI embeddings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also a hybrid approach, but I will let that for a future article. Instead, in this post I’ll walk you through how the two brad approaches work in Python using &lt;a href="https://mariadb.com/database-topics/ai/" rel="noopener noreferrer"&gt;MariaDB&lt;/a&gt; and an AI embedding model, highlight where they differ, and show code that you can adapt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The key components
&lt;/h2&gt;

&lt;p&gt;For this example, I used &lt;a href="https://mariadb.com/products/cloud/" rel="noopener noreferrer"&gt;MariaDB Cloud&lt;/a&gt; to spin up a free serverless database. Within seconds I had a free instance ready. I grabbed the host/user/password details, connected with VS Code, created a database called &lt;code&gt;demo&lt;/code&gt;, created a &lt;code&gt;products&lt;/code&gt; table and loaded ~500 rows of product names via &lt;code&gt;LOAD DATA LOCAL INFILE&lt;/code&gt;. This is an extremely small dataset, but it's enough for learning and experimentation.&lt;/p&gt;

&lt;p&gt;Then I built a small Python + FastAPI app. First I implemented a simple keyword search (by product name) endpoint using full-text index, then I implemented semantic (vector) search using AI-generated vector embeddings + MariaDB’s vector support. You can see the whole process in &lt;a href="https://www.youtube.com/watch?v=B8XGe4KIv8o" rel="noopener noreferrer"&gt;this video&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keyword-based search: simple and familiar
&lt;/h2&gt;

&lt;p&gt;For keyword search I used a &lt;a href="https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/optimization-and-indexes/full-text-indexes/full-text-index-overview" rel="noopener noreferrer"&gt;full-text index&lt;/a&gt; on the &lt;code&gt;name&lt;/code&gt; column of of the &lt;code&gt;products&lt;/code&gt; table. With this index in place, I could search by product name using this SQL query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;AGAINST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I exposed this functionality using a FastAPI endpoint as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/products/text-search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nf"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs fast.&lt;/li&gt;
&lt;li&gt;Works well when users type exact or close terms.&lt;/li&gt;
&lt;li&gt;Uses built-in SQL features (no external AI model needed).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Misses synonyms, context or related meaning.&lt;/li&gt;
&lt;li&gt;Doesn’t understand intent (if user types “running shoes”, a strict keyword search may miss “jogging trainers” or “sneakers”).&lt;/li&gt;
&lt;li&gt;Quality depends heavily on the wording.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my demo, the endpoint returned several products that were not relevant to “running shoes”.&lt;/p&gt;

&lt;h2&gt;
  
  
  Semantic (vector) search: matching meaning
&lt;/h2&gt;

&lt;p&gt;To &lt;a href="https://go.mariadb.com/25Q3-WBN-GLBL-OSSG-2025-09-24-AIVectorsearch_Registration-LP.html" rel="noopener noreferrer"&gt;go beyond keywords&lt;/a&gt; I implemented a second endpoint:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector.&lt;/li&gt;
&lt;li&gt;Store those vectors in MariaDB with the vector &lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/mariadb" rel="noopener noreferrer"&gt;integration for LangChain&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s how I implemented the ingestion endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/products/ingest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ingest_products&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT name FROM products;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nf"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Products ingested successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this is the semantic search endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/products/semantic-search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LangChain integration for MariaDB makes the whole process extremely easy. The integration creates two tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;langchain_collection&lt;/code&gt;: Each row represents a related set of vector embeddings. I have only one in this demo which corresponds to the product names.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;langchain_embedding&lt;/code&gt;: The vector embeddings. Each vector belongs to a collection (many-to-one to &lt;code&gt;langchain_collection&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I ran the semantic search endpoint with the same query “running shoes”, the results felt much more relevant: they included products that didn’t match “running” or “shoes” literally but were semantically close.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keyword vs. semantic — when to use which
&lt;/h2&gt;

&lt;p&gt;Here’s a quick comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Keyword search&lt;/td&gt;
&lt;td&gt;Quick to set up, uses SQL directly&lt;/td&gt;
&lt;td&gt;Limited to literal term matching, less clever&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic search&lt;/td&gt;
&lt;td&gt;Matches meaning and context, more flexible&lt;/td&gt;
&lt;td&gt;Requires embedding model + vector support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pick keyword search&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your search domain is small and predictable or, obviously, you need exact keyword match.&lt;/li&gt;
&lt;li&gt;Users know exactly what they’re looking for (specific codes, exact names).&lt;/li&gt;
&lt;li&gt;You want minimal dependencies and complexity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick semantic search&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to handle synonyms, similar concepts, user intent.&lt;/li&gt;
&lt;li&gt;The dataset or domain has natural language variation.&lt;/li&gt;
&lt;li&gt;You’re willing to integrate an embedding model and manage vector storage/indexing. &lt;a href="https://mariadb.com/docs/server/reference/sql-structure/vectors/vector-overview" rel="noopener noreferrer"&gt;MariaDB helps with this&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In many real-world apps you’ll use a hybrid: start with keyword search, and for higher-value queries or when exact match fails, fall back to semantic search. Or even mix the two via hybrid search. &lt;a href="https://mariadb.com/docs/server/reference/sql-structure/vectors/optimizing-hybrid-search-query-with-reciprocal-rank-fusion-rrf" rel="noopener noreferrer"&gt;MariaDB helps with this too&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How simple the integration can be
&lt;/h2&gt;

&lt;p&gt;In my demo I triggered vector ingestion via a POST endpoint (&lt;code&gt;/ingest&lt;/code&gt;). That reads all product names, computes embeddings, and writes them to MariaDB. One line of code (via LangChain + MariaDB integration) handled the insertion of ~500 rows of vectors.&lt;/p&gt;

&lt;p&gt;Once vectors are stored, adding a semantic search endpoint was just a few lines of code. The MariaDB vector support hidden most of the complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The source code
&lt;/h2&gt;

&lt;p&gt;You can find the code on &lt;a href="https://github.com/mariadb-developers/langchain-fastapi-mariadb-webinar-demo" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. I have one simplistic easy-to-follow program in the &lt;strong&gt;webinar-main.py&lt;/strong&gt; and a more elaborate one with good practices in &lt;strong&gt;backend.py&lt;/strong&gt;. Feel free to clone the repository, modify it, experiment with your own datasets, and let us know if there's anything you'd like to see in the LangChain integration for MariaDB.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/B8XGe4KIv8o"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>database</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>What can go wrong when using database transactions?</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Mon, 06 Oct 2025 21:00:00 +0000</pubDate>
      <link>https://forem.com/alejandro_du/what-can-go-wrong-when-using-database-transactions-40p6</link>
      <guid>https://forem.com/alejandro_du/what-can-go-wrong-when-using-database-transactions-40p6</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is an excerpt from an unedited version of my book &lt;a href="https://mariadbfordevelopers.com/" rel="noopener noreferrer"&gt;MariaDB for Developers&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To this point, we have understood the concept of atomicity—either all operations succeed or none do. What can go wrong? It seems like we are covered. And we are. Until we introduce concurrency in our system. MariaDB is one of the most highly performant database systems and tries to parallelize processing to increase throughput. Parallelizing means that MariaDB can execute transactions from different sessions at the same time by interleaving operations from different transactions instead of waiting for one to finish before starting the next. Each transaction has its own sequence of operations, but MariaDB executes them in overlapping order. Figure 8-2 shows two transactions (A and B) and multiple database operations interleaved through time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr46egxke1st5kbyfaimy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr46egxke1st5kbyfaimy.png" alt="Figure 8-2" width="800" height="264"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 8-2: Interleaving database operations for parallelism.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This interleaving allows MariaDB to use CPU and I/O resources more efficiently than without parallelism. This however opens the door to subtle problems when the parallel transactions read and write overlapping data. Let's study some of these problems known as &lt;em&gt;concurrency phenomena&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dirty reads
&lt;/h2&gt;

&lt;p&gt;Friday afternoon and we've got a winner! Our to-do application—which by chapter 6 became more of a project management tool than a to-do app—is so central to the business, that prizes are given to users who excelled at reporting bugs or helping its development. Our to-do application allows the HR team to grant prizes to users and this use case involves reducing the quantity of the awarded prize in the prizes table.&lt;/p&gt;

&lt;p&gt;Janet and Moe, both from HR, are using our to-do app at the same time. Janet is about to grant today's prize (named "Bagelers" in our database), while Moe is viewing a dashboard that shows an overview of the prize inventory. Jane selects the winner and the prize, and clicks on "Grant prize". Our to-do app starts a new transaction that decreases the quantity for Bagelers from 8 to 7. At that moment, Moe's refreshes the dashboard and sees that there are 7 Bagelers. However, the system crashes and since the transaction was never committed, the new quantity is not written to disk. Jane gets an error, but Moe doesn't. To him, there are 7 Bagelers. He is seeing incorrect data. This is called a &lt;em&gt;dirty read&lt;/em&gt;. Figure 8-3 shows an example of the sequence of operations that lead to a dirty read at time t3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t0pz3nv9k1ov958pnqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t0pz3nv9k1ov958pnqr.png" alt="Figure 8-3" width="800" height="264"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 8-3: Example of dirty read phenomena.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Non-repeatable reads
&lt;/h2&gt;

&lt;p&gt;A similar situation can occur when a transaction reads a value twice, but such value is modified between the reads by another transaction. In this case, the second read would obtain a different value. This phenomena is called a &lt;em&gt;non-repeatable read&lt;/em&gt; and can lead to incorrect results if the values are used for other calculations in the same transaction. Figure 8-4 shows a non-repeatable read at time t5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbbq26ag4brl7va0s8ni6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbbq26ag4brl7va0s8ni6.png" alt="Figure 8-4" width="800" height="264"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 8-4: Example of non-repeatable read phenomena.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Phantom reads
&lt;/h2&gt;

&lt;p&gt;If the write operation in the previous example implies inserting rows, we get what's called a &lt;em&gt;phantom read&lt;/em&gt; phenomena. Figure 8-5 shows a phantom read at time t5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faelt1ntfoy3vj7umg6cl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faelt1ntfoy3vj7umg6cl.png" alt="Figure 8-5" width="800" height="264"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 8-5: Example of phantom read phenomena.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>mysql</category>
      <category>performance</category>
    </item>
    <item>
      <title>Disaster Recovery and AI Vectors in MariaDB Kubernetes Operator 25.08.0</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Thu, 07 Aug 2025 15:48:43 +0000</pubDate>
      <link>https://forem.com/alejandro_du/disaster-recovery-and-ai-vectors-in-mariadb-kubernetes-operator-25080-3k03</link>
      <guid>https://forem.com/alejandro_du/disaster-recovery-and-ai-vectors-in-mariadb-kubernetes-operator-25080-3k03</guid>
      <description>&lt;p&gt;The latest &lt;a href="https://github.com/mariadb-operator/mariadb-operator" rel="noopener noreferrer"&gt;MariaDB Kubernetes Operator&lt;/a&gt; release, version 25.08.0, is &lt;a href="https://github.com/mariadb-operator/mariadb-operator/releases/tag/25.08.0" rel="noopener noreferrer"&gt;now available&lt;/a&gt;. This ships with enhancements, especially in how you can approach disaster recovery in &lt;a href="https://mariadb.com/products/enterprise/galera-cluster/" rel="noopener noreferrer"&gt;MariaDB clusters&lt;/a&gt;, and how the Operator is adapting to the requirements of modern data-centric applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disaster Recovery with Physical Backups
&lt;/h2&gt;

&lt;p&gt;One of the main features in 25.08.0 is the introduction of &lt;strong&gt;&lt;code&gt;PhysicalBackup&lt;/code&gt; Custom Resources&lt;/strong&gt;. For some time, &lt;a href="https://mariadb.com/docs/server/server-usage/backup-and-restore/backup-and-restore-overview" rel="noopener noreferrer"&gt;logical backups&lt;/a&gt; have been the only supported method, but as databases grow, so do the challenges of restoring them quickly. &lt;a href="https://mariadb.com/docs/server/server-usage/backup-and-restore/mariadb-backup/mariadb-backup-overview" rel="noopener noreferrer"&gt;Physical backups&lt;/a&gt; offer a more efficient path, especially for large datasets. They work at the physical directory level rather than through execution of SQL statements.&lt;/p&gt;

&lt;p&gt;This capability has been implemented in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mariadb-backup&lt;/code&gt; Integration:&lt;/strong&gt; MariaDB's native backup tool, &lt;code&gt;mariadb-backup&lt;/code&gt;, can be used directly through the Operator. You can define &lt;code&gt;PhysicalBackup&lt;/code&gt; CRs to schedule backups, manage retention, apply compression, and even specify S3-compatible storage. The restoration process is straightforward: simply reference the &lt;code&gt;PhysicalBackup&lt;/code&gt; in a new &lt;code&gt;MariaDB&lt;/code&gt; resource &lt;code&gt;bootstrapFrom&lt;/code&gt; field, and the Operator handles the rest, preparing and restoring the backup files. A great feature for reducing RTO (&lt;a href="https://www.youtube.com/watch?v=hepcdtUPG-4" rel="noopener noreferrer"&gt;Recovery Time Objective&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes-native &lt;code&gt;VolumeSnapshots&lt;/code&gt;:&lt;/strong&gt; Alternatively, if your Kubernetes environment is set up with CSI drivers that support &lt;code&gt;VolumeSnapshots&lt;/code&gt;, physical backups can now be created directly at the storage level. This method creates snapshots of MariaDB data volumes, offering another robust way to capture a consistent point-in-time copy of your database. Restoring from a &lt;code&gt;VolumeSnapshot&lt;/code&gt; is equally simple and allows for quick provisioning of new clusters from these storage-level backups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These new physical backup options provide greater flexibility and significantly faster recovery times, which are absolutely critical in &lt;strong&gt;production environments&lt;/strong&gt;. The aim is to provide more tools to build a resilient and robust data infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  MariaDB 11.8 and the &lt;code&gt;VECTOR&lt;/code&gt; Data Type
&lt;/h2&gt;

&lt;p&gt;The Operator is also keeping pace with the latest advancements in MariaDB itself. The MariaDB Kubernetes Operator 25.08.0 now supports &lt;a href="https://mariadb.com/products/community-server/" rel="noopener noreferrer"&gt;MariaDB 11.8 Community Server&lt;/a&gt; by default.&lt;/p&gt;

&lt;p&gt;One of the most important features in MariaDB 11.8 is the &lt;a href="https://mariadb.com/docs/server/reference/sql-structure/vectors/vector-overview" rel="noopener noreferrer"&gt;&lt;code&gt;VECTOR&lt;/code&gt; data type&lt;/a&gt;. This is a notable development for anyone working with AI applications (watch &lt;a href="https://go.mariadb.com/25Q1-GLBL-WBN-VectorGenAI-2025-03-26_Registration-LP.html" rel="noopener noreferrer"&gt;this webinar&lt;/a&gt; to get up to speed). High-dimensional vectors are fundamental in areas like &lt;a href="https://www.youtube.com/watch?v=XkB2DLK60JU" rel="noopener noreferrer"&gt;similarity search for RAG&lt;/a&gt; (Retrieval Augmented Generation) applications.&lt;/p&gt;

&lt;p&gt;With the &lt;code&gt;VECTOR&lt;/code&gt; data type, these vectors can be stored and manipulated directly within a MariaDB database without having to introduce an often complex, dedicated vector database—MariaDB is also a vector database! If you're using frameworks like LangChain or Spring AI, the &lt;a href="https://mariadb.com/resources/blog/building-ai-applications-using-frameworks-with-mariadb-vector-store/" rel="noopener noreferrer"&gt;new MariaDB integrations&lt;/a&gt; allow using MariaDB as a vector store.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployments with the New Helm Chart
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;mariadb-cluster&lt;/code&gt; &lt;a href="https://mariadb.com/docs/tools/mariadb-enterprise-operator/installation/helm" rel="noopener noreferrer"&gt;Helm chart&lt;/a&gt; has been introduced to facilitate database instance deployments. This new chart streamlines the provisioning of a MariaDB cluster along with its associated Custom Resources managed by the Operator. Instead of manually configuring relationships between different CRs, the Helm chart takes care of it, allowing &lt;a href="https://www.youtube.com/watch?v=YfDa4qj9s7g" rel="noopener noreferrer"&gt;management of an entire MariaDB deployment&lt;/a&gt; as a single Helm release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise Support
&lt;/h2&gt;

&lt;p&gt;For mission-critical, production environments, the &lt;a href="https://www.google.com/url?q=https://mariadb.com/wp-content/uploads/2025/07/mariadb-enterprise-kubernetes-operator_datasheet_1196.pdf&amp;amp;sa=D&amp;amp;source=docs&amp;amp;ust=1754383972370349&amp;amp;usg=AOvVaw3LW0hKb5D7gYoJxXi1JofJ" rel="noopener noreferrer"&gt;MariaDB Enterprise Kubernetes Operator&lt;/a&gt; provides enterprise-grade support, including Red Hat OpenShift certification, and utilizes a secure Red Hat UBI base image. The operator offers robust security features like a customizable certificate lifecycle and advanced private key algorithms, while managing both &lt;a href="https://mariadb.com/products/enterprise/enterprise-server/" rel="noopener noreferrer"&gt;MariaDB Enterprise&lt;/a&gt; Server and &lt;a href="https://mariadb.com/products/maxscale/" rel="noopener noreferrer"&gt;MariaDB MaxScale&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Else is New?
&lt;/h2&gt;

&lt;p&gt;Beyond these major highlights, this release includes important replication improvements, pushing closer to general availability for this feature. You'll also find various bug fixes and enhancements. A new calendar-based versioning scheme has also been adopted for clarity and predictability.&lt;/p&gt;

&lt;p&gt;The team is committed to continuously improving the MariaDB Operator based on user needs. Your feedback is invaluable! So if you have any questions or encounter any issues, please don't hesitate to open an issue on &lt;a href="https://github.com/mariadb-operator/mariadb-operator" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or join us on the &lt;a href="https://r.mariadb.com/join-community-slack" rel="noopener noreferrer"&gt;MariaDB Community Slack&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As always, the detailed changelog and upgrade guide can be found on the &lt;a href="https://github.com/mariadb-operator/mariadb-operator/releases/" rel="noopener noreferrer"&gt;GitHub release page&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Fri, 23 May 2025 11:29:40 +0000</pubDate>
      <link>https://forem.com/alejandro_du/can-you-run-a-mariadb-cluster-on-a-150-kubernetes-lab-i-gave-it-a-shot-1bbp</link>
      <guid>https://forem.com/alejandro_du/can-you-run-a-mariadb-cluster-on-a-150-kubernetes-lab-i-gave-it-a-shot-1bbp</guid>
      <description>&lt;p&gt;If you're like me, learning how to run databases inside Kubernetes sounds better when it's hands-on, physical, and brutally honest. So instead of spinning up cloud VMs or using Kind or minikube on a laptop, I went small and real: four Orange Pi 3 LTS boards (a Raspberry Pi alternative), each with just 2GB RAM. &lt;/p&gt;

&lt;p&gt;My goal? Get MariaDB — and eventually &lt;a href="https://mariadb.com/kb/en/galera-cluster/" rel="noopener noreferrer"&gt;Galera&lt;/a&gt; replication — running on Kubernetes using the official &lt;a href="https://mariadb.com/kb/en/kubernetes-operators-for-mariadb/" rel="noopener noreferrer"&gt;MariaDB Kubernetes Operator&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXftLTia4BGN3dGChw6Mi7IXq7jSNXPtfSTFDHllR5-v3sPm9OTpiJ54DwxwgMnX6TadaZwpeR9uTAbbMK4CHXaFhCuIolfXgqDWhRJX1etVcWhuxdaJxcOL_ylw3oIlnQ3EOxYjXQ%3Fkey%3D4ccG6zCqxsi6LQJ3nEoVOA" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXftLTia4BGN3dGChw6Mi7IXq7jSNXPtfSTFDHllR5-v3sPm9OTpiJ54DwxwgMnX6TadaZwpeR9uTAbbMK4CHXaFhCuIolfXgqDWhRJX1etVcWhuxdaJxcOL_ylw3oIlnQ3EOxYjXQ%3Fkey%3D4ccG6zCqxsi6LQJ3nEoVOA" alt="Four 2GB Orange Pi 3 LTS boards" width="1600" height="1337"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: If you came here for the code, you can find &lt;a href="https://www.redhat.com/en/ansible-collaborative?intcmp=7015Y000003t7aWQAQ" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt; playbooks on &lt;a href="https://github.com/alejandro-du/orange-pi-zero-cluster-ansible-playbooks" rel="noopener noreferrer"&gt;this GitHub repository&lt;/a&gt;, along with instructions on how to use them. For production environments, see &lt;a href="https://github.com/mariadb-operator/mariadb-operator/blob/8671ab54e953e2611e377c0dab42ab5e9d9741b6/examples/manifests/mariadb_galera_production.yaml" rel="noopener noreferrer"&gt;this manifest&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: This isn’t a tutorial on building an &lt;a href="https://dzone.com/articles/orange-pi-cluster-with-docker-swarm-and-mariadb" rel="noopener noreferrer"&gt;Orange Pi cluster&lt;/a&gt;, or even setting up K3s. It’s a record of what I tried, what worked, what broke, and what I learned when deploying MariaDB on Kubernetes.&lt;/p&gt;

&lt;p&gt;This article ignores best practices and security in favor of simplicity and brevity of code. The setup presented here helps you to get started with the MariaDB Kubernetes Operator so you can continue your exploration with the links provided at the end of the article.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Info&lt;/strong&gt;: The MariaDB Kubernetes Operator has been in development since 2022 and is steadily growing in &lt;a href="https://www.star-history.com/#mariadb-operator/mariadb-operator&amp;amp;Date" rel="noopener noreferrer"&gt;popularity&lt;/a&gt;. It’s also &lt;a href="https://www.bigdatawire.com/this-just-in/new-mariadb-kubernetes-operator-achieves-red-hat-openshift-operator-certification/" rel="noopener noreferrer"&gt;Red Hat OpenShift Certified&lt;/a&gt; and available as part of &lt;a href="https://mariadb.com/products/enterprise/" rel="noopener noreferrer"&gt;MariaDB Enterprise&lt;/a&gt;. Galera is a synchronous multi-primary cluster solution that enables high availability and data consistency across MariaDB nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stripping K3s Down to the Essentials
&lt;/h2&gt;

&lt;p&gt;First of all, I installed K3s (a certified Kubernetes distribution built for IoT and edge computing) on the control node as follows (ssh into the control node):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl -sfL https://get.k3s.io | &lt;span class="se"&gt;\\&lt;/span&gt;
&lt;span class="nv"&gt;INSTALL_K3S_EXEC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--disable traefik &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
                  --disable servicelb &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
                  --disable cloud-controller &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
                  --disable network-policy"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
sh &lt;span class="nt"&gt;-s&lt;/span&gt; - server &lt;span class="nt"&gt;--cluster-init&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These flags strip out components I didn't need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;traefik&lt;/strong&gt;: No need for HTTP ingress.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;servicelb&lt;/strong&gt;: I relied on NodePorts instead.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;cloud-controller&lt;/strong&gt;: Irrelevant on bare-metal.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;network-policy&lt;/strong&gt;: Avoided for simplicity and memory savings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On worker nodes, I installed K3s and joined the cluster with the usual command (replace &lt;code&gt;&amp;lt;control-node-ip&amp;gt;&lt;/code&gt; with the actual IP of the control node):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.k3s.io | &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;K3S_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://&amp;lt;control-node-ip&amp;gt;:6443 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;K3S_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;token&amp;gt; sh -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To be able to manage the cluster from my laptop (MacOS), I did this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scp orangepi@&amp;lt;master-ip&amp;gt;:/etc/rancher/k3s/k3s.yaml ~/.kube/config

&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s1"&gt;'s/127.0.0.1/&amp;lt;control-node-ip&amp;gt;/g'&lt;/span&gt; ~/.kube/config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Windows users can do the same using &lt;a href="https://winscp.net/eng/index.php" rel="noopener noreferrer"&gt;WinSCP&lt;/a&gt; or &lt;a href="https://learn.microsoft.com/en-us/windows/wsl/install" rel="noopener noreferrer"&gt;WSL&lt;/a&gt; + scp. And don’t forget to replace &lt;code&gt;&amp;lt;control-node-ip&amp;gt;&lt;/code&gt; with the actual IP again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the MariaDB Operator
&lt;/h2&gt;

&lt;p&gt;Here’s how I installed the MariaDB Kubernetes operator via Helm (ssh into the control node):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add mariadb-operator https://helm.mariadb.com/mariadb-operator

helm &lt;span class="nb"&gt;install &lt;/span&gt;mariadb-operator-crds mariadb-operator/mariadb-operator-crds

helm &lt;span class="nb"&gt;install &lt;/span&gt;mariadb-operator mariadb-operator/mariadb-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It deployed cleanly with no extra config, and the ARM64 support worked out of the box. Once installed, the operator started watching for MariaDB resources. &lt;/p&gt;

&lt;h2&gt;
  
  
  The MariaDB Secret
&lt;/h2&gt;

&lt;p&gt;I tried to configure the MariaDB &lt;a href="https://mariadb.com/kb/en/mariadb-secure-installation/" rel="noopener noreferrer"&gt;root password&lt;/a&gt; in the same manifest file (for demo purposes), but it failed, especially with Galera. I guess the MariaDB servers are initialized before the secret, which makes the startup process fail. So, I just followed the &lt;a href="https://github.com/mariadb-operator/mariadb-operator/tree/main/docs" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; (as one should always do!) and created the secret via command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create secret generic mariadb-root-password &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;demo123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also got the opportunity to speak with &lt;a href="https://x.com/m_montes11" rel="noopener noreferrer"&gt;Martin Montes&lt;/a&gt; (Sr. Software Engineer at &lt;a href="https://mariadb.com/about-us/" rel="noopener noreferrer"&gt;MariaDB plc&lt;/a&gt; and main developer of the MariaDB Kubernetes Operator). He shared this with me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“If the rootPasswordSecretKeyRef field is not set, a random one is provisioned by the operator. Then, the init jobs are triggered with that secret, which ties the database's initial state to that random secret. To start over with an explicit secret, you can delete the MariaDB resource, delete the PVCs (which contain the internal state), and create a manifest that contains both the MariaDB and the Secret. It should work.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can find some examples of predictable password handling &lt;a href="https://github.com/mmontes11/k8s-media/tree/main/infrastructure" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal MariaDB Instance: The Tuning Game
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://github.com/mariadb-operator/mariadb-operator/blob/main/examples/manifests/mariadb_minimal.yaml" rel="noopener noreferrer"&gt;first deployment&lt;/a&gt; failed immediately: &lt;code&gt;OOMKilled&lt;/code&gt;. The MariaDB Kubernetes Operator is made for real production environments, and it works out of the box on clusters with enough compute capacity. &lt;/p&gt;

&lt;p&gt;However, in my case, with only 2GB per node, memory tuning was unavoidable. Fortunately, one of the strengths of the MariaDB Kubernetes Operator is its flexible configuration. So, I limited memory usage, dropped buffer pool size, reduced connection limits, and tweaked probe configs to prevent premature restarts.&lt;/p&gt;

&lt;p&gt;Here’s the config that ran reliably:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# MariaDB instance&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s.mariadb.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MariaDB&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb-demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rootPasswordSecretKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;       &lt;span class="c1"&gt;# Reference to a secret containing root password for security&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb-root-password&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
  &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100Mi&lt;/span&gt;                   &lt;span class="c1"&gt;# Small storage size to conserve resources on limited-capacity SD cards&lt;/span&gt;
    &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local-path&lt;/span&gt;  &lt;span class="c1"&gt;# Local storage class for simplicity and performance&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512Mi&lt;/span&gt;               &lt;span class="c1"&gt;# Minimum memory allocation - suitable for IoT/edge devices like Raspberry Pi, Orange Pi, and others&lt;/span&gt;
    &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512Mi&lt;/span&gt;               &lt;span class="c1"&gt;# Hard limit prevents MariaDB from consuming too much memory on constrained devices&lt;/span&gt;
  &lt;span class="na"&gt;myCnf&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;[mariadb]&lt;/span&gt;
    &lt;span class="s"&gt;# Listen on all interfaces to allow external connections&lt;/span&gt;
    &lt;span class="s"&gt;bind-address=0.0.0.0&lt;/span&gt;

    &lt;span class="s"&gt;# Disable binary logging to reduce disk I/O and storage requirements&lt;/span&gt;
    &lt;span class="s"&gt;skip-log-bin&lt;/span&gt;

    &lt;span class="s"&gt;# Set to ~70% of available RAM to balance performance and memory usage&lt;/span&gt;
    &lt;span class="s"&gt;innodb_buffer_pool_size=358M&lt;/span&gt;

    &lt;span class="s"&gt;# Limit connections to avoid memory exhaustion on constrained hardware&lt;/span&gt;
    &lt;span class="s"&gt;max_connections=20&lt;/span&gt;
  &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;          &lt;span class="c1"&gt;# 40 * 15s = 10 minutes grace&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;             &lt;span class="c1"&gt;# check every 15 seconds&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;            &lt;span class="c1"&gt;# each check can take up to 10s&lt;/span&gt;
  &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;          &lt;span class="c1"&gt;# 10 * 60s = 10 minutes of failing allowed&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;             &lt;span class="c1"&gt;# check every 60 seconds&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;            &lt;span class="c1"&gt;# each check can take 10s&lt;/span&gt;
  &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;          &lt;span class="c1"&gt;# 10 * 30s = 5 minutes tolerance&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;             &lt;span class="c1"&gt;# check every 30 seconds&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;             &lt;span class="c1"&gt;# fast readiness check&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# NodePort service&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb-demo-external&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NodePort&lt;/span&gt;                    &lt;span class="c1"&gt;# Makes the database accessible from outside the cluster&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb&lt;/span&gt; &lt;span class="c1"&gt;# Targets the MariaDB pods created by operator&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;                    &lt;span class="c1"&gt;# Standard MariaDB port&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;              &lt;span class="c1"&gt;# Port inside the container&lt;/span&gt;
&lt;span class="na"&gt;      nodePort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30001              &lt;/span&gt; &lt;span class="c1"&gt;# External access port on all nodes (limited to 30000-32767 range)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The operator generated the underlying StatefulSet and other resources automatically. I checked logs and resources — it created valid objects, respected the custom config, and successfully managed lifecycle events. That level of automation saved time and reduced YAML noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Info&lt;/strong&gt;: Set the &lt;code&gt;innodb_buffer_pool_size&lt;/code&gt; variable to around 70% of the total memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Warning&lt;/strong&gt;: Normally, it is recommended to not set CPU limits. This can make the whole initialization process and the database itself slow (and cause CPU throttling). The trade-off of not setting limits is that it might steal CPU cycles from other workloads running on the same Node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Galera Cluster: A Bit of Patience Required
&lt;/h2&gt;

&lt;p&gt;Deploying a 3-node MariaDB Galera cluster wasn’t that difficult after the experience gained from the single-instance deployment — it only required additional configuration and minimal adjustments. The process takes some time to complete, though. So be patient if you are trying this on small SBCs with limited resources like the Orange Pi or Raspberry Pi.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mariadb.com/kb/en/mariabackup-sst-method/" rel="noopener noreferrer"&gt;SST (State Snapshot Transfer)&lt;/a&gt; processes are a bit resource-heavy, and early on, the startup probe would trigger restarts before nodes could sync on these small SBCs already running Kubernetes. I increased probe thresholds and stopped trying to watch the rollout step-by-step, instead letting the cluster come up at its own pace. &lt;/p&gt;

&lt;p&gt;And it just works! By the way, this step-by-step rollout is designed to avoid downtime: rolling the replicas one at a time, waiting for each of them to sync, proceeding with the primary, and switching over to an up-to-date replica. Also, for this setup, I increased the memory a bit to let Galera do its thing.&lt;/p&gt;

&lt;p&gt;Here’s the deployment manifest file that worked smoothly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 3-node multi-master MariaDB cluster&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s.mariadb.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MariaDB&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb-galera&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;                   &lt;span class="c1"&gt;# Minimum number for a fault-tolerant Galera cluster (balanced for resource constraints)&lt;/span&gt;
  &lt;span class="na"&gt;replicasAllowEvenNumber&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="c1"&gt;# Allows cluster to continue if a node fails, even with even number of nodes&lt;/span&gt;
  &lt;span class="na"&gt;rootPasswordSecretKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb-root-password&lt;/span&gt; &lt;span class="c1"&gt;# References the password secret created with kubectl&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
    &lt;span class="na"&gt;generate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;             &lt;span class="c1"&gt;# Use existing secret instead of generating one&lt;/span&gt;
  &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100Mi&lt;/span&gt;                 &lt;span class="c1"&gt;# Small storage size to accommodate limited SD card capacity on Raspberry Pi, Orange Pi, and others&lt;/span&gt;
    &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local-path&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;               &lt;span class="c1"&gt;# Higher than single instance to accommodate Galera overhead&lt;/span&gt;
    &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;               &lt;span class="c1"&gt;# Strict limit prevents OOM issues on resource-constrained nodes&lt;/span&gt;
  &lt;span class="na"&gt;galera&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;               &lt;span class="c1"&gt;# Activates multi-master synchronous replication&lt;/span&gt;
    &lt;span class="na"&gt;sst&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariabackup&lt;/span&gt;            &lt;span class="c1"&gt;# State transfer method that's more efficient for limited bandwidth connections&lt;/span&gt;
    &lt;span class="na"&gt;primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;podIndex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;               &lt;span class="c1"&gt;# First pod bootstraps the cluster&lt;/span&gt;
    &lt;span class="na"&gt;providerOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;gcache.size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;64M'&lt;/span&gt;      &lt;span class="c1"&gt;# Reduced write-set cache for memory-constrained environment&lt;/span&gt;
        &lt;span class="na"&gt;gcache.page_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;64M'&lt;/span&gt; &lt;span class="c1"&gt;# Matching page size improves memory efficiency&lt;/span&gt;
  &lt;span class="na"&gt;myCnf&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;[mariadb]&lt;/span&gt;
    &lt;span class="s"&gt;# Listen on all interfaces for cluster communication&lt;/span&gt;
    &lt;span class="s"&gt;bind-address=0.0.0.0&lt;/span&gt;

    &lt;span class="s"&gt;# Required for Galera replication to work correctly&lt;/span&gt;
    &lt;span class="s"&gt;binlog_format=ROW&lt;/span&gt;

    &lt;span class="s"&gt;# ~70% of available memory for database caching&lt;/span&gt;
    &lt;span class="s"&gt;innodb_buffer_pool_size=700M&lt;/span&gt;

    &lt;span class="s"&gt;# Severely limited to prevent memory exhaustion across replicas&lt;/span&gt;
    &lt;span class="s"&gt;max_connections=12&lt;/span&gt;
  &lt;span class="na"&gt;affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;antiAffinityEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# Ensures pods run on different nodes for true high availability&lt;/span&gt;
  &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;40# 40 * 15s = 10 minutes grace&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;           &lt;span class="c1"&gt;# check every 15 seconds&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;          &lt;span class="c1"&gt;# each check can take up to 10s&lt;/span&gt;
  &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;        &lt;span class="c1"&gt;# 10 * 60s = 10 minutes of failing allowed&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;           &lt;span class="c1"&gt;# check every 60 seconds&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;          &lt;span class="c1"&gt;# each check can take 10s&lt;/span&gt;
  &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;        &lt;span class="c1"&gt;# 10 * 30s = 5 minutes tolerance&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;           &lt;span class="c1"&gt;# check every 30 seconds&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;           &lt;span class="c1"&gt;# fast readiness check&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# External access service&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb-galera-external&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NodePort&lt;/span&gt;                    &lt;span class="c1"&gt;# Makes the database accessible from outside the cluster&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mariadb&lt;/span&gt; &lt;span class="c1"&gt;# Targets all MariaDB pods for load balancing&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;                    &lt;span class="c1"&gt;# Standard MariaDB port&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;              &lt;span class="c1"&gt;# Port inside the container&lt;/span&gt;
&lt;span class="na"&gt;      nodePort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30001              &lt;/span&gt; &lt;span class="c1"&gt;# External access port on all cluster nodes (using any node IP)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After tuning the values, all three pods reached &lt;code&gt;Running&lt;/code&gt;. I confirmed replication was active, and each pod landed on a different node — &lt;code&gt;kubectl get pods -o wide&lt;/code&gt; confirmed even distribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Info&lt;/strong&gt;: To ensure that every MariaDB pod gets scheduled on a different Node, set &lt;code&gt;spec.gallera.affinity.antiAffinityEnabled&lt;/code&gt; to &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Did Replication Work?
&lt;/h2&gt;

&lt;p&gt;Here’s the basic test I used to check if replication worked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; mariadb-galera-0 &lt;span class="nt"&gt;--&lt;/span&gt; mariadb &lt;span class="nt"&gt;-uroot&lt;/span&gt; &lt;span class="nt"&gt;-pdemo123&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"
CREATE DATABASE test;
CREATE TABLE test.t (id INT PRIMARY KEY AUTO_INCREMENT, msg TEXT);
INSERT INTO test.t(msg) VALUES ('It works!');"&lt;/span&gt;


kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; mariadb-galera-1 &lt;span class="nt"&gt;--&lt;/span&gt; mariadb &lt;span class="nt"&gt;-uroot&lt;/span&gt; &lt;span class="nt"&gt;-pdemo123&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM test.t;"&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; mariadb-galera-2 &lt;span class="nt"&gt;--&lt;/span&gt; mariadb &lt;span class="nt"&gt;-uroot&lt;/span&gt; &lt;span class="nt"&gt;-pdemo123&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM test.t;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inserted row appeared on all three nodes. I didn’t measure write latency or SST transfer duration—this wasn’t a performance test. For me, it was just enough to confirm functional replication and declare success.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXcVs_MvshqI50_E0_2_KWDBLg_Zemmv80U4muxcGdNLLEGgLucnWZyZSyZ4eEh648sZowoaYzBMOIkxNKZc-jAkhtI15yp5xj90pbO_mSX1Chv5GvvvhgWOIBACq8Den816pQHt%3Fkey%3D4ccG6zCqxsi6LQJ3nEoVOA" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXcVs_MvshqI50_E0_2_KWDBLg_Zemmv80U4muxcGdNLLEGgLucnWZyZSyZ4eEh648sZowoaYzBMOIkxNKZc-jAkhtI15yp5xj90pbO_mSX1Chv5GvvvhgWOIBACq8Den816pQHt%3Fkey%3D4ccG6zCqxsi6LQJ3nEoVOA" alt="MariaDB Galera Replication working" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since I exposed the service using a simple NodePort, I was also able to connect to the MariaDB cluster using the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mariadb &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;master-ip&amp;gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 30001 &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-pdemo123&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I skipped Ingress entirely to keep memory usage and YAML code minimal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The MariaDB Operator handled resource creation pretty well&lt;/strong&gt; — PVCs, StatefulSets, Secrets, and lifecycle probes were all applied correctly with no manual intervention.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Galera on SBCs is actually possible&lt;/strong&gt;. SST needs patience, and tuning memory limits is critical, but it works!&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Out-of-the-box kube probes often don’t work on slow hardware&lt;/strong&gt;. Startup times will trip checks unless you adjust thresholds.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Node scheduling worked out fine on its own&lt;/strong&gt;. K3s distributed the pods evenly.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Failures teach more than success&lt;/strong&gt;. Early OOM errors helped me understand the behavior of stateful apps in Kubernetes much more than a smooth rollout would’ve.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This wasn’t about benchmarks, and it wasn’t for production. For production environments, see &lt;a href="https://github.com/mariadb-operator/mariadb-operator/blob/8671ab54e953e2611e377c0dab42ab5e9d9741b6/examples/manifests/mariadb_galera_production.yaml" rel="noopener noreferrer"&gt;this manifest&lt;/a&gt;. This article was about shrinking a MariaDB Kubernetes deployment to get it working on a constrained environment. It was also about getting started with the MariaDB Kubernetes Operator and learning what it does for you.&lt;/p&gt;

&lt;p&gt;The operator simplified a lot of what would otherwise be painful on K8s: it created stable StatefulSets, managed volumes and config, and coordinated cluster state without needing glue scripts or sidecars. Still, it required experimentation on this resource-limited cluster. Probes need care. And obviously, you won’t get resilience or high throughput from an SBC cluster like this, especially if you have a curious dog or cat around your cluster! But this is a worthwhile test for learning and experimentation. Also, if you don’t want to fiddle with SBCs, try Kind or minikube.&lt;/p&gt;

&lt;p&gt;By the way, the MariaDB Kubernetes Operator can do much more for you. Check &lt;a href="https://github.com/mariadb-operator/mariadb-operator" rel="noopener noreferrer"&gt;this repository&lt;/a&gt; to see a list of the possibilities. Here are just a few worth exploring next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Multiple HA modes: Galera Cluster or MariaDB Replication.&lt;/li&gt;
&lt;li&gt;  Advanced HA with MaxScale: a sophisticated database proxy, router, and load balancer for MariaDB.&lt;/li&gt;
&lt;li&gt;  Flexible storage configuration. Volume expansion.&lt;/li&gt;
&lt;li&gt;  Take, restore and schedule backups.&lt;/li&gt;
&lt;li&gt;  Cluster-aware rolling update: roll out replica Pods one by one, wait for each of them to become ready, and then proceed with the primary Pod, using ReplicasFirstPrimaryLast.&lt;/li&gt;
&lt;li&gt;  Issue, configure and rotate TLS certificates and CAs.&lt;/li&gt;
&lt;li&gt;  Orchestrate and schedule sql scripts.&lt;/li&gt;
&lt;li&gt;  Prometheus metrics via mysqld-exporter and maxscale-exporter.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>database</category>
      <category>mariadb</category>
      <category>devops</category>
    </item>
    <item>
      <title>Vector Storage, Indexing, and Search With MariaDB</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Tue, 28 Jan 2025 13:56:36 +0000</pubDate>
      <link>https://forem.com/alejandro_du/vector-storage-indexing-and-search-with-mariadb-19bl</link>
      <guid>https://forem.com/alejandro_du/vector-storage-indexing-and-search-with-mariadb-19bl</guid>
      <description>&lt;p&gt;When you develop generative AI applications, you typically introduce three additional components to your infrastructure: an embedder, an LLM, and a vector database. &lt;/p&gt;

&lt;p&gt;However, if you are using MariaDB, you don't need to introduce an additional database along with its own SQL dialect — or even worse — its own proprietary API. Since MariaDB version 11.7 (and &lt;a href="https://mariadb.com/resources/blog/introducing-vector-search-with-the-latest-version-of-mariadb-enterprise-platform/" rel="noopener noreferrer"&gt;MariaDB Enterprise Server 11.4&lt;/a&gt;) you can simply store your embeddings (or vectors) in any column of any table—no need to make your applications database polyglots.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"After announcing the preview of vector search in MariaDB Server, the vector search capability has now been added to the MariaDB Community Server 11.7 release,"&lt;/em&gt; &lt;a href="https://mariadb.com/resources/blog/announcing-mariadb-community-server-11-7-rc-with-vector-search-and-11-6-ga/" rel="noopener noreferrer"&gt;writes Ralf Gebhardt&lt;/a&gt;, Product Manager for MariaDB Server at MariaDB. This includes a new datatype (&lt;code&gt;VECTOR&lt;/code&gt;), vector index, and a set of functions for vector manipulation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Are Vectors Needed in Generative AI Applications?
&lt;/h2&gt;

&lt;p&gt;Vectors are needed in generative AI applications because they embed complex meanings in a compact fixed-length array of numbers (a vector). This is more clear in the context of &lt;a href="https://dzone.com/articles/introduction-to-retrieval-augmented-generation-rag" rel="noopener noreferrer"&gt;retrieval-augmented generation&lt;/a&gt; (or RAG). This technique allows you to fetch relevant data from your sources (APIs, files, databases) to enhance an AI model input with the fetched, often private-to-the-business, data. &lt;/p&gt;

&lt;p&gt;Since your data sources can be vast, you need a way to find the relevant pieces, given that current AI models have a finite context window — you cannot simply add all of your data to a prompt. By creating chunks of data and running these chunks of data through a special AI model called &lt;em&gt;&lt;strong&gt;embedder&lt;/strong&gt;&lt;/em&gt;, you can generate vectors and use proximity search techniques to find relevant information to be appended to a prompt.&lt;/p&gt;

&lt;p&gt;For example, take the following input from a user in a recommendation chatbot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I need a good case for my iPhone 15 pro.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since your AI model was not trained with the exact data containing the product information in your online store, you need to retrieve the most relevant products and their information before sending the prompt to the model. &lt;/p&gt;

&lt;p&gt;For this, you send the original input from the user to an embedder and get a vector that you can later use to get the closest, say, 10 products to the user input. Once you get this information (and we'll see how to do this with MariaDB later), you can send the enhanced prompt to your AI model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I need a good case for my iPhone 15 pro. Which of the following products better suit my needs?

1. ProShield Ultra Case for iPhone 15 Pro - $29.99: A slim, shock-absorbing case with raised edges for screen protection and a sleek matte finish.

2. EcoGuard Bio-Friendly Case for iPhone 15 Pro - $24.99: Made from 100% recycled materials, offering moderate drop protection with an eco-conscious design.

3. ArmorFlex Max Case for iPhone 15 Pro - $39.99: Heavy-duty protection with military-grade durability, including a built-in kickstand for hands-free use.

4. CrystalClear Slim Case for iPhone 15 Pro - $19.99: Ultra-thin and transparent, showcasing the phone's design while providing basic scratch protection.

5. LeatherTouch Luxe Case for iPhone 15 Pro - $49.99: Premium genuine leather construction with a soft-touch feel and an integrated cardholder for convenience.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This results in AI predictions that use your own data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Tables for Vector Storage
&lt;/h2&gt;

&lt;p&gt;To store vectors in MariaDB, use the new &lt;code&gt;VECTOR&lt;/code&gt; data type. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the &lt;code&gt;embedding&lt;/code&gt; column can hold a vector of 2048 dimensions. You have to match the number of dimensions that your embedder generates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Vector Indexes
&lt;/h2&gt;

&lt;p&gt;For read performance, it's important to add an index to your vector column. This speeds up similarity searches. You can define the index at table creation time as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For greater control, you can specify the distance function that the database server will use to build the index, as well as the &lt;strong&gt;&lt;em&gt;M value&lt;/em&gt; of the &lt;em&gt;Hierarchical Navigable Small Worlds&lt;/em&gt;&lt;/strong&gt; (&lt;a href="https://dzone.com/articles/what-is-vector-search" rel="noopener noreferrer"&gt;HNSW&lt;/a&gt;) algorithm used by MariaDB. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="n"&gt;DISTANCE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cosine&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the &lt;a href="https://mariadb.com/kb/en/create-table-with-vectors/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for more information on these configurations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inserting Vectors
&lt;/h2&gt;

&lt;p&gt;When you pass data (text, image, audio) through an embedder, you get a vector. Typically, this is a series of numbers in an array in JSON format. To insert this vector in a MariaDB table, you can use the &lt;code&gt;VEC_FromText&lt;/code&gt; function. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Alarm clock"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VEC_FromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"[0.001, 0, ...]"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Cow figure"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VEC_FromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"[1.0, 0.05, ...]"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Bicycle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VEC_FromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"[0.2, 0.156, ...]"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember that the inserted vectors must have the correct number of dimensions as defined in the &lt;code&gt;CREATE TABLE&lt;/code&gt; statement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Similarity Search (Comparing Vectors)
&lt;/h2&gt;

&lt;p&gt;In RAG applications, you send the user input to an embedder to get a vector. You can then query the records in your database that are closer to that vector. Closer vectors represent data that are semantically similar. At the time of writing this, MariaDB has two distance functions that you can use for similarity or proximity search:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;VEC_DISTANCE_EUCLIDEAN&lt;/code&gt;: calculates the straight-line distance between two vectors. It is best suited for vectors derived from raw, unnormalized data or scenarios where spatial separation directly correlates with similarity, such as comparing positional or numeric features. However, it is less effective for high-dimensional or normalized embeddings since it is sensitive to differences in vector magnitude.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;VEC_DISTANCE_COSINE&lt;/code&gt;: measures the angular difference between vectors. Good for comparing normalized embeddings, especially in semantic applications like text or document retrieval. It excels at capturing similarity in meaning or context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep in mind that similarity search using the previous functions is only approximate and highly depends on the quality of the calculated vectors and, hence, on the quality of the embedder used.&lt;/p&gt;

&lt;p&gt;The following example, finds the top 10 most similar products to a given vector (&lt;code&gt;$user_input_vector&lt;/code&gt; should be replaced with the actual vector returned by the embedder over the user input):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VEC_DISTANCE_COSINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VEC_FromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;user_input_vector&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;VEC_DISTANCE_COSINE&lt;/code&gt; and &lt;code&gt;VEC_DISTANCE_EUCLIDEAN&lt;/code&gt; functions take two vectors. In the previous example, one of the vectors is the vector calculated over the user input, and the other is the corresponding vector for each record in the &lt;code&gt;products&lt;/code&gt; table.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Example
&lt;/h2&gt;

&lt;p&gt;I have prepared a practical example using Java and no AI frameworks so you truly understand the process of creating generative AI applications leveraging MariaDB's vector search capabilities. You can find the code on &lt;a href="https://github.com/alejandro-du/mariadb-rag-demo-java" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Supercharge Your App: MariaDB In-Memory Tables as a Cache</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Mon, 19 Aug 2024 10:49:02 +0000</pubDate>
      <link>https://forem.com/alejandro_du/supercharge-your-app-mariadb-in-memory-tables-as-a-cache-2lai</link>
      <guid>https://forem.com/alejandro_du/supercharge-your-app-mariadb-in-memory-tables-as-a-cache-2lai</guid>
      <description>&lt;p&gt;Redis is mainly used as an application cache or a quick-response database. But wait, you can always create a cache in a relational database as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="err"&gt;´&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="err"&gt;´&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_updated&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;
        &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
        &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Moreover, with &lt;a href="https://mariadb.com/kb/en/beginner-mariadb-articles/" rel="noopener noreferrer"&gt;MariaDB&lt;/a&gt;, you can pick one from the many available &lt;a href="https://mariadb.com/kb/en/choosing-the-right-storage-engine/" rel="noopener noreferrer"&gt;storage engines&lt;/a&gt;. For example, if you want to store the previous &lt;code&gt;cache&lt;/code&gt; table in memory, simply use the &lt;strong&gt;MEMORY storage engine&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you configure the &lt;code&gt;cache&lt;/code&gt; table to use the &lt;a href="https://mariadb.com/kb/en/memory-storage-engine/" rel="noopener noreferrer"&gt;MEMORY storage engine&lt;/a&gt;, its data will reside entirely in RAM. This is interestingly similar to how Redis operates, keeping data in memory for low-latency access. This looks great and it definitely has its benefits. However, there are a few nuances that are worth exploring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Use Case
&lt;/h2&gt;

&lt;p&gt;Let’s say you have a web application that needs to track session IDs. Using a MariaDB MEMORY table sounds like a good idea here—there’s potential for reducing the load on your &lt;a href="https://www.youtube.com/watch?v=QKUtvt3_y6Y" rel="noopener noreferrer"&gt;primary databases&lt;/a&gt; and improving response times for your users. Here's how you could implement such a cache using MariaDB’s MEMORY storage engine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users_cache&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;last_updated&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
        &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
        &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insertion into this table would typically occur whenever a user logs in. But generalizing a bit more, insertion in a cache could happen every time you read data stored on disk-based tables. We’ll use the later approach and assume that storing session IDs permanently is a business requirement just so that we can make more experiments. In any case, you can set up a background job to refresh this cache periodically or invalidate it when the underlying data changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cache Invalidation and Management
&lt;/h2&gt;

&lt;p&gt;There are several ways to handle cache invalidation with MariaDB. For example, you can set a limited row-based lifespan (through a column for expiration time) and use &lt;a href="https://mariadb.com/kb/en/create-event/" rel="noopener noreferrer"&gt;event schedulers&lt;/a&gt; to clear or update cached data at fixed intervals. Here’s an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;EVENT&lt;/span&gt; &lt;span class="n"&gt;ev_remove_stale_user_cache_entries&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;SCHEDULE&lt;/span&gt; &lt;span class="k"&gt;EVERY&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt;
        &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users_cache&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;last_updated&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For testing you can use different intervals. For example &lt;code&gt;EVERY 1 SECOND&lt;/code&gt; and &lt;code&gt;INTERVAL 20 SECOND&lt;/code&gt;. Also, remember to enable MariaDB’s &lt;a href="https://mariadb.com/kb/en/event-scheduler/" rel="noopener noreferrer"&gt;event scheduler&lt;/a&gt; by setting the &lt;code&gt;event_scheduler&lt;/code&gt; configuration property or, for testing, by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET GLOBAL event_scheduler = ON;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to try this out, you can find a &lt;a href="https://gist.github.com/alejandro-du/50d1e2425269be582304de13d606a961" rel="noopener noreferrer"&gt;complete example on GitHub&lt;/a&gt;. Or &lt;a href="https://www.youtube.com/watch?v=7wrVAJZgxRM" rel="noopener noreferrer"&gt;watch the short video demoing the example in action&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and cons
&lt;/h2&gt;

&lt;p&gt;Although using the MEMORY storage engine can speed up data retrieval times, as always, this depends on the exact use case—you should test this configuration with your applications before making decisions. In particular, it’s important to be aware that &lt;strong&gt;MEMORY performs table-wide locks&lt;/strong&gt;. This means that it might not be well-suited when you need to update the cache more frequently than you read it. Or in other words, using &lt;strong&gt;the MEMORY storage engine is a good option for data that needs to be accessed frequently and updated infrequently&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A key advantage of using the MEMORY engine is when your app needs to &lt;strong&gt;mix cache data with data in tables of a relational database&lt;/strong&gt;, for example, during a single HTTP request. Imagine an app that processes user information updates. Each update might involve writing to a cache and simultaneously updating a relational record. This would require two different accesses to two different databases. With MariaDB, you can handle this in a single database using SQL. This eliminates the overhead and complexity of managing separate data stores and coordinating between them. Here’s a simplified example of how such an operation could look:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"other data"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users_cache&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the user's &lt;code&gt;data&lt;/code&gt; columns in both the &lt;code&gt;users&lt;/code&gt; and the &lt;code&gt;user_cache&lt;/code&gt; tables are updated in a single call to the database. Keep in mind that &lt;strong&gt;the MEMORY storage engine is not &lt;a href="https://mariadb.com/kb/en/transactions/" rel="noopener noreferrer"&gt;transactional&lt;/a&gt;&lt;/strong&gt;, which is less important when you compare it with a cache that lives in a completely different database technology than your operational database anyway.&lt;/p&gt;

&lt;p&gt;An additional and obvious but important advantage of using the MEMORY engine is that you can &lt;strong&gt;remove persistence-polyglot logic from your app&lt;/strong&gt;. If your team is already familiar with &lt;a href="https://mariadb.com/kb/en/sql-statements/" rel="noopener noreferrer"&gt;SQL&lt;/a&gt;, MariaDB provides a seamless experience without the need to learn new syntax or juggle another technology stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does It Stack Up Against Redis?
&lt;/h2&gt;

&lt;p&gt;While &lt;strong&gt;Redis is a powerful tool for handling simple data structures&lt;/strong&gt; like strings, hashes, lists, sets, and sorted sets directly in-memory, &lt;strong&gt;MariaDB's MEMORY engine handles complex queries more naturally&lt;/strong&gt; because it supports the full power of SQL and relational database systems. This means you can perform joins, subqueries, and even complex transactions, which are not as straightforward in Redis.&lt;/p&gt;

&lt;p&gt;Now, there’s also the question of &lt;a href="https://mariadb.com/database-topics/scalability/" rel="noopener noreferrer"&gt;scalability&lt;/a&gt;. Especially horizontal scalability. This involves adding more nodes to a system to distribute load and increase capacity without interrupting service. Both Redis and MariaDB offer robust solutions, but their mechanisms are different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redis achieves horizontal scalability primarily through sharding&lt;/strong&gt;, where data is partitioned across multiple Redis instances. This can be configured manually or managed via Redis Cluster, which handles sharding and provides high availability through failover and replication processes. Redis Cluster supports up to 1000 nodes, which allows it to scale massively. This model is particularly effective for applications requiring ultra-fast operations and high throughput on simple data structures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MariaDB offers a somewhat similar approach to scalability&lt;/strong&gt;. And you probably guessed it—this is achieved through a storage engine. The &lt;a href="https://mariadb.com/kb/en/spider-storage-engine-overview/" rel="noopener noreferrer"&gt;Spider engine&lt;/a&gt; partitions table data across multiple MariaDB nodes, treating them as one logical entity. This enables querying and updating data across various physical servers seamlessly as if they were on a single local server. The Spider engine supports SQL and transactional data operations so you can run complex queries when you need. It’s useful for large database environments where data distribution is essential for performance and management to meet the demands of large-scale applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Durability and Persistence
&lt;/h2&gt;

&lt;p&gt;One difference to consider is that the MEMORY storage engine in MariaDB does not offer data persistence after server restarts. &lt;strong&gt;Data stored in MEMORY tables is volatile&lt;/strong&gt;; it's cleared when the database restarts, much like Redis in its default configuration. If persistence is crucial, you might consider using MariaDB's &lt;a href="https://mariadb.com/kb/en/aria-storage-engine/" rel="noopener noreferrer"&gt;Aria&lt;/a&gt; or &lt;a href="https://mariadb.com/kb/en/innodb/" rel="noopener noreferrer"&gt;InnoDB&lt;/a&gt; engines for caching. In fact, InnoDB has excellent performance, thanks to its &lt;a href="https://mariadb.com/kb/en/innodb-buffer-pool/" rel="noopener noreferrer"&gt;cache&lt;/a&gt; mechanism, which reduces the load on primary nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Switching from Redis to MariaDB for caching might not suit every project, but it's a viable option for those looking to streamline their technology stack or leverage their existing SQL expertise. It provides an easy way to implement caching solutions with tools you already know and reduces the overhead of managing additional systems. Plus, for those looking for a middle ground, MariaDB can also serve as a complementary caching layer alongside Redis, taking advantage of both systems' strengths. Moreover, you can &lt;a href="https://mariadb.com/kb/en/mariadb-maxscale-2402-maxscale-2402-cache/#storage_redis" rel="noopener noreferrer"&gt;leverage Redis as a cache&lt;/a&gt; for &lt;a href="https://www.youtube.com/watch?v=mPA5EU5EjeU" rel="noopener noreferrer"&gt;MariaDB MaxScale&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Try MariaDB and set up your own in-memory cache with the MEMORY storage engine. Experience how it fits with your existing SQL knowledge. I have created a simple &lt;a href="https://gist.github.com/alejandro-du/50d1e2425269be582304de13d606a961" rel="noopener noreferrer"&gt;plain-text file with detailed instructions and code&lt;/a&gt; that you can run to see a cache in action using only SQL! So all you need is to connect to your MariaDB server (spin one up quickly using Docker if you don’t have one running already) and run the commands in any SQL client compatible with MariaDB (most of them are). Here you can see the demo in action:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=7wrVAJZgxRM" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.youtube.com%2Fvi%2F7wrVAJZgxRM%2F0.jpg" alt="Supercharge Your App: MariaDB In-Memory Tables for Caching" width="480" height="360"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have questions or want to share your experience, don’t hesitate to join the &lt;a href="https://r.mariadb.com/join-community-slack" rel="noopener noreferrer"&gt;MariaDB Community Slack&lt;/a&gt; and let us know!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Packages for Store Routines in MariaDB 11.4</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Fri, 28 Jun 2024 17:37:14 +0000</pubDate>
      <link>https://forem.com/alejandro_du/packages-for-store-routines-in-mariadb-114-2h5o</link>
      <guid>https://forem.com/alejandro_du/packages-for-store-routines-in-mariadb-114-2h5o</guid>
      <description>&lt;p&gt;&lt;a href="https://mariadb.com/downloads/" rel="noopener noreferrer"&gt;MariaDB 11.4&lt;/a&gt; introduced many &lt;a href="https://go.mariadb.com/24Q3-GLBL-WBN-Enterpriseserver11.4_Registration-LP.html" rel="noopener noreferrer"&gt;advanced features&lt;/a&gt;. One that grabbed my attention is the general support of packages for stored routines. Although this was previously available by activating the &lt;a href="https://mariadb.com/kb/en/sql_modeoracle/" rel="noopener noreferrer"&gt;Oracle compatibility mode&lt;/a&gt;, now the feature is available generally out-of-the-box. This will help you to significantly enhance the organization of database development within a MariaDB environment. Packages provide a modular approach to managing database logic. This addition aligns MariaDB more closely with other advanced database systems that have long utilized packages, such as Oracle, and sets it apart in that regard from other open-source relational databases that don't support packages.&lt;/p&gt;

&lt;p&gt;Packages in MariaDB allow you to group related stored procedures, functions, variables, and other elements together into a single unit. This structure provides several benefits, including improved code organization, enhanced reusability, and simplified maintenance. Prior to this update, each stored procedure and function in MariaDB existed independently, which could lead to a cluttered schema and more complicated management of complex business logic when &lt;a href="https://www.red-gate.com/simple-talk/blogs/locating-business-logic-in-applications/" rel="noopener noreferrer"&gt;implemented in the database&lt;/a&gt;. Packages address this issue by providing a way to logically group related routines.&lt;/p&gt;

&lt;p&gt;Again, the primary advantage of using packages is the encapsulation of related routines. For example, in an e-commerce application, operations related to order processing can be grouped into a single package called &lt;code&gt;OrderProcessing&lt;/code&gt;. This package might include procedures like &lt;code&gt;PlaceOrder&lt;/code&gt;, &lt;code&gt;CancelOrder&lt;/code&gt;, and &lt;code&gt;UpdateOrderStatus&lt;/code&gt;, as well as functions such as &lt;code&gt;GetOrderDetails&lt;/code&gt;. This logical grouping makes the database schema more organized and the codebase easier to navigate and hence to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating packages
&lt;/h2&gt;

&lt;p&gt;Creating and using packages in MariaDB 11.4 and later is straightforward. Simply use the &lt;code&gt;CREATE PACKAGE&lt;/code&gt; statement to define a package and the &lt;code&gt;CREATE PACKAGE BODY&lt;/code&gt; statement to implement the package’s routines. Here is a simplified example without the actual business logic implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;PACKAGE&lt;/span&gt; &lt;span class="n"&gt;OrderProcessing&lt;/span&gt;
  &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;PlaceOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;CancelOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;GetOrderDetails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;PACKAGE&lt;/span&gt; &lt;span class="n"&gt;BODY&lt;/span&gt; &lt;span class="n"&gt;OrderProcessing&lt;/span&gt;
  &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;PlaceOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="c1"&gt;-- Implementation code here&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;CancelOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="c1"&gt;-- Implementation code here&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;GetOrderDetails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;
  &lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="c1"&gt;-- Implementation code here&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the &lt;code&gt;OrderProcessing&lt;/code&gt; package is defined with two procedures and one function. The package body provides the implementation for these routines, encapsulating the logic related to order processing within a single package.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calling packaged stored routines
&lt;/h2&gt;

&lt;p&gt;To call a procedure or function that was defined in a package, you use the "dot notation". Here's an example of how to call the &lt;code&gt;CancelOrder&lt;/code&gt; procedure in the &lt;code&gt;OrderProcessing&lt;/code&gt; package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;OrderProcessing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CancelOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same applies to functions in packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;OrderProcessing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetOrderDetails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also define package-level variables and constants, which are accessible to all routines within the package. This way you can share common data without relying on global variables or passing around parameters between routines. By centrally managing shared data within the package, you reduce code duplication and minimize the risk of errors. The following is an example (don't take the variable names too seriously in this snippet of code):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;PACKAGE&lt;/span&gt; &lt;span class="n"&gt;OrderProcessing&lt;/span&gt;
  &lt;span class="c1"&gt;-- procedure list (see previous example)&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;PACKAGE&lt;/span&gt; &lt;span class="n"&gt;BODY&lt;/span&gt; &lt;span class="n"&gt;OrderProcessing&lt;/span&gt;
  &lt;span class="c1"&gt;-- variable declarations&lt;/span&gt;
  &lt;span class="k"&gt;DECLARE&lt;/span&gt; &lt;span class="n"&gt;some_count&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;DECLARE&lt;/span&gt; &lt;span class="n"&gt;some_total&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;-- procedure definitions (see previous example)&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DELIMITER&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Other benefits of packages
&lt;/h2&gt;

&lt;p&gt;The introduction of packages in MariaDB 11.4 also brings the possibility of better version control and modularization of code. You can now manage stored routines more effectively, making it easier to track changes and updates. This modularization is particularly beneficial in large projects, where multiple developers might be working on different parts of the database logic simultaneously. By isolating different functionalities into packages, conflicts and overlaps can be minimized, leading to a smoother development process.&lt;/p&gt;

&lt;p&gt;Moreover, packages support forward declarations, which means that the routines can be defined before their actual implementation. This feature allows for a more flexible and structured approach to coding, where developers can outline the package interface first and then fill in the details. This separation of interface and implementation can lead to cleaner, more understandable code, facilitating collaboration and reducing the learning curve for new developers joining a project.&lt;/p&gt;

&lt;p&gt;For developers accustomed to working with Oracle databases, the inclusion of packages in MariaDB 11.4 will feel familiar and welcome. It bridges a functional gap between MariaDB and Oracle, making it easier to transition between these platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Packages vs multiple schemas
&lt;/h2&gt;

&lt;p&gt;It's important to note the distinction between packages and merely using multiple database schemas. While multiple schemas can help segregate different parts of a database, they do not offer the same level of organization and encapsulation as packages. Schemas are useful for dividing distinct areas of data and logic, but they do not inherently group related procedures and functions together in a way that enhances modularity and maintainability. Packages, on the other hand, allow for a more granular and cohesive approach, grouping related logic together within the same schema. This not only simplifies the management of routines but also improves the clarity and maintainability of the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out
&lt;/h2&gt;

&lt;p&gt;As always, &lt;a href="https://go.mariadb.com/24Q1-WBN-GLBL-OSSG-MariaDB-Evolution-2023-10-31_Registration-LP.html" rel="noopener noreferrer"&gt;MariaDB continues to evolve&lt;/a&gt;. Try it out today by &lt;a href="https://mariadb.com/downloads" rel="noopener noreferrer"&gt;downloading&lt;/a&gt; the latest version of MariaDB, or if you have Docker running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--name&lt;/span&gt; mariadb-11.4 &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MARIADB_ROOT_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-secret-pw &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MARIADB_DATABASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MARIADB_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myuser &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MARIADB_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypassword &lt;span class="nt"&gt;-d&lt;/span&gt; mariadb:11.4

docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; mariadb-11.4 mariadb &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s2"&gt;"mypassword"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>Using Temporary Tables in MariaDB</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Thu, 02 May 2024 21:13:18 +0000</pubDate>
      <link>https://forem.com/alejandro_du/using-temporary-tables-in-mariadb-1nb</link>
      <guid>https://forem.com/alejandro_du/using-temporary-tables-in-mariadb-1nb</guid>
      <description>&lt;p&gt;Let's explore how &lt;a href="https://mariadb.com/kb/en/create-table/#create-temporary-table" rel="noopener noreferrer"&gt;temporary tables&lt;/a&gt; work in MariaDB. First, we have to connect to the server. For example (use your own connection details):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mariadb &lt;span class="nt"&gt;-h&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s2"&gt;"RootPassword!"&lt;/span&gt; &lt;span class="nt"&gt;--database&lt;/span&gt; demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, just to point something out, let's create a standard (permanent) table. Here's how:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This table, &lt;code&gt;t&lt;/code&gt;, will persist in the database even after we exit the client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we reconnect and check the existing tables using &lt;code&gt;SHOW TABLES;&lt;/code&gt;, the table &lt;code&gt;t&lt;/code&gt; will still be listed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mariadb &lt;span class="nt"&gt;-h&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s2"&gt;"RootPassword!"&lt;/span&gt; &lt;span class="nt"&gt;--database&lt;/span&gt; demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------------+
| Tables_in_demo |
+----------------+
| t              |
+----------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All this is pretty obvious, but now, let's recreate this table and try something different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;TEMPORARY&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;TEMPORARY&lt;/code&gt; keyword. After creating this table, if we run &lt;code&gt;SHOW TABLES;&lt;/code&gt;, it appears in the list. We can insert data into it, query it, join it with other tables. It behaves like a normal table during the current session. However, if we exit the client, then reconnect, and perform a &lt;code&gt;SHOW TABLES;&lt;/code&gt; again, the temporary table &lt;code&gt;t&lt;/code&gt; will not be listed. A temporary table only exists for the duration of the session in which it was created and other sessions won't be able to see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case for Temporary Tables
&lt;/h2&gt;

&lt;p&gt;Temporary tables are quite useful for transient data operations. For instance, consider a table called &lt;code&gt;products&lt;/code&gt; in our database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can create a temporary table that mimics the structure of &lt;code&gt;products&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TEMPORARY&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can confirm this by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DESCRIBE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| code        | varchar(100) | NO   | UNI | NULL    |                |
| name        | varchar(250) | NO   |     | NULL    |                |
| description | text         | YES  |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initially, &lt;code&gt;t&lt;/code&gt; will be empty. However, suppose we want to transfer some data from &lt;code&gt;products&lt;/code&gt; to &lt;code&gt;t&lt;/code&gt;. Let’s assume we only want to include products that contain the number 0 in their code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%0%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running this command, if we query the temporary table &lt;code&gt;t&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----+--------+------------------+---------------------------------------------------+
| id | code   | name             | description                                       |
+----+--------+------------------+---------------------------------------------------+
|  1 | BG2024 | BugBlaster       | Eradicates software bugs with a single scan.      |
|  3 | FW001  | FireWhale        | An oversized, comprehensive firewall solution.    |
|  4 | CLD404 | CloudNine Finder | Find your way back from cloud outages and errors. |
+----+--------+------------------+---------------------------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We see the filtered data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Temporary tables offer a powerful way to handle data for temporary processing without affecting the persistent data store. They are particularly useful in scenarios where data needs to be manipulated or transformed temporarily. You can use permanent tables for this kind of data manipulation but temporary tables are useful when you need automatic cleanup, reduced risk of naming conflicts, isolation and security, and resource management for query performance.&lt;/p&gt;

</description>
      <category>mariadb</category>
      <category>sql</category>
      <category>database</category>
      <category>tips</category>
    </item>
    <item>
      <title>Why do We Need Databases and SQL?</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Thu, 07 Mar 2024 16:34:01 +0000</pubDate>
      <link>https://forem.com/alejandro_du/why-do-we-need-databases-and-sql-1loa</link>
      <guid>https://forem.com/alejandro_du/why-do-we-need-databases-and-sql-1loa</guid>
      <description>&lt;p&gt;SQL has a long and proven history. It has survived the fuss around NoSQL. And even if not perfect, it has demonstrated to be the best available language for data. This is no surprise! The story began in the 1960s with the development of databases—an era marked by the introduction of Integrated Data Store (IDS) at General Electric. However, it was Edgar Codd's relational model that revolutionized data handling. His model, which turned data into a series of tables (or more strictly, relations), has influenced database systems ever since. This era also saw the birth of SQL (Structured Query Language), which became the standard language for interacting with relational databases, including MariaDB and other.&lt;/p&gt;

&lt;h2&gt;
  
  
  The utility of relational database systems
&lt;/h2&gt;

&lt;p&gt;So, why do we need all this database stuff? Let's imagine you're building an app, maybe a simple to-do list to keep track of your daily tasks. Initially, you might think, "Why not just save each task directly to a file?" After all, my programming language has constructs and libraries to save and read data from disk. Also, implementing this seems straightforward: create a task, write it to a file; delete a task, remove it from the file. These are good points, however as your app gains traction, users start to aggregate, and suddenly, you have thousands of users trying to add, delete, and modify tasks simultaneously. At this point, the simplicity of files becomes fragile. Imagine one user is updating a task at the exact moment another tries to delete it. Or maybe two users are editing the same task at the same time. With a simple file system, you're likely to end up with corrupted or lost data because there's no inherent mechanism to handle such conflicts.&lt;/p&gt;

&lt;p&gt;Databases handle these situations gracefully through the &lt;em&gt;ACID properties&lt;/em&gt;. Essentially, a set of principles ensures that even if your app crashes midway through an update, the data remains consistent and no half-completed tasks are left hanging. Back to the to-do app example, imagine trying to move your task "Buy groceries" from &lt;code&gt;pending&lt;/code&gt; to &lt;code&gt;completed&lt;/code&gt; which requires also changing the &lt;code&gt;last_updated&lt;/code&gt; property, but your app crashes right in the middle. With a relational database, it's all or nothing—either the task is marked complete and the &lt;code&gt;last_updated&lt;/code&gt; property reflects the new time value, or it's like you never tried to update it in the first place, avoiding those incorrect half-states.&lt;/p&gt;

&lt;p&gt;Now, let's consider data relationships. In your app, tasks might belong to different categories or users. In a file system, maintaining these relationships is cumbersome. You might end up with a separate file for each category or user, but then how do you quickly find all tasks across categories or ensure two users don't end up with the same task ID? Databases have the ability to manage complex relationships, making it easy to query all tasks for a specific user or category, or even more complex queries like "show me the number of completed tasks for user U grouped by category C during the last month."&lt;/p&gt;

&lt;p&gt;Security is another biggie. In a file system, if someone gains access to your files, they have your data. Databases offer robust security features, like access controls and encryption, safeguarding your data from unauthorized eyes.&lt;/p&gt;

&lt;p&gt;And then there's the issue of growth. Your simple to-do app might evolve into a complex enterprise project management tool over time. With a file system, every change can feel like renovating a building with people still inside. Databases are built to be flexible and scalable, meaning they're designed to grow with your needs, whether you're adding new features or handling more users.&lt;/p&gt;

&lt;p&gt;In the end, choosing a database over a simple file system is about preparing for success while standing on solid ground. It's about ensuring that as your app grows, your data remains secure, consistent, and manageable, and your users happy. After all, no one likes losing their to-do list to a random crash or waiting forever for their tasks to load because the system is bogged down handling conflicts and searches!&lt;/p&gt;

&lt;h2&gt;
  
  
  A bit of history
&lt;/h2&gt;

&lt;p&gt;It was Edgar Codd who proposed the Relational Model for databases and, since he was a mathematician, formalized the concepts creating what is called Relational Algebra and Relational Calculus. All this was theoretical, until IBM and others started to implement the concepts in academic and research projects. They also wanted to come up with a standard language for querying data in relational databases. At first they invented QUEL (Querying Using the English Language) at the University of California, Berkeley. At IBM, researchers wanted to come up with their own language and started a project which I perceive more as a game between colleagues called SQUARE (Specifying Queries Using a Relational Environment). This led to a query language that had a scientific-like notation with subindexes and super-indexes which was hard to type on computer keyboards. To solve this, they redefined the language to only use standard characters and in an ingenious and probably friendly mockery way called it SEQUEL. This name however, was a trademark in the UK which prevented them from using it. They removed the vowels in SEQUEL and boom! SQL was born. By 1986, SQL would become an ISO and ANSI standard.&lt;/p&gt;

&lt;p&gt;As a curious historical remark, although their inventors had to rename SEQUEL to SQL, they continued to call it "sequel". Even today many software developers and IT professionals continue to pronounce it "sequel". The name Structured Query Language (SQL) would appear later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The utility of SQL
&lt;/h2&gt;

&lt;p&gt;SQL is a declarative language, meaning that you specify what you want to get and not how to get it. The database is in charge of doing whatever needs to be done to get the data requested. SQL isolates database complexity. A database is a complex piece of software with tons of algorithms implemented in it. This algorithms deal with different ways to get data stored in disk or memory. Different algorithms are more efficient in different circumstances which includes different queries and different datasets.&lt;/p&gt;

&lt;p&gt;For example, in MariaDB, a component called the &lt;a href="https://mariadb.com/kb/en/query-optimizations/" rel="noopener noreferrer"&gt;query optimizer&lt;/a&gt; is in charge of deciding what algorithms to use given a SQL query and stats gathered on the actual data. The query optimizer analyzes the SQL query, the data structures, the database schema, and the statistical distribution of the data. It then decides whether to use an index, which joining algorithm is the best, and how to sequence the operations. This process involves a remarkable amount of complexity and mathematical precision, all of which the database abstractly manages for you. As a developer you only need to worry about constructing the query to get the data you need and let the database figure out whether to use or not an index (with some datasets, not using an index could be faster), B-trees, hash tables, and even whether to add the data to an in-memory cache, as well as many other things.&lt;/p&gt;

&lt;p&gt;SQL also allows you to handle writes, that is, creating and updating data. It also allows you to define the schema of the database, or in short and over-simplifying, the tables and their column structure. In fact there's much more that SQL allows you to do and its functionality can be divided in four categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data definition language (DDL):&lt;/strong&gt; Creating and manipulating the schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data manipulation language (DML):&lt;/strong&gt; Inserting, updating, and deleting data from the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data query language (DQL):&lt;/strong&gt; Retrieving data from the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data control language (DCL):&lt;/strong&gt; Dealing with rights and permissions over the database and its objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my more than 15 years of experience in the industry, I have rarely seen the previous categories used in a work environment, with the exception of DDL to refer to activities related to handling database schema updates. These categories are useful mostly in academic circles or in teams implementing relational database management software. However, it's good to know that these terms exist and are used by others as it helps in discussions around database technology. With this in mind, let me briefly touch on one of such discussions.&lt;/p&gt;

&lt;p&gt;Some would say that developers have to deal only with DML and DQL while DDL and DCL are a concern of DBAs. In practice, this division is not so easy to make. Developers need to understand how database objects (like tables and columns) are created and how access to this objects is managed. However, it is true that developers spend most of their time writing SQL statements to modify and query data. You'll see that this book focuses on DML and DQL while explaining other categories as they are needed. On the other hand, DBA's are experts on everything database—from infrastructure and general database management to SQL query optimization and migration, a DBA is always a valuable brain to have in your team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So in conclusion, databases solve real problems that application developers face, thanks to their ability to ensure data integrity through ACID properties, manage complex relationships, and provide robust security features. I only scratched the surface here, but this should be enough to give the novice IT practitioners a quick refresh on the importance of relational databases and SQL.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Fast Analytics with MariaDB ColumnStore</title>
      <dc:creator>Alejandro Duarte</dc:creator>
      <pubDate>Fri, 19 Jan 2024 13:33:36 +0000</pubDate>
      <link>https://forem.com/alejandro_du/fast-analytics-with-mariadb-columnstore-1jfp</link>
      <guid>https://forem.com/alejandro_du/fast-analytics-with-mariadb-columnstore-1jfp</guid>
      <description>&lt;p&gt;Slow query times in large datasets are a common headache in database management. &lt;a href="https://mariadb.com/products/columnstore/" rel="noopener noreferrer"&gt;MariaDB ColumnStore&lt;/a&gt; offers a neat way out of this. It's a columnar storage engine that significantly speeds up data analytics. Typically, you can improve query performance in relational databases by adding appropriate indexes. However, maintaining indexes is hard, especially with ad-hoc queries where you don't really know where indexes are going to be needed. ColumnStore eases this pain. It's as if you had an index on each column, but without the hassle of creating and updating them. The price to pay? Well, inserts are not as fast as with &lt;a href="https://mariadb.com/kb/en/innodb/" rel="noopener noreferrer"&gt;InnoDB&lt;/a&gt;, so this is not the best option for operational/transactional databases but rather for analytical ones. &lt;a href="https://mariadb.com/docs/server/ref/col/cli/cpimport/" rel="noopener noreferrer"&gt;Bulk inserts&lt;/a&gt; are very fast though.&lt;/p&gt;

&lt;p&gt;There's plenty of &lt;a href="https://mariadb.com/docs/columnstore/" rel="noopener noreferrer"&gt;online documentation&lt;/a&gt; about ColumnStore, so I won't go through all the details on how it works or how to deploy it on production. Instead, in this article, I'll show you how to try MariaDB ColumnStore on your computer using &lt;a href="https://hub.docker.com/r/mariadb/columnstore" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://mariadb.com/docs/server/connect/clients/mariadb-client/" rel="noopener noreferrer"&gt;&lt;code&gt;mariadb&lt;/code&gt; command line tool&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.docker.com/products/docker-desktop/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting up MariaDB ColumnStore
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Run a container with MariaDB + ColumnStore:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3307:3306 &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;PM1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mcs1 &lt;span class="nt"&gt;--hostname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mcs1 &lt;span class="nt"&gt;--name&lt;/span&gt; mcs1 mariadb/columnstore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command runs a new Docker container using the official ColumnStore image, with several specified options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker run&lt;/code&gt;: Starts a new Docker container.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-d&lt;/code&gt;: Runs the container in detached mode (in the background).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-p 3307:3306&lt;/code&gt;: Maps port 3307 on the host (your computer) to port 3306 inside the container. This makes the database accessible on port &lt;code&gt;3307&lt;/code&gt; on the host machine.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-e PM1=mcs1&lt;/code&gt;: The &lt;code&gt;PM1&lt;/code&gt; environment variable &lt;code&gt;PM1&lt;/code&gt; specifies the primary database node (&lt;code&gt;mcs1&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--hostname=mcs1&lt;/code&gt;: Sets the hostname of the container to &lt;code&gt;mcs1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--name mcs1&lt;/code&gt;: Names the container &lt;code&gt;mcs1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mariadb/columnstore&lt;/code&gt;: Specifies the Docker image to use, in this case, an image for MariaDB with the ColumnStore storage engine.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Provision ColumnStore:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; mcs1 provision mcs1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command &lt;code&gt;docker exec&lt;/code&gt; is used to interact with a running Docker container. This is what each option does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker exec&lt;/code&gt;: Executes a command in a running container.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-it&lt;/code&gt;: This option ensures the command is run in interactive mode with a terminal.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcs1&lt;/code&gt; (first occurrence): This is the name of the Docker container in which the command is to be executed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;provision mcs1&lt;/code&gt; This is the specific command being executed inside the container. &lt;code&gt;provision&lt;/code&gt; is a script included in the Docker image that initialize and configure the MariaDB ColumnStore environment within the container. The argument &lt;code&gt;mcs1&lt;/code&gt; is passed to the &lt;code&gt;provision&lt;/code&gt; command to specify the host for the MariaDB server within the Docker container.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Connect to the MariaDB server using the default credentials defined in the MariaDB ColumnStore Docker image:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mariadb &lt;span class="nt"&gt;-h&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;-P&lt;/span&gt; 3307 &lt;span class="nt"&gt;-u&lt;/span&gt; admin &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s1"&gt;'C0lumnStore!'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Check that ColumnStore is available as a storage engine by running the following SQL sentence:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;SHOW ENGINES&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting up a demo database
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create the &lt;code&gt;operations&lt;/code&gt; database and its InnoDB tables:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;InnoDB&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="n"&gt;RLIKE&lt;/span&gt; &lt;span class="s1"&gt;'[0-9]+'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;254&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="nb"&gt;ENUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Consultation'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Follow-up'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Preventive'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Chronic'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;ENUM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Scheduled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Canceled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Completed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'No Show'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;doctor_id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="nb"&gt;UNSIGNED&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;fk_appointments_doctors&lt;/span&gt; &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doctor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;InnoDB&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create the &lt;code&gt;analytics&lt;/code&gt; database and its ColumnStore table:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="nb"&gt;UNSIGNED&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;254&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;doctor_id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="nb"&gt;UNSIGNED&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ColumnStore&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use the same database (or schema, they are &lt;a href="https://mariadb.com/kb/en/create-database/" rel="noopener noreferrer"&gt;synonyms&lt;/a&gt; in MariaDB) for both the InnoDB and ColumnStore tables if you prefer. Use a different name for the ColumnStore table if you opt for this alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inserting demo data
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Insert a few doctors:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Maria"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Jane"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create a new file with the name &lt;code&gt;test_data_insert.py&lt;/code&gt; with the following content:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="c1"&gt;# Function to generate a random date within a given range
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;random_date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="c1"&gt;# Function to execute a given SQL command using MariaDB
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Write the SQL command to a temporary file
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp.sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Execute the SQL command using the MariaDB client
&lt;/span&gt;    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mariadb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-h&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-P&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3307&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-u&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-pC0lumnStore!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-e&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source temp.sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="c1"&gt;# Remove the temporary file
&lt;/span&gt;    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp.sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generating and inserting data...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Total number of rows to be inserted
&lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4000000&lt;/span&gt;
&lt;span class="c1"&gt;# Number of rows to insert in each batch
&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;

&lt;span class="c1"&gt;# Possible values for the 'reason' column and their associated weights for random selection
&lt;/span&gt;&lt;span class="n"&gt;reasons&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Consultation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Follow-up&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Preventive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chronic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;reason_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Possible values for the 'status' column and their associated weights for random selection
&lt;/span&gt;&lt;span class="n"&gt;statuses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scheduled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Canceled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No Show&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;status_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Possible values for the 'doctor_id' column and their associated weights for random selection
&lt;/span&gt;&lt;span class="n"&gt;doctors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;doctors_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# List of patient names
&lt;/span&gt;&lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Patient_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;# Insert data in batches
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch_start&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;batch_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate data for each row in the batch
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;999&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;999&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9999&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patient_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;random_date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reasons&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason_weights&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statuses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status_weights&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;doctor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doctors_weights&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Append the generated row to the batch
&lt;/span&gt;        &lt;span class="n"&gt;batch_values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doctor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# SQL command to insert the batch of data into the 'appointments' table
&lt;/span&gt;    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE operations;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO appointments (name, phone_number, email, time, reason, status, doctor_id) VALUES &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Execute the SQL command
&lt;/span&gt;    &lt;span class="nf"&gt;execute_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Print progress
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Inserted up to row &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data insertion complete.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Insert 4 million appointments by running the Python script:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 test_data_insert.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Populate the ColumnStore table by connecting to the database and running:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;appointments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;doctor_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;doctor_id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;appointments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run cross-engine SQL queries
&lt;/h2&gt;

&lt;p&gt;MariaDB ColumnStore is designed to run in a cluster of multiple servers. It is there where you see massive performance gains in analytical queries. However, we can also see this in action with the single-node setup of this article.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the following query and pay attention to the time it needs to complete (make sure it queries the &lt;code&gt;operations&lt;/code&gt; database):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;appointments&lt;/span&gt; &lt;span class="c1"&gt;-- use the InnoDB table&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;doctor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'Scheduled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Canceled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Completed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'No Show'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On my machine, it took around 3 seconds.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Now modify the query to use the ColumnStore table instead (in the &lt;code&gt;analytics&lt;/code&gt; database):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;appointments&lt;/span&gt; &lt;span class="c1"&gt;-- use the ColumnStore table&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;doctor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'Scheduled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Canceled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Completed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'No Show'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;doctors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It takes less than a second. Of course, you can speed up the first query by adding an index in this simplistic example, but imagine the situation in which you have hundreds of tables—it will become harder and harder to manage indexes. ColumnStore removes this complexity.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
