<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Hamed Karbasi</title>
    <description>The latest articles on Forem by Hamed Karbasi (@hoptical).</description>
    <link>https://forem.com/hoptical</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1050129%2Fecf5ea92-076e-46bd-a7ba-b42ba181c3c8.jpg</url>
      <title>Forem: Hamed Karbasi</title>
      <link>https://forem.com/hoptical</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/hoptical"/>
    <language>en</language>
    <item>
      <title>ClickHouse Advanced Tutorial: Apply CDC from MySQL to ClickHouse</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Thu, 15 Jun 2023 20:48:30 +0000</pubDate>
      <link>https://forem.com/hoptical/clickhouse-advanced-tutorial-apply-cdc-from-mysql-to-clickhouse-44na</link>
      <guid>https://forem.com/hoptical/clickhouse-advanced-tutorial-apply-cdc-from-mysql-to-clickhouse-44na</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Suppose that you have a database handling &lt;a href="https://en.wikipedia.org/wiki/Online_transaction_processing" rel="noopener noreferrer"&gt;OLTP&lt;/a&gt; queries. To tackle intensive analytical BI reports, you set up an &lt;a href="https://en.wikipedia.org/wiki/Online_analytical_processing" rel="noopener noreferrer"&gt;OLAP&lt;/a&gt;-friendly database such as ClickHouse. How do you synchronize your follower database (which is ClickHouse here)? What challenges should you be prepared for?&lt;/p&gt;

&lt;p&gt;Synchronizing two or more databases in a data-intensive application is one of the usual routines you may have encountered before or are dealing with now. Thanks to Change Data Capture (CDC) and technologies such as Kafka, this process is not sophisticated anymore. However, depending on the databases you’re utilizing, it could be challenging if the source database works in the OLTP paradigm and the target in the OLAP. In this article, I will walk through this process from MySQL as the source to ClickHouse as the destination. Although I’ve limited this article to those technologies, it’s pretty generalizable to similar cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  System Design Overview
&lt;/h2&gt;

&lt;p&gt;Contrary to what it sounds, it’s quite straightforward. The database changes are captured via &lt;a href="https://debezium.io/" rel="noopener noreferrer"&gt;Debezium&lt;/a&gt; and published as events on Apache Kafka. ClickHouse consumes those changes in partial order by &lt;a href="https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka/" rel="noopener noreferrer"&gt;Kafka Engine&lt;/a&gt;. Real-time and eventually consistent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2950%2F1%2AUBnMzAphYgnyM79_gcZv7g.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2950%2F1%2AUBnMzAphYgnyM79_gcZv7g.jpeg"&gt;&lt;/a&gt;&lt;br&gt;CDC Architecture
  &lt;/p&gt;

&lt;h2&gt;
  
  
  Case Study
&lt;/h2&gt;

&lt;p&gt;Imagine that we have an &lt;em&gt;orders&lt;/em&gt; table in Mysql with the following DDL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`orders`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`id`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;InnoDB&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;CHARSET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;latin1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Users may create, delete and update any column or the whole record. We want to capture its changes and sink them to ClickHouse to synchronize them.&lt;/p&gt;

&lt;p&gt;We’re going to use Debezium &lt;em&gt;v2.1&lt;/em&gt; and the &lt;em&gt;ReplacingMergeTree&lt;/em&gt; engine in ClickHouse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: CDC with Debezium
&lt;/h3&gt;

&lt;p&gt;Most databases have a log that every operation is written there before applying on data (Write Ahead Log or WAL). In Mysql, this file is called Binlog. If you read that file, parse and apply it to your target database, you’re following the Change Data Capture (CDC) manifest.&lt;/p&gt;

&lt;p&gt;CDC is one of the best ways to synchronize two or multiple heterogeneous databases. It’s real-time, eventually consistent, and prevents you from the other methods imposing more costs like batch-backfills with Airflow. No matter what happens on the source, you can capture it in order and be consistent with the original (eventually, of course!)&lt;/p&gt;

&lt;p&gt;Debezium is a well-known tool for reading and parsing the Binlog. It simply integrates with Kafka Connect as a connector and produces every change on a Kafka topic.&lt;/p&gt;

&lt;p&gt;To do so, you’ve to enable log-bin on the MySQL database and set up Kafka Connect, Kafka, and Debezium accordingly. Since it is well-explained in other articles like &lt;a href="https://rmoff.net/2018/03/24/streaming-data-from-mysql-into-kafka-with-kafka-connect-and-debezium/" rel="noopener noreferrer"&gt;this&lt;/a&gt; or &lt;a href="https://medium.com/nagoya-foundation/simple-cdc-with-debezium-kafka-a27b28d8c3b8" rel="noopener noreferrer"&gt;this&lt;/a&gt;, I’ll only focus on the Debezium configuration customized for our purpose: Capture the changes while being functional and parsable by ClickHouse.&lt;/p&gt;

&lt;p&gt;Before showing the whole configuration, we should discuss three necessary configs:&lt;/p&gt;

&lt;h3&gt;
  
  
  Extracting New Record State
&lt;/h3&gt;

&lt;p&gt;Debezium emits every record concluding of &lt;em&gt;before&lt;/em&gt; and &lt;em&gt;after&lt;/em&gt; states for every operation by default which is hard to parse by ClickHouse Kafka Table. Additionally, it creates tombstone records (i.e., a record with a Null value) in case of a delete operation (Again, unparsable by Clickhouse). The entire behavior has been demonstrated in the table below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A1lEoL6n7uxLRy-lrMre1hQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A1lEoL6n7uxLRy-lrMre1hQ.png"&gt;&lt;/a&gt;&lt;br&gt;Records state for different operations in the default configuration.
  &lt;/p&gt;

&lt;p&gt;We use the ExtractNewRecod transformer in the Debezium configuration to handle the problem. Thanks to this option, Debezium only keeps the &lt;em&gt;after&lt;/em&gt; state for the &lt;strong&gt;create/update&lt;/strong&gt; operations and disregards the before state. But as a drawback, It drops the &lt;strong&gt;Delete&lt;/strong&gt; record containing the previous state and the tombstone record mentioned earlier. In other words, you won’t capture the delete operation anymore. Don’t worry! We’ll tackle it in the next section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"transforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unwrap"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"transforms.unwrap.type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.debezium.transforms.ExtractNewRecordState"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The picture below shows how the state &lt;em&gt;before&lt;/em&gt; is dropped and &lt;em&gt;after&lt;/em&gt; is flattened by using the &lt;em&gt;ExtractNewRecord&lt;/em&gt; configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A2Rtrofg166dxl1xOdqwkzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2A2Rtrofg166dxl1xOdqwkzw.png"&gt;&lt;/a&gt;&lt;br&gt;Left: Record without ExtractNewRecord config; Right: Record with ExtractNewRecord config
  &lt;/p&gt;

&lt;h3&gt;
  
  
  Rewriting Delete Events
&lt;/h3&gt;

&lt;p&gt;To capture delete operations, we must add the &lt;strong&gt;rewrite&lt;/strong&gt; config as below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"transforms.unwrap.delete.handling.mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"rewrite"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debezium adds field &lt;code&gt;__deleted&lt;/code&gt; with this config, which is true for the delete operation and false for the others. Hence, a deletion would contain the previous state as well as a &lt;code&gt;__deleted: true&lt;/code&gt; field.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2ABU7--CdPTPySFR2DzjF72A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2000%2F1%2ABU7--CdPTPySFR2DzjF72A.png"&gt;&lt;/a&gt;&lt;br&gt;The field __deleted is added after using the rewrite configuration
  &lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Non-Primary Keys Update
&lt;/h3&gt;

&lt;p&gt;Providing the mentioned configurations, updating a record (every column except the primary key) emits a simple record with the new state. Having another relational database with the same DDL is OK since the updated record replaces the previous one in the destination. But in the case of ClickHouse, the story goes wrong!&lt;/p&gt;

&lt;p&gt;In our example, the source uses &lt;em&gt;id&lt;/em&gt; as the primary key, and ClickHouse uses &lt;em&gt;id&lt;/em&gt; and &lt;em&gt;status&lt;/em&gt; as order keys. Replaces and uniqueness only guarantees for the records with the same &lt;em&gt;id&lt;/em&gt; and &lt;em&gt;status&lt;/em&gt;! So what happens if the source updates the &lt;em&gt;status&lt;/em&gt; column? We end up with duplicate records implying equal &lt;em&gt;ids&lt;/em&gt; but different &lt;em&gt;statuses&lt;/em&gt; in ClikHouse!&lt;/p&gt;

&lt;p&gt;Fortunately, there is a way. By default, Debezium creates a delete record and a create record for updating on primary keys. So if the source updates the &lt;em&gt;id&lt;/em&gt;, it emits a &lt;strong&gt;delete&lt;/strong&gt; record with the previous &lt;em&gt;id&lt;/em&gt; and a &lt;strong&gt;create&lt;/strong&gt; record with the new &lt;em&gt;id&lt;/em&gt;. The previous one with the &lt;code&gt;__deleted=ture&lt;/code&gt; field replaces our stall record in CH. Then the records implying deletion can be filtered in the view. We can extend this behavior to other columns with the below option:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"message.key.columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inventory.orders:id;inventory.orders:status"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now by putting together all the above options and the usual ones, we’ll have a fully functional Debezium configuration capable of handling any change desired by ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mysql-connector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"connector.class"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.debezium.connector.mysql.MySqlConnector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.hostname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mysql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.include.list"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inventory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mypassword"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3306"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.server.id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.server.name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dbz.inventory.v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"database.user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"root"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message.key.columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inventory.orders:id;inventory.orders:status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mysql-connector-v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"schema.history.internal.kafka.bootstrap.servers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"broker:9092"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"schema.history.internal.kafka.topic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dbz.inventory.history.v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"snapshot.mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"schema_only"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"table.include.list"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inventory.orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"topic.prefix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dbz.inventory.v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"transforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unwrap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"transforms.unwrap.delete.handling.mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rewrite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"transforms.unwrap.type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.debezium.transforms.ExtractNewRecordState"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Important: How to choose the Debezium key columns?
&lt;/h3&gt;

&lt;p&gt;By changing the key columns of the connector, Debezium uses those columns as the topic keys instead of the default Primary key of the source table. So different operations related to a record of the database may end up at the other partitions in Kafka. As records lose their order in different partitions, it can lead to inconsistency in Clikchouse unless you ensure that ClickHouse order keys and Debezium message keys are the same.&lt;/p&gt;

&lt;p&gt;The rule of thumb is as below:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Design the partition key and order key based on your desired table design.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extract the source origin of the partition and sort keys, supposing they are calculated during materialization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Union all of those columns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define the result of step 3 as the &lt;em&gt;message.column.keys&lt;/em&gt; in the Debezium connector configuration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check if the Clickhouse sort key has all those columns. If not, add them.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 2: ClickHouse Tables
&lt;/h2&gt;

&lt;p&gt;ClickHouse can sink Kafka records into a table by utilizing &lt;a href="https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka/" rel="noopener noreferrer"&gt;Kafka Engine&lt;/a&gt;. We need to define three tables: Kafka table, Consumer Materilizaed table, and Main table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kafka Table
&lt;/h3&gt;

&lt;p&gt;Kafka table defines the record structure and Kafka topic intended to be read.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kafka_orders&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`__deleted`&lt;/span&gt; &lt;span class="k"&gt;Nullable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Kafka&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'broker:9092'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'inventory.orders'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'AvroConfluent'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;format_avro_schema_registry_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'http://schema-registry:8081'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Consumer Materializer
&lt;/h3&gt;

&lt;p&gt;Every record of the Kafka Table is only read once since its consumer group bumps the offset, and we can’t read it twice. So, we need to define a main table and materialize every Kafka table record to it via the view Materializer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consumer__orders&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream_orders&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`__deleted`&lt;/span&gt; &lt;span class="k"&gt;Nullable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;__deleted&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;__deleted&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kafka_orders&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Main Table
&lt;/h3&gt;

&lt;p&gt;The main table has the source structure plus the &lt;code&gt;__deleted&lt;/code&gt; field. I’m using a Replacing Merge Tree since we need to replace stall records with their deleted or updated ones.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream_orders&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`__deleted`&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ReplacingMergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  View Table
&lt;/h3&gt;

&lt;p&gt;Finally, we need to filter every deleted record (since we don’t want to see them) and have the most recent one in case of having different records with the same sort key. This can be tackled by using the &lt;em&gt;Final&lt;/em&gt; modifier. But to avoid using filter and final in every query, we can define a simple View to do the job implicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`__deleted`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream_orders&lt;/span&gt;
&lt;span class="k"&gt;FINAL&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;__deleted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'false'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: It’s inefficient to use Final for every query, especially in production. You can use aggregations to see the last records or wait for ClickHouse to merge records in the background.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we saw how we could synchronize the ClickHouse database with MySQL via CDC and prevent duplication using a soft-delete approach.&lt;/p&gt;

</description>
      <category>cdc</category>
      <category>clickhouse</category>
      <category>mysql</category>
      <category>debezium</category>
    </item>
    <item>
      <title>ClickHouse Advanced Tutorial: Performance Comparison with MySQL</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Thu, 08 Jun 2023 10:14:24 +0000</pubDate>
      <link>https://forem.com/hoptical/clickhouse-advanced-tutorial-performance-comparison-with-mysql-2cj2</link>
      <guid>https://forem.com/hoptical/clickhouse-advanced-tutorial-performance-comparison-with-mysql-2cj2</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;p&gt; 1. Introduction&lt;/p&gt;

&lt;p&gt;       1.1. OLTP&lt;/p&gt;

&lt;p&gt;       1.2. OLAP&lt;/p&gt;

&lt;p&gt;       1.3. MySQL&lt;/p&gt;

&lt;p&gt;       1.4. ClickHouse&lt;br&gt;
 2. Comparison Case Study&lt;/p&gt;

&lt;p&gt;       2.5. System Specification&lt;/p&gt;

&lt;p&gt;       2.6. Benchmark Flow&lt;/p&gt;

&lt;p&gt;       2.7. Queries&lt;br&gt;
 3. Results&lt;/p&gt;

&lt;p&gt;       3.8. Dataset Load&lt;/p&gt;

&lt;p&gt;       3.9. Table Size&lt;/p&gt;

&lt;p&gt;       3.10. Read Queries Execution&lt;/p&gt;

&lt;p&gt;       3.11. Update Query Execution&lt;br&gt;
 4. Conclusion&lt;/p&gt;


  &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.azquotes.com%2Fpicture-quotes%2Fquote-it-s-more-important-to-know-your-weaknesses-than-your-strengths-ray-l-hunt-102-72-52.jpg"&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Nothing is perfect. In terms of databases, you can't expect the best performance for every task and query from your deployed database. However, the vital step as a software developer is to know their strengths and weaknesses and how to deal with them.&lt;/p&gt;

&lt;p&gt;In this post, I will compare Clickhouse as a representative of the OLAP database and MySQL of the OLTP. This will help us to choose better solutions for our challenges according to our conditions and desires. Before jumping into the main context, let's discuss OLTP, OLAP, MySQL, and ClickHouse.&lt;/p&gt;
&lt;h3&gt;
  
  
  OLTP
&lt;/h3&gt;

&lt;p&gt;OLTP stands for Online Transaction Processing and is used for day-to-day operations, such as processing orders and updating customer information. OLTP is best for short, fast transactions and is optimized for quick response times. It is essential to ensure data accuracy and consistency and provide an efficient way to access data.&lt;/p&gt;
&lt;h3&gt;
  
  
  OLAP
&lt;/h3&gt;

&lt;p&gt;OLAP stands for Online Analytical Processing and is used for data mining and analysis. It enables organizations to analyze large amounts of data from multiple perspectives and identify trends and patterns. OLAP is best for complex queries and data mining and can provide impossible insights with traditional reporting tools.&lt;/p&gt;
&lt;h3&gt;
  
  
  MySQL
&lt;/h3&gt;

&lt;p&gt;MySQL is a popular open-source database management system. It is used to store and manage data and is utilized by websites and applications to store and manage information. MySQL is a relational database management system that holds data in tables and allows users to query the data. It also provides features such as triggers, stored procedures, and views. MySQL is easy to use and has a wide range of features that can be used to create powerful and efficient applications.&lt;/p&gt;
&lt;h3&gt;
  
  
  ClickHouse
&lt;/h3&gt;

&lt;p&gt;ClickHouse is an open-source column-oriented database management system developed by Yandex. It is designed to provide high performance for analytical queries.&lt;br&gt;
ClickHouse uses a SQL-like query language for querying data and supports different data types, including integers, strings, dates, and floats. It offers various features such as clustering, distributed query processing, and fault tolerance. It also supports replication and data sharding. You can know more about this database by visiting the first part of this series:&lt;/p&gt;


&lt;div class="ltag__link"&gt;
  &lt;a href="/hoptical" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1050129%2Fecf5ea92-076e-46bd-a7ba-b42ba181c3c8.jpg" alt="hoptical"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/hoptical/clickhouse-basic-tutorial-an-introduction-52il" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;ClickHouse Basic Tutorial: An Introduction&lt;/h2&gt;
      &lt;h3&gt;Hamed Karbasi ・ Apr 13 '23&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#clickhouse&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#database&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#tutorial&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


&lt;p&gt;Now we can talk about the performance comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison Case Study
&lt;/h2&gt;

&lt;p&gt;I've followed the &lt;a href="https://github.com/ClickHouse/ClickBench" rel="noopener noreferrer"&gt;Clickbench&lt;/a&gt; repository methodology for the case study. It uses the &lt;em&gt;hits&lt;/em&gt; dataset obtained from the actual traffic recording of one of the world's largest web analytics platforms. &lt;code&gt;hits&lt;/code&gt; contain about 100M rows as a single flat table. This repository studies more than 20 databases regarding dataset load time, elapsed time for 43 OLAP queries, and occupied storage. You can access their visualized results &lt;a href="https://github.com/ClickHouse/ClickBench" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ClickHouse" rel="noopener noreferrer"&gt;
        ClickHouse
      &lt;/a&gt; / &lt;a href="https://github.com/ClickHouse/ClickBench" rel="noopener noreferrer"&gt;
        ClickBench
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      ClickBench: a Benchmark For Analytical Databases
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;ClickBench: a Benchmark For Analytical Databases&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href="https://benchmark.clickhouse.com/" rel="nofollow noopener noreferrer"&gt;https://benchmark.clickhouse.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Discussion: &lt;a href="https://news.ycombinator.com/item?id=32084571" rel="nofollow noopener noreferrer"&gt;https://news.ycombinator.com/item?id=32084571&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Overview&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;This benchmark represents typical workload in the following areas: clickstream and traffic analysis, web analytics, machine-generated data, structured logs, and events data. It covers the typical queries in ad-hoc analytics and real-time dashboards.&lt;/p&gt;
&lt;p&gt;The dataset from this benchmark was obtained from the actual traffic recording of one of the world's largest web analytics platforms. It is anonymized while keeping all the essential distributions of the data. The set of queries was improvised to reflect the realistic workloads, while the queries are not directly from production.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Goals&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;The main goals of this benchmark are:&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Reproducibility&lt;/h3&gt;

&lt;/div&gt;
&lt;p&gt;You can quickly reproduce every test in as little as 20 minutes (although some systems may take several hours) in a semi-automated way. The test setup is documented and uses inexpensive cloud VMs. The test process is documented in the form of a shell script, covering…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ClickHouse/ClickBench" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;To investigate ClickHouse and MySQL performance specifically, I separated 10M rows of the table and chose some of the predefined &lt;a href="https://github.com/ClickHouse/ClickBench/blob/main/mysql/queries.sql" rel="noopener noreferrer"&gt;queries&lt;/a&gt; that can make our point more clear. Those queries are mainly in OLAP manner, so they only show ClickHouse strengths compared to MySQL (i.e., MySQL loses in all those queries). Hence, I added other queries showing the opposite (OLTP queries). Although I've limited the benchmark to these two databases, you can generalize the concept to other row-oriented and column-oriented DBMSs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; This benchmark only clarifies the main difference between column-oriented and row-oriented databases regarding their performance and use cases. It should not be considered a reference for your use cases. Hence, you should perform your benchmarks with your queries to achieve the best decision.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  System Specification
&lt;/h3&gt;

&lt;p&gt;Databases are installed on Ubuntu 22.04 LTS on a system with the below specifications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU: Intel® Core™ i7-10510U CPU @ 1.80GHz × 8&lt;/li&gt;
&lt;li&gt;RAM: 16 GiB&lt;/li&gt;
&lt;li&gt;Storage: 256 GiB SSD&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benchmark Flow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;The database is created.&lt;/li&gt;
&lt;li&gt;The table is created with the &lt;a href="https://github.com/ClickHouse/ClickBench/blob/main/mysql/create.sql" rel="noopener noreferrer"&gt;defined DDL&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Data (&lt;code&gt;hits.tsv&lt;/code&gt;) is loaded into the table, and its time is measured.&lt;/li&gt;
&lt;li&gt;Queries are run, and each query's elapsed time is measured.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Queries
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Number&lt;/th&gt;
&lt;th&gt;Statement&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT COUNT(*) FROM hits;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLAP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLAP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate &amp;gt;= '2013-07-01' AND EventDate &amp;lt;= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL &amp;lt;&amp;gt; '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLAP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLAP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT EventTime, WatchID  FROM hits WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT Title, URL, Referer  FROM hits WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;code&gt;UPDATE hits SET Title='my title', URL='my url', Referer='my referer' WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OLTP&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;I'll study the results under four categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataset Load&lt;/li&gt;
&lt;li&gt;Table Size&lt;/li&gt;
&lt;li&gt;Read Queries Execution&lt;/li&gt;
&lt;li&gt;Update Query Execution: I've discussed the update query (query number 7) separately since it needs more discussion and attention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dataset Load
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;MySQL&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;65s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11m35s&lt;/td&gt;
&lt;td&gt;x10.7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Thanks to the LSM and sparse indexes, ClickHouse load time is much faster than MySQL, which uses BTree. However, ClickHouse inserts efficiency is observable in bulk inserts instead of many individual inserts. This behavior comes from the fact that it creates immutable parts for each insert and is unwilling to change, remove or create its data for a few rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table Size
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ClickHouse (GiB)&lt;/th&gt;
&lt;th&gt;MySQL (GiB)&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6.32&lt;/td&gt;
&lt;td&gt;x4.86&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The column-oriented structure gives the ability of &lt;a href="https://clickhouse.com/docs/en/about-us/distinctive-features#data-compression" rel="noopener noreferrer"&gt;&lt;em&gt;Data Compression&lt;/em&gt;&lt;/a&gt;, something that is not available in row-oriented databases. That is why ClickHouse can do a practical favor to the teams storing a high amount of data, reducing the storage cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read Queries Execution
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Number&lt;/th&gt;
&lt;th&gt;ClickHouse (s)&lt;/th&gt;
&lt;th&gt;MySQL (s)&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.005&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7.79&lt;/td&gt;
&lt;td&gt;x1558&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.030&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16.0&lt;/td&gt;
&lt;td&gt;x533.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.193&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4.35&lt;/td&gt;
&lt;td&gt;x22.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.600&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;180.93&lt;/td&gt;
&lt;td&gt;x69.58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0.01&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;x0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;0.011&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;x0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ClickHouse's &lt;a href="https://clickhouse.com/docs/en/optimize/sparse-primary-indexes#:~:text=At%20the%20very%20large%20scale,technique%20is%20called%20sparse%20index." rel="noopener noreferrer"&gt;sparse index&lt;/a&gt; and column-oriented structure have outperformed MySQL in all OLAP queries (numbers 1 to 4). That's why BI and Data Analysts would be more than happy with ClickHouse for their daily reports.&lt;/p&gt;

&lt;p&gt;However, MySQL wins the battle when it comes to OLTP queries (numbers 5 and 6). Btree (equipped by MySQL) indeed performs better for pointy queries in which you demand short transactions requiring few rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Update Query Execution
&lt;/h3&gt;

&lt;p&gt;For the update query (number 7), we should execute a different query in ClickHouse as it doesn't support updates in a naive way, and the &lt;code&gt;Alter&lt;/code&gt; command has to be used:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;JavaEnable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;CounterID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;38&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;EventDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2013-07-15'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;UserID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'1387668437822950552'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;WatchID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'8899477221003616239'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Additionally, ClickHouse applies the update asynchronously. To have the result immediately, you've to perform an &lt;code&gt;optimize&lt;/code&gt; command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;By performing query number 7 statement shown in the queries table for MySQL and the two above SQL statements for ClickHouse, we achieve the below results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Number&lt;/th&gt;
&lt;th&gt;ClickHouse (s)&lt;/th&gt;
&lt;th&gt;MySQL (s)&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Again, ClickHouse mutation hatred makes it a loser for real-time updates (and similarly deletes) compared to MySQL. Consequently, other methods like deduplication using ReplacingMergeTree can be utilized to handle updates. You can find valuable resources in the below links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://clickhouse.com/docs/en/guides/developer/deduplication" rel="noopener noreferrer"&gt;Row-level Deduplication Strategies for Upserts and Frequent Updates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse" rel="noopener noreferrer"&gt;Handling Updates and Deletes in ClickHouse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://altinity.com/blog/2020/4/14/handling-real-time-updates-in-clickhouse" rel="noopener noreferrer"&gt;Handling Real-Time Updates in ClickHouse&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post, I benchmarked MySQL and ClickHouse databases to study some of their strengths and weaknesses that may help us choose a suitable solution. To summarize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MySQL performs better on pointy and OLTP queries.&lt;/li&gt;
&lt;li&gt;ClickHoues performs better on OLAP queries.&lt;/li&gt;
&lt;li&gt;ClickHouse is not designed for frequent updates and deletes. You have to handle them with deduplication methods.&lt;/li&gt;
&lt;li&gt;ClickHouse reduces the storage cost thanks to its column-oriented structure.&lt;/li&gt;
&lt;li&gt;ClickHouse bulk inserts load time operates far better than MySQL.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>clickhouse</category>
      <category>mysql</category>
      <category>database</category>
      <category>performance</category>
    </item>
    <item>
      <title>ClickHouse Basic Tutorial: Keys &amp; Indexes</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Fri, 02 Jun 2023 16:04:08 +0000</pubDate>
      <link>https://forem.com/hoptical/clickhouse-basic-tutorial-keys-indexes-5d7a</link>
      <guid>https://forem.com/hoptical/clickhouse-basic-tutorial-keys-indexes-5d7a</guid>
      <description>&lt;p&gt;In the previous parts, we saw an introduction to ClickHouse and its features. Furthermore, we learned about its different table engine families and their most usable members. In this part, I will walk through the special keys and indexes in ClickHouse, which can help reduce query latency and database load significantly.&lt;/p&gt;

&lt;p&gt;It should be said that these concepts are only applicable to the default table engine family: Merge-Trees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Primary Key
&lt;/h2&gt;

&lt;p&gt;ClickHouse indexes are based on &lt;em&gt;Sparse Indexing&lt;/em&gt;, an alternative to the &lt;a href="https://en.wikipedia.org/wiki/B-tree" rel="noopener noreferrer"&gt;B-Tree&lt;/a&gt; index utilized by traditional DBMSs. In B-tree, every row is indexed, which is suitable for locating and updating a single row, also known as pointy-queries common in OLTP tasks. This comes with the cost of poor performance on high-volume insert speed and high memory and storage consumption. On the contrary, the sparse index splits data into multiple &lt;em&gt;parts&lt;/em&gt;, each group by a fixed portion called &lt;em&gt;granules&lt;/em&gt;. ClickHouse considers an index for every granule (group of data) instead of every row, and that's where the &lt;em&gt;sparse index&lt;/em&gt; term comes from. Having a query filtered on the primary keys, ClickHouse looks for those granules and loads the matched granules in parallel to the memory. That brings a notable performance on range queries common in OLAP tasks. Additionally, as data is stored in columns in multiple files, it can be compressed, resulting in much less storage consumption.&lt;/p&gt;

&lt;p&gt;The nature of the spars-index is based on &lt;a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree" rel="noopener noreferrer"&gt;LSM trees&lt;/a&gt; allowing you to insert high-volume data per second. All these come with the cost of not being suitable for pointy queries, which is not the purpose of the ClickHouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structure
&lt;/h3&gt;

&lt;p&gt;In the below figure, we can see how ClickHouse stores data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgwpv7hn78pjm0zmiv9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgwpv7hn78pjm0zmiv9y.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;ClickHouse Data Store Structure&lt;/cite&gt;
  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is split into multiple parts (ClickHouse default or user-defined partition key)&lt;/li&gt;
&lt;li&gt;Parts are split in granules which is a logical concept, and ClickHouse doesn't split data into them as the physical. Instead, it can locate the granules via the marks.
Granules' locations (start and end) are defined in the mark files with the &lt;code&gt;mrk2&lt;/code&gt; extension.&lt;/li&gt;
&lt;li&gt;Index values are stored in the &lt;code&gt;primary.idx&lt;/code&gt; file, which contains one row per granule.&lt;/li&gt;
&lt;li&gt;Columns are stored as compressed blocks in &lt;code&gt;.bin&lt;/code&gt; files: One file for every column in the &lt;code&gt;Wide&lt;/code&gt; and a single file for all columns in the &lt;code&gt;Compact&lt;/code&gt; format. Being Wide or Compact is determined by ClickHouse based on the size of the columns.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now let's see how ClickHouse finds the matching rows using primary keys:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ClickHouse finds the matching granule marks utilizing the &lt;code&gt;primary.idx&lt;/code&gt; file via the binary search.&lt;/li&gt;
&lt;li&gt;Looks into the mark files to find the granules' location in the &lt;code&gt;bin&lt;/code&gt; files.&lt;/li&gt;
&lt;li&gt;Loads the matching granules from the &lt;code&gt;bin&lt;/code&gt; files into the memory in parallel and looks for the matching rows in those granules using binary search.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Case Study
&lt;/h3&gt;

&lt;p&gt;To clarify the flow mentioned above, let's create a table and insert data into it:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;

    &lt;span class="nv"&gt;`project_id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`created_date`&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generateRandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'project_id Int32, name String, created_date Date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10000000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;First, if you don't specify primary keys separately, ClickHouse will consider sort keys (in order by) as primary keys. Hence, in this table, &lt;code&gt;project_id&lt;/code&gt; and &lt;code&gt;created_date&lt;/code&gt; are the primary keys. Every time you insert data into this table, it will sort data first by &lt;code&gt;project_id&lt;/code&gt; and then by &lt;code&gt;created_date&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If we look into the data structure stored on the hard drive, we face this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8u1cfv0vor7vj98dzrcj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8u1cfv0vor7vj98dzrcj.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Physical files stored in a part&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;We have five parts, and one of them is: &lt;code&gt;all_1_1_0&lt;/code&gt;. You can visit &lt;a href="https://kb.altinity.com/engines/mergetree-table-engine-family/part-naming-and-mvcc/#part-names--multiversion-concurrency-control" rel="noopener noreferrer"&gt;this link&lt;/a&gt; if you're curious about the naming convention. As you can see, columns are stored in &lt;code&gt;bin&lt;/code&gt; files, and we see mark files named as primary keys along with the &lt;code&gt;primary.idx&lt;/code&gt; file.&lt;/p&gt;

&lt;h4&gt;
  
  
  Filter on the first primary-key
&lt;/h4&gt;

&lt;p&gt;Now let's filter on &lt;code&gt;project_id&lt;/code&gt;, which is the first primary key, and explain its indexes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt3u2ool0pla83zjokbq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt3u2ool0pla83zjokbq.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Index analysis of a query on first primary key&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;As you can see, the system has detected &lt;code&gt;project_id&lt;/code&gt; as a primary key and ruled out 1224 granules out of 1225 using it!&lt;/p&gt;

&lt;h4&gt;
  
  
  Filter on second primary-key
&lt;/h4&gt;

&lt;p&gt;What if we filter on &lt;code&gt;created_date&lt;/code&gt;: the second PK:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeauo6583t2d9vclbb1u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeauo6583t2d9vclbb1u.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Index analysis of a query on second primary key&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;The database has detected &lt;code&gt;created_date&lt;/code&gt; as a primary key, but it hasn't been able to filter any granules. Why?&lt;br&gt;
Because ClickHouse uses binary search only for the first key and &lt;a href="https://github.com/ClickHouse/ClickHouse/blob/22.3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L1444" rel="noopener noreferrer"&gt;generic exclusive search&lt;/a&gt; for other keys, which is much less efficient than the former. So how can we make it more efficient?&lt;/p&gt;

&lt;p&gt;If we substitute &lt;code&gt;project_id&lt;/code&gt; and &lt;code&gt;created_date&lt;/code&gt; in the sort keys while defining the table, you will achieve better results in filtering for the non-first keys since the created_date has lower cardinality (uniqueness) than the &lt;code&gt;project_id&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;

    &lt;span class="nv"&gt;`project_id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`created_date`&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;700&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftz9fgeo23rncr5cx3oah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftz9fgeo23rncr5cx3oah.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Index analysis of a query on second primary key on an improved sort keys table&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;If we filter on the &lt;code&gt;project_id&lt;/code&gt;, the second key, now ClickHouse, would use only 909 granules instead of the whole data.&lt;/p&gt;

&lt;p&gt;So to summarize, always try to order the primary keys from &lt;strong&gt;low&lt;/strong&gt; to &lt;strong&gt;high&lt;/strong&gt; cardinality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Order Key
&lt;/h3&gt;

&lt;p&gt;I mentioned earlier that if you don't specify the &lt;code&gt;PRIMARY KEY&lt;/code&gt; option, ClickHouse considers sort keys as the primary keys. However, if you want to set primary keys separately, it should be a subset of the sort keys. As a result, additional keys specified in the sort keys are only utilized for sorting purposes and don't play any role in indexing.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;

    &lt;span class="nv"&gt;`project_id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`created_date`&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In this example, &lt;code&gt;created_date&lt;/code&gt; and &lt;code&gt;project_id&lt;/code&gt; columns are utilized in the sparse index and sorting, and &lt;code&gt;name&lt;/code&gt; column is only used as the last item for sorting.&lt;/p&gt;

&lt;p&gt;Use this option if you wish to use a column in the &lt;code&gt;ORDER BY&lt;/code&gt; part of the query since it will eliminate the database sorting effort while running it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partition Key
&lt;/h3&gt;

&lt;p&gt;A partition is a logical combination of parts in ClickHouse. It considers all parts under no specific partition by default. To find out more, look into the &lt;code&gt;system.parts&lt;/code&gt; table for that &lt;code&gt;projects&lt;/code&gt; table defined in the previous section:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'projects'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fepygdscxk5pahpfzgjnu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fepygdscxk5pahpfzgjnu.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Parts structure in an unpartitioned table&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;You can see that the &lt;code&gt;projects&lt;/code&gt; table has no particular partition. However, you can customize it using the &lt;code&gt;PARTITION BY&lt;/code&gt; option:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;projects_partitioned&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;

    &lt;span class="nv"&gt;`project_id`&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`name`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nv"&gt;`created_date`&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toYYYYMM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In the above table, ClickHouse partitions data based on the month of the &lt;code&gt;created_date&lt;/code&gt; column:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqthhonsue7p6qj26mgxj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqthhonsue7p6qj26mgxj.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Parts structure in a partitioned table&lt;/cite&gt;
  &lt;/p&gt;

&lt;h3&gt;
  
  
  Index
&lt;/h3&gt;

&lt;p&gt;ClickHouse creates a &lt;em&gt;min-max&lt;/em&gt; index for the partition key and uses it as the first filter layer in query running. Let's see what happens when we filter data by a column existent in the partition key:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;projects_partitioned&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'2020-02-01'&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z17e0jmy8v8nvbp04ow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z17e0jmy8v8nvbp04ow.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Index analysis on a partitioned table&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;You can see that database has chosen one part out of 16 using the min-max index of the partition key.&lt;/p&gt;

&lt;h3&gt;
  
  
  Usage
&lt;/h3&gt;

&lt;p&gt;Partitioning in ClickHouse aims to bring data manipulation capabilities to the table. For instance, you can delete or move parts belonging to partitions older than a year. It is way more efficient than an unpartitioned table since ClickHouse has split data based on the month physically on the storage. Consequently, such operations can be performed easily.&lt;/p&gt;

&lt;p&gt;Although Clickhouse creates an additional index for the partition key, it should never be considered a query performance improvement method because it loses the performance battle to define the column in the sort keys. So if you wish to enhance the query performance, contemplate those columns in the sort keys and use a column as the partition key if you have particular plans for data manipulation based on that column.&lt;/p&gt;

&lt;p&gt;Finally, don't get partitions in ClickHouse wrong with the same term in the distributed systems where data is split on different nodes. You should use &lt;a href="https://clickhouse.com/docs/en/engines/table-engines/special/distributed" rel="noopener noreferrer"&gt;shards and distributed tables&lt;/a&gt; if you're inclined to achieve such purposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skip Index
&lt;/h2&gt;

&lt;p&gt;You may have recognized that defining a column in the last items of the sort key cannot be helpful, mainly if you only filter on that column without the sort keys. What should you do in those cases?&lt;/p&gt;

&lt;p&gt;Consider a dictionary you want to read. You can find words using the table of contents, sorted by the alphabet. Those items are the sort keys in the table. You can simply find a word starting with &lt;em&gt;W&lt;/em&gt;, but how can you find pages containing words related to wars?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3mrrbbhb98bgm75kjcot.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3mrrbbhb98bgm75kjcot.jpeg" alt="A book with sticky notes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can put marks or sticky notes on those pages making your effort less the next time. That's how &lt;a href="https://clickhouse.com/docs/en/optimize/skipping-indexes" rel="noopener noreferrer"&gt;Skip Index&lt;/a&gt; works. It helps the database filter granules that don't have desired values of some columns by creating additional indexes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Case Study
&lt;/h3&gt;

&lt;p&gt;Consider the &lt;code&gt;projects&lt;/code&gt; table defined in the &lt;em&gt;Order By&lt;/em&gt; section. &lt;code&gt;created_date&lt;/code&gt; and &lt;code&gt;project_id&lt;/code&gt; were defined as primary keys. Now if we filter on the &lt;code&gt;name&lt;/code&gt; column, we'll encounter this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'hamed'&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feiofrdndocznh2d242u5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feiofrdndocznh2d242u5.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Index analysis on a query on non-indexed column&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;The result was expected. Now what if we define a skip index on it?&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;name_index&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;bloom_filter&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The above command creates a skip index on the &lt;code&gt;name&lt;/code&gt; column. I've used the bloom filter type because the column was a string. You can find more about the other kinds &lt;a href="https://clickhouse.com/docs/en/optimize/skipping-indexes#skip-index-types" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This command only makes the index for the new data. Wishing to create for already inserted, you can use this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;name_index&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's see the query analysis this time:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimwu2pw2hbl2zeh7cb9x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimwu2pw2hbl2zeh7cb9x.png"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Index analysis on a query on skip-indexed column&lt;/cite&gt;
  &lt;/p&gt;

&lt;p&gt;As you can see, the skip index greatly affected granules' rule-out and performance.&lt;/p&gt;

&lt;p&gt;While the skip index performed efficiently in this example, it can show poor performance in other cases. It depends on the correlation of your specified column and sort keys and settings like index granularity and its type.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, understanding and utilizing ClickHouse's primary keys, order keys, partition keys, and skip index is crucial for optimizing query performance and scalability. Choosing appropriate primary keys, order keys, and partitioning strategies can enhance data distribution, improve query execution speed, and prevent overloading. Additionally, leveraging the skip index feature intelligently helps minimize disk I/O and reduce query execution time. By considering these factors in your ClickHouse schema design, you can unlock the full potential of ClickHouse for efficient and performant data solutions.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Schedule Database Backups with Cronjob and Upload to AWS S3</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Wed, 24 May 2023 15:53:02 +0000</pubDate>
      <link>https://forem.com/hoptical/how-to-schedule-database-backups-with-cronjob-and-upload-to-aws-s3-4ikh</link>
      <guid>https://forem.com/hoptical/how-to-schedule-database-backups-with-cronjob-and-upload-to-aws-s3-4ikh</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The update procedure is a vital operation every team should consider. However, doing it manually can be exhausting. As a toil, you can automate it simply by creating a cronjob to take the backup and upload it to your desired object storage.&lt;/p&gt;

&lt;p&gt;This article explains how to automate your database backup with a Cronjob and upload it to the Cloud S3. Postgres has been considered as the database, but you can generalize it to any other database or data type you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Create configmap
&lt;/h2&gt;

&lt;p&gt;To perform the backup and upload, we need a bash script. It should first login into the cloud via the &lt;code&gt;oc login&lt;/code&gt; command. Then gets your desired database pod name. Executes the dump command, zips it, and downloads it via the &lt;code&gt;rsync&lt;/code&gt; command. Finally, it uploads it to the AWS S3 object storage.&lt;/p&gt;

&lt;p&gt;During running the script and before the dumping, it gets a prompt from the user as an &lt;em&gt;Are you sure?&lt;/em&gt; which you can bypass by the ‍‍&lt;code&gt;-y‍&lt;/code&gt; option.&lt;/p&gt;

&lt;p&gt;All credentials like &lt;em&gt;OKD Token&lt;/em&gt; or &lt;em&gt;Postgres Password&lt;/em&gt; are passed to the application as environment variables.&lt;/p&gt;

&lt;p&gt;By putting this bash script in a config map, it can be mounted as a volume in the cronjob. Remember to replace &lt;code&gt;PROJECT_HERE&lt;/code&gt; with your project name and customize the variables in the bash script according to your project specifications.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash-script&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PROJECT_HERE&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;backup.sh&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;#!/bin/bash&lt;/span&gt;

    &lt;span class="s"&gt;# This file provides a backup script for postgres&lt;/span&gt;

    &lt;span class="s"&gt;# Variables: Modify your variables according to the okd projects and database secrets&lt;/span&gt;

    &lt;span class="s"&gt;NAMESPACE=PROJECT_HERE&lt;/span&gt;

    &lt;span class="s"&gt;S3_URL=https://s3.mycompany.com&lt;/span&gt;

    &lt;span class="s"&gt;STATEFULSET_NAME=postgres&lt;/span&gt;

    &lt;span class="s"&gt;BACKUP_NAME=backup-$(date "+%F")&lt;/span&gt;

    &lt;span class="s"&gt;S3_BUCKET=databases-backup&lt;/span&gt;

    &lt;span class="s"&gt;# Exit the script anywhere faced the error &lt;/span&gt;

    &lt;span class="s"&gt;set -e&lt;/span&gt;

    &lt;span class="s"&gt;# Define the confirm option about user prompt (yes or no)&lt;/span&gt;

    &lt;span class="s"&gt;confirm=""&lt;/span&gt;

    &lt;span class="s"&gt;# Parse command-line options&lt;/span&gt;

    &lt;span class="s"&gt;while getopts "y" opt; do&lt;/span&gt;
        &lt;span class="s"&gt;case $opt in&lt;/span&gt;
        &lt;span class="s"&gt;y)&lt;/span&gt;
            &lt;span class="s"&gt;confirm="y"&lt;/span&gt;
            &lt;span class="s"&gt;;;&lt;/span&gt;
        &lt;span class="s"&gt;\?)&lt;/span&gt;
            &lt;span class="s"&gt;echo "Invalid option: -$OPTARG" &amp;gt;&amp;amp;2&lt;/span&gt;
            &lt;span class="s"&gt;exit 1&lt;/span&gt;
            &lt;span class="s"&gt;;;&lt;/span&gt;
        &lt;span class="s"&gt;esac&lt;/span&gt;
    &lt;span class="s"&gt;done&lt;/span&gt;

    &lt;span class="s"&gt;# Login to OKD&lt;/span&gt;

    &lt;span class="s"&gt;oc login ${S3_URL} --token=${OKD_TOKEN}&lt;/span&gt;

    &lt;span class="s"&gt;POD_NAME=$(oc get pods -n ${NAMESPACE} | grep ${STATEFULSET_NAME} | cut -d' '&lt;/span&gt;
    &lt;span class="s"&gt;-f1)&lt;/span&gt;

    &lt;span class="s"&gt;echo The backup of database in pod ${POD_NAME} will be dumped in ${BACKUP_NAME}&lt;/span&gt;
    &lt;span class="s"&gt;file.&lt;/span&gt;

    &lt;span class="s"&gt;DUMP_COMMAND='PGPASSWORD="'${POSTGRES_USER_PASSWORD}'" pg_dump -U&lt;/span&gt;
    &lt;span class="s"&gt;'${POSTGRES_USER}' '${POSTGRES_DB}' &amp;gt; /bitnami/postgresql/backup/'${BACKUP_NAME}&lt;/span&gt;

    &lt;span class="s"&gt;GZIP_COMMAND='gzip /bitnami/postgresql/backup/'${BACKUP_NAME}&lt;/span&gt;

    &lt;span class="s"&gt;REMOVE_COMMAND='rm /bitnami/postgresql/backup/'${BACKUP_NAME}.gz&lt;/span&gt;

    &lt;span class="s"&gt;# Prompt the user for confirmation if the -y option was not provided&lt;/span&gt;

    &lt;span class="s"&gt;if [[ $confirm != "y" ]]; then&lt;/span&gt;
        &lt;span class="s"&gt;read -r -p "Are you sure you want to proceed? [y/N] " response&lt;/span&gt;
        &lt;span class="s"&gt;case "$response" in&lt;/span&gt;
        &lt;span class="s"&gt;[yY][eE][sS] | [yY])&lt;/span&gt;
            &lt;span class="s"&gt;confirm="y"&lt;/span&gt;
            &lt;span class="s"&gt;;;&lt;/span&gt;
        &lt;span class="s"&gt;*)&lt;/span&gt;
            &lt;span class="s"&gt;echo "Aborted"&lt;/span&gt;
            &lt;span class="s"&gt;exit 0&lt;/span&gt;
            &lt;span class="s"&gt;;;&lt;/span&gt;
        &lt;span class="s"&gt;esac&lt;/span&gt;
    &lt;span class="s"&gt;fi&lt;/span&gt;

    &lt;span class="s"&gt;# Dump the backup and zip it&lt;/span&gt;

    &lt;span class="s"&gt;oc exec -n ${NAMESPACE} "${POD_NAME}" -- sh -c "${DUMP_COMMAND} &amp;amp;&amp;amp; ${GZIP_COMMAND}"&lt;/span&gt;

    &lt;span class="s"&gt;echo Transfer it to current local folder&lt;/span&gt;

    &lt;span class="s"&gt;oc rsync -n ${NAMESPACE} ${POD_NAME}:/bitnami/postgresql/backup/ /backup-files&lt;/span&gt;
    &lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
        &lt;span class="s"&gt;oc exec -n ${NAMESPACE} "${POD_NAME}" -- sh -c "${REMOVE_COMMAND}"&lt;/span&gt;

    &lt;span class="s"&gt;# Send backup files to AWS S3&lt;/span&gt;

    &lt;span class="s"&gt;aws --endpoint-url "${S3_URL}" s3 sync /backup-files&lt;/span&gt;
    &lt;span class="s"&gt;s3://${S3_BUCKET}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Create secrets
&lt;/h2&gt;

&lt;p&gt;Database, AWS, and OC credentials should be kept as secrets. First, we’ll create a secret containing the AWS CA Bundles. After downloading the bundle, you can make a secret file from it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;oc create secret &lt;span class="nt"&gt;-n&lt;/span&gt; PROJECT_HERE generic certs &lt;span class="nt"&gt;--from-file&lt;/span&gt; ca-bundle.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should replace &lt;code&gt;PROJECT_HERE&lt;/code&gt; with your project name.&lt;/p&gt;

&lt;p&gt;Now let’s create another secret for other credentials. Consider that you should specify AWS_CA_BUNDLE with=/certs/ca-bundle.crt&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecret&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PROJECT_HERE&lt;/span&gt;

&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;AWS_CA_BUNDLE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;OKD_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;POSTGRES_USER_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  
  &lt;span class="na"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 

&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Create cronjob
&lt;/h2&gt;

&lt;p&gt;To create the cronjob, we need a docker image capable of running oc and aws commands. You can find this image and its Docker file &lt;a href="https://hub.docker.com/repository/docker/hamedkarbasi/aws-cli-oc/"&gt;here&lt;/a&gt; if you are inclined to customize it.&lt;/p&gt;

&lt;p&gt;Now let’s create the cronjob:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;batch/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CronJob&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database-backup&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PROJECT_HERE&lt;/span&gt; 
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0 3 * * *&lt;/span&gt;
  &lt;span class="na"&gt;jobTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backup&lt;/span&gt;
            &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hamedkarbasi/aws-cli-oc:1.0.0&lt;/span&gt;
            &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/bin/bash"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/backup-script/backup.sh&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-y"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
            &lt;span class="na"&gt;envFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecret&lt;/span&gt;
            &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;script&lt;/span&gt;
                &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/backup-script/backup.sh&lt;/span&gt;
                &lt;span class="na"&gt;subPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backup.sh&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;certs&lt;/span&gt;
                &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/certs/ca-bundle.crt&lt;/span&gt;
                &lt;span class="na"&gt;subPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ca-bundle.crt&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-dir&lt;/span&gt;
                &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/.kube&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backup-files&lt;/span&gt;
                &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/backup-files&lt;/span&gt;
          &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;script&lt;/span&gt;
              &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backup-script&lt;/span&gt;
                &lt;span class="na"&gt;defaultMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0777&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;certs&lt;/span&gt;
              &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
                &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;certs&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-dir&lt;/span&gt;
              &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backup-files&lt;/span&gt;
              &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
          &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, you should replace &lt;code&gt;PROJECT_HERE&lt;/code&gt; with your project name and the schedule parameter, with your desired run job frequency. By putting all manifests in a folder named &lt;code&gt;backup&lt;/code&gt;, we can apply it to Kubernetes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;oc apply &lt;span class="nt"&gt;-f&lt;/span&gt; backup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cronjob will be run at 3:00 AM every night, dumping the database and uploading to the AWS S3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, automating database backups to AWS S3 using cronjob can save you time and effort while ensuring your valuable data is stored securely in the cloud. Following the steps outlined in this guide, you can easily set up a backup schedule that meets your needs and upload your backups to AWS S3 for safekeeping. Remember to test your backups regularly to ensure they can be restored when needed, and keep your AWS credentials and permissions secure to prevent unauthorized access. With these best practices in mind, you can have peace of mind knowing that your database backups are automated and securely stored in the cloud.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>aws</category>
      <category>automation</category>
      <category>database</category>
    </item>
    <item>
      <title>ClickHouse Basic Tutorial: Table Engines</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Sat, 29 Apr 2023 08:38:56 +0000</pubDate>
      <link>https://forem.com/hoptical/clickhouse-basic-tutorial-table-engines-30i1</link>
      <guid>https://forem.com/hoptical/clickhouse-basic-tutorial-table-engines-30i1</guid>
      <description>&lt;p&gt;In this part, I will cover ClickHouse table engines. Like any other database, ClickHouse uses engines to determine a table's storage, replication, and concurrency methodologies. Every engine has pros and cons, and you should choose them by your need. Moreover, engines are categorized into families sharing the main features. As a practical article, I will deep dive into the most usable ones in every family and leave the others to your interest. &lt;/p&gt;

&lt;p&gt;Now, let's start with the first and most usable family:&lt;/p&gt;

&lt;h2&gt;
  
  
  Merge-Tree Family
&lt;/h2&gt;

&lt;p&gt;‌As it said, this is the most possible choice when you want to create a table in ClickHouse. It's based on the data structure of the &lt;a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree" rel="noopener noreferrer"&gt;Log Structured Merge-Tree&lt;/a&gt;. LSM trees are optimized for write-intensive workloads. They are designed to handle a large volume of writes by buffering them in memory and then periodically flushing them to disk in sorted order. This allows for faster writes of massive data and reduces the likelihood of disk fragmentation. They are considered an alternative to the &lt;a href="https://en.wikipedia.org/wiki/B-tree" rel="noopener noreferrer"&gt;B-Tree&lt;/a&gt; data structure which is common in traditional relational databases like MySQL.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note: For all engines of this family, you can use &lt;code&gt;Replicated&lt;/code&gt; as a prefix to the engine name to create a replication of the table on every ClickHouse node.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's investigate common engines in this family.&lt;/p&gt;

&lt;h3&gt;
  
  
  MergeTree
&lt;/h3&gt;

&lt;p&gt;Here is an example of a merge-tree DDL:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`comment`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Merge-tree tables use &lt;a href="https://clickhouse.com/docs/en/optimize/sparse-primary-indexes" rel="noopener noreferrer"&gt;sparse indexing&lt;/a&gt; to optimize queries.&lt;br&gt;
Briefly, in sparse indexing, data is split into multiple parts. Every part is sorted by the &lt;code&gt;order by&lt;/code&gt; keys (referred to as &lt;em&gt;sort keys&lt;/em&gt;), where the first key has the highest priority in sorting. Then every part is broken down into groups called &lt;em&gt;granules&lt;/em&gt; whose first and last items for primary keys are considered as marks. Since these marks are extracted from the sorted data, primary keys should be a subset of sort keys. Then for every query containing a filter on primary keys, ClickHouse performs a binary search on those marks to find the target granules as fast as possible. Finally, ClickHouse loads target granules in memory and searches for the matching rows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note: You can omit the &lt;code&gt;PRIMARY KEY&lt;/code&gt; in DDL, and ClickHouse will consider sort keys as primary keys.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  ReplaingMergeTree
&lt;/h3&gt;

&lt;h4&gt;
  
  
  DDL
&lt;/h4&gt;

&lt;p&gt;In this engine, rows with equal order keys are replaced by the last row. Consider the below engine:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`comment`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ReplacingMergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Suppose that you insert a row in this table:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1000'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Confirmed'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now let's insert another row with the same sort keys:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2000'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Cancelled'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now the latter row will replace the previous one. Note that if you get select rows, you may face both of them:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fttb0dodjn9lsa72k7hpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fttb0dodjn9lsa72k7hpe.png" alt="Result of the replacing merge tree without Final modifier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's because Clickhouse performs the replacement process while merging the parts, which happens in the background asynchronously and not immediately. To see the final result immediately, you can use the &lt;code&gt;FINAL&lt;/code&gt; modifier:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmz5kceh1wut9hdsze7bj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmz5kceh1wut9hdsze7bj.png" alt="Result of the replacing merge tree with Final modifier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note: You can specify a &lt;a href="https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree#ver" rel="noopener noreferrer"&gt;column as version&lt;/a&gt; while defining the table to replace rows accordingly.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Usage
&lt;/h4&gt;

&lt;p&gt;Replacing Merge Tree is widely used for deduplication. As ClickHouse performs poorly in frequent updates, you can update a column by inserting a new row with the equal sort keys, and ClickHouse will remove the stalled rows in the background. Surely it's challenging to update sort keys because it won't delete the old rows in that situation. In that case, you can use Collapsing Merge Trees, explained in the next part.&lt;/p&gt;

&lt;h3&gt;
  
  
  CollapsingMergeTree
&lt;/h3&gt;

&lt;p&gt;In this engine, you can define a sign column and ask the database to delete stall rows with &lt;code&gt;sign=-1&lt;/code&gt; and keep the new row with &lt;code&gt;sign=1&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  DDL
&lt;/h4&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`comment`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`sign`&lt;/span&gt; &lt;span class="n"&gt;Int8&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CollapsingMergeTree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sign&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's insert a row in this table:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1000'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Confirmed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzxhg4v0p9mxb5w7th6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzxhg4v0p9mxb5w7th6r.png" alt="Data in Collapsing Merge Tree before update"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now to update the row:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1000'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Confirmed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2000'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Cancelled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To see the results:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd95a2qzea1rabxbifc6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd95a2qzea1rabxbifc6a.png" alt="Data in Collapsing Merge Tree after update without final modifier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To see the final results immediately:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8bg0xpqhqh1ph0zg223.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8bg0xpqhqh1ph0zg223.png" alt="Data in Collapsing Merge Tree after update with final modifier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Usage
&lt;/h4&gt;

&lt;p&gt;Collapsing Merge Trees can handle updates and deletes in a more controlled manner. For example, you can update sorts keys by inserting the same row with &lt;code&gt;sign=-1&lt;/code&gt; and the row with new sort keys with &lt;code&gt;sign=1&lt;/code&gt;. There are two challenges with this engine:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Since you need to insert the old row with &lt;code&gt;sign=1&lt;/code&gt;, you need to inquire about it by fetching from the database or another data store.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In case of inserting multiple rows accidentally or deliberately, with the &lt;code&gt;sign&lt;/code&gt; equal to 1 or -1, you may face unwanted results. That's why you should consider all situations explained &lt;a href="https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/collapsingmergetree#table_engine-collapsingmergetree-collapsing-algorithm" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  AggreragatingMergeTree
&lt;/h3&gt;

&lt;p&gt;Using this engine, you can materialize the aggregation of a table into another one.&lt;/p&gt;

&lt;h4&gt;
  
  
  DDL
&lt;/h4&gt;

&lt;p&gt;Consider this &lt;em&gt;inventory&lt;/em&gt; table. We need to have the maximum price per every item id and the sum of its number of items in another table. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`price`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`num_items`&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now let's materialize its results into another table via AggregatingMergeTree:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;agg_inventory&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;`max_price`&lt;/span&gt; &lt;span class="n"&gt;AggregateFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nv"&gt;`sum_items`&lt;/span&gt; &lt;span class="n"&gt;AggregateFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AggregatingMergeTree&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;maxState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;max_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sumState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sum_items&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inventory2&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now let's insert rows into it and see the results:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;inventory2&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxMerge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sumMerge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sum_items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sum_items&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;agg_inventory&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19fapbotd8grxn916e8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19fapbotd8grxn916e8x.png" alt="Output of aggregating merge tree"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Usage
&lt;/h4&gt;

&lt;p&gt;This engine helps you reduce the response time of heavy, fixed analytics queries by calculating them in writing time. That will end up decreasing in database load in query time too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Log Family
&lt;/h2&gt;

&lt;p&gt;Lightweight engines with minimum functionality. They're the most effective when you need to quickly write many small tables (up to approximately 1 million rows) and read them later. Additionally, there are no indexes in this family. However, &lt;code&gt;Log&lt;/code&gt; and &lt;code&gt;StripeLog&lt;/code&gt; engines can break down data into multiple blocks to support multi-threading while reading data.&lt;/p&gt;

&lt;p&gt;I will only look into the &lt;em&gt;TinyLog&lt;/em&gt; engine. To check the others, you can visit &lt;a href="https://clickhouse.com/docs/en/engines/table-engines#log" rel="noopener noreferrer"&gt;this link&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  TinyLog
&lt;/h3&gt;

&lt;p&gt;This table is mainly used as a write-once method. i.e., you will write data once and read it as often as you want. As ClickHouse reads data in a single stream, it's better to keep the size of the table up to 1M rows.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;log_location&lt;/span&gt;&lt;br&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
    &lt;span class="nv"&gt;&lt;code&gt;id&lt;/code&gt;&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="nv"&gt;&lt;code&gt;long&lt;/code&gt;&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="nv"&gt;&lt;code&gt;lat&lt;/code&gt;&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;br&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TinyLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Usage&lt;br&gt;
&lt;/h4&gt;

&lt;p&gt;You can use this engine as an intermediate state for batch operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration Family
&lt;/h2&gt;

&lt;p&gt;The engines in this family are widely used to connect with other databases and brokers with the ability to fetch or insert data.&lt;/p&gt;

&lt;p&gt;I'll cover MySQL and Kafka Engines, but you can study the others &lt;a href="https://clickhouse.com/docs/en/engines/table-engines#integration-engines" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  MySQL Engine
&lt;/h3&gt;

&lt;p&gt;With this engine, you can connect with a MySQL database through ClickHouse and read its data or insert rows.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;mysql_inventory&lt;/span&gt;&lt;br&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
    &lt;span class="nv"&gt;&lt;code&gt;id&lt;/code&gt;&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="nv"&gt;&lt;code&gt;price&lt;/code&gt;&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;br&gt;
&lt;span class="p"&gt;)&lt;/span&gt;&lt;br&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MySQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'host:port'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'database'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'table'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'password'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Kafka Engine&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;Using this engine, you can make a connection to a Kafka Cluster and read its data with a defined consumer group. This engine is broadly used for CDC purposes.&lt;/p&gt;

&lt;p&gt;To learn more about this feature, read &lt;a href="https://medium.com/@hoptical/apply-cdc-from-mysql-to-clickhouse-d660873311c7" rel="noopener noreferrer"&gt;this&lt;/a&gt; article specifically on this topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we saw some of the most important engines of the ClickHouse database. It is clear that ClickHouse provides a wide range of engine options to suit various use-cases. The Merge Tree engine is the default engine and is suitable for most scenarios, but it can be replaced with other engines like AggregatingMergeTree, TinyLog, etc.&lt;/p&gt;

&lt;p&gt;It's important to note that choosing the right engine for your use-case can significantly improve performance and efficiency. Therefore, it's worth taking the time to understand the strengths and limitations of each engine and select the one that best meets your needs. &lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Step-by-Step Guide: Deploying Kafka Connect via Strimzi Operator on Kubernetes</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Tue, 25 Apr 2023 12:19:43 +0000</pubDate>
      <link>https://forem.com/hoptical/step-by-step-guide-deploying-kafka-connect-via-strimzi-operator-on-kubernetes-3bjb</link>
      <guid>https://forem.com/hoptical/step-by-step-guide-deploying-kafka-connect-via-strimzi-operator-on-kubernetes-3bjb</guid>
      <description>&lt;p&gt;&lt;a href="https://strimzi.io/"&gt;Strimzi&lt;/a&gt; is almost the richest Kubernetes Kafka operator, which you can utilize to deploy Apache Kafka or its other components like Kafka Connect, Kafka Mirror, etc. This article will provide a step-by-step tutorial about deploying Kafka Connect on Kubernetes. I brought all issues I encountered during the deployment procedure and their best mitigation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Consider that this operator is based on &lt;a href="https://kafka.apache.org/"&gt;Apache Kafka&lt;/a&gt;, not the &lt;a href="https://docs.confluent.io/platform/current/platform.html"&gt;Confluent Platform&lt;/a&gt;. That's why you may need to add some confluent artifacts like &lt;a href="https://www.confluent.io/hub/confluentinc/kafka-connect-avro-converter"&gt;Confluent Avro Converter&lt;/a&gt; to get the most out of it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article is based on &lt;code&gt;Strimzi v0.29.0&lt;/code&gt;. Thus you're able to install the following versions of Kafka Connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Strimzi: 0.29.0&lt;/li&gt;
&lt;li&gt;  Apache Kafka &amp;amp; Kafka Connect: Up to 3.2&lt;/li&gt;
&lt;li&gt;  Equivalent Confluent Platform: 7.2.4&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can convert Confluent Platform version to Apache Kafka version and vice versa with the provided table &lt;a href="https://docs.confluent.io/platform/current/installation/versions-interoperability.html#supported-versions-and-interoperability-for-cp"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Openshift GUI and Kubernetes CLI
&lt;/h3&gt;

&lt;p&gt;If you're using Openshift, navigate to Operators &amp;gt; installed Operators &amp;gt; Strimzi &amp;gt; Kafka Connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkdc82y0t6yf0zjc7y6l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkdc82y0t6yf0zjc7y6l.png" alt="Openshift Strimzi Operator Page" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you will face a form containing the Kafka connect configurations. You can get the equivalent Yaml file of the form by clicking on Yaml View. Any update on the form view will be applied to the Yaml view on the fly. Although the form view is quite straightforward, It's strongly recommended not to use it for creating the instance directly. Use it only for converting your desired configuration to a Yaml file and then deploy the operator with the &lt;code&gt;kubectl apply&lt;/code&gt; command. So to summarize:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Enter the configuration in the form view&lt;/li&gt;
&lt;li&gt; Click on Yaml view&lt;/li&gt;
&lt;li&gt; Copy its contents to a Yaml file on your local (e.g. &lt;code&gt;kafka-connect.yaml&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt; Run: &lt;code&gt;kubectl apply -f kafka-connect.yaml&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now the Kafka-Connect kind should be deployed or updated. The deployed resources consist of Deployment and pods, Service, config maps, and secrets.&lt;/p&gt;

&lt;p&gt;Let's get through the minimum configuration and make it more advanced, step by step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimum Configuration
&lt;/h2&gt;

&lt;p&gt;To deploy a simple minimum configuration of Kafka Connect, you can use the below Yaml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka.strimzi.io/v1beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaConnect&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-connect-cluster&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;YOUR_PROJECT_NAME&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config.storage.replication.factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
    &lt;span class="na"&gt;config.storage.topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster-configs&lt;/span&gt;
    &lt;span class="na"&gt;group.id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster&lt;/span&gt;
    &lt;span class="na"&gt;offset.storage.replication.factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
    &lt;span class="na"&gt;offset.storage.topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster-offsets&lt;/span&gt;
    &lt;span class="na"&gt;status.storage.replication.factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
    &lt;span class="na"&gt;status.storage.topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster-status&lt;/span&gt;
  &lt;span class="na"&gt;bootstrapServers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka1, kafka2&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3.2.0&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can have the Kafka Connect Rest API on port 8083 exposed on the pod. You can expose it on a private or internal network by defining a route on OKD.&lt;/p&gt;

&lt;h2&gt;
  
  
  REST API Authentication
&lt;/h2&gt;

&lt;p&gt;With the configuration explained &lt;a href="https://docs.confluent.io/platform/current/security/basic-auth.html#kconnect-rest-api"&gt;here&lt;/a&gt;, you can add authentication to the Kafka Connect REST proxy. Unfortunately, that doesn't work on the Strimzi operator, as discussed &lt;a href="https://github.com/strimzi/strimzi-kafka-operator/issues/3229"&gt;here&lt;/a&gt;. So to provide security on Kafka Connect, you've two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Use &lt;em&gt;the Kafka Connector&lt;/em&gt; operator API. Strimzi operator lets you have a Connector kind defined in a YAML file. However, it may not be practical for some use cases since updating, pausing, and stopping connectors via the REST API is necessary.&lt;/li&gt;
&lt;li&gt; Put the insecure REST API behind an authenticated API Gateway like Apache APISIX or any other tool or self-developed application.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  JMX Prometheus Metrics
&lt;/h2&gt;

&lt;p&gt;To expose JMX Prometheus Metrics, useful for observing connectors statuses in Grafana, add the below configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;metricsConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jmxPrometheusExporter&lt;/span&gt;
    &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;configMapKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jmx-prometheus&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;configs&lt;/span&gt;
  &lt;span class="na"&gt;jmxOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It uses a pre-defined config for Prometheus export. You can use this config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;startDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="na"&gt;ssl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;lowercaseOutputName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;lowercaseOutputLabelNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kafka.connect&amp;lt;type=connect-worker-metrics&amp;gt;([^:]+):"&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kafka_connect_connect_worker_metrics_$1"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kafka.connect&amp;lt;type=connect-metrics,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;client-id=([^:]+)&amp;gt;&amp;lt;&amp;gt;([^:]+)"&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kafka_connect_connect_metrics_$2"&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$1"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debezium.([^:]+)&amp;lt;type=connector-metrics,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;context=([^,]+),&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;server=([^,]+),&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;key=([^&amp;gt;]+)&amp;gt;&amp;lt;&amp;gt;RowsScanned"&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debezium_metrics_RowsScanned"&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;plugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$1"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$3"&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$2"&lt;/span&gt;
    &lt;span class="na"&gt;table&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$4"&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debezium.([^:]+)&amp;lt;type=connector-metrics,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;context=([^,]+),&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;server=([^&amp;gt;]+)&amp;gt;([^:]+)"&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debezium_metrics_$4"&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;plugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$1"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$3"&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service for External Prometheus
&lt;/h3&gt;

&lt;p&gt;If you are intended to deploy Prometheus in companion with Strimzi to collect the metrics, follow the instructions &lt;a href="https://strimzi.io/docs/operators/latest/deploying.html#assembly-metrics-setup-str"&gt;here&lt;/a&gt;. However, in the case of using external Prometheus, the story goes another way:&lt;/p&gt;

&lt;p&gt;Strimzi operator only creates port mapping in Service for these ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  8083: Kafka Connect REST API&lt;/li&gt;
&lt;li&gt;  9999: JMX port&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sadly it doesn't create a mapping for port 9404, the Prometheus exporter HTTP port. &lt;a href="https://github.com/strimzi/strimzi-kafka-operator/issues/8403"&gt;So we've to create a service on our own&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect-jmx-prometheus&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/instance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/managed-by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;strimzi-cluster-operator&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
    &lt;span class="na"&gt;app.kubernetes.io/part-of&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;strimzi-kafka-connect&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaConnect&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp-prometheus&lt;/span&gt;
      &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9404&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9404&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIP&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaConnect&lt;/span&gt;
    &lt;span class="na"&gt;strimzi.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect-connect&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;loadBalancer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: This method only works for single-pod deployments since you should define a route for the service and even in the case of headless service, the route returns one IP of a pod at a time. Hence, Prometheus can't scrape all pods metrics. That's why it is recommended to use Podmonitor and Prometheus on Cloud. This issue is discussed here&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Plugins and Artifacts
&lt;/h2&gt;

&lt;p&gt;To add plugins and artifacts, there are two ways:&lt;/p&gt;

&lt;h3&gt;
  
  
  Operator Build Section
&lt;/h3&gt;

&lt;p&gt;To add plugins, you can use the operator build section. It gets the plugin or artifact addresses, downloads them in the build stage (The operator creates the build config automatically), and adds them to the plugin directory of the image.&lt;/p&gt;

&lt;p&gt;It supports &lt;code&gt;jar, tgz, zip, and maven&lt;/code&gt;.  However, in the case of Maven,  a multi-stage Dockerfile is created, which is &lt;a href="https://bugzilla.redhat.com/show_bug.cgi?id=1937243"&gt;problematic to Openshift&lt;/a&gt;, and it faces failure in the build stage.  Hence, you should only use other types that don't need compile stage (i.e., jar, zip, tgz) and end up with a single-stage Dockerfile.&lt;/p&gt;

&lt;p&gt;For example, to add the Debezium MySQL plugin, you can use the below configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;kafkaconnect:1.0'&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;imagestream&lt;/span&gt;
    &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tgz&lt;/span&gt;
            &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
              &lt;span class="s"&gt;https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/2.1.4.Final/debezium-connector-mysql-2.1.4.Final-plugin.tar.gz&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;debezium-connector-mysql&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: Strimzi operator is only able to download public artifacts. So if you wish to download a privately secured artifact that is not accessible by Kubernetes, you've to give up this method and follow the next one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Changing Image
&lt;/h3&gt;

&lt;p&gt;The operator is able to use your desired image instead of its default one. Thus you can add your desired artifacts and plugins by building an image manually or via CI/CD. One of the other reasons why you may want to use this method is that Strimzi uses Apache Kafka image, not the Confluent Platform. So the deployments don't have Confluent useful packages like Confluent Avro Converter, etc. So you need to add them to your image and configure the operator to use your docker image.&lt;/p&gt;

&lt;p&gt;For example, If you want to add your customized Debezium MySQL Connector plugin from Gitlab Generic Packages and Confluent Avro Converter to the base image, first use this Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;ARG CONFLUENT_VERSION=7.2.4&lt;/span&gt;

&lt;span class="c1"&gt;# Install confluent avro converter&lt;/span&gt;
&lt;span class="s"&gt;FROM confluentinc/cp-kafka-connect:${CONFLUENT_VERSION} as cp&lt;/span&gt;
&lt;span class="c1"&gt;# Reassign version&lt;/span&gt;
&lt;span class="s"&gt;ARG CONFLUENT_VERSION&lt;/span&gt;
&lt;span class="s"&gt;RUN confluent-hub install --no-prompt confluentinc/kafka-connect-avro-converter:${CONFLUENT_VERSION}&lt;/span&gt;

&lt;span class="c1"&gt;# Copy privious artifacts to the main strimzi kafka image&lt;/span&gt;
&lt;span class="s"&gt;FROM quay.io/strimzi/kafka:0.29.0-kafka-3.2.0&lt;/span&gt;
&lt;span class="s"&gt;ARG GITLAB_TOKEN&lt;/span&gt;
&lt;span class="s"&gt;ARG CI_API_V4_URL=https://gitlab.snapp.ir/api/v4&lt;/span&gt;
&lt;span class="s"&gt;ARG CI_PROJECT_ID=3873&lt;/span&gt;
&lt;span class="s"&gt;ARG DEBEZIUM_CONNECTOR_MYSQL_CUSTOMIZED_VERSION=1.0&lt;/span&gt;
&lt;span class="s"&gt;USER root:root&lt;/span&gt;

&lt;span class="c1"&gt;# Copy Confluent packages from previous stage&lt;/span&gt;
&lt;span class="s"&gt;RUN mkdir -p /opt/kafka/plugins/avro/&lt;/span&gt;
&lt;span class="s"&gt;COPY --from=cp /usr/share/confluent-hub-components/confluentinc-kafka-connect-avro-converter/lib /opt/kafka/plugins/avro/&lt;/span&gt;

&lt;span class="c1"&gt;# Connector plugin debezium-connector-mysql&lt;/span&gt;
&lt;span class="s"&gt;RUN 'mkdir' '-p' '/opt/kafka/plugins/debezium-connector-mysql' \&lt;/span&gt;
    &lt;span class="s"&gt;&amp;amp;&amp;amp; curl --header "${GITLAB_TOKEN}" -f -L \&lt;/span&gt;
    &lt;span class="s"&gt;--output /opt/kafka/plugins/debezium-connector-mysql.tgz \&lt;/span&gt;
    &lt;span class="s"&gt;${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/debezium-customized/${DEBEZIUM_CONNECTOR_MYSQL_CUSTOMIZED_VERSION}/debezium-connector-mysql-customized.tar.gz \&lt;/span&gt;
    &lt;span class="s"&gt;&amp;amp;&amp;amp; 'tar' 'xvfz' '/opt/kafka/plugins/debezium-connector-mysql.tgz' '-C' '/opt/kafka/plugins/debezium-connector-mysql' \&lt;/span&gt;
    &lt;span class="s"&gt;&amp;amp;&amp;amp; 'rm' '-vf' '/opt/kafka/plugins/debezium-connector-mysql.tgz'&lt;/span&gt;

&lt;span class="s"&gt;USER &lt;/span&gt;&lt;span class="m"&gt;1001&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build the image. Push it to the image stream or any other docker repository and configure the operator by adding the below line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;image-registry.openshift-image-registry.svc:5000/kafka-connect/kafkaconnect-customized:1.0&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Kafka Authentication
&lt;/h2&gt;

&lt;p&gt;Depending on its type, you need to use different configurations to add Kafka authentication. However, to bring an example, here you can see the configuration for Kafka with SASL/Plaintext mechanism and scram-sha-512:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;authentication&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;passwordSecret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-password&lt;/span&gt;
      &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecrets&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scram-sha-512&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myuser&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No need to say that you must provide the password in a secret file named &lt;em&gt;mysecret&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling File Credentials
&lt;/h2&gt;

&lt;p&gt;Since connectors need credentials to access databases, you've to define them as secrets and access them with environment variables. However, if there are too many of them, you can put all credentials in a file and address them in the connector with the &lt;code&gt;$file modifier&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;1- Put all credentials as the value of a key named &lt;em&gt;credentials&lt;/em&gt; in a secret file.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Credentials file:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USERNAME_DB_1=user1
PASSWORD_DB_1=pass1

USERNAME_DB_2=user2
PASSWORD_DB_2=pass2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Secret file:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecrets&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;BASE64 YOUR DATA&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2- Configure the operator with the secret as volume:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config.providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file&lt;/span&gt;
    &lt;span class="na"&gt;config.providers.file.class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;org.apache.kafka.common.config.provider.FileConfigProvider&lt;/span&gt;  
  &lt;span class="na"&gt;externalConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database_credentials&lt;/span&gt;
        &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;items&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;credentials&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;credentials&lt;/span&gt;
          &lt;span class="na"&gt;optional&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
          &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecrets&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3- Now in the connector, you can access PASSWORD_DB_1 with the below command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="s"&gt;"${file:/opt/kafka/external-configuration/database_credentials/credentials:PASSWORD_DB_1}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Put it all together
&lt;/h2&gt;

&lt;p&gt;If we put all configurations together, we'll have the below configuration for Kafka Connect:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Service, route and build configuration are ommited since we've discussed earlier in the article.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka.strimzi.io/v1beta2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KafkaConnect&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;authentication&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;passwordSecret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-password&lt;/span&gt;
      &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecrets&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scram-sha-512&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myuser&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config.providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;file&lt;/span&gt;
    &lt;span class="na"&gt;config.providers.file.class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;org.apache.kafka.common.config.provider.FileConfigProvider&lt;/span&gt;
    &lt;span class="na"&gt;config.storage.replication.factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
    &lt;span class="na"&gt;config.storage.topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster-configs&lt;/span&gt;
    &lt;span class="na"&gt;group.id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster&lt;/span&gt;
    &lt;span class="na"&gt;offset.storage.replication.factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
    &lt;span class="na"&gt;offset.storage.topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster-offsets&lt;/span&gt;
    &lt;span class="na"&gt;status.storage.replication.factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
    &lt;span class="na"&gt;status.storage.topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;okd4-connect-cluster-status&lt;/span&gt;
  &lt;span class="na"&gt;bootstrapServers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;kafka1:9092,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;kafka2:9092'&lt;/span&gt;
  &lt;span class="na"&gt;metricsConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jmxPrometheusExporter&lt;/span&gt;
    &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;configMapKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jmx-prometheus&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;configs&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;
  &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
  &lt;span class="na"&gt;jmxOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;image-registry.openshift-image-registry.svc:5000/kafka-connect/kafkaconnect-customized:1.0&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3.2.0&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;externalConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;database_credentials&lt;/span&gt;
        &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;items&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;credentials&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;credentials&lt;/span&gt;
          &lt;span class="na"&gt;optional&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
          &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysecrets&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, deploying Kafka Connect using the Strimzi Operator can be a powerful and efficient way to manage data integration in your organization. By leveraging the flexibility and scalability of Kafka, along with the ease of use and automation provided by the Strimzi Operator, you can streamline your data pipelines and improve your data-driven decision-making. In this article, I've covered the key steps involved in deploying Kafka Connect via the Strimzi Operator, including creating its minimal custom resource definition (CRD), REST API Basic authentication issue, Kafka Authentication, JMX Prometheus metrics, plugins and artifacts and handling file credentials. Following these steps, you can easily customize your Kafka Connect deployment to meet your specific needs.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>strimzi</category>
      <category>tutorial</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>ClickHouse Basic Tutorial: An Introduction</title>
      <dc:creator>Hamed Karbasi</dc:creator>
      <pubDate>Thu, 13 Apr 2023 23:01:20 +0000</pubDate>
      <link>https://forem.com/hoptical/clickhouse-basic-tutorial-an-introduction-52il</link>
      <guid>https://forem.com/hoptical/clickhouse-basic-tutorial-an-introduction-52il</guid>
      <description>&lt;p&gt;This is the first part of the &lt;strong&gt;ClickHouse Tutorial Series&lt;/strong&gt;. In this series, I cover some practical and vital aspects of the ClickHouse database, a robust OLAP technology many enterprise companies utilize.&lt;/p&gt;

&lt;p&gt;In this part, I'll talk about the main features, weaknesses, installation, and usage of ClickHouse. I'll also refer to some helpful links for those who want to dive into broader details.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ClickHouse
&lt;/h2&gt;

&lt;p&gt;ClickHouse is an open-source column-oriented database developed by Yandex. It is designed to provide high performance for analytical queries. ClickHouse uses a SQL-like query language for querying data and supports different data types, including integers, strings, dates, and floats. It offers various features such as clustering, distributed query processing, and fault tolerance. It also supports replication and data sharding. ClickHouse is used by companies such as Yandex, Facebook, and Uber for data analysis, machine learning, and more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Features
&lt;/h3&gt;

&lt;p&gt;The main features of Clickhouse Database are:&lt;/p&gt;

&lt;h4&gt;
  
  
  Column-Oriented
&lt;/h4&gt;

&lt;p&gt;Data in ClickHouse is stored in &lt;a href="https://clickhouse.com/docs/en/about-us/distinctive-features#true-column-oriented-database-management-system" rel="noopener noreferrer"&gt;columns instead of rows&lt;/a&gt;, bringing at least two benefits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every column can be sorted in a separate file; hence, stronger compression happens on each column and the whole table.&lt;/li&gt;
&lt;li&gt;In range queries common in analytical processing, the system can access and process data easier since data is sorted in some columns (i.e., columns defined as sort keys). Additionally, it can parallelize processes on multi-cores while loading massive columns.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fclickhouse.com%2Fdocs%2Fassets%2Fimages%2Frow-oriented-3e6fd5aa48e3075202d242b4799da8fa.gif%23" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fclickhouse.com%2Fdocs%2Fassets%2Fimages%2Frow-oriented-3e6fd5aa48e3075202d242b4799da8fa.gif%23"&gt;&lt;/a&gt;&lt;br&gt;Row-Oriented Database (Gif by &lt;a href="https://clickhouse.com/docs/en/faq/general/columnar-database" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt;)
  &lt;/p&gt;




&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcb05sjd69q2feq3w6gfn.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcb05sjd69q2feq3w6gfn.gif"&gt;&lt;/a&gt;&lt;br&gt;&lt;cite&gt;Columnar Database (Gif by &lt;a href="https://clickhouse.com/docs/en/faq/general/columnar-database" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt;)&lt;/cite&gt;
  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: It should not get mistaken with Wide-Column databases like Cassandra as they store data in rows but enable you to denormalize intensive data in a table with many columns leading to a No-SQL structure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Data Compression
&lt;/h4&gt;

&lt;p&gt;Thanks to compression algorithms (&lt;a href="https://github.com/facebook/zstd" rel="noopener noreferrer"&gt;zstd&lt;/a&gt; and &lt;a href="https://github.com/lz4/lz4" rel="noopener noreferrer"&gt;LZ4&lt;/a&gt;), data occupies much less storage, even more than 20x smaller! You can study some of the benchmarks on ClickHouse and other databases storage &lt;a href="https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQXRoZW5hIChwYXJ0aXRpb25lZCkiOnRydWUsIkF0aGVuYSAoc2luZ2xlKSI6dHJ1ZSwiQXVyb3JhIGZvciBNeVNRTCI6dHJ1ZSwiQXVyb3JhIGZvciBQb3N0Z3JlU1FMIjp0cnVlLCJCeXRlSG91c2UiOnRydWUsIkNpdHVzIjp0cnVlLCJjbGlja2hvdXNlLWxvY2FsIChwYXJ0aXRpb25lZCkiOnRydWUsImNsaWNraG91c2UtbG9jYWwgKHNpbmdsZSkiOnRydWUsIkNsaWNrSG91c2UgKHdlYikiOnRydWUsIkNsaWNrSG91c2UiOnRydWUsIkNsaWNrSG91c2UgKHR1bmVkKSI6dHJ1ZSwiQ2xpY2tIb3VzZSAoenN0ZCkiOnRydWUsIkNsaWNrSG91c2UgQ2xvdWQiOnRydWUsIkNyYXRlREIiOnRydWUsIkRhdGFiZW5kIjp0cnVlLCJEYXRhRnVzaW9uIChzaW5nbGUpIjp0cnVlLCJBcGFjaGUgRG9yaXMiOnRydWUsIkRydWlkIjp0cnVlLCJEdWNrREIgKFBhcnF1ZXQpIjp0cnVlLCJEdWNrREIiOnRydWUsIkVsYXN0aWNzZWFyY2giOnRydWUsIkVsYXN0aWNzZWFyY2ggKHR1bmVkKSI6ZmFsc2UsIkdyZWVucGx1bSI6dHJ1ZSwiSGVhdnlBSSI6dHJ1ZSwiSHlkcmEiOnRydWUsIkluZm9icmlnaHQiOnRydWUsIktpbmV0aWNhIjp0cnVlLCJNYXJpYURCIENvbHVtblN0b3JlIjp0cnVlLCJNYXJpYURCIjpmYWxzZSwiTW9uZXREQiI6dHJ1ZSwiTW9uZ29EQiI6dHJ1ZSwiTXlTUUwgKE15SVNBTSkiOnRydWUsIk15U1FMIjp0cnVlLCJQaW5vdCI6dHJ1ZSwiUG9zdGdyZVNRTCAodHVuZWQpIjpmYWxzZSwiUG9zdGdyZVNRTCI6dHJ1ZSwiUXVlc3REQiAocGFydGl0aW9uZWQpIjp0cnVlLCJRdWVzdERCIjp0cnVlLCJSZWRzaGlmdCI6dHJ1ZSwiU2VsZWN0REIiOnRydWUsIlNpbmdsZVN0b3JlIjp0cnVlLCJTbm93Zmxha2UiOnRydWUsIlNRTGl0ZSI6dHJ1ZSwiU3RhclJvY2tzIjp0cnVlLCJUaW1lc2NhbGVEQiAoY29tcHJlc3Npb24pIjp0cnVlLCJUaW1lc2NhbGVEQiI6dHJ1ZX0sInR5cGUiOnsic3RhdGVsZXNzIjp0cnVlLCJtYW5hZ2VkIjp0cnVlLCJKYXZhIjp0cnVlLCJjb2x1bW4tb3JpZW50ZWQiOnRydWUsIkMrKyI6dHJ1ZSwiTXlTUUwgY29tcGF0aWJsZSI6dHJ1ZSwicm93LW9yaWVudGVkIjp0cnVlLCJDIjp0cnVlLCJQb3N0Z3JlU1FMIGNvbXBhdGlibGUiOnRydWUsIkNsaWNrSG91c2UgZGVyaXZhdGl2ZSI6dHJ1ZSwiZW1iZWRkZWQiOnRydWUsInNlcnZlcmxlc3MiOnRydWUsIlJ1c3QiOnRydWUsInNlYXJjaCI6dHJ1ZSwiZG9jdW1lbnQiOnRydWUsInRpbWUtc2VyaWVzIjp0cnVlfSwibWFjaGluZSI6eyJzZXJ2ZXJsZXNzIjp0cnVlLCIxNmFjdSI6dHJ1ZSwiTCI6dHJ1ZSwiTSI6dHJ1ZSwiUyI6dHJ1ZSwiWFMiOnRydWUsImM2YS40eGxhcmdlLCA1MDBnYiBncDIiOnRydWUsImM1bi40eGxhcmdlLCAyMDBnYiBncDIiOnRydWUsImM1LjR4bGFyZ2UsIDUwMGdiIGdwMiI6dHJ1ZSwiYzZhLm1ldGFsLCA1MDBnYiBncDIiOnRydWUsIjE2IHRocmVhZHMiOnRydWUsIjIwIHRocmVhZHMiOnRydWUsIjI0IHRocmVhZHMiOnRydWUsIjI4IHRocmVhZHMiOnRydWUsIjMwIHRocmVhZHMiOnRydWUsIjQ4IHRocmVhZHMiOnRydWUsIjYwIHRocmVhZHMiOnRydWUsIm01ZC4yNHhsYXJnZSI6dHJ1ZSwiYzZhLjR4bGFyZ2UsIDE1MDBnYiBncDIiOnRydWUsInJhMy4xNnhsYXJnZSI6dHJ1ZSwicmEzLjR4bGFyZ2UiOnRydWUsInJhMy54bHBsdXMiOnRydWUsImRjMi44eGxhcmdlIjp0cnVlLCJTMiI6dHJ1ZSwiUzI0Ijp0cnVlLCIyWEwiOnRydWUsIjNYTCI6dHJ1ZSwiNFhMIjp0cnVlLCJYTCI6dHJ1ZX0sImNsdXN0ZXJfc2l6ZSI6eyIxIjp0cnVlLCIyIjp0cnVlLCI0Ijp0cnVlLCI4Ijp0cnVlLCIxNiI6dHJ1ZSwiMzIiOnRydWUsIjY0Ijp0cnVlLCIxMjgiOnRydWUsInNlcnZlcmxlc3MiOnRydWUsInVuZGVmaW5lZCI6dHJ1ZX0sIm1ldHJpYyI6InNpemUiLCJxdWVyaWVzIjpbdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZV19" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxauowhmmaq89hmgoxvht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxauowhmmaq89hmgoxvht.png"&gt;&lt;/a&gt;&lt;br&gt;ClickHouse columnar structure leads to storing and reading columns more efficiently (Graph by &lt;a href="https://altinity.com/blog/using-clickhouse-as-an-analytic-extension-for-mysql" rel="noopener noreferrer"&gt;Altinity&lt;/a&gt;)
  &lt;/p&gt;

&lt;h4&gt;
  
  
  Scalability
&lt;/h4&gt;

&lt;p&gt;ClickHouse scales well both vertically and horizontally. It can be scaled by adding &lt;a href="https://altinitydb.medium.com/clickhouse-for-time-series-scalability-benchmarks-e181132a895b" rel="noopener noreferrer"&gt;extra replicas&lt;/a&gt; and extra shards to process queries in a distributed way. ClickHouse supports multi-master asynchronous replication and can be deployed across multiple data centers. All nodes are equal, which allows for avoiding having single points of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Weaknesses
&lt;/h3&gt;

&lt;p&gt;To mention some:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lack of full-fledged UPDATE/DELETE implementation: ClickHouse is unsuited for modification and mutations. So you'll come across poor performance regarding those kinds of queries.&lt;/li&gt;
&lt;li&gt;OLTP queries like pointy ones would not make you happy since ClickHouse is easily outperformed by traditional RDBMSs like MySQL with those queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rivals and Alternatives
&lt;/h3&gt;

&lt;p&gt;To name a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache Druid&lt;/li&gt;
&lt;li&gt;ElasticSearch&lt;/li&gt;
&lt;li&gt;SingleStore&lt;/li&gt;
&lt;li&gt;Snowflake&lt;/li&gt;
&lt;li&gt;TimescaleDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Surely, each one is suitable for different use cases and has its pros and cons, but I won't discuss their comparison here. However, you can study some valuable benchmarks &lt;a href="https://benchmark.clickhouse.com/" rel="noopener noreferrer"&gt;here&lt;/a&gt; and &lt;a href="https://www.timescale.com/blog/what-is-clickhouse-how-does-it-compare-to-postgresql-and-timescaledb-and-how-does-it-perform-for-time-series-data/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;I only cover the Docker approach here. For other methods on different distros, please follow &lt;a href="https://clickhouse.com/docs/en/install" rel="noopener noreferrer"&gt;ClicHouse's official Installation.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The docker-compose file:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2'&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myclickhouse&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse/clickhouse-server:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8123:8123"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9000:9000"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./clickhouse-data:/var/lib/clickhouse/&lt;/span&gt;  
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;And then run it by:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;As you can see, two ports have been exposed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;8123&lt;/strong&gt;: HTTP API Port for HTTP requests,  used by JDBC, ODBC, and web interfaces.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9000&lt;/strong&gt;: Native Protocol port (ClickHouse TCP protocol). Used by ClickHouse apps and processes like &lt;em&gt;clickhouse-server&lt;/em&gt;, &lt;em&gt;clickhouse-client&lt;/em&gt;, and native ClickHouse tools. Used for inter-server communication for distributed queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's up to your client driver to choose one of them. For example, &lt;a href="https://dbeaver.io/" rel="noopener noreferrer"&gt;DBeaver&lt;/a&gt; uses 8123, and &lt;a href="https://clickhouse-driver.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;Python ClickhHouse-Driver&lt;/a&gt; uses 9000.&lt;/p&gt;

&lt;p&gt;To continue the tutorial, we use ClickHouse-Client available on the installed server:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; myclickhouse clickhouse-client


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Database and Table Creation
&lt;/h3&gt;

&lt;p&gt;Create database test:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;create table &lt;code&gt;orders&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`OrderID`&lt;/span&gt; &lt;span class="n"&gt;Int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nv"&gt;`CustomerID`&lt;/span&gt; &lt;span class="n"&gt;Int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nv"&gt;`OrderDate`&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nv"&gt;`Comments`&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nv"&gt;`Cancelled`&lt;/span&gt; &lt;span class="nb"&gt;Bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OrderID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OrderDate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OrderID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OrderDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CustomerID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;In the next parts, we'll talk about other configurations like &lt;code&gt;Engine&lt;/code&gt;, &lt;code&gt;PRIMARY KEY&lt;/code&gt;, &lt;code&gt;ORDER BY&lt;/code&gt;, etc.&lt;/p&gt;
&lt;h3&gt;
  
  
  Insert Data
&lt;/h3&gt;

&lt;p&gt;To insert sample data:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;334&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2021-09-15 14:30:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'some comment'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Read Data
&lt;/h3&gt;

&lt;p&gt;Just like any other SQL query:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;OrderID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OrderDate&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the first part of the &lt;em&gt;ClickHouse Tutorial Series&lt;/em&gt;, we discussed the traits, features, and weaknesses of ClickHouse. Then we saw how to set up an instance with minimum configuration, create a database and table, insert data into it, and read from it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Useful Links
&lt;/h2&gt;


&lt;div class="ltag__link"&gt;
  &lt;a href="/taw" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F673595%2Fa3ca7eb8-f1aa-48f9-b544-1b90b6a6f948.png" alt="taw"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/taw/getting-started-with-clickhouse-3nf9" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Getting Started with ClickHouse&lt;/h2&gt;
      &lt;h3&gt;Tomasz Wegrzanowski ・ Dec 24 '22&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#clickhouse&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#nosql&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#database&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;



&lt;div class="ltag__link"&gt;
  &lt;a href="/olena_kutsenko" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F601720%2F9dfb66ac-aa96-4905-aa6d-24272c4ae85b.png" alt="olena_kutsenko"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/olena_kutsenko/introduction-to-clickhouse-8em" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Introduction to ClickHouse&lt;/h2&gt;
      &lt;h3&gt;Olena Kutsenko ・ Nov 8 '22&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#clickhouse&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#datawarehouse&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#columnar&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#database&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;



&lt;div class="ltag__link"&gt;
  &lt;a href="https://george3d6.medium.com/clickhouse-an-analytics-database-for-the-21st-century-82d3828f79cc" class="ltag__link__link" rel="noopener noreferrer"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afill%3A88%3A88%2F1%2ARPJSpIh08hBKVoCgDth-4Q.jpeg" alt="George Hosu"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://george3d6.medium.com/clickhouse-an-analytics-database-for-the-21st-century-82d3828f79cc" class="ltag__link__link" rel="noopener noreferrer"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Clickhouse, an analytics database for the 21st century | by George Hosu | Medium&lt;/h2&gt;
      &lt;h3&gt;George Hosu ・ &lt;time&gt;May 4, 2019&lt;/time&gt; ・ 
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fmedium-f709f79cf29704f9f4c2a83f950b2964e95007a3e311b77f686915c71574fef2.svg" alt="Medium Logo"&gt;
        george3d6.Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;



</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
