<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Francesco Tisiot</title>
    <description>The latest articles on Forem by Francesco Tisiot (@ftisiot).</description>
    <link>https://forem.com/ftisiot</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F571977%2F529fa0cd-c499-4dc7-b17f-461b46c68313.jpg</url>
      <title>Forem: Francesco Tisiot</title>
      <link>https://forem.com/ftisiot</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ftisiot"/>
    <language>en</language>
    <item>
      <title>Introducing Developer Tier for Aiven for PostgreSQL® services</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Wed, 12 Nov 2025 14:10:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/introducing-developer-tier-for-aiven-for-postgresqlr-services-1ik7</link>
      <guid>https://forem.com/ftisiot/introducing-developer-tier-for-aiven-for-postgresqlr-services-1ik7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Starting at $8 USD, the new Developer tier includes everything from the Free tier, with extra disk space, preserved uptime for idle services, and Basic support to keep you building without interruption.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Aiven is introducing a new pricing plan for Aiven for PostgreSQL® services. Starting at $8 USD per month, the Developer tier offers more storage, so you can scale up your free PostgreSQL service in a cost-effective way. Unlike the Free tier, services on the Developer tier are not automatically powered off if inactive. The Developer tier also automatically includes Basic support services.&lt;/p&gt;

&lt;p&gt;More information on the Developer tier is available in the &lt;a href="https://aiven.io/docs/platform/concepts/service-pricing#developer-plan" rel="noopener noreferrer"&gt;Aiven docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjc1273iwjv68gwxmbq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjc1273iwjv68gwxmbq3.png" alt="The Aiven Console showing Free, Developer and Professional Tier choices" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What do I get with the Developer tier?
&lt;/h2&gt;

&lt;p&gt;Compared to the Free tier, you still get a single node, with 1 CPU per virtual machine, monitoring and backups. But beyond that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased storage from 1GB to a maximum of 8GB.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aiven.io/docs/platform/howto/support" rel="noopener noreferrer"&gt;Basic tier support&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Inactive services don’t automatically power off, meaning that your database will always be up and running, even if there’s no traffic to it.
See &lt;a href="https://aiven.io/docs/platform/concepts/service-pricing#developer-plan" rel="noopener noreferrer"&gt;the documentation&lt;/a&gt; for full technical details.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your database needs are more specific, you can easily switch and scale to the Professional plans, where you get additional benefits including the ability to choose a specific cloud provider and region for your service, the option for more nodes, even more disk space, longer backup, integrations with other services, forking and much more. Find out more in the &lt;a href="https://aiven.io/docs/products/postgresql" rel="noopener noreferrer"&gt;Aiven for PostgreSQL documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why three tiers?
&lt;/h2&gt;

&lt;p&gt;We originally started out with our selection of paid service plans, which you can now find neatly grouped under the Professional tier.&lt;/p&gt;

&lt;p&gt;We introduced the Free tier as a way of paying back to our community, providing a way to start and host your non critical PostgreSQL infrastructure on Aiven, and creating a great way to learn about using a database in the cloud.&lt;/p&gt;

&lt;p&gt;Now we’ve introduced the Developer tier as a cost-effective choice between those two. The Developer tier inherits some traits from the free tier and enhances it with some features you might find useful for your more advanced needs at a lower price point compared to our Professional plans.&lt;/p&gt;

&lt;p&gt;You can choose the balance that suits your project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free&lt;/strong&gt; to explore and learn the platform and PostgreSQL in the cloud at no cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer&lt;/strong&gt; as a cost-effective option for test and personal projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professional&lt;/strong&gt; for highly available business-critical workloads.
And as your project grows, there’s always an upgrade path available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Questions you might ask
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Do I get to choose where the service runs?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As with the Free tier, you can choose the geographical area (Asia Pacific, Australia, Europe or North America), but not the cloud provider or region. If that’s a requirement, the selection is available in the Professional level plans.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Can I go (back) to the Free tier?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your service data will fit in the smaller instance size (1GB disk instead of 8GB), then it’s always possible to downgrade a service to the Free tier. See the documentation for how to &lt;a href="https://aiven.io/docs/platform/howto/scale-services" rel="noopener noreferrer"&gt;Change a service plan&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Can I upgrade to the Professional tier?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, it’s always possible to upgrade to a higher tier. See the documentation for how to &lt;a href="https://aiven.io/docs/platform/howto/scale-services" rel="noopener noreferrer"&gt;Change a service plan&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Should I use a Developer tier service in production?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As it says in the description in the Aiven Console, the Developer tier is intended to be “A cost-effective option for test and personal projects”.&lt;/p&gt;

&lt;p&gt;The Developer tier provides a working PostgreSQL environment with basic daily backup and Basic level support, but without advanced networking, point in time recovery, forking, integrations, or cloud provider and region selection. If any of the above features are interesting for you, then please look in the Professional plans.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why is this just for PostgreSQL?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We started with adding this new tier for Aiven for PostgreSQL because we’re confident that our users will find it useful. We do have ideas for other services - keep tuned to see what we announce in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out!
&lt;/h2&gt;

&lt;p&gt;If you’re creating a new Aiven for PostgreSQL service and know that a Free service isn’t enough, but you don’t need all the resources of the Professional “Hobbyist” plan, then start a Developer service instead.&lt;/p&gt;

&lt;p&gt;If you’ve got an existing Free tier Aiven for PostgreSQL service that could use just that bit more disk space, or that you only use intermittently, or if you want some support, then consider upgrading to the Developer tier.&lt;/p&gt;

&lt;p&gt;And of course, if you’ve got an existing Hobbyist service that feels over-provisioned, you’ve now got the option of downgrading to Developer (although please do read the documentation to check that everything you need is still provided).&lt;/p&gt;

&lt;p&gt;Find out more in the &lt;a href="https://console.aiven.io/" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt;, or check out &lt;a href="https://aiven.io/docs/platform/concepts/service-pricing#developer-plan" rel="noopener noreferrer"&gt;the documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>announcements</category>
      <category>productupdates</category>
      <category>platform</category>
    </item>
    <item>
      <title>Free PostgreSQL® in the Cloud - Aug 25</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Thu, 14 Aug 2025 08:20:11 +0000</pubDate>
      <link>https://forem.com/ftisiot/free-postgresqlr-in-the-cloud-58bm</link>
      <guid>https://forem.com/ftisiot/free-postgresqlr-in-the-cloud-58bm</guid>
      <description>&lt;p&gt;A short summary of Free PostgreSQL® services in the cloud as of August 14th 2025&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Tier Highlights&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aiven.io/free-postgresql-database" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1 CPU, 1 GB RAM, 1GB Storage&lt;/td&gt;
&lt;td&gt;Production-like prototyping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://neon.com/" rel="noopener noreferrer"&gt;Neon&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;10 projects, 0.5 GB storage, serverless, branching&lt;/td&gt;
&lt;td&gt;Modern development workflows, serverless apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://fly.io/" rel="noopener noreferrer"&gt;Fly.io&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3 GB storage, 160 GB outbound&lt;/td&gt;
&lt;td&gt;Lightweight global apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://render.com/pricing#postgresql" rel="noopener noreferrer"&gt;Render.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;0.1 CPU, 256 RAM, 100 connections, daily backups, 30 day limit&lt;/td&gt;
&lt;td&gt;Small apps needing more capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/free/database/" rel="noopener noreferrer"&gt;Amazon RDS&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;20 GB + backups, 750 hrs/month&lt;/td&gt;
&lt;td&gt;Temporary usage via AWS Free Tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://supabase.com/" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;500 MB storage&lt;/td&gt;
&lt;td&gt;Small prototypes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>postgres</category>
      <category>free</category>
    </item>
    <item>
      <title>Helping PostgreSQL® professionals with AI-assisted performance recommendations</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 28 May 2024 15:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/helping-postgresqlr-professionals-with-ai-assisted-performance-recommendations-27k2</link>
      <guid>https://forem.com/ftisiot/helping-postgresqlr-professionals-with-ai-assisted-performance-recommendations-27k2</guid>
      <description>&lt;p&gt;Since the beginning of my journey into the data world I've been keen on making professionals better at their data job. In the previous years that took the shape of creating materials in several different forms that could help people understand, use, and avoid mistakes on their data tool of choice. But now there's much more into it: a trusted AI solution to help data professional in their day to day optimization job.&lt;/p&gt;

&lt;h2&gt;
  
  
  From content to tooling
&lt;/h2&gt;

&lt;p&gt;The big advantage of the content creation approach is the 1-N effect: the material, once created and updated, can serve a multitude of people interested in the same technology or facing the same problem. You write an article once, and it gets found and adopted by a vast amount of professionals.&lt;/p&gt;

&lt;p&gt;The limit of content tho, it's that it is an extra resource, that people need to find and read elsewhere. While this is useful, it forces a context switch, moving people away from the problem they are facing. Here is where tooling helps, providing assistance in the same IDE that professionals are using for their day to day work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tooling for database professionals
&lt;/h2&gt;

&lt;p&gt;I have the luxury of working for &lt;a href="https://aiven.io/" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; which provides professionals an integrated platform for all their data needs. In the last three years I witnessed the growth of the platform and its evolution with the clear objective to make it better usable at scale. Tooling like &lt;a href="https://aiven.io/integrations-and-connectors" rel="noopener noreferrer"&gt;integrations&lt;/a&gt;, &lt;a href="https://aiven.io/docs/tools/terraform" rel="noopener noreferrer"&gt;Terraform providers&lt;/a&gt; and the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Console&lt;/a&gt; facilitate the work that platform administrators have to perform on daily basis.&lt;/p&gt;

&lt;p&gt;But what about the day to day work of developers and CloudOps teams? This was facilitated when dealing with administrative tasks like backups, creation of read only replicas or upgrades, but the day to day work of optimizing the workloads was still completely on their hands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using trusted AI to optimize database workloads
&lt;/h2&gt;

&lt;p&gt;This, however, is now changing. With the recent launch of &lt;a href="https://aiven.io/blog/aiven-ai-dboptimizer-launch" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer&lt;/a&gt; we are able to help both developer and CloudOps in their day to day optimization work!&lt;/p&gt;

&lt;p&gt;Aiven AI Database Optimizer provides, directly in the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Console&lt;/a&gt;, insights on database workloads alongside a one-click Optimize button that suggests index and SQL rewrites. It's a non intrusive solution who gathers informations from the slow query log and database metadata, and leverages an in-built AI engine, to provide accurate suggestions to improve performance.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/fOavII9QAmg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The solution, based on the &lt;a href="https://www.eversql.com/" rel="noopener noreferrer"&gt;EverSQL by Aiven&lt;/a&gt; technology has been already adopted by 120.000 professionals optimizing over 2 million queries. It's not a wrapper around a public Generative AI provider, is a dedicated solution that keeps data privacy and security as a priority.&lt;/p&gt;

&lt;p&gt;You can experience it for Free in the Early Availability phase, just navigate to the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; and create an Aiven for PostgreSQL® service! Once you have some workload on the service, the &lt;strong&gt;AI Insights&lt;/strong&gt; page will show you the queries and provide index and SQL rewrite suggestion to take your database performance to the next level!&lt;/p&gt;

&lt;p&gt;More information on the dedicated &lt;a href="https://aiven.io/solutions/aiven-ai-database-optimizer" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer page&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>performance</category>
      <category>ai</category>
    </item>
    <item>
      <title>Introducing Aiven's AI Database Optimizer: The First Built-In SQL Optimizer for Enhanced Performance</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 28 May 2024 12:00:00 +0000</pubDate>
      <link>https://forem.com/aiven_io/introducing-aivens-ai-database-optimizer-the-first-built-in-sql-optimizer-for-enhanced-performance-402b</link>
      <guid>https://forem.com/aiven_io/introducing-aivens-ai-database-optimizer-the-first-built-in-sql-optimizer-for-enhanced-performance-402b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgchfpvltsf4zlcic373r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgchfpvltsf4zlcic373r.png" alt="Hero Image" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An efficient data infrastructure is a vital component in building &amp;amp; operating scalable and performant applications that are widely adopted, satisfy customers, and ultimately, drive business growth. Unfortunately, the speed of new feature delivery coupled with a lack of database optimization knowledge is exposing organizations to high risk performance issues. The new &lt;a href="https://aiven.io/solutions/aiven-ai-database-optimizer?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer&lt;/a&gt; helps organizations address performance both in the development and production phase, making it simple to quickly deploy, fully optimized, scalable, and cost efficient applications.&lt;/p&gt;

&lt;p&gt;Fully integrated with &lt;a href="https://aiven.io/postgresql?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Aiven for PostgreSQL&lt;/a&gt;®, Aiven AI Database Optimizer offers AI-driven performance insights, index, and SQL rewrite suggestions to maximize database performance, minimize costs, and make the best out of your cloud investment. &lt;/p&gt;

&lt;h2&gt;
  
  
  How does AI Database Optimizer work?
&lt;/h2&gt;

&lt;p&gt;Aiven AI Database Optimizer is a non-intrusive solution powered by &lt;a href="https://www.eversql.com/?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;EverSQL by Aiven&lt;/a&gt; that gathers information about database workloads, metadata and supporting data structures, such as indexes. Information about the number of query executions and average query times are continually processed by a mix of heuristic and AI models to determine which SQL statements can be further optimized. AI Database Optimizer then delivers accurate, secure optimization suggestions that you can trust, and that can be adopted to speed up query performance.&lt;/p&gt;

&lt;p&gt;Recommendations from Aiven’s AI Database Optimizer are already trusted by over 120,000 users in organizations ranging from startups to the largest global enterprises, who have optimized more than 2 million queries to date.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does AI Database Optimizer help organizations?
&lt;/h2&gt;

&lt;p&gt;During development, AI Database Optimizer enables early performance testing, allowing easier redesign or refactoring of queries before they impact production. This enables customers to foster a culture of considering performance from the get-go, ensuring it is a priority throughout development rather than an afterthought.&lt;/p&gt;

&lt;p&gt;AI Database Optimizer also helps businesses gain an optimal user experience: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With fast query response times that ensure a smooth and responsive user experience, especially in data-intensive applications.&lt;/li&gt;
&lt;li&gt;By identifying and fixing performance bottlenecks organizations can reduce costs, avoid outages and deliver continuous service availability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A fleet of Aiven for PostgreSQL® databases is powering &lt;a href="https://www.youtube.com/watch?v=mPWxizlA3so" rel="noopener noreferrer"&gt;La Redoute&lt;/a&gt;’s marketplace functionality, driving 30% of their business. &lt;br&gt;
Diogo Passadouro - OPS-DBA Team Lead stated &lt;em&gt;"Aiven AI Database Optimizer has revolutionized the way we analyze database performance, providing a simple, clear and highly effective approach and has proven instrumental in enhancing the performance of our databases."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwsxlkj5bc36zcc1blpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwsxlkj5bc36zcc1blpy.png" alt="Quote from Diogo Passadouro" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aiven.io/case-studies/conrad-electronic-expands-e-commerce-platform-with-aiven" rel="noopener noreferrer"&gt;Conrad&lt;/a&gt; is an advanced B2B sourcing platform selling 9 million products from 6,000 brands, powered by Aiven. Janek Wonner - Head of SRE &amp;amp; Cloud Technology stated &lt;em&gt;"Aiven for PostgreSQL is underpinning our fundamental company functionalities, we are looking forward to adopt Aiven AI database optimizer to empower our developers to create scalable code and empower our development teams with better performance insights and improvement suggestions to reduce the time to fix performance issues"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkc8t0qbybx2firird70.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkc8t0qbybx2firird70.png" alt="Quote from Janek Wonner" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More information is available in the &lt;a href="https://aiven.io/solutions/aiven-ai-database-optimizer?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer page&lt;/a&gt;. You can experience it for yourself in any &lt;a href="https://aiven.io/postgresql" rel="noopener noreferrer"&gt;Aiven for PostgreSQL&lt;/a&gt; service for free during the Early Availability phase. Simply navigate to the “AI Insights” tab. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://console.aiven.io/signup?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Try it now&lt;/a&gt; or &lt;a href="https://aiven.io/book-demo?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Contact us&lt;/a&gt; today to check it out!&lt;/p&gt;

</description>
      <category>announcements</category>
      <category>ai</category>
      <category>postgres</category>
    </item>
    <item>
      <title>How to use pgbench to test PostgreSQL® performance</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 12 Mar 2024 15:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/how-to-use-pgbench-to-test-postgresqlr-performance-4m6h</link>
      <guid>https://forem.com/ftisiot/how-to-use-pgbench-to-test-postgresqlr-performance-4m6h</guid>
      <description>&lt;p&gt;Testing a database performance is a must in every company. Despite everyone's needs beings slightly different, a good starting point for PostgreSQL® database is using &lt;a href="https://www.postgresql.org/docs/current/pgbench.html" rel="noopener noreferrer"&gt;pgbench&lt;/a&gt;: a tool shipped with the PostgreSQL installation that allows you to stress test a local or remote database. &lt;br&gt;
This blog post showcases how to install (on a Mac) and use pgbench to create load on a remote PostgreSQL database on &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? &lt;br&gt;
🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Install pgbench locally
&lt;/h2&gt;

&lt;p&gt;In a Mac, &lt;a href="https://www.postgresql.org/docs/current/pgbench.html" rel="noopener noreferrer"&gt;pgbench&lt;/a&gt; comes with the default PostgreSQL installation via brew. Therefore to have pgbench all you need is to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a test PostgreSQL environment
&lt;/h2&gt;

&lt;p&gt;While you could create a test PostgreSQL environment locally (the &lt;a href="https://ftisiot.net/posts/1brows" rel="noopener noreferrer"&gt;1brc challenge blog post&lt;/a&gt; contains all the details), this time we'll create an Aiven for PostgreSQL service in minutes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create an account&lt;/li&gt;
&lt;li&gt;Create an Aiven for PostgreSQL service on your favorite cloud provider and region &lt;/li&gt;
&lt;li&gt;Select the &lt;code&gt;startup-4&lt;/code&gt; plan, it will be sufficient for the test.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use pgbench to load test the a PostgreSQL database
&lt;/h2&gt;

&lt;p&gt;The Aiven for PostgreSQL database comes with a &lt;code&gt;defaultdb&lt;/code&gt; database we can use for our testing. &lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Initialize the database
&lt;/h3&gt;

&lt;p&gt;All we need is to grab the connection details from the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; and we are ready to &lt;strong&gt;initialize&lt;/strong&gt; the database with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;       &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-i&lt;/span&gt;                  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-s&lt;/span&gt; &amp;lt;SCALEFACTOR&amp;gt;    &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;HOST&amp;gt;&lt;/code&gt; is the database hostname&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;PORT&amp;gt;&lt;/code&gt; is the database port&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;USER&amp;gt;&lt;/code&gt; is the database user&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;DBNAME&amp;gt;&lt;/code&gt; is the database name&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SCALEFACTOR&lt;/code&gt; is the test scale factor, &lt;code&gt;100&lt;/code&gt; could be a good place to start. The default is &lt;code&gt;1&lt;/code&gt; which will create a &lt;code&gt;16MB&lt;/code&gt; database, the &lt;code&gt;100&lt;/code&gt; scale will create a &lt;code&gt;1.6GB&lt;/code&gt; database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll be prompted to write the password, that we can find in the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can automate the flow without the need to include the password by storing it in the &lt;code&gt;PGPASSWORD&lt;/code&gt; environment variable with&lt;br&gt;
  &lt;code&gt;export PGPASSWORD=&amp;lt;PASSWORD&amp;gt;&lt;/code&gt; and then rerun the &lt;code&gt;pgbench&lt;/code&gt; command above.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After this pgbench will execute some work including the following tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;       Name       | nr_rows
------------------+---------
 pgbench_accounts | 10000000
 pgbench_branches |      100
 pgbench_history  |        0 
 pgbench_tellers  |     1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: execute a first test to set the baseline
&lt;/h3&gt;

&lt;p&gt;After creating the dataset, it's time to define the performance baseline (a.k.a how much does my database perform under current conditions). Let's create one with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;                  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;                       &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;                       &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt;NUMBER_OF_CLIENTS&amp;gt;          &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; &amp;lt;NUMBER_OF_PGBENCH_THREADS&amp;gt;  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; &amp;lt;NUMBER_OF_TRANSACTIONS&amp;gt;     &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;NUMBER_OF_CLIENTS&amp;gt;&lt;/code&gt; is the number of clients to connect to PostgreSQL with, a good starting point is &lt;code&gt;20&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;NUMBER_OF_PGBENCH_THREADS&amp;gt;&lt;/code&gt; is the number pf pgbench threads to run concurrently&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;NUMBER_OF_TRANSACTIONS&amp;gt;&lt;/code&gt; is the overall number of transactions to execute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following code will execute 1000 transactions using 5 parallel threads and 5 clients&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; 1000             &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pgbench (16.1, server 16.2)
starting vacuum...end.
transaction type: &amp;lt;builtin: TPC-B (sort of)&amp;gt;
scaling factor: 100
query mode: simple
number of clients: 5
number of threads: 5
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 5000/5000
number of failed transactions: 0 (0.000%)
latency average = 180.179 ms
initial connection time = 164.343 ms
tps = 27.750256 (without initial connection time)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Showcasing that we are generating almost 28 transactions per second (&lt;code&gt;tps&lt;/code&gt; in the above one) from my laptop sitting in Verona to a database sitting in Turin!&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: change parameters and validate the new performance
&lt;/h3&gt;

&lt;p&gt;What could we change? We could increase the percentage of total RAM dedicated to shared buffers to 60% by going in the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; selecting the PostgreSQL service, selecting the &lt;strong&gt;Service settings&lt;/strong&gt; menu and clicking on the &lt;strong&gt;Configure&lt;/strong&gt; button next to the Advanced configuration section.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcdokhkyx711ebqs5qcn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcdokhkyx711ebqs5qcn.png" alt="Shared buffers to 60%" width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, if we try with the same pgbench command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; 1000             &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get updated results with &lt;code&gt;36&lt;/code&gt; tps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: define you custom workload in a SQL file
&lt;/h3&gt;

&lt;p&gt;You can customize the type of queries pgbench will execute based on your workloads. All you need to do is to define a &lt;code&gt;SQL&lt;/code&gt; file and call it with pgbench &lt;code&gt;-f&lt;/code&gt; flag. Let's create a file named &lt;code&gt;test.sql&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pgbench_accounts&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;bid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above we are setting a &lt;code&gt;r1&lt;/code&gt; variable with a random number between &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;2000&lt;/code&gt; and then use it to query the &lt;code&gt;pgbench_accounts&lt;/code&gt; table for rows with &lt;code&gt;bid&lt;/code&gt; equal to &lt;code&gt;r1&lt;/code&gt;.&lt;br&gt;
We can now execute the custom sql file with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; 1000             &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-f&lt;/span&gt; test.sql         &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our example the above produces the following output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scaling factor: 1
query mode: simple
number of clients: 5
number of threads: 5
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 500/500
number of failed transactions: 0 (0.000%)
latency average = 10514.270 ms
initial connection time = 245.617 ms
tps = 0.475544 (without initial connection time)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A miserable &lt;code&gt;0.47&lt;/code&gt; transactions per second. Can we do better?&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Optimize your database
&lt;/h3&gt;

&lt;p&gt;Once you took time to write a custom workload to test with pgbench and got a baseline, it's time to optimize your database. You could do it based on your knowledge and available data but, if you are using Aiven, you can receive automated insights about your PostgreSQL database workload and index and (in the future) SQL optimization suggestions. Check out the video below!&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/_hXGKhbgnos"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For more information, check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; and &lt;a href="https://www.eversql.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotpgbench" rel="noopener noreferrer"&gt;EverSQL&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our case, Aiven will suggest to add an index on the &lt;code&gt;bid&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvyomurrvs61z95y0nnh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvyomurrvs61z95y0nnh.png" alt="Suggestion to add an Index" width="800" height="497"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;pgbench_accounts_idx_bid&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"pgbench_accounts"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"bid"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating it, and retrying the pgbench command above, the custom workload speed raised from &lt;code&gt;0.45&lt;/code&gt; to &lt;code&gt;35&lt;/code&gt; transaction per second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;pgbench is a great tool to perform performance checks on your PostgreSQL database and the custom script option provides a way to customize it to match your workloads. Used with Aiven and its AI insights is a great way to test and optimize your database!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>pgbench</category>
      <category>sql</category>
      <category>performance</category>
    </item>
    <item>
      <title>From dbf to PostgreSQL with Python</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Mon, 04 Mar 2024 15:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/from-dbf-to-postgresql-with-python-4kfa</link>
      <guid>https://forem.com/ftisiot/from-dbf-to-postgresql-with-python-4kfa</guid>
      <description>&lt;p&gt;Some time ago I found an interesting database file suffix I never faced before: the &lt;code&gt;.dbf&lt;/code&gt; and saw aroung that it was first introduced in 1983 with dBASE II. This article showcases how we can automatically generate the PostgreSQL table and fill it with data using Python and &lt;a href="https://dbfread.readthedocs.io/en/latest/exporting_data.html#dataset-sql" rel="noopener noreferrer"&gt;dbfread&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? &lt;br&gt;
🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Download a sample dbf file
&lt;/h2&gt;

&lt;p&gt;We can get a sample &lt;code&gt;.dbf&lt;/code&gt; file from &lt;a href="https://github.com/infused/dbf/blob/master/spec/fixtures/cp1251.dbf" rel="noopener noreferrer"&gt;Infused&lt;/a&gt; with the following in our terminal&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://github.com/infused/dbf/raw/master/spec/fixtures/cp1251.dbf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will store a file named &lt;code&gt;cp1251.dbf&lt;/code&gt; in the current folder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use dbfread to move the data into PostgreSQL
&lt;/h2&gt;

&lt;p&gt;We need to install simpledf with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dbfread
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can write a Python script (named &lt;code&gt;convert_bdf_to_sql.py&lt;/code&gt;) that opens the &lt;code&gt;sample.dbf&lt;/code&gt; file and creates the PostgreSQL DDL and loads the data into a &lt;code&gt;CSV&lt;/code&gt; file we can use to populate the database&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dbfread&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DBF&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;

&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postgresql://[USER]:[PWD]@[HOST]:[PORT]/[DBNAME]?sslmode=require&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;people&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nc"&gt;DBF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cp1251.dbf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lowernames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above script we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;connect to a PostgreSQL instance using 

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[USER]&lt;/code&gt;: the username&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[PWD]&lt;/code&gt;: the password&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[HOST]&lt;/code&gt;: the hostname&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[PORT]&lt;/code&gt;: the port&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[DBNAME]&lt;/code&gt;: the database name&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;define a table named &lt;code&gt;people&lt;/code&gt; that will be created and populated&lt;/li&gt;

&lt;li&gt;insert into the &lt;code&gt;people&lt;/code&gt; table all the records in &lt;code&gt;cp1251.dbf&lt;/code&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We can then execute it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python convert_bdf_to_sql.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we now connect to our PostgreSQL database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql postgres://[USER]:[PWD]@[HOST]:[PORT]/[DBNAME]?sslmode&lt;span class="o"&gt;=&lt;/span&gt;require
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can check the &lt;code&gt;people&lt;/code&gt; table being populated with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;people&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see the data in the table!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | rn |                  name
----+----+----------------------------------------
  1 |  1 | амбулаторно-поликлиническое
  2 |  2 | больничное
  3 |  3 | НИИ
  4 |  4 | образовательное медицинское учреждение
(4 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>postgres</category>
      <category>dbf</category>
      <category>sql</category>
      <category>python</category>
    </item>
    <item>
      <title>List of PostgreSQL® AI Projects and Resources</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Thu, 29 Feb 2024 10:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/list-of-postgresqlr-ai-projects-and-resources-4142</link>
      <guid>https://forem.com/ftisiot/list-of-postgresqlr-ai-projects-and-resources-4142</guid>
      <description>&lt;p&gt;Everyone is now talking about AI, and modern databases like PostgreSQL® are increasingly being adopted in companies' AI journey as sources of data or key pieces of the AI infrastructure. Moreover there's a new set projects that are solving PostgreSQL problems with AI.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/ftisiot/postgresql-ai-projects" rel="noopener noreferrer"&gt;List of PostgreSQL® AI projects and resources&lt;/a&gt; is open-source project to collect PostgreSQL® extensions, applications and resources (video or blogs) talking about how our loved database can fit in the AI journey.&lt;/p&gt;

&lt;p&gt;Did you write an article, gave a talk, are you working on an extension regarding PostgreSQL and AI? &lt;strong&gt;PRs are welcome&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp4ic9m4h861impwu7o9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp4ic9m4h861impwu7o9.png" alt="Preview of List of PostgreSQL® AI Projects and Resources" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>ai</category>
      <category>projects</category>
      <category>resources</category>
    </item>
    <item>
      <title>11 Lessons to learn when using NULLs in PostgreSQL®</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Wed, 28 Feb 2024 10:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/11-lessons-to-learn-when-using-nulls-in-postgresqlr-2b7g</link>
      <guid>https://forem.com/ftisiot/11-lessons-to-learn-when-using-nulls-in-postgresqlr-2b7g</guid>
      <description>&lt;p&gt;A boolean value should only contain two values, &lt;code&gt;True&lt;/code&gt; or &lt;code&gt;False&lt;/code&gt;, but is it correct? Usually people assume so, but sometimes miss the fact that there could be the &lt;strong&gt;absence&lt;/strong&gt; of the value all-together. In databases this is absence is usually stored as &lt;code&gt;NULL&lt;/code&gt; and this blog showcases how to find them, use them properly and 11 lessons to learn to be a &lt;code&gt;NULL&lt;/code&gt; Pro! &lt;/p&gt;

&lt;p&gt;Keep in mind, it's not only booleans that can contain &lt;code&gt;NULL&lt;/code&gt; values, it's all the columns where you don't define a &lt;code&gt;NOT NULL&lt;/code&gt; constraint!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? &lt;br&gt;
🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  It all starts with some columns and rows
&lt;/h2&gt;

&lt;p&gt;Let's start from the basics: you have a PostgreSQL® database and a table, called &lt;code&gt;users&lt;/code&gt; like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Hey, do you want to test the above code but not having a PostgreSQL database handy? Past your code in &lt;a href="https://pgplayground.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotNULL" rel="noopener noreferrer"&gt;PostgreSQL Playground&lt;/a&gt; and quickly check the results!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's insert some data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'jdoe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Jon'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'lspencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Liz'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Spencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'hlondon'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Hanna'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'London'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Querying the data showcases the table with all the columns filled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Insert NULLs
&lt;/h2&gt;

&lt;p&gt;Now, let's check if we can insert some &lt;code&gt;NULL&lt;/code&gt;s, let's try by inserting them in the &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;surname&lt;/code&gt; and &lt;code&gt;age&lt;/code&gt; columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'test'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works since we don't have any constraint&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
  4 | test     |       |         |    
(4 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can even simplify the insert with the same effect&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'test1'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
  4 | test     |       |         |    
  5 | test1    |       |         |    
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Inserting NULLs
&lt;/h2&gt;

&lt;p&gt;The first step to avoid &lt;code&gt;NULL&lt;/code&gt;s from appearing in a table is to forbid inserting them. The section below showcases a few situations on how to avoid inserting &lt;code&gt;NULL&lt;/code&gt;s in new and existing columns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inserting NULLs in the primary key
&lt;/h3&gt;

&lt;p&gt;Can I insert a &lt;code&gt;NULL&lt;/code&gt; in the table primary key? In theory this could be allowed since there's no explicit &lt;code&gt;NOT NULL&lt;/code&gt; constraint on the column. Let's try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this fails with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:  null value in column "username" of relation "users" violates not-null constraint
DETAIL:  Failing row contains (6, null, null, null, null).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 1 💡&lt;/strong&gt;: Primary Keys are by default &lt;code&gt;NOT NULL&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a new column with NOT NULL constraint
&lt;/h3&gt;

&lt;p&gt;Let's add a new column to our table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will fail, since the previously inserted data don't have any associated value for &lt;code&gt;points&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:  column "points" of relation "users" contains null values
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's add a default value with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The table was altered, and now contains &lt;code&gt;0 points&lt;/code&gt; for all the existing rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age | points 
----+----------+-------+---------+-----+--------
  1 | jdoe     | Jon   | Doe     |  25 |      0
  2 | lspencer | Liz   | Spencer |  35 |      0
  3 | hlondon  | Hanna | London  |  45 |      0
  4 | test     |       |         |     |      0
  5 | test1    |       |         |     |      0
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we try to insert a new row without specifying the &lt;code&gt;points&lt;/code&gt; column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'usrwithnullpoints'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The row is inserted with the default &lt;code&gt;0 points&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |     username      | name  | surname | age | points 
----+-------------------+-------+---------+-----+--------
  1 | jdoe              | Jon   | Doe     |  25 |      0
  2 | lspencer          | Liz   | Spencer |  35 |      0
  3 | hlondon           | Hanna | London  |  45 |      0
  4 | test              |       |         |     |      0
  5 | test1             |       |         |     |      0
  7 | usrwithnullpoints |       |         |     |      0
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What if we try to push a &lt;code&gt;NULL&lt;/code&gt; into &lt;code&gt;points&lt;/code&gt;?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'forcenullpoints'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get the &lt;code&gt;ERROR:  null value in column "points" of relation "users" violates not-null constraint&lt;/code&gt; stating that our insert is violating the &lt;code&gt;NOT NULL&lt;/code&gt; constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 2 💡&lt;/strong&gt;: Setting a &lt;code&gt;NOT NULL&lt;/code&gt; constraint doesn't allow inserting &lt;code&gt;NULL&lt;/code&gt;s&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 3 💡&lt;/strong&gt;: If we want to add a &lt;code&gt;NOT NULL&lt;/code&gt; column to an existing table with data, we need to specify the &lt;strong&gt;default&lt;/strong&gt; value for existing rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add a NOT NULL constraint to an existing column
&lt;/h3&gt;

&lt;p&gt;If we try to change an existing column (e.g. &lt;code&gt;name&lt;/code&gt;) to add the &lt;code&gt;NOT NULL&lt;/code&gt; constraint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get the similar error &lt;code&gt;ERROR:  column "name" of relation "users" contains null values&lt;/code&gt;, therefore we need also need to amend the current &lt;code&gt;NULL&lt;/code&gt; values (and possibly set a default)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Hugo'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'Hugo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above we: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Altered the type, setting it to &lt;code&gt;TEXT&lt;/code&gt; again, to apply the function &lt;code&gt;COALESCE(name, 'Hugo')&lt;/code&gt; which is replacing &lt;code&gt;NULL&lt;/code&gt;s in the &lt;code&gt;name&lt;/code&gt; field with &lt;code&gt;Hugo&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set the default value for the &lt;code&gt;name&lt;/code&gt; column to &lt;code&gt;Hugo&lt;/code&gt; &amp;lt;- This is not strictly necessary&lt;/li&gt;
&lt;li&gt;Applied the &lt;code&gt;NOT NULL&lt;/code&gt; constraint since we don't have any &lt;code&gt;NULL&lt;/code&gt; values anymore&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: the above method locks the entire table until done. A faster approach could be to split the &lt;code&gt;COALESCE(name, 'Hugo')&lt;/code&gt; into a dedicated update and then only perform the &lt;code&gt;NOT NULL&lt;/code&gt; and &lt;code&gt;DEFAULT&lt;/code&gt; settings on the &lt;code&gt;ALTER TABLE&lt;/code&gt; statement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The results are in line with the expected, no &lt;code&gt;NULL&lt;/code&gt;s in the &lt;code&gt;name&lt;/code&gt; column, being replaced by &lt;code&gt;Hugo&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |     username      | name  | surname | age | points 
----+-------------------+-------+---------+-----+--------
  1 | jdoe              | Jon   | Doe     |  25 |      0
  2 | lspencer          | Liz   | Spencer |  35 |      0
  3 | hlondon           | Hanna | London  |  45 |      0
  4 | test              | Hugo  |         |     |      0
  5 | test1             | Hugo  |         |     |      0
  7 | usrwithnullpoints | Hugo  |         |     |      0
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 4 💡&lt;/strong&gt;: If we need to add a &lt;code&gt;NOT NULL&lt;/code&gt; column to an existing column in a table, we need to amend the existing &lt;code&gt;NULL&lt;/code&gt; values and optionally set the &lt;strong&gt;default&lt;/strong&gt; value for incoming rows.&lt;/p&gt;

&lt;p&gt;What if I want to &lt;strong&gt;remove&lt;/strong&gt; the &lt;code&gt;NOT NULL&lt;/code&gt; constraint? We can do it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Is DEFAULT enough?
&lt;/h3&gt;

&lt;p&gt;What if we leave the &lt;code&gt;NOT NULL&lt;/code&gt; constraint aside, and just set the &lt;code&gt;DEFAULT&lt;/code&gt;? Let's try with the &lt;code&gt;date_first_login&lt;/code&gt; new column&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;date_first_login&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking what's in the table, we see all the &lt;code&gt;date_first_login&lt;/code&gt; filled&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |     username      | name  | surname | age | points | date_first_login 
----+-------------------+-------+---------+-----+--------+------------------
  1 | jdoe              | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer          | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon           | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test              | Hugo  |         |     |      0 | 2024-02-27
  5 | test1             | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints | Hugo  |         |     |      0 | 2024-02-27
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we try to insert a new row, without specifying the &lt;code&gt;date_first_login&lt;/code&gt;, it's correctly assigned to today&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_with_no_date_first_login'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above is correctly inserted with a not null &lt;code&gt;date_first_login&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |           username            | name  | surname | age | points | date_first_login 
----+-------------------------------+-------+---------+-----+--------+------------------
  1 | jdoe                          | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer                      | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon                       | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                          | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                         | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints             | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login | Hugo  |         |     |      0 | 2024-02-27
(7 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So... is the &lt;code&gt;NOT NULL&lt;/code&gt; constraint really needed? The reply is &lt;strong&gt;YES&lt;/strong&gt; since I could insert &lt;code&gt;NULLs&lt;/code&gt; by specifying it in the &lt;code&gt;INSERT&lt;/code&gt; statement&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_first_login&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_with_null_date_first_login'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which will be stored correctly with a &lt;code&gt;NULL&lt;/code&gt; value for &lt;code&gt;date_first_login&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name  | surname | age | points | date_first_login 
----+---------------------------------+-------+---------+-----+--------+------------------
  1 | jdoe                            | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer                        | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon                         | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                            | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo  |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo  |         |     |      0 | 
(8 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 5 💡&lt;/strong&gt;: The &lt;code&gt;DEFAULT&lt;/code&gt; is going to prevent &lt;code&gt;NULL&lt;/code&gt;s from being inserted only if the &lt;code&gt;NULL&lt;/code&gt; is &lt;strong&gt;NOT&lt;/strong&gt; specified in the target column. To enforce the value we need an explicit &lt;code&gt;NOT NULL&lt;/code&gt; constraint&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying NULLs
&lt;/h2&gt;

&lt;p&gt;In the above section we tried to avoid storing &lt;code&gt;NULL&lt;/code&gt;s in our systems, what if we need to deal with existing &lt;code&gt;NULL&lt;/code&gt;s in our column? Here we can use a few SQL functions to detect them and change their value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting NULLs with the &lt;code&gt;IS NULL&lt;/code&gt; condition
&lt;/h3&gt;

&lt;p&gt;What if we want to detect the &lt;code&gt;NULL&lt;/code&gt; values? We can use the &lt;code&gt;IS NULL&lt;/code&gt; condition, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which will retrieve the list of &lt;code&gt;NULL&lt;/code&gt;s surnames&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name | surname | age | points | date_first_login 
----+---------------------------------+------+---------+-----+--------+------------------
  4 | test                            | Hugo |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo |         |     |      0 | 
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The opposite list could be retrieved by applying the &lt;code&gt;IS NOT NULL&lt;/code&gt; condition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Resulting in&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age | points | date_first_login 
----+----------+-------+---------+-----+--------+------------------
  1 | jdoe     | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon  | Hanna | London  |  45 |      0 | 2024-02-27
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What if I try to use the equal comparison (&lt;code&gt;=&lt;/code&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above retrieves &lt;strong&gt;NO rows&lt;/strong&gt; since the &lt;code&gt;NULL&lt;/code&gt; value means there is &lt;strong&gt;NO value&lt;/strong&gt; and is &lt;strong&gt;NOT&lt;/strong&gt; equal to any other value (&lt;code&gt;NULL&lt;/code&gt;=&lt;code&gt;NULL&lt;/code&gt; is &lt;strong&gt;False&lt;/strong&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name | surname | age | points | date_first_login 
----+----------+------+---------+-----+--------+------------------
(0 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we apply the different operator (&lt;code&gt;&amp;lt;&amp;gt;&lt;/code&gt;) with &lt;code&gt;NULL&lt;/code&gt; we also get back &lt;strong&gt;NO rows&lt;/strong&gt;, since the &lt;code&gt;NULL&lt;/code&gt; value means there is &lt;strong&gt;NO value&lt;/strong&gt;, and therefore is not equal or different to other values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 6 💡&lt;/strong&gt;: To find &lt;code&gt;NULL&lt;/code&gt;s we should always use the &lt;code&gt;IS NULL&lt;/code&gt;; the &lt;code&gt;IS NOT NULL&lt;/code&gt; to fetch the not null rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Associating a default value to NULLs with COALESCE
&lt;/h3&gt;

&lt;p&gt;In the above we saw how to find &lt;code&gt;NULL&lt;/code&gt;s, what if we need to include them in a &lt;code&gt;SELECT&lt;/code&gt; statement with other values? We can either add an &lt;code&gt;OR&lt;/code&gt; condition like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'London'&lt;/span&gt; 
&lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or we can apply a transformation to change the &lt;code&gt;NULL&lt;/code&gt;s into a value we can compare&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Verona'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'London'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Verona'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both will have the same result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name  | surname | age | points | date_first_login 
----+---------------------------------+-------+---------+-----+--------+------------------
  3 | hlondon                         | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                            | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo  |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo  |         |     |      0 | 
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, if we apply the second option, we must be sure that &lt;code&gt;Verona&lt;/code&gt; (the label we associate to &lt;code&gt;NULL&lt;/code&gt; values) is a value not present in the column or a value we want to include in our selection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 7 💡&lt;/strong&gt;: When using &lt;code&gt;COALESCE&lt;/code&gt; check out that the value you want to use as substitute of &lt;code&gt;NULL&lt;/code&gt; is not already present in the table and outside of your business logic otherwise you might include in the filter more values than expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performing Math on NULL columns
&lt;/h3&gt;

&lt;p&gt;A similar scenario is due when we need to perform a calculation over a &lt;code&gt;NULL&lt;/code&gt;able column. Let's take the example of the &lt;code&gt;age&lt;/code&gt; column: what if we want to calculate the average age?&lt;/p&gt;

&lt;p&gt;Let's see the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name  | surname | age | points | date_first_login 
----+---------------------------------+-------+---------+-----+--------+------------------
  1 | jdoe                            | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer                        | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon                         | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                            | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo  |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo  |         |     |      0 | 
(8 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can calculate the average age with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Resulting in &lt;code&gt;35&lt;/code&gt; which is the average between the 3 &lt;strong&gt;NOT NULL&lt;/strong&gt; ages! Similarly if we count the &lt;code&gt;date_first_login&lt;/code&gt;s with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_first_login&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is &lt;code&gt;7&lt;/code&gt;, one row for every user having a not null &lt;code&gt;date_first_login&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 8 💡&lt;/strong&gt;: When performing Mathematical operation with nullable columns, only &lt;strong&gt;NOT NULL&lt;/strong&gt; values will be elaborated!&lt;/p&gt;

&lt;p&gt;What if we try to calculate the average &lt;code&gt;points&lt;/code&gt;? First let's update the table to give &lt;code&gt;jdoe&lt;/code&gt; 25 points&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'jdoe'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's calcuate the average points&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is &lt;code&gt;3.125&lt;/code&gt; which is the mathematical average between the &lt;code&gt;25&lt;/code&gt; points of &lt;code&gt;jdoe&lt;/code&gt; and the &lt;code&gt;0&lt;/code&gt; of the rest due to us previously setting &lt;code&gt;0&lt;/code&gt; as default value. Is this correct? It depends! If we wanted to calculate the average number of points of users actually playing some games, this could be a misleading calculation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 9 💡&lt;/strong&gt;: When performing Mathematical operation with nullable columns, the default value of a column will impact the results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Joining on NULL columns
&lt;/h3&gt;

&lt;p&gt;Building on what we explored before about the &lt;code&gt;NULL&lt;/code&gt; not being equal or different to any other value, we need to pay closer attention when we deal with nullable columns in join condition. &lt;/p&gt;

&lt;p&gt;Let's create another table called &lt;code&gt;sessions&lt;/code&gt; and insert some data. For a weird reason, the design of this table will force us to join with &lt;code&gt;users&lt;/code&gt; on the &lt;code&gt;surname&lt;/code&gt; which is a NULLABLE field.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Spencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Spencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we want to count users having a session on each day we could think of simply writing the above query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However the results shows only the 4 sessions below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; extract | count 
---------+-------
      25 |     2
      26 |     1
      27 |     1
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why? It's again because of &lt;code&gt;NULL&lt;/code&gt; values in the join condition. This problem of poor data quality on both ends of the join is stopping us from reporting any useful number. How could we fix it? Well, the fix should be done uphill by setting the &lt;code&gt;sessions&lt;/code&gt; table to use the &lt;code&gt;username&lt;/code&gt; instead of the &lt;code&gt;surname&lt;/code&gt; and defining a proper &lt;code&gt;FOREIGN KEY&lt;/code&gt; to the &lt;code&gt;users&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 10 💡&lt;/strong&gt;: Joining tables on nullable columns without caring about the NULL management will impact the results by reducing the number of rows included in the join.&lt;/p&gt;

&lt;p&gt;Still, if we want a simplistic solution, we could do the apply the &lt;code&gt;COALESCE&lt;/code&gt; to both sides of the join:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'NoSurname'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'NoSurname'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result shows an increased count of sessions&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; extract | count 
---------+-------
      25 |     7
      26 |     6
      27 |     6
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this is still not the correct answer, since we are seeing &lt;code&gt;19&lt;/code&gt; session being aggregated while only &lt;code&gt;9&lt;/code&gt; sessions were inserted in the original &lt;code&gt;sessions&lt;/code&gt; table. Why is that? It's because we now casted all the &lt;code&gt;NULL&lt;/code&gt;s as &lt;code&gt;NoSurname&lt;/code&gt; on both sides of the join. Since we had &lt;code&gt;5&lt;/code&gt; sessions without &lt;code&gt;surname&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt; users without &lt;code&gt;surname&lt;/code&gt; we are creating a cartesian join of all the empty surnames with the empty sessions.&lt;/p&gt;

&lt;p&gt;We had the following users without &lt;code&gt;surname&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test
test1
usrwithnullpoints
user_with_no_date_first_login
user_with_null_date_first_login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the following sessions without &lt;code&gt;surname&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  6 | 2024-02-27 14:10:08.419021
  7 | 2024-02-25 14:10:08.419021
  8 | 2024-02-25 14:10:08.419021
  9 | 2024-02-26 14:10:08.419021
 10 | 2024-02-27 14:10:08.419021
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of the users was associated with each sessions, creating a overflow of sessions.&lt;/p&gt;

&lt;p&gt;A more correct solution, attribuiting all the &lt;code&gt;NULL&lt;/code&gt; session to a same phantom user &lt;code&gt;NoUser&lt;/code&gt; could be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'NoUser'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;left&lt;/span&gt; &lt;span class="k"&gt;outer&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this is an effort to fix data quality issues at query time, which is never a good idea.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 11 💡&lt;/strong&gt;: Fixing data quality issues related to nullable columns at query time is a risky approach. Solving them with proper table design and constraints allows you to keep quality data in the tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Dealing with &lt;code&gt;NULL&lt;/code&gt;s is something to care upfront, during table design, in order to avoid data quality issues downstream! &lt;br&gt;
This is it for the 11 lessons learnt using &lt;code&gt;NULL&lt;/code&gt;s, do you have others to share? Feel free to reach me on X/Twitter at &lt;a class="mentioned-user" href="https://dev.to/ftisiot"&gt;@ftisiot&lt;/a&gt; or on &lt;a href="https://www.linkedin.com/in/francescotisiot/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>null</category>
      <category>isnull</category>
    </item>
    <item>
      <title>How to CONCAT in MySQL</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 13 Feb 2024 13:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/how-to-concat-in-mysql-3jgf</link>
      <guid>https://forem.com/ftisiot/how-to-concat-in-mysql-3jgf</guid>
      <description>&lt;p&gt;One of the most common tasks with strings is concatenation! This blog post showcases several techniques to perform it with MySQL.&lt;/p&gt;

&lt;p&gt;If you need a FREE MySQL database? 🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
If you need to optimize your SQL query?  🐧 Check &lt;a href="https://www.eversql.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotmysqlconcat" rel="noopener noreferrer"&gt;EverSQL&lt;/a&gt;! 🐧&lt;/p&gt;
&lt;h2&gt;
  
  
  String concatenation in MySQL by placing strings next to each other
&lt;/h2&gt;

&lt;p&gt;Let's take a basic example: a table containing &lt;code&gt;firstname&lt;/code&gt; and &lt;code&gt;surname&lt;/code&gt; fields we want to concatenate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;creation_date&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;preferred_item&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; 
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'John'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'EMEA'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'football'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Laura'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Smith'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'EMEA'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-02-01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tennis ball'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Eva'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Book'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'APAC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-03'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Andy'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Novel'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'APAC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-02-05'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'teddy bear'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Eva'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Leeds'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'APAC'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2024-02-01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'football'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can concatenate two quoted strings by placing one string next to the other like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'MyFirstname'&lt;/span&gt; &lt;span class="s1"&gt;'MySurname'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------------------+
| MyFirstname          |
+----------------------+
| MyFirstnameMySurname |
+----------------------+
1 row in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the result is the correct concatenation&lt;/li&gt;
&lt;li&gt;the column name is the first quoted string&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, if we apply the same principle to standard columns instead of quoted strings with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get a different result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---------+
| surname |
+---------+
| John    |
| Laura   |
| Eva     |
| Andy    |
| Eva     |
+---------+
5 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result showcases only the &lt;code&gt;firstname&lt;/code&gt; in the results, with the &lt;code&gt;surname&lt;/code&gt; being included as the result column heading.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TLDR: concatenating strings in MySQL by placing strings next to each other is possible only if we are dealing with quoted strings.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Concatenating a non string
&lt;/h3&gt;

&lt;p&gt;Is it possible with the same method to concatenate a non string?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'MyFirstname'&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nope, we get &lt;code&gt;ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1' at line 1&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Concatenating a NULL
&lt;/h3&gt;

&lt;p&gt;What about concatenating a &lt;code&gt;NULL&lt;/code&gt; value?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'MyFirstname'&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same error &lt;code&gt;ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'NULL' at line 1&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  String concatenation in MySQL with the &lt;code&gt;||&lt;/code&gt; operator
&lt;/h2&gt;

&lt;p&gt;We can also concatenate &lt;code&gt;firstname&lt;/code&gt; and &lt;code&gt;surname&lt;/code&gt; using the &lt;code&gt;||&lt;/code&gt; operator with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result will be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------+
| firstname || surname  |
+-----------------------+
| JohnDoe               |
| LauraSmith            |
| EvaBook               |
| AndyNovel             |
| EvaLeeds              |
+-----------------------+
5 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please check that there's no space between the &lt;code&gt;firstname&lt;/code&gt; and &lt;code&gt;surname&lt;/code&gt;, but we can add it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which shows the expected result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------------------+
| firstname || ' ' || surname  |
+------------------------------+
| John Doe                     |
| Laura Smith                  |
| Eva Book                     |
| Andy Novel                   |
| Eva Leeds                    |
+------------------------------+
5 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Concatenating a non string
&lt;/h3&gt;

&lt;p&gt;What about concatenating strings with other fields that are not a string? If, for example, we try to concat the &lt;code&gt;firstname&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt; (an &lt;code&gt;integer&lt;/code&gt;) with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get the expected result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------+
| firstname || id  |
+------------------+
| John1            |
| Laura2           |
| Eva3             |
| Andy4            |
| Eva5             |
+------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same if we try to concatenate the &lt;code&gt;creation_date&lt;/code&gt; date field&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;creation_date&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------------+
| firstname || creation_date  |
+-----------------------------+
| John2024-01-01              |
| Laura2024-02-01             |
| Eva2024-01-03               |
| Andy2024-02-05              |
| Eva2024-02-01               |
+-----------------------------+
5 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However we might want to add a function to the date to get alternative formats from the default, for example the &lt;code&gt;DATE_FORMAT&lt;/code&gt; function below converts the date in a weird format separating years, months and days with &lt;code&gt;..&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;DATE_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;creation_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'..%Y..%M..%D'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---------------------------------------------------------+
| firstname || DATE_FORMAT(creation_date, '..%Y..%M..%D') |
+---------------------------------------------------------+
| John..2024..January..1st                                |
| Laura..2024..February..1st                              |
| Eva..2024..January..3rd                                 |
| Andy..2024..February..5th                               |
| Eva..2024..February..1st                                |
+---------------------------------------------------------+
5 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Concatenating a NULL value
&lt;/h3&gt;

&lt;p&gt;What does it happen if we try to concatenate a &lt;code&gt;NULL&lt;/code&gt; value? If you check the dataset created, the row with &lt;code&gt;id=3&lt;/code&gt; for &lt;code&gt;Eva Book&lt;/code&gt; has a &lt;code&gt;preferred_item&lt;/code&gt; equal to &lt;code&gt;NULL&lt;/code&gt;, let's check the output of concatenating the &lt;code&gt;firstname&lt;/code&gt;, &lt;code&gt;surname&lt;/code&gt; and &lt;code&gt;preferred_item&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;preferred_item&lt;/span&gt;  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------------------------------+
| firstname || surname || preferred_item   |
+------------------------------------------+
| JohnDoefootball                          |
| LauraSmithtennis ball                    |
| NULL                                     |
| AndyNovelteddy bear                      |
| EvaLeedsfootball                         |
+------------------------------------------+
5 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result shows that the entire concatenation string is &lt;code&gt;NULL&lt;/code&gt;!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TLDR: a concatenation will return &lt;code&gt;NULL&lt;/code&gt; if any of the strings is &lt;code&gt;NULL&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  String concatenation in MySQL with the &lt;code&gt;CONCAT&lt;/code&gt; function
&lt;/h2&gt;

&lt;p&gt;Another option is to use MySQL &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_concat" rel="noopener noreferrer"&gt;Concat function&lt;/a&gt;. Let's try with the basic &lt;code&gt;firstname&lt;/code&gt; and &lt;code&gt;surname&lt;/code&gt; example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The results are in line with the previous &lt;code&gt;||&lt;/code&gt; example, not having any additional separator added into the resulting string.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------------------------+
| concat(firstname, surname) |
+----------------------------+
| JohnDoe                    |
| LauraSmith                 |
| EvaBook                    |
| AndyNovel                  |
| EvaLeeds                   |
+----------------------------+
5 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How can we add a space in between? We have two options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add a space to the &lt;code&gt;CONCAT&lt;/code&gt; function
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Use the &lt;code&gt;CONCAT_WS&lt;/code&gt; function (concat with separator) passing the space as separator
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;concat_ws&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Concatenating a non string with the CONCAT function
&lt;/h3&gt;

&lt;p&gt;What happens if we include non strings in the &lt;code&gt;CONCAT&lt;/code&gt; function?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As before we get the expected result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------+
| concat(firstname, id) |
+-----------------------+
| John1                 |
| Laura2                |
| Eva3                  |
| Andy4                 |
| Eva5                  |
+-----------------------+
5 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Concatenating a NULL value with the CONCAT function
&lt;/h3&gt;

&lt;p&gt;What about &lt;code&gt;NULL&lt;/code&gt; values?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;preferred_item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;CLIENTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As before, if any of the source strings is &lt;code&gt;NULL&lt;/code&gt;, the result of the concatenation is &lt;code&gt;NULL&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------------------------------------------+
| concat(firstname, surname, preferred_item) |
+--------------------------------------------+
| JohnDoefootball                            |
| LauraSmithtennis ball                      |
| NULL                                       |
| AndyNovelteddy bear                        |
| EvaLeedsfootball                           |
+--------------------------------------------+
5 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  String aggregation via concatenation in MySQL with the &lt;code&gt;GROUP_CONCAT&lt;/code&gt; function
&lt;/h2&gt;

&lt;p&gt;If we need to concatenate all the strings belonging to the same group, we can use MySQL &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_group-concat" rel="noopener noreferrer"&gt;GROUP_CONCAT&lt;/a&gt; function.&lt;/p&gt;

&lt;p&gt;If, in our example, we want to concatenate the &lt;code&gt;firstname&lt;/code&gt; of all the people grouped by their &lt;code&gt;region&lt;/code&gt; we can write the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;CLIENTS&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is the following, with &lt;code&gt;,&lt;/code&gt; as the default separator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------------------------+
| region | group_concat(firstname) |
+--------+-------------------------+
| APAC   | Eva,Andy,Eva            |
| EMEA   | John,Laura              |
+--------+-------------------------+
2 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can also order how the strings are ordered in the list with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;CLIENTS&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+--------------------------------------------+
| region | group_concat(firstname ORDER BY firstname) |
+--------+--------------------------------------------+
| APAC   | Andy,Eva,Eva                               |
| EMEA   | John,Laura                                 |
+--------+--------------------------------------------+
2 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And change the separator with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="n"&gt;SEPARATOR&lt;/span&gt; &lt;span class="s1"&gt;'$$'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;CLIENTS&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+----------------+
| region | res            |
+--------+----------------+
| APAC   | Andy$$Eva$$Eva |
| EMEA   | John$$Laura    |
+--------+----------------+
2 rows in set (0.02 sec)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can also only list the distinct values, removing the duplication of &lt;code&gt;Eva&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;firstname&lt;/span&gt; &lt;span class="n"&gt;SEPARATOR&lt;/span&gt; &lt;span class="s1"&gt;'$$'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;CLIENTS&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------------+
| region | res         |
+--------+-------------+
| APAC   | Andy$$Eva   |
| EMEA   | John$$Laura |
+--------+-------------+
2 rows in set (0.03 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Aggregating and concatenating a non string with the GROUP_CONCAT function
&lt;/h3&gt;

&lt;p&gt;Let's try with a non string, like the &lt;code&gt;id&lt;/code&gt; field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;CLIENTS&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| region | res   |
+--------+-------+
| APAC   | 3,4,5 |
| EMEA   | 1,2   |
+--------+-------+
2 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same if we try with &lt;code&gt;creation_date&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+----------------------------------+
| region | res                              |
+--------+----------------------------------+
| APAC   | 2024-01-03,2024-02-05,2024-02-01 |
| EMEA   | 2024-01-01,2024-02-01            |
+--------+----------------------------------+
2 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Aggregating and concatenating a NULL value with the GROUP_CONCAT function
&lt;/h3&gt;

&lt;p&gt;What about our &lt;code&gt;NULL&lt;/code&gt; value? Let's check it out!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;group_concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;preferred_item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;CLIENTS&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case the behaviour is different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+----------------------+
| region | res                  |
+--------+----------------------+
| APAC   | teddy bear,football  |
| EMEA   | football,tennis ball |
+--------+----------------------+
2 rows in set (0.02 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even if the row with &lt;code&gt;id=3&lt;/code&gt; part of the &lt;code&gt;APAC&lt;/code&gt; group had &lt;code&gt;preferred_item&lt;/code&gt; equal to &lt;code&gt;NULL&lt;/code&gt;, the resulting aggregate is &lt;strong&gt;not NULL&lt;/strong&gt;! This is a common behaviour for aggregate functions to ignore the &lt;code&gt;NULL&lt;/code&gt; values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Here's an handy overview of the concatenation possibilities in MySQL:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;code&lt;/th&gt;
&lt;th&gt;works with quoted strings&lt;/th&gt;
&lt;th&gt;works with standard columns&lt;/th&gt;
&lt;th&gt;works with non strings&lt;/th&gt;
&lt;th&gt;Works with &lt;code&gt;NULL&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;aggregates results&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Placing strings next to each other&lt;/td&gt;
&lt;td&gt;'a' 'b'&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;| operator&lt;/td&gt;
&lt;td&gt;'a' || 'b'&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONCAT function&lt;/td&gt;
&lt;td&gt;concat('a','b')&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GROUP_CONCAT function&lt;/td&gt;
&lt;td&gt;group_concat('a','b')&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>mysql</category>
      <category>concatenation</category>
      <category>concat</category>
      <category>string</category>
    </item>
    <item>
      <title>How to load JSON data in PostgreSQL with the COPY command</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Fri, 02 Feb 2024 13:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/how-to-load-json-data-in-postgresql-with-the-the-copy-command-4gmh</link>
      <guid>https://forem.com/ftisiot/how-to-load-json-data-in-postgresql-with-the-the-copy-command-4gmh</guid>
      <description>&lt;p&gt;You have a JSON dataset that you want to upload to a PostgreSQL table containing properly formatted rows and columns... How do you do it? &lt;/p&gt;

&lt;p&gt;All the main sources like &lt;a href="https://ftisiot.net/postgresqljson/how-to-load-json-postgresql" rel="noopener noreferrer"&gt;my own blog&lt;/a&gt; and &lt;a href="https://konbert.com/blog/import-json-into-postgres-using-copy" rel="noopener noreferrer"&gt;others&lt;/a&gt; tell you to load the JSON in a dedicated temporary table containing a unique &lt;code&gt;JSON&lt;/code&gt; column, then parse it and load into the target table. However there could be another way, avoiding the temp table!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuazrsvjv9ckk3fgmlr14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuazrsvjv9ckk3fgmlr14.png" alt="Traditional and new JSON Load options" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this blog we'll see how to upload the JSON directly using PostgreSQL &lt;code&gt;COPY&lt;/code&gt; command and using an utility called &lt;a href="https://jqlang.github.io/jq/" rel="noopener noreferrer"&gt;jq&lt;/a&gt;!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? 🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  PostgreSQL COPY command
&lt;/h2&gt;

&lt;p&gt;First of all, let's check PostgreSQL &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;COPY command&lt;/a&gt;. It's a command that allows you to copy into a PostgreSQL table data from a file, there are two versions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;COPY&lt;/code&gt; if the file is sitting in the PG server already&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\COPY&lt;/code&gt; if the file is sitting in the client machine connected to the server via &lt;code&gt;psql&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In both cases, the standard &lt;code&gt;COPY&lt;/code&gt; command has the following minimal set of parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OPTIONAL&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;SOURCE&lt;/span&gt; &lt;span class="n"&gt;FILE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FORMAT&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;TARGET TABLE&amp;gt;&lt;/code&gt; is the name of the target table&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(&amp;lt;OPTIONAL LIST OF COLUMNS&amp;gt;)&lt;/code&gt; defines the columns in the table to load&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;SOURCE FILE&amp;gt;&lt;/code&gt; is pointing to the source file to load&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;FORMAT&amp;gt;&lt;/code&gt; defines the format of the data&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The full list of parameters is available in the &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;PostgreSQL COPY documentation&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  PostgreSQL COPY command - the out of the box formats
&lt;/h3&gt;

&lt;p&gt;Let's focus on the formats available in the &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;PostgreSQL COPY documentation&lt;/a&gt; are listed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;TEXT&lt;/code&gt;: can be used to load a full text&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CSV&lt;/code&gt;: to load comma (or otherwise separated) values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BINARY&lt;/code&gt;: to load binary files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unfortunately there doesn't seem to be an out of the box way to load JSON files!&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;PROGRAM&lt;/code&gt; option to the rescue!
&lt;/h2&gt;

&lt;p&gt;A, maybe not known, option of the &lt;code&gt;COPY&lt;/code&gt; command is to point to a program to execute instead of the file. This option can be called with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OPTIONAL&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;PROGRAM&lt;/span&gt; &lt;span class="nv"&gt;"&amp;lt;SET OF INSTRUCTIONS&amp;gt;"&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compared to the previous &lt;code&gt;\copy&lt;/code&gt; call, this time we are adding the &lt;code&gt;PROGRAM&lt;/code&gt; with a set of instructions delimited by quotes or double quotes that will be executed on the client machine before loading the data.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you are using the &lt;code&gt;COPY&lt;/code&gt; command on the server, you'll probably need a superuser. This is the error message shown in &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;: &lt;code&gt;ERROR:  must be superuser or have privileges of the pg_read_server_files role to COPY from a file&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, what we can do, is to reshape the data before loading it.&lt;/p&gt;

&lt;h3&gt;
  
  
  jq - the indispensable JSON parsing tool
&lt;/h3&gt;

&lt;p&gt;I've been using jq quite a while in a lot of blog posts, it's a very handy tool to parse, reshape, select JSON documents. For the purpose of this blog, we'll use to reshape the JSON input into a CSV format, digestible from the PostgreSQL &lt;code&gt;COPY&lt;/code&gt; command. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You need to have &lt;a href="https://jqlang.github.io/jq/" rel="noopener noreferrer"&gt;jq&lt;/a&gt; installed on the workstation from where the &lt;code&gt;COPY&lt;/code&gt; command is executed!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's create a basic JSON file with named &lt;code&gt;test.json&lt;/code&gt; with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mystring"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ciao"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mystring"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"sole"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mystring"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"mare"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With jq we can read and reshape the above JSON to a CSV format with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;more test.json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;". | [.id, .mystring] | @csv"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;more test.json&lt;/code&gt; reads the file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;jq -r&lt;/code&gt; prints the raw output&lt;/li&gt;
&lt;li&gt;the first &lt;code&gt;.&lt;/code&gt; selects all the elements at the root level&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;| [.id, .mystring]&lt;/code&gt; retrieves the &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;mystring&lt;/code&gt; keys from each element&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;|@csv&lt;/code&gt; sets the output format as CSV&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,"ciao"
2,"sole"
3,"mare"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;To check the complete set of options available with jq please view the &lt;a href="https://jqlang.github.io/jq/manual/" rel="noopener noreferrer"&gt;manual&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Stitching all together
&lt;/h2&gt;

&lt;p&gt;So, how can we load a target table with just 1 &lt;code&gt;COPY&lt;/code&gt; command? Let's first create the target table with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;MYTARGETTABLE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mystring&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now want to load the &lt;code&gt;myid&lt;/code&gt; and &lt;code&gt;mystring&lt;/code&gt; columns of the &lt;code&gt;MYTARGETTABLE&lt;/code&gt; table with the following &lt;code&gt;COPY&lt;/code&gt; command reading from the &lt;code&gt;test.json&lt;/code&gt; and applying the transformation with &lt;code&gt;jq&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;MYTARGETTABLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mystring&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;PROGRAM&lt;/span&gt; &lt;span class="s1"&gt;'more test.json | jq -r ". | [.id, .mystring] | @csv"'&lt;/span&gt;  
&lt;span class="n"&gt;CSV&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is the data properly loaded in the &lt;code&gt;MYTARGETTABLE&lt;/code&gt; table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | myid | mystring
----+------+----------
  1 |    1 | ciao
  2 |    2 | sole
  3 |    3 | mare
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Solving problems at the source sometimes is useful to avoid extra hops! Stitching together &lt;code&gt;COPY&lt;/code&gt; (with &lt;code&gt;PROGRAM&lt;/code&gt;) and &lt;code&gt;jq&lt;/code&gt; provides us the flexibility to load JSON files without intermediary tables.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>json</category>
      <category>copy</category>
      <category>dataload</category>
    </item>
    <item>
      <title>Load StackOverflow's StackExchange data in PostgreSQL®</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Wed, 31 Jan 2024 10:00:00 +0000</pubDate>
      <link>https://forem.com/ftisiot/load-stackoverflows-stackexchange-data-in-postgresqlr-40ip</link>
      <guid>https://forem.com/ftisiot/load-stackoverflows-stackexchange-data-in-postgresqlr-40ip</guid>
      <description>&lt;p&gt;I recently found about the StackOverflow dataset in &lt;a href="https://www.kaggle.com/datasets/stackoverflow/stackoverflow" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;, which points to the &lt;a href="https://archive.org/download/stackexchange" rel="noopener noreferrer"&gt;StackExchange&lt;/a&gt; link to download the entire data. In this blog I'll show a (maybe not 100% polished) way to upload the data to several tables in a PostgreSQL database!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" alt="Architecture of the data loading phase" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The data is provided by the &lt;a href="https://archive.org/details/stackexchange" rel="noopener noreferrer"&gt;Stack Exchange Network&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you need a FREE PostgreSQL database? 🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
If you need to optimize your SQL query?  🐧 Check &lt;a href="https://www.eversql.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotstackoverflow" rel="noopener noreferrer"&gt;EverSQL&lt;/a&gt;! 🐧&lt;/p&gt;
&lt;h2&gt;
  
  
  Create a PostgreSQL database
&lt;/h2&gt;

&lt;p&gt;You can create one with &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; or just have one local, the below code works in both examples.&lt;/p&gt;
&lt;h2&gt;
  
  
  Create database tables
&lt;/h2&gt;

&lt;p&gt;Every downloaded file, referring to a specific site, contains the following files: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Posts&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;&lt;code&gt;Users&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Votes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Comments&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PostHistory&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PostLinks&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The detailed schema information is available in this &lt;a href="https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede/2678#2678" rel="noopener noreferrer"&gt;meta post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We'll load the data in a two step approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First load the XML files in a temporary table&lt;/li&gt;
&lt;li&gt;Then load the data from the temporary table into the proper tables with optimised column types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" alt="Architecture of the data loading phase" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To load the data we need the following data structures: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a table &lt;code&gt;data_load&lt;/code&gt; containing a unique &lt;code&gt;data&lt;/code&gt; &lt;code&gt;TEXT&lt;/code&gt; column we'll use to load the XML data row by row&lt;/li&gt;
&lt;li&gt;a set of tables matching the file structures in &lt;code&gt;Posts&lt;/code&gt;, &lt;code&gt;Users&lt;/code&gt;, &lt;code&gt;Votes&lt;/code&gt;, &lt;code&gt;Comments&lt;/code&gt;, &lt;code&gt;PostHistory&lt;/code&gt;, &lt;code&gt;PostLinks&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can create the &lt;code&gt;data_load&lt;/code&gt; table with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;data_load&lt;/code&gt; table will be used to load the XML files on the database and then parse them accordingly.&lt;/p&gt;

&lt;p&gt;The following tables are needed to properly store the data in the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reputation&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CreationDate&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DisplayName&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LastAccessDate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;Location&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AboutMe&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;UpVotes&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DownVotes&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AccountId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PostTypeId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CreationDate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;viewcount&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;OwnerUserId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LastActivityDate&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Tags&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AnswerCount&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CommentCount&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ContentLicense&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;badges&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tagbased&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;creationdate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;userid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contentlicense&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;posthistory&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PostHistoryTypeId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;RevisionGUID&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CreationDate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;userID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ContentLicense&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;postlinks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;creationdate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;relatedPostId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LinkTypeId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tagname&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ExcerptPostId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;WikiPostId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IsRequired&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IsModeratorOnly&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;votes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;votetypeid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;creationdate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Load the data
&lt;/h2&gt;

&lt;p&gt;Let's download the &lt;code&gt;ai.stackexchange.com&lt;/code&gt; the section from StackExchange dedicated to AI. Within the downloaded folder we can find 8 files (1-1 with our tables). Let's try to load the &lt;code&gt;Users.xml&lt;/code&gt; first.&lt;/p&gt;

&lt;p&gt;As mentioned before, we'll perform a two step loading approach: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load the XML into a table, with each XML row on a different database row&lt;/li&gt;
&lt;li&gt;Parse the XML to populate the proper tables and columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two step approach is needed since the original dataset threats all columns, including ids, as strings. We could either define all the Ids as strings or, do a bit more work to load the data into the proper column definitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Load the User XML into the &lt;code&gt;data_load&lt;/code&gt; table
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l3d119rzv8wh2m0ahyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l3d119rzv8wh2m0ahyb.png" alt="Architecture of the data loading phase - step 1" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To load the entire XML of the &lt;code&gt;Users.xml&lt;/code&gt; file into a temporary &lt;code&gt;data_load&lt;/code&gt; table we can connect to the database using &lt;code&gt;psql&lt;/code&gt; from the same folder where the &lt;code&gt;Users.xml&lt;/code&gt; is located and execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Users.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;\copy&lt;/code&gt; command allows us to load the dataset into the &lt;code&gt;data_load&lt;/code&gt; table. However we need to perform a couple of tricks in order to load it properly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tr -d "\t" &amp;lt; Users.xml&lt;/code&gt; removes the tabs from the file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sed -e ''s/\\/\\\\/g''&lt;/code&gt; properly escapes the &lt;code&gt;\&lt;/code&gt; in the strings.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;HEADER&lt;/code&gt; avoids to load the initial &lt;code&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;&lt;/code&gt; row, unnecessary for our parsing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Load the data into the &lt;code&gt;Users&lt;/code&gt; table
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbin7car721jyutt78z4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbin7car721jyutt78z4f.png" alt="Architecture of the data loading phase - step 2" width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above command loads the data into the &lt;code&gt;data_load&lt;/code&gt; table with one row in the table for each row in the original file. This is not optimal but avoids having to deal with very large files included in just a single blob.&lt;/p&gt;

&lt;p&gt;We can make use the fact that each user is contained in a &lt;code&gt;&amp;lt;raw&amp;gt;&lt;/code&gt; tag to leverage PostgreSQL ability to parse XML fields with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DisplayName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LastAccessDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;AboutMe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Views&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UpVotes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DownVotes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;AccountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Reputation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@DisplayName'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;DisplayName&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@LastAccessDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;LastAccessDate&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Location'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;Location&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@AboutMe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;AboutMe&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Views'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Views&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UpVotes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;UpVotes&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@DownVotes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;DownVotes&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@AccountId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;AccountId&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Reputation'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Reputation&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;code&gt;xpath&lt;/code&gt; to extract the relevant fields from each user&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;::text::int&lt;/code&gt; to cast the extracted field to the proper column type&lt;/li&gt;
&lt;li&gt;the filter &lt;code&gt;regexp_like(data,'[ ]+\&amp;lt;row')&lt;/code&gt; to remove the other unnecessary rows, including for example, &lt;code&gt;&amp;lt;users&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;/users&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Load the other tables
&lt;/h3&gt;

&lt;p&gt;Similar to the example with &lt;code&gt;Users.xml&lt;/code&gt; above, we can load the other tables with the following steps (please not we are truncating the &lt;code&gt;data_load&lt;/code&gt; table before loading the next file):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Loading posts&lt;/span&gt;
&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Posts.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;POSTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PostTypeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;viewcount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OwnerUserId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LastActivityDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AnswerCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CommentCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ContentLicense&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Score'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ViewCount'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Body'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@OwnerUserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@LastActivityDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Title'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Tags'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@AnswerCount'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CommentCount'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ContentLicense'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading badges&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Badges.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;BADGES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;tagbased&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Class'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@TagBased'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading comments&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Comments.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;COMMENTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;creationdate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;userid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contentlicense&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Score'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ContentLicense'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading posthistory&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; PostHistory.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;posthistory&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PostHistoryTypeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RevisionGUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ContentLicense&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostHistoryTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@RevisionGUID'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ContentLicense'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading postlinks&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; PostLinks.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;postlinks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;creationdate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relatedPostId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LinkTypeId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@RelatedPostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@LinkTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;span class="c1"&gt;-- Loading tags&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Tags.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tagname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ExcerptPostId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WikiPostId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IsModeratorOnly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IsRequired&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@TagName'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Count'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ExcerptPostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@WikiPostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@IsModeratorOnly'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;boolean&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@IsRequired'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading votes&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Votes.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;votes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;votetypeid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;creationdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@VoteTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Query Time
&lt;/h2&gt;

&lt;p&gt;Once the data is loaded, we can start querying the dataset. For example, finding the top 2 post having a comment with highest score using the following SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;POSTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;POSTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;COMMENTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;COMMENTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SCORE&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="n"&gt;POSTS&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;COMMENTS&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;POSTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;COMMENTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;POSTID&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;SCORE&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enjoy the dataset and find your own questions and answers!&lt;/p&gt;

&lt;p&gt;The following links provide the entire set of &lt;a href="https://ftisiot.net/images/files/2024/stackoverflow/table_create.sql" rel="noopener noreferrer"&gt;DDLs&lt;/a&gt; and &lt;a href="https://ftisiot.net/images/files/2024/stackoverflow/data_load.sql" rel="noopener noreferrer"&gt;Loading&lt;/a&gt; SQL!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>stackoverflow</category>
      <category>stackexchange</category>
      <category>dataload</category>
    </item>
    <item>
      <title>Managing data drift with Apache Kafka® Connect and a schema registry</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Mon, 29 Jan 2024 14:00:00 +0000</pubDate>
      <link>https://forem.com/aiven_io/managing-data-drift-with-apache-kafkar-connect-and-a-schema-registry-kop</link>
      <guid>https://forem.com/aiven_io/managing-data-drift-with-apache-kafkar-connect-and-a-schema-registry-kop</guid>
      <description>&lt;p&gt;Data flows across many technologies, teams, and people in today's businesses. Businesses are always growing and changing, so the way we collect and share data changes all the time too. We need to know not only who owns certain data but also what to do if that data changes. This problem is often referred to as "data drift."&lt;/p&gt;

&lt;p&gt;Consider the scenario where a piece of data is modified at its source — what implications does this have for other systems reliant on it? How do we communicate necessary changes to stakeholders? Conversely, how do we prevent changes that could disrupt the system?&lt;/p&gt;

&lt;p&gt;Having a robust plan for managing data drift is imperative. Businesses require data systems that function seamlessly and remain consistent, even amidst changes at the data source. Additionally, mechanisms are needed to assess and decide on changes, ensuring smooth operations for everyone involved with the data.&lt;/p&gt;

&lt;p&gt;This tutorial will show you how tools like Apache Kafka®, Apache Kafka Connect, and the built in schema registry functionality provided by &lt;a href="https://www.karapace.io/" rel="noopener noreferrer"&gt;Karapace&lt;/a&gt;, can help businesses keep an eye on data drift. It will also explain how to either deny or allow changes based on what a business needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Apache Kafka and why a schema registry?
&lt;/h2&gt;

&lt;p&gt;Apache Kafka is widely adopted as a backend data hub, empowering companies to move their data supported by a reliable, fast and scalable technology. Apache Kafka provides the benefit of decoupling data producers and consumers, by allowing the producers to reliably send the data without having to worry about consumers being ready to read, or being fast enough to keep up with throughput. &lt;/p&gt;

&lt;p&gt;By default, Apache Kafka doesn't impose or verify the structure of data. Messages are pushed and retrieved in any format agreed upon by the producer and the consumer. However, in complex systems, where the same information needs to be reused across multiple consumers from different parts of the company, a simple external agreement is often insufficient. Apache Kafka must not only ensure that consumers can retrieve the data but also make sense of it, even if the structure of the messages changes slightly over time.&lt;/p&gt;

&lt;p&gt;This is where the schema registry functionality enabled by Karapace comes into play: a way to decouple the structure of the message from its content and a method to verify that updates in the data structure won't break downstream consumers of the information. With Karapace, we can define the structure of each topic, along with the compatibility level that determines which data structure changes are allowed or rejected.&lt;/p&gt;

&lt;p&gt;In the following sections, we will explore how the schema registry can be used in conjunction with Apache Kafka® Connect, both as a source and a sink, to check data structure changes and propagate them if they meet compatibility requirements.. &lt;/p&gt;

&lt;h2&gt;
  
  
  The overall architecture
&lt;/h2&gt;

&lt;p&gt;To simulate a typical company data flow, we will employ PostgreSQL® as our source, serving as our transactional database. Extracting data from it will involve using Apache Kafka, Apache Kafka Connect, and the Debezium source connector, enabling a real-time change data capture process. Once the data resides in Apache Kafka, we will leverage the integrated integration with Karapace to store the data schema and assess changes for compatibility. Finally, the results of our data changes will manifest in a MySQL database and an Amazon S3 bucket, mirroring two use cases: departmental analytics and long-term data storage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyv002q5v825tz0ivx1j8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyv002q5v825tz0ivx1j8.png" alt="Overall architecture including PostgreSQL as Source, CDC with Debezium, Apache Kafka and two sinks to S3 and MySQL" width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We'll use &lt;a href="https://aiven.io/kafka" rel="noopener noreferrer"&gt;Aiven for Apache Kafka®&lt;/a&gt;, &lt;a href="https://aiven.io/postgresql" rel="noopener noreferrer"&gt;Aiven for PostgreSQL®&lt;/a&gt;, &lt;a href="https://aiven.io/mysql" rel="noopener noreferrer"&gt;Aiven for MySQL&lt;/a&gt; and a Debezium Kafka Connector to demonstrate this. &lt;a href="https://console.aiven.io/signup" rel="noopener noreferrer"&gt;Sign up for an Aiven account&lt;/a&gt; to follow along. &lt;/p&gt;

&lt;p&gt;We can create the whole flow using Aiven's &lt;a href="https://docs.aiven.io/docs/tools/cli" rel="noopener noreferrer"&gt;command line interface&lt;/a&gt;. You'll also need to install &lt;code&gt;psql&lt;/code&gt;. Run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service create demo-drift-postgresql &lt;span class="nt"&gt;-t&lt;/span&gt; pg &lt;span class="nt"&gt;--cloud&lt;/span&gt; aws-eu-west-1 &lt;span class="nt"&gt;-p&lt;/span&gt; free-1-5gb
avn service create demo-drift-mysqldb &lt;span class="nt"&gt;-t&lt;/span&gt; mysql &lt;span class="nt"&gt;--cloud&lt;/span&gt; aws-eu-west-1 &lt;span class="nt"&gt;-p&lt;/span&gt; free-1-5gb
avn service create demo-drift-kafka         &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; kafka                                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--cloud&lt;/span&gt; aws-eu-west-1                   &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; business-4                           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; kafka.auto_create_topics_enable&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;kafka_connect&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;                   &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;kafka_rest&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;                      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;schema_registry&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above three commands will start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An Aiven for PostgreSQL database named &lt;code&gt;demo-drift-postgresql&lt;/code&gt; in the &lt;code&gt;aws-eu-west-1&lt;/code&gt; cloud region using Aiven's free tier&lt;/li&gt;
&lt;li&gt;An Aiven for MySQL database named &lt;code&gt;demo-drift-mysql&lt;/code&gt; in the &lt;code&gt;aws-eu-west-1&lt;/code&gt; cloud region using Aiven's free tier&lt;/li&gt;
&lt;li&gt;An Aiven for Apache Kafka® service named &lt;code&gt;demo-drift-kafka&lt;/code&gt; in the &lt;code&gt;aws-eu-west-1&lt;/code&gt; cloud region using Aiven's &lt;code&gt;business-4&lt;/code&gt; plan and enabling:

&lt;ul&gt;
&lt;li&gt;The automatic creation of topics&lt;/li&gt;
&lt;li&gt;Apache Kafka Connect, running on the same nodes as Apache Kafka&lt;/li&gt;
&lt;li&gt;Kafka REST APIs&lt;/li&gt;
&lt;li&gt;Kafka Schema Registry functionality powered by Karapace &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We can wait for the above services to be created with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service &lt;span class="nb"&gt;wait &lt;/span&gt;demo-drift-postgresql
avn service &lt;span class="nb"&gt;wait &lt;/span&gt;demo-drift-kafka
avn service &lt;span class="nb"&gt;wait &lt;/span&gt;demo-drift-mysqldb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create the source dataset in PostgreSQL
&lt;/h2&gt;

&lt;p&gt;The first step of the data journey will be in the PostgreSQL database, acting as a company transactional backend. In this section we'll connect to the database and include some data. To connect, we can use the prebuilt Aiven CLI command (that requires &lt;code&gt;psql&lt;/code&gt; to be installed locally):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service cli demo-drift-postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After connecting, we can create a basic &lt;code&gt;USERS&lt;/code&gt; table and include some data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;USERNAME&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HERO&lt;/span&gt; &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;USERNAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HERO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Spiderman'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Flash'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Joker'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;FALSE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Batman'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Change Data Capture from PostgreSQL to Apache Kafka
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F643ky145a0la7kk60eir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F643ky145a0la7kk60eir.png" alt="Change Data Capture with PostgreSQL, Debezium Source Connector, Apache Kafka and Schema Registry" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After mimicking the OLTP (Online Transaction Processing) system, we can now create the change data capture pipeline allowing us to track the &lt;code&gt;USERS&lt;/code&gt; table in Apache Kafka. We'll set up the CDC flow using a &lt;a href="https://docs.aiven.io/docs/products/kafka/kafka-connect/howto/debezium-source-connector-pg" rel="noopener noreferrer"&gt;Debezium connector&lt;/a&gt; and the following configuration file, which we'll name &lt;code&gt;cdc-deb.json&lt;/code&gt;. Be sure to replace values like &lt;code&gt;&amp;lt;DATABASE_HOST&amp;gt;&lt;/code&gt; in the below example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;   
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pg-source-users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"connector.class"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.debezium.connector.postgresql.PostgresConnector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.server.name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.hostname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;DATABASE_HOST&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;DATABASE_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;POSTGRESQL_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.dbname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"defaultdb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"plugin.name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pgoutput"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"slot.name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"myslot1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"publication.name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mypub1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"publication.autocreate.mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"filtered"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database.sslmode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"require"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"table.include.list"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public.users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.confluent.connect.avro.AvroConverter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.schema.registry.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.basic.auth.credentials.source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.schema.registry.basic.auth.user.info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin:&amp;lt;SCHEMA_REGSITRY_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.confluent.connect.avro.AvroConverter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.schema.registry.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.basic.auth.credentials.source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.schema.registry.basic.auth.user.info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above connector we are defining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Debezium PostgreSQL connector in the &lt;code&gt;connector.class&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;The PostgreSQL connection settings in the set of &lt;code&gt;database.*&lt;/code&gt; parameters, We can get the list of needed parameters with the following call:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  avn service get demo-drift-postgresql &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{service_uri_params}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The PostgreSQL replication plugin name, slot name, publication name and mode. We can either create the slot and publication in PostgreSQL beforehand or have the connector create them for us.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The list of tables to include in the replication (&lt;code&gt;public.users&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The usage of Avro and Apache Kafka schema registry functionality for both message keys and values. We can fetch the needed connection parameters (&lt;code&gt;&amp;lt;KAFKA_HOST&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;&lt;/code&gt;)&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  avn service get demo-drift-kafka &lt;span class="nt"&gt;--json&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.connection_info.schema_registry_uri'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command will report the Kafka schema registry URI in the form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;  https://avnadmin:&lt;span class="nt"&gt;&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;&lt;/span&gt;@&lt;span class="nt"&gt;&amp;lt;KAFKA_HOST&amp;gt;&lt;/span&gt;:&lt;span class="nt"&gt;&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we've replaced the placeholder values in the file, we can create the connector with the following call where &lt;code&gt;cdc-deb.json&lt;/code&gt; is the file containing the connector settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service connector create demo-drift-kafka @cdc-deb.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Check the data in Kafka
&lt;/h2&gt;

&lt;p&gt;Once the connector is working, we can use &lt;a href="https://docs.aiven.io/docs/products/kafka/howto/kcat" rel="noopener noreferrer"&gt;kcat&lt;/a&gt; to check the data in Apache Kafka.&lt;/p&gt;

&lt;p&gt;To get the &lt;code&gt;kcat&lt;/code&gt; command for connecting to our Kafka service and also download the necessary SSL certificates, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service connection-info kcat demo-drift-kafka &lt;span class="nt"&gt;-W&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we can get the &lt;code&gt;avnadmin&lt;/code&gt; password with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service user-list &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{password}'&lt;/span&gt; &lt;span class="nt"&gt;--project&lt;/span&gt; devrel-francesco demo-drift-kafka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we can take that &lt;code&gt;kcat&lt;/code&gt; command and use it to check the data.&lt;br&gt;
We need to add some parameters to explain what we want to read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-C&lt;/code&gt; to tell it to act as a Consumer,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-t sourcepg.public.users&lt;/code&gt; to tell it which topic to read from,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-s avro&lt;/code&gt; to tell it to use Avro, and&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-r https://avnadmin:&amp;lt;SCHEMA_REGISTRY_PWD&amp;gt;@&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;&lt;/code&gt; to tell it where the schema registry is&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Putting all of that together, the command you run should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kcat &lt;span class="nt"&gt;-b&lt;/span&gt; &amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;KAFKA_PORT&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-X&lt;/span&gt; security.protocol&lt;span class="o"&gt;=&lt;/span&gt;SSL &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-X&lt;/span&gt; ssl.ca.location&lt;span class="o"&gt;=&lt;/span&gt;ca.pem &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-X&lt;/span&gt; ssl.key.location&lt;span class="o"&gt;=&lt;/span&gt;service.key &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-X&lt;/span&gt; ssl.certificate.location&lt;span class="o"&gt;=&lt;/span&gt;service.cert &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-s&lt;/span&gt; avro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-r&lt;/span&gt; https://avnadmin:&amp;lt;SCHEMA_REGISTRY_PWD&amp;gt;@&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; sourcepg.public.users
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We should see the same four rows we inserted previously appearing in the standard Debezium format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="nl"&gt;"before"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"after"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spiderman"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"boolean"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}}},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.9.7.aiven"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"connector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgresql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"snapshot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"db"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"defaultdb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sequence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[null,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;235090528&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"txId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1036&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lsn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;235090528&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"xmin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248487&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"transaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="nl"&gt;"before"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"after"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Flash"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"boolean"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}}},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.9.7.aiven"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"connector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgresql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"snapshot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"db"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"defaultdb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sequence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[null,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;235090528&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"txId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1036&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lsn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;235090528&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"xmin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248493&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"transaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="nl"&gt;"before"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"after"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Joker"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"boolean"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;}}},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.9.7.aiven"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"connector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgresql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"snapshot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"db"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"defaultdb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sequence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[null,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;235090528&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"txId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1036&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lsn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;235090528&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"xmin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248494&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"transaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="nl"&gt;"before"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"after"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Batman"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"boolean"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}}},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.9.7.aiven"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"connector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgresql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"snapshot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"db"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"defaultdb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"sequence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[null,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;235090528&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"txId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1036&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lsn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;235090528&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"xmin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ts_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"long"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1692017248494&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"transaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Check the data definition in Karapace
&lt;/h2&gt;

&lt;p&gt;Having created the connector using Avro and the Karapace schema registry, we can examine the schema definition for the topic. By default, when utilizing Kafka Connect with a schema registry, two schemas are generated with names -value and -key to store the schema definition for the value and key, respectively.&lt;/p&gt;

&lt;p&gt;We can get the list of schemas defined in Karapace with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;@&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;KAFKA_PORT&amp;gt;/subjects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which returns output similar to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sourcepg.public.users-key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"sourcepg.public.users-value"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above is the names of the two schemas for the Debezium topic. Each name is the concatenation of the &lt;code&gt;database.server.name&lt;/code&gt; parameter (&lt;code&gt;sourcepg&lt;/code&gt;), the schema and table name (&lt;code&gt;public.users&lt;/code&gt;) and either the &lt;code&gt;key&lt;/code&gt; or &lt;code&gt;value&lt;/code&gt; suffix.&lt;/p&gt;

&lt;p&gt;We can check which versions we have for the &lt;code&gt;sourcepg.public.users-key&lt;/code&gt; topic with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET https://avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;@&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;KAFKA_PORT&amp;gt;/subjects/sourcepg.public.users-key/versions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output should show version &lt;code&gt;1&lt;/code&gt; being available.&lt;/p&gt;

&lt;p&gt;To check the definition of the schema &lt;code&gt;sourcepg.public.users-key&lt;/code&gt; version &lt;code&gt;1&lt;/code&gt; we can use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET https://avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;@&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;KAFKA_PORT&amp;gt;/subjects/sourcepg.public.users-key/versions/1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows all the fields included in the key, including the &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;name&lt;/code&gt; we defined in the original PostgreSQL table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "id": 1,
  "schema": "{\"connect.name\":\"sourcepg.public.users.Key\",\"fields\":[{\"default\":0,\"name\":\"id\",\"type\":{\"connect.default\":0,\"type\":\"int\"}}],\"name\":\"Key\",\"namespace\":\"sourcepg.public.users\",\"type\":\"record\"}",
  "subject": "sourcepg.public.users-key",
  "version": 1
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sink the data to MySQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisc45o5fhsldb0w1ll0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisc45o5fhsldb0w1ll0s.png" alt="Sink data to MySQL with Kafka Connect JDBC sink" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we have the data in Apache Kafka, let's set up a consumer for the data to demonstrate how the solution manages drift. The initial consumer will be a MySQL database. We can establish the flow using a dedicated JDBC sink connector and the following code stored in mysql_jdbc_sink.json.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cdc-sink-mysql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"connector.class"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.aiven.connect.jdbc.JdbcSinkConnector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"topics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg.public.users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"extract"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"connection.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jdbc:mysql://&amp;lt;MYSQL_HOST&amp;gt;:&amp;lt;MYSQL_PORT&amp;gt;/&amp;lt;MYSQL_DB_NAME&amp;gt;?ssl-mode=REQUIRED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"connection.user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"connection.password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;MYSQL_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"table.name.format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"users_mysql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"insert.mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"upsert"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"pk.mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"record_key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"pk.fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"auto.create"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"auto.evolve"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"extract"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transforms.extract.type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.debezium.transforms.ExtractNewRecordState"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.confluent.connect.avro.AvroConverter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.schema.registry.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.basic.auth.credentials.source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.schema.registry.basic.auth.user.info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.confluent.connect.avro.AvroConverter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.schema.registry.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.basic.auth.credentials.source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.schema.registry.basic.auth.user.info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above connector we are defining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The JDBC sink connector in the &lt;code&gt;connector.class&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;The MySQL connection settings in the &lt;code&gt;connection.url&lt;/code&gt; parameter, We can get the parameters to compose the URL and the credentials with the following call
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  avn service get demo-drift-mysqldb &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{service_uri_params}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The target table name will be &lt;code&gt;users_mysql&lt;/code&gt; with upsert mode (see &lt;code&gt;insert.mode&lt;/code&gt;), inserting or updating existing rows based on the &lt;code&gt;id&lt;/code&gt; field (see &lt;code&gt;pk.mode&lt;/code&gt; and &lt;code&gt;pk.fields&lt;/code&gt; parameters)&lt;/li&gt;
&lt;li&gt;The table will be created automatically if it does not exist (&lt;code&gt;"auto.create": "true"&lt;/code&gt;) and evolve following the changes in the Apache Kafka topic (&lt;code&gt;"auto.evolve": "true"&lt;/code&gt;). This will be key to propagating the drift to downstream technologies (MySQL in this case).&lt;/li&gt;
&lt;li&gt;A transformation called &lt;code&gt;extract&lt;/code&gt; to retrieve and propagate the status of the row after the change from the Debezium format&lt;/li&gt;
&lt;li&gt;The usage of Avro and Apache Kafka schema registry functionality for both message keys and values. We can fetch the needed connection parameters (&lt;code&gt;&amp;lt;KAFKA_HOST&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;&lt;/code&gt;)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  avn service get demo-drift-kafka &lt;span class="nt"&gt;--json&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.connection_info.schema_registry_uri'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command will provide the Kafka schema registry uri in the form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;  https://avnadmin:&lt;span class="nt"&gt;&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;&lt;/span&gt;@&lt;span class="nt"&gt;&amp;lt;KAFKA_HOST&amp;gt;&lt;/span&gt;:&lt;span class="nt"&gt;&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Having replaced the placeholder values, we can create the connector with the following call, where &lt;code&gt;cdc-deb.json&lt;/code&gt; is the file containing the connector settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service connector create demo-drift-kafka @mysql_jdbc_sink.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can verify the status of the connector with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service connector status demo-drift-kafka cdc-sink-mysql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command should show the connector in &lt;code&gt;RUNNING&lt;/code&gt; state&lt;/p&gt;

&lt;h2&gt;
  
  
  Check the data in MySQL
&lt;/h2&gt;

&lt;p&gt;Once the above connector is running, we can head to MySQL to check the data. To get the connection parameters, we can retype the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service get demo-drift-mysqldb &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{service_uri_params}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then connect with the following command, replacing the placeholders. Note the absence of spaces between the &lt;code&gt;-p&lt;/code&gt; parameter and the password.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mysql &lt;span class="nt"&gt;-u&lt;/span&gt; avnadmin   &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-P&lt;/span&gt; &amp;lt;MYSQL_PORT&amp;gt;        &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;MYSQL_HOST&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-D&lt;/span&gt; defaultdb    &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt;&amp;lt;MYSQL_PASSWORD&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then check the data with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users_mysql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The table is &lt;code&gt;users_mysql&lt;/code&gt; following the &lt;code&gt;table.name.format&lt;/code&gt; in the connector. The data should be in line with what we have in PostgreSQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----+-----------+------+
| id | username  | hero |
+----+-----------+------+
|  1 | Spiderman |    1 |
|  2 | Flash     |    1 |
|  3 | Joker     |    0 |
|  4 | Batman    |    1 |
+----+-----------+------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we check the table structure with &lt;code&gt;describe users_mysql&lt;/code&gt;, we can see that the &lt;code&gt;hero&lt;/code&gt; column has been mapped to a &lt;code&gt;TINYINT&lt;/code&gt; in MySQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| id       | int          | NO   | PRI | 0       |       |
| username | varchar(256) | YES  |     | NULL    |       |
| hero     | tinyint      | YES  |     | NULL    |       |
| points   | int          | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Let's talk Drift
&lt;/h2&gt;

&lt;p&gt;So far we've built a fairly traditional data pipeline. Now, let's include some changes to the original data structure in PostgreSQL to mimic drift. &lt;/p&gt;

&lt;h3&gt;
  
  
  Adding a column
&lt;/h3&gt;

&lt;p&gt;In the terminal connected to the PostgreSQL database, execute the following command to add a &lt;code&gt;POINTS&lt;/code&gt; integer column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;POINTS&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing happens immediately in the target MySQL table after the DDL execution in PostgreSQL. The structure and the data of &lt;code&gt;USERS_MYSQL&lt;/code&gt; is still the same.&lt;/p&gt;

&lt;p&gt;Now change the data in PostgreSQL, using the following update statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;UPDATE USERS SET POINTS &lt;span class="o"&gt;=&lt;/span&gt; CASE WHEN USERNAME &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Batman'&lt;/span&gt; &lt;span class="k"&gt;then &lt;/span&gt;5 &lt;span class="k"&gt;else &lt;/span&gt;10 end&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MySQL, execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users_mysql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see the effect on the MySQL table &lt;code&gt;points&lt;/code&gt; in near real time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----+-----------+------+--------+
| id | username  | hero | points |
+----+-----------+------+--------+
|  1 | Spiderman |    1 |     10 |
|  2 | Flash     |    1 |     10 |
|  3 | Joker     |    0 |     10 |
|  4 | Batman    |    1 |      5 |
+----+-----------+------+--------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As mentioned in the sink connector definition, &lt;code&gt;"auto.create": "true"&lt;/code&gt; allows the automatic creation of the table if it doesn't exist, and &lt;code&gt;"auto.evolve": "true"&lt;/code&gt; allows the evolution of the table in cases when &lt;strong&gt;new data columns are included&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Removing a column
&lt;/h3&gt;

&lt;p&gt;What about removing columns? Let's test it! Let's drop the same &lt;code&gt;points&lt;/code&gt; column we just added from the PostgreSQL terminal with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;POINTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we execute our previous query in MySQL again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users_mysql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We see that column is not dropped in MySQL, the structure of the &lt;code&gt;users_mysql&lt;/code&gt; is the same and the &lt;code&gt;points&lt;/code&gt; column is still filled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;----+-----------+------+--------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;hero&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;----+-----------+------+--------+&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Spiderman&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Flash&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Joker&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;  &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Batman&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;      &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="c1"&gt;----+-----------+------+--------+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes sense because downstream applications might be using the &lt;code&gt;points&lt;/code&gt; column. An unexpected and unhandled drop of a column could have disastrous effects on the downstream data pipelines. However the risk actually is dealing with updated information, as the &lt;code&gt;points&lt;/code&gt; column has been dropped from PostgreSQL and therefore cannot be updated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Changing the column type
&lt;/h3&gt;

&lt;p&gt;What about changing the column type? A change in the column type could be needed in cases, like this example, where we want to migrate from a &lt;code&gt;BOOLEAN&lt;/code&gt; to a &lt;code&gt;VARCHAR&lt;/code&gt; for the &lt;code&gt;HERO&lt;/code&gt; column. Let's execute the following in PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;HERO&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As before nothing happens on the DDL statement, but, when we try to add some data using the new &lt;code&gt;VARCHAR&lt;/code&gt; column type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;USERNAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HERO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Panda'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'middle'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The insert goes well PostgreSQL as expected, but the Debezium source connector crashes with the following error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR "Caused by: org.apache.kafka.common.config.ConfigException: Failed to access Avro data from topic sourcepg.public.users : Incompatible schema, compatibility_mode=BACKWARD reader union lacking writer type: RECORD; error code: 409"
Backwards compatibility, old schema type is boolean (with null), new schema type is string... incompatible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is because the schema is stored in Karapace with the &lt;code&gt;BACKWARDS&lt;/code&gt; compatibility setting. The &lt;code&gt;BACKWARDS&lt;/code&gt; compatibility ensures that consumers using an older schema definition are able to consume events produced with the current schema. The change, from &lt;code&gt;BOOLEAN&lt;/code&gt; to &lt;code&gt;VARCHAR&lt;/code&gt; could stop old consumers from being able to parse the information correctly, so it's not allowed and the connector fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  Changing the compatibility level
&lt;/h3&gt;

&lt;p&gt;For the sake of this example, let's remove the &lt;code&gt;BACKWARDS&lt;/code&gt; compatibility setting and allowing all changes in the source system to propagate. We'll set compatibility to &lt;code&gt;NONE&lt;/code&gt; allowing all the changes to propagate to the Apache Kafka topic.&lt;/p&gt;

&lt;p&gt;First, we check the default compatibility level for the Apache Kafka service with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service schema configuration demo-drift-kafka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows &lt;code&gt;BACKWARD&lt;/code&gt; being the default. The same default setting is applied to the &lt;code&gt;sourcepg.public.users-value&lt;/code&gt; topic, that we can check with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service schema subject-configuration demo-drift-kafka &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--subject&lt;/span&gt; sourcepg.public.users-value
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To change the compatibility level to &lt;code&gt;NONE&lt;/code&gt; for both key and value, run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service schema subject-configuration-update demo-drift-kafka &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--subject&lt;/span&gt; sourcepg.public.users-value                        &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--compatibility&lt;/span&gt; NONE
avn service schema subject-configuration-update demo-drift-kafka &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--subject&lt;/span&gt; sourcepg.public.users-key                          &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--compatibility&lt;/span&gt; NONE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we restart the Debezium Source connector task &lt;code&gt;0&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service connector restart-task demo-drift-kafka pg-source-users 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We see that the source connector restarts correctly. Using &lt;code&gt;avn service connector status demo-drift-kafka pg-source-users&lt;/code&gt; shows the connector in the &lt;code&gt;RUNNING&lt;/code&gt; state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RUNNING"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"tasks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                  &lt;/span&gt;&lt;span class="nl"&gt;"state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RUNNING"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                  &lt;/span&gt;&lt;span class="nl"&gt;"trace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The JDBC sink connector to MySQL fails. Running &lt;code&gt;avn service connector status demo-drift-kafka cdc-sink-mysql&lt;/code&gt; returns an error:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: java.sql.BatchUpdateException: Incorrect integer value: 'middle' for column 'hero' at row 1
java.sql.SQLException: Incorrect integer value: 'middle' for column 'hero' at row 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error indicates that the connector attempted to insert the new value (middle) into an integer column. This implies that the auto-evolution process did not alter the structure of the pre-existing column.&lt;/p&gt;

&lt;p&gt;To confirm this, we can execute describe users_mysql on the MySQL database and validate that the hero column remains a tinyint.&lt;/p&gt;

&lt;p&gt;In the JDBC sink connector documentation, the &lt;a href="https://github.com/aiven/jdbc-connector-for-apache-kafka/blob/master/docs/sink-connector.md#auto-evolution" rel="noopener noreferrer"&gt;&lt;code&gt;auto.evolution&lt;/code&gt; section&lt;/a&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The connector does not delete columns.&lt;/li&gt;
&lt;li&gt;The connector does not alter column types.&lt;/li&gt;
&lt;li&gt;The connector does not add primary keys constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;We already talked about automatic column deletion being a dangerous action. The same is true for the automatic change of column types, since downstream applications could rely on functions that work specifically on particular column types. Therefore, modifying a column type should be handled as a breaking change, correctly making  the sink connector fail. &lt;/p&gt;

&lt;h2&gt;
  
  
  What about non relational targets? The AWS S3 example
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg05vzofqt7z0bgpc4bhm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg05vzofqt7z0bgpc4bhm.png" alt="Sink to AWS S3 with Kafka Connect and the s3 sink" width="800" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The scenario described above is one of the more strict scenarios possible, in terms of data evolution. Both the source and the target are relational databases with strict column type definition. In this second example we'll sink the data to an S3 bucket where the data structure is not defined upfront.&lt;/p&gt;

&lt;p&gt;We can create a sink connector to S3 with the following JSON configuration file stored in a file named &lt;code&gt;s3_sink.json&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3sink"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"connector.class"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.aiven.kafka.connect.s3.AivenKafkaConnectS3SinkConnector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"aws.access.key.id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;AWS_SECRET_ID&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"aws.secret.access.key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;AWS_SECRET_ACCESS&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"aws.s3.bucket.name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;gt;AWS_BUCKET_NAME&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"aws.s3.region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;AWS_REGION&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"topics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sourcepg.public.users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"format.output.type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.confluent.connect.avro.AvroConverter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.schema.registry.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.basic.auth.credentials.source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key.converter.schema.registry.basic.auth.user.info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.confluent.connect.avro.AvroConverter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.schema.registry.url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;KAFKA_HOST&amp;gt;:&amp;lt;SCHEMA_REGISTRY_PORT&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.basic.auth.credentials.source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"value.converter.schema.registry.basic.auth.user.info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avnadmin:&amp;lt;SCHEMA_REGISTRY_PASSWORD&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"extract"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transforms.extract.type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"io.debezium.transforms.ExtractNewRecordState"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The set of &lt;code&gt;aws.*&lt;/code&gt; parameters refers to the S3-related secrets, as detailed in the &lt;a href="https://docs.aiven.io/docs/products/kafka/kafka-connect/howto/s3-sink-prereq.html" rel="noopener noreferrer"&gt;connector prerequisites documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;topics&lt;/code&gt; parameter defines the source of information (specifically, the &lt;code&gt;sourcepg.public.users&lt;/code&gt; topic).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;format.output.type&lt;/code&gt; parameter specifies how the data will be stored (in this case, as &lt;code&gt;json&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;key.converter&lt;/code&gt; and &lt;code&gt;value.converter&lt;/code&gt; parameters enable the connector to retrieve schema information from Karapace.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;transforms&lt;/code&gt; section allows the extraction of the value after the change from the Debezium format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can start the above connector with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service connector create demo-drift-kafka @s3_sink.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we check the data in S3, we should see a document in the bucket containing all the changes implemented in PostgreSQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Spiderman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Joker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Batman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Spiderman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"points"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"points"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Joker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"points"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Batman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"points"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Panda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hero"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"middle"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output from the CDC -&amp;gt; Kafka -&amp;gt; S3 flow encompasses all events. Due to the Debezium compatibility mode being set to NONE, every change successfully stored in Kafka. Moreover, as S3 does not enforce a specific structure on the data, all changes, whether they involve new or deleted columns, have been written to the target bucket in JSON format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terminate the services
&lt;/h2&gt;

&lt;p&gt;If you followed this tutorial and want to remove the services used for testing, you can run the commands below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;avn service terminate demo-drift-postgresql &lt;span class="nt"&gt;--force&lt;/span&gt;
avn service terminate demo-drift-kafka &lt;span class="nt"&gt;--force&lt;/span&gt;
avn service terminate demo-drift-mysqldb &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Data and Schema drift must be managed in scenarios where multiple consumers want to access the changes happening in a source system. Apache Kafka, and the Karapace schema registry, provide a method to propagate compatible changes and forbid breaking ones by stopping the pipeline. Pay special attention to column drops, since they are not propagated automatically to target systems (specifically if the target is another relational database) and could cause problems with updated data on the deleted columns.&lt;/p&gt;

&lt;p&gt;To summarize how changes are propagated: &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Add column&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Propagates downstream if &lt;code&gt;auto.evolve&lt;/code&gt; is set to &lt;code&gt;true&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove column&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;Does not propagate downstream in case of sink to relational database. Possible use of stale data for the dropped column.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change datatype&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;Depends on the change, compatibility settings and target technology. Not propagated in case of JDBC sink.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A summary of Schema registry compatibility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BACKWARDS&lt;/code&gt; allows you to stop the pipeline before ingesting data in Kafka, since breaking changes will not be included in the topic&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NONE&lt;/code&gt; allows you to continue ingesting, but might break downstream data pipelines if the downstream tech is relational or has precise column definition and evolution is not straightforward&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>apachekafka</category>
      <category>data</category>
    </item>
  </channel>
</rss>
