<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Derek Xiao</title>
    <description>The latest articles on Forem by Derek Xiao (@lofiderek).</description>
    <link>https://forem.com/lofiderek</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F524841%2F4db5083e-8cd2-4ff8-b2bf-5a9e2a906522.jpeg</url>
      <title>Forem: Derek Xiao</title>
      <link>https://forem.com/lofiderek</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lofiderek"/>
    <language>en</language>
    <item>
      <title>A Powerful Addition to Your Postgres Toolbelt: Table Inheritance</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Fri, 26 Feb 2021 18:41:50 +0000</pubDate>
      <link>https://forem.com/arctype/a-powerful-addition-to-your-postgres-toolbelt-table-inheritance-4f8o</link>
      <guid>https://forem.com/arctype/a-powerful-addition-to-your-postgres-toolbelt-table-inheritance-4f8o</guid>
      <description>&lt;p&gt;Table inheritance is a less commonly used Postgres feature, but it has the power to save time with both data retrieval and database management.&lt;/p&gt;

&lt;p&gt;In this article, I’ll cover how inheritance works in Postgres and provide some examples of when to use inheritance.&lt;/p&gt;

&lt;p&gt;To follow along with the examples in this article, &lt;a href="https://blog.arctype.com/p/7a5ac2e5-a4ce-4f1d-8839-cdf1e258e369/www.arctype.com"&gt;try Arctype's free SQL editor&lt;/a&gt; to quickly connect to a Postgres database:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--od4_R9rD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/visualization_demo-6.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--od4_R9rD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/visualization_demo-6.gif" alt="A Powerful Addition to Your Postgres Toolbelt: Table Inheritance"&gt;&lt;/a&gt;&lt;a href="https://blog.arctype.com/inheritance-in-postgres/www.arctype.com"&gt;Try Arctype today&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Table Inheritance in Postgres?
&lt;/h2&gt;

&lt;p&gt;Inheritance is one of the main principles of object-oriented programming. It is a process for deriving one object from another object so that they have shared properties.&lt;/p&gt;

&lt;p&gt;Inheritance in PostgreSQL allows you to create a child table based on another table, and the child table will include all of the columns in the parent table.&lt;/p&gt;

&lt;p&gt;Let's take a database that's used to store blueprints for different types of homes.&lt;/p&gt;

&lt;p&gt;There are some things that we know every home will have such as: bedrooms, bathrooms, and kitchens. We can create a parent table to store these shared attributes.&lt;/p&gt;

&lt;p&gt;Now let's say we wanted to add a blueprint for a home with a patio. This new blueprint is identical to our existing one, but with a new room. Instead of recreating the entire blueprint, we can create a new "child" table that &lt;em&gt;inherits&lt;/em&gt; the parent table.&lt;/p&gt;

&lt;p&gt;We now have a copy of the main parent blueprint with a new "patio" item, without creating a duplicated blueprint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why should I use inheritance?
&lt;/h2&gt;

&lt;p&gt;The two main benefits are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;More performant queries&lt;/li&gt;
&lt;li&gt;Easier database management&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;More performant queries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inheritance splits data into smaller tables that inherit some of the parent's fields. This in effect partitions the data, improving the speed to retrieve data.&lt;/p&gt;

&lt;p&gt;Imagine you are fetching data that is &lt;code&gt;BETWEEN&lt;/code&gt; two dates. There is a parent table called &lt;code&gt;year_sales&lt;/code&gt; and inherited tables with data for each month.&lt;/p&gt;

&lt;p&gt;A command to retrieve all sales between &lt;code&gt;2020-10-1&lt;/code&gt; and &lt;code&gt;2020-10-15&lt;/code&gt; would only scan the table for the month of October .&lt;/p&gt;

&lt;p&gt;Inherited tables also creates more manageable indexes. Each individual table contains less data, which speeds up search both with and without an index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easier database management&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Making future schema changes is easier because you only have to make one change to the parent table and then it's propagated to each inherited table. This saves time and lessens the chances of accidental divergences.&lt;/p&gt;

&lt;p&gt;Running maintenance commands such as a full vacuum or re-index of inherited tables will also happen without blocking other data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 1: Using inheritance to store table statistics by month
&lt;/h2&gt;

&lt;p&gt;One of the most popular use cases for table inheritance is storing information divided by months. This gives the benefit of partitioning your data for faster queries.&lt;/p&gt;

&lt;p&gt;I've used this solution to architect schemas for situations including:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Process execution audits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inherited tables can be used to track data that is continuously loaded/unloaded into the system, user requests and computational processes, and other important information for monitoring the system's health.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. User actions auditing for critical modules in the application&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can create an audit system to track who changed the data in the system and when did they do it. If the system has many users, then there is a lot of data. So, to speed up data access, it is more performant to use table inheritance.&lt;/p&gt;

&lt;p&gt;Let's dive into an example.&lt;/p&gt;

&lt;p&gt;First, create schema “example1”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE SCHEMA example1
    AUTHORIZATION postgres;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create a parent logging table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE example1.logging
(
    id integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    event_name character varying NOT NULL,
    start_time timestamp(6) without time zone NOT NULL,
    end_time timestamp(6) without time zone NOT NULL,
    CONSTRAINT february_log_pkey PRIMARY KEY (id, start_time, end_time)
)
TABLESPACE pg_default;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a child logging table for a specific month and year, which inherits fields from the parent table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE example1.january_log_2021
(
    CONSTRAINT start_time CHECK (start_time BETWEEN '2021-01-01' and '2021-01-31')
)
    INHERITS (example1.logging)
TABLESPACE pg_default;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code contains a check constraint on the “start_time” column with the &lt;code&gt;CHECK&lt;/code&gt; command. This keeps the date and time within January.&lt;/p&gt;

&lt;p&gt;Fill in the logging table for January 2021:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO example1.january_log_2021(id,
    event_name, start_time, end_time)
    VALUES 
           (1, 'Log in', '2021-01-11 03:26:11', '2021-01-11 03:26:13'),
           (2, 'Log out', '2021-01-03 12:11:17', '2021-01-03 12:11:18'),
           (3, 'Upload file xml', '2021-01-06 16:14:28', '2021-01-06 16:14:59'),
           (4, 'Delete data', '2021-01-05 23:01:55', '2021-01-05 23:01:58');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can check if the data was successfully inserted with a &lt;code&gt;SELECT&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM example1.january_log_2021;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0L_95XLf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/LlHVlFpQ-mTry2sSIUvHLqfyFNVYpqhm6xW4IepR13pM823zZlLZCLsb0g7qVORbaFyLHcop-0LnHb0k-xZOIyLnFEnYQ3yMmkfiz2hRvztsdJIHQj4PV2qOjuEBDDmr1MWEtYDC" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0L_95XLf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/LlHVlFpQ-mTry2sSIUvHLqfyFNVYpqhm6xW4IepR13pM823zZlLZCLsb0g7qVORbaFyLHcop-0LnHb0k-xZOIyLnFEnYQ3yMmkfiz2hRvztsdJIHQj4PV2qOjuEBDDmr1MWEtYDC" alt="A Powerful Addition to Your Postgres Toolbelt: Table Inheritance"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data in each child table will also automatically be added to the parent table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example #2: Using inheritance to track the movement of ships
&lt;/h2&gt;

&lt;p&gt;Another instance when I used inheritance was storing information about ships and their movements based on geolocation.&lt;/p&gt;

&lt;p&gt;Each ship had both common and unique values. Because of this quality, I decided design the schema using inheritance, and I created a separate table for each ship based on the parent table.&lt;/p&gt;

&lt;p&gt;Using an inheritance model gave us the following benefits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When the application built the map of the ship's movement, it only had to reference the individual ship's table, increasing the speed of report building and monitoring.&lt;/li&gt;
&lt;li&gt;Adding new specific fields for certain ships types did not require changing all tables.&lt;/li&gt;
&lt;li&gt;We could retrieve baseline data for all ships using a single request.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's how we did it.&lt;/p&gt;

&lt;p&gt;First create a new schema called “test”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE SCHEMA test
    AUTHORIZATION postgres;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a database with ships:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE test.ship
(
    AIS_name text NOT NULL,
    type text NOT NULL,
    flag text NOT NULL,
    IMO character varying NOT NULL,
    MMSI character varying NOT NULL,
    callsign character varying NOT NULL,
    year_built character varying NOT NULL,
    length character varying NOT NULL,
    width character varying NOT NULL,
    draught character varying NOT NULL,
    speed character varying NOT NULL,
    AIS_class character varying NOT NULL,
    cargo character varying,
    CONSTRAINT ship_pkey PRIMARY KEY (MMSI)
)

TABLESPACE pg_default;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the table "sail_ship", which is a child to table “ship”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE test.sail_ship
(
    -- Inherited from table test.ship: AIS_name text NOT NULL,
    -- Inherited from table test.ship: type text NOT NULL,
    -- Inherited from table test.ship: flag text NOT NULL,
    -- Inherited from table test.ship: IMO character varying NOT NULL,
    -- Inherited from table test.ship: MMSI character varying NOT NULL,
    -- Inherited from table test.ship: callsign character varying NOT NULL,
    -- Inherited from table test.ship: year_built character varying NOT NULL,
    -- Inherited from table test.ship: length character varying NOT NULL,
    -- Inherited from table test.ship: width character varying NOT NULL,
    -- Inherited from table test.ship: draught character varying NOT NULL,
    -- Inherited from table test.ship: speed character varying NOT NULL,
    -- Inherited from table test.ship: AIS_class character varying NOT NULL,
    -- Inherited from table test.ship: cargo character varying,
    id_sail integer NOT NULL,
    course text NOT NULL,
    navigation_status text NOT NULL,
    CONSTRAINT sail_ship_pkey PRIMARY KEY (id_sail)
)
    INHERITS (test.ship)
TABLESPACE pg_default;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now fill in table "sail_ship":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO test.sail_ship(AIS_name, type, flag, IMO, MMSI, callsign, year_built, length, width, draught, speed, AIS_class, cargo, id_sail, course, navigation_status)
    VALUES ('A P T JAMES', 'Ferry', 'Trinidad and Tobado', '9877717', '362254000', '9YNM', '2020', '94 m', '26 m', '2.9 m/', '13.1 kn/20.2 kn', '-', '-', 1, '-', '-'),
           ('MOZART', 'Container ship', 'Liberia', '9337274', '636018378', 'A8MA9', '2007', '222 m', '30 m', '10.4 m /', '12.9 kn / 23.6 kn', '-', 'Containers', 2, '-', '-'),
           ('ALIANCA SKY', ' Bulk carrier', 'Liberia', '9128441', '636014513', 'A8UK6', '1997', '186 m', '30 m', '8.8 m /', '10.2 kn / 17.2 kn', '-', 'Agricultural Commodities', 3, '-', '-'),
           ('XXX7', 'Ship', 'China', '-', '412444890', 'BVMY5', '-', '-', '-', '-/', '60.5 kn / 66.1 kn', '-', '-', 4, '-', '-');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The table “sail_ship” inherits all the columns of its parent table, “ship".&lt;/p&gt;

&lt;p&gt;Let's also add some data to the parent table and see what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO test.ship(AIS_name, type, flag, IMO, MMSI, callsign, year_built, length, width, draught, speed, AIS_class, cargo)
    VALUES ('A P T JAMES', 'Ferry', 'Trinidad and Tobado', '9877717', '362254000', '9YNM', '2020', '94 m', '26 m', '2.9 m/', '13.1 kn/20.2 kn', '-', '-');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1XYHqN8f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/XAMKwhYR3gnuhbfu_iFdntQgnL_i0-ITwN_hlNkm-FaWI-coxE62BTnRjrcWImZIjGJf8cnp818kZF0Km04XmojAj3ObUmRWKzyorO1qyYZdlfvWhxLXqObPvggKQW_BeZIuwz7h" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1XYHqN8f--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/XAMKwhYR3gnuhbfu_iFdntQgnL_i0-ITwN_hlNkm-FaWI-coxE62BTnRjrcWImZIjGJf8cnp818kZF0Km04XmojAj3ObUmRWKzyorO1qyYZdlfvWhxLXqObPvggKQW_BeZIuwz7h" alt="A Powerful Addition to Your Postgres Toolbelt: Table Inheritance"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see in the picture, the field has been added, but the primary key is repeated, and so it is no longer unique&lt;/p&gt;

&lt;p&gt;This is one of the caveats with inheritance in Postgres. No errors were found when adding to the child table, even though it violated the unique primary key.&lt;/p&gt;

&lt;p&gt;We can remove these duplicates from child tables, by retrieving data from the “ship” table using the &lt;code&gt;ONLY&lt;/code&gt; operator.&lt;/p&gt;

&lt;p&gt;Here, the &lt;code&gt;ONLY&lt;/code&gt; keyword indicates that the query should only be applied to the “ship” table and not to tables below ship in the inheritance hierarchy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT AIS_name, type, flag, year_built FROM ONLY test.ship;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jYs38dPZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/7RSuLMYNpSvyJ845zVwzRMXEe0RZJ8R5VmyYvsOVZ-LTskTsMymL9pGGbIRMv_zC8eyJJ1uF1W7w99T1AtMW73TzBLGa26OmjJzyDfAXuqo_ODHFJstJhb5ZWniMfEXMphCmiVac" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jYs38dPZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/7RSuLMYNpSvyJ845zVwzRMXEe0RZJ8R5VmyYvsOVZ-LTskTsMymL9pGGbIRMv_zC8eyJJ1uF1W7w99T1AtMW73TzBLGa26OmjJzyDfAXuqo_ODHFJstJhb5ZWniMfEXMphCmiVac" alt="A Powerful Addition to Your Postgres Toolbelt: Table Inheritance"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats to be aware of with PostgreSQL inheritance
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;You can’t apply &lt;code&gt;RENAME&lt;/code&gt; with &lt;code&gt;ALTER TABLE&lt;/code&gt; commands to the child tables;&lt;/li&gt;
&lt;li&gt;The uniqueness of primary keys and foreign keys is not inherited. The inheritance mechanism is not capable of automatically distributing data from INSERT or COPY commands across tables in an inheritance hierarchy.  INSERT inserts only into the specified table and no other;&lt;/li&gt;
&lt;li&gt;The user must have access rights to both the parent table and the child table;&lt;/li&gt;
&lt;li&gt;Columns will have to be dropped manually. If using the &lt;code&gt;DROP COLUMN&lt;/code&gt; command to the parent table under the condition of cascading deletion, it cannot affect the child table.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article we covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How inheritance in Postgres works&lt;/li&gt;
&lt;li&gt;Why you should use inheritance in your applications&lt;/li&gt;
&lt;li&gt;Two examples for how inheritance is used in real applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are looking for a SQL editor that makes working with databases even easier, &lt;a href="https://blog.arctype.com/p/7a5ac2e5-a4ce-4f1d-8839-cdf1e258e369/www.arctype.com"&gt;try Arctype today for free&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3KrlmKtr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/visualization_demo-5.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3KrlmKtr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/visualization_demo-5.gif" alt="A Powerful Addition to Your Postgres Toolbelt: Table Inheritance"&gt;&lt;/a&gt;&lt;a href="https://blog.arctype.com/inheritance-in-postgres/www.arctype.com"&gt;Try Arctype today&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
    </item>
    <item>
      <title>Add Your Database to Your Spring Cleaning List</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Tue, 16 Feb 2021 21:50:00 +0000</pubDate>
      <link>https://forem.com/arctype/add-your-database-to-your-spring-cleaning-list-1e9k</link>
      <guid>https://forem.com/arctype/add-your-database-to-your-spring-cleaning-list-1e9k</guid>
      <description>&lt;p&gt;Every time you delete or update a row in your database, the old records are secretly still hiding in the background and taking up space on your hard drive.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;VACUUM&lt;/code&gt; process is like emptying the recycling bin on your laptop. It clears up space, reduces indexing time, and keeps your database squeaky clean.&lt;/p&gt;

&lt;p&gt;But it's important to understand how &lt;code&gt;VACUUM&lt;/code&gt; works so you can avoid the equivalent of vacuuming your house in the middle of a dinner party.&lt;/p&gt;

&lt;p&gt;By understanding how and when Postgres and other databases clean themselves, you will be able to tune your database for low response times and your database server for the right amount of storage.&lt;/p&gt;

&lt;p&gt;In this post we will cover:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What a VACUUM process is and how it works&lt;/li&gt;
&lt;li&gt;Full vs Auto VACUUM&lt;/li&gt;
&lt;li&gt;How to modify and inspect this process&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How Postgres Executes SQL Statements - DEAD vs Removed Tuples
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LzrPJG8Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/updating_table.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LzrPJG8Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/updating_table.gif" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;Updating a Postgres Table. SQL Editor: &lt;a href="https://arctype.com/?utm_campaign=postgres-vacuum&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During the life of a database, we might make thousands of changes to a database like above. But deleting a record does not actually free up the disk space.&lt;/p&gt;

&lt;p&gt;Postgres uses multi-version concurrency control (also known as MVCC) to ensure concurrent data access and consistency.&lt;/p&gt;

&lt;p&gt;Whenever we execute a SQL statement, it uses a snapshot of data instead of every row. This prevents users from viewing inconsistent data generated by concurrent transactions. It also minimizes lock contentions for different sessions trying to read or write data.&lt;/p&gt;

&lt;p&gt;Each transaction gets a transaction ID – XID for a block of &lt;code&gt;BEGIN&lt;/code&gt; and &lt;code&gt;COMMIT&lt;/code&gt; statement. When a user inserts a row, but the transaction is not committed, other users cannot see the newly inserted row.&lt;/p&gt;

&lt;p&gt;For example, in the below image, User A inserts two records into an empty table. If User B were to run a &lt;code&gt;SELECT&lt;/code&gt; statement, they would get zero rows before the record  committed. Similarly, another user still gets table data for a delete statement if the delete transaction statement is not committed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NWSFaZtC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/WGONJfu2X--Zz8l-eCH25W6vZ3MuOHDTF0IwUUgCNttHXLNw6JCCus5qhV8S5EfzJja1E4fMQunzQ8C12a69hmungH_pOW0mcy9lACewO1RgCc2rPrR7gJAKrv68J6DL5bQD73U" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NWSFaZtC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/WGONJfu2X--Zz8l-eCH25W6vZ3MuOHDTF0IwUUgCNttHXLNw6JCCus5qhV8S5EfzJja1E4fMQunzQ8C12a69hmungH_pOW0mcy9lACewO1RgCc2rPrR7gJAKrv68J6DL5bQD73U" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;MVCC data access example&lt;/p&gt;

&lt;p&gt;When we execute a &lt;code&gt;DELETE&lt;/code&gt; or &lt;code&gt;UPDATE&lt;/code&gt; statement, Postgres does the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Delete command&lt;/strong&gt; : Postgres does not remove the tuples physically; it marks the existing tuple as &lt;code&gt;DEAD&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update command:&lt;/strong&gt; The update statement is similar to a &lt;code&gt;DELETE&lt;/code&gt; plus an &lt;code&gt;INSERT&lt;/code&gt; statement.  Therefore, it marks the existing tuple as &lt;code&gt;DEAD&lt;/code&gt; and inserts a new tuple.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have frequent DML( INSERT, UPDATE, DELETE) statements, these &lt;code&gt;DEAD&lt;/code&gt; tuples can take unnecessary space. To free up space, we have to run the following maintenance processes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove the dead tuples&lt;/li&gt;
&lt;li&gt;Remove index tuples pointing to the dead tuples&lt;/li&gt;
&lt;li&gt;Update the statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this knowledge of how a SQL statement is executed and the maintenance requirements, we can discuss the &lt;code&gt;VACUUM&lt;/code&gt; process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cleaning up dead tuples with a VACUUM in Postgres
&lt;/h2&gt;

&lt;p&gt;Postgres uses the &lt;code&gt;VACUUM&lt;/code&gt; maintenance process for removing &lt;code&gt;DEAD&lt;/code&gt; tuples. It reclaims space occupied by dead tuples for reuse in the further data insertion.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;VACUUM&lt;/code&gt; process can run concurrently with other DML transactions because it does not put an exclusive lock on the table. It carries out following operations for removing dead tuples:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Postgres scans all pages of a target table and builds a list of dead tuples. It freezes the old tuples if required.&lt;/li&gt;
&lt;li&gt;It removes the index tuples pointing to the dead tuples by referencing to the dead tuple list.&lt;/li&gt;
&lt;li&gt;It updates the statistics as well as the system catalog for the target table after the &lt;code&gt;VACCUM&lt;/code&gt; processing. It also updates the FSM (Free Space Map) and VM(Visibility Map).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Postgres VACUUM Example
&lt;/h2&gt;

&lt;p&gt;I'll demonstrate the impact of &lt;code&gt;VACUUM&lt;/code&gt; by creating an example table, deleting some values, and then running a &lt;code&gt;VACUUM&lt;/code&gt; command. In this example I also use the &lt;code&gt;pg_freespacecamp&lt;/code&gt; extension to monitor improvements in space utilization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a table with a auto-generated series of data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create table SampleTable(id1 int, id2 int);

insert into
    SampleTable
values (
    generate_series(1,100000),
    generate_series(1,100000)
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Measure free space usage using pg_freespacemap
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create extension pg_freespacemap;

SELECT
  count(*) as npages,
  round(100 * avg(avail) / 8192, 2) as avg_fs_ratio
FROM
  pg_freespace(
    'SampleTable');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RliN-1gW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/02/Screen-Shot-2021-02-16-at-10.59.55-AM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RliN-1gW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/02/Screen-Shot-2021-02-16-at-10.59.55-AM.png" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;SQL Editor: &lt;a href="https://arctype.com/?utm_campaign=postgres-vacuum&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The table we created uses up 443 pages of space on the hard drive, and because all of the data was added sequentially, it has a perfect free space ratio of 0.&lt;/p&gt;

&lt;p&gt;Now I'll delete every record with a value greater that 100. But if we re-run the command above, the number of pages and free space ratio don't change.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;delete from SampleTable where id1 &amp;gt; 10;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To remove the unused pages, we have to run a VACUUM command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;VACUUM sampletable;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qX_A2o-T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/02/Screen-Shot-2021-02-16-at-11.21.53-AM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qX_A2o-T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/02/Screen-Shot-2021-02-16-at-11.21.53-AM.png" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;SQL Editor: &lt;a href="https://arctype.com/?utm_campaign=postgres-vacuum&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now when we track the space usage we can see the number of used pages has gone down from 443 to 1! But the ratio of free to used space on this page has also increased from 0 to 49%. We can return this space to the operating system with a &lt;code&gt;VACUUM FULL&lt;/code&gt; .&lt;/p&gt;

&lt;h2&gt;
  
  
  What does VACUUM FULL do?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0eFlTX3l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/PPbfKyvHk_ypLYpeWNk9ijCgOAWcf6kyZho18vQGSCbCZeREkvt4XSQUegfNwXhrzaAgoOZfvHADjEgSuShGKnLTMNhDr8FlLe7IAwGvESB8r_HvQipNdU3uTqdYz9cqVFCHzvM" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0eFlTX3l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/PPbfKyvHk_ypLYpeWNk9ijCgOAWcf6kyZho18vQGSCbCZeREkvt4XSQUegfNwXhrzaAgoOZfvHADjEgSuShGKnLTMNhDr8FlLe7IAwGvESB8r_HvQipNdU3uTqdYz9cqVFCHzvM" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;Vacuum full diagram&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;VACUUM&lt;/code&gt; process removes DEAD tuples for future usage, but it does not return the space to the operating system.&lt;/p&gt;

&lt;p&gt;Therefore, if you perform a bulk data deletion or updates, you might be using too much storage due to space occupied by these DEAD tuples. The &lt;code&gt;VACUUM FULL&lt;/code&gt; process returns the space to the operating system, as seen in the picture below.  It does the following tasks.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;VACUUM FULL&lt;/code&gt; process obtains an exclusive lock on the table.&lt;/li&gt;
&lt;li&gt;It creates a new empty storage table file.&lt;/li&gt;
&lt;li&gt;Copy the live tuples to the new table storage.&lt;/li&gt;
&lt;li&gt;Removes the old table file and frees the storage.&lt;/li&gt;
&lt;li&gt;It rebuilds all associated table indexes, updates the system catalogs and statistics.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's see how running the &lt;code&gt;VACUUM FULL&lt;/code&gt; command impacts our previous sample table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vacuum full SampleTable;

SELECT
  count(*) as npages,
  round(100 * avg(avail) / 8192, 2) as avg_fs_ratio
FROM
  pg_freespace(
    'SampleTable');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---E3ibrm---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/3bkxpDbQnPfPaLHEmRg7Ha62261Kzn0ckVSoUns9HdFh053RE81AHJmjm_mDwScORy-XvUGuUjg0wuj-zW84UdwfzMipZ_XOyNlYa1atyaZcyOaQz4UBdsif9yIK1A" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---E3ibrm---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/3bkxpDbQnPfPaLHEmRg7Ha62261Kzn0ckVSoUns9HdFh053RE81AHJmjm_mDwScORy-XvUGuUjg0wuj-zW84UdwfzMipZ_XOyNlYa1atyaZcyOaQz4UBdsif9yIK1A" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;SQL Editor: &lt;a href="https://arctype.com/?utm_campaign=postgres-vacuum&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The free space ratio is now down from almost 50% to 0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Postgres VACUUM Performance
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;VACUUM&lt;/code&gt; cleaning is costly because it needs to scan all pages of a target table. If you have a large table with million rows, it can be harmful to your database resources. To preserve resources, Postgres uses the Visibility Map.  Each table in Postgres has a VM that determines whether the page in the table has dead tuples.  If the page does not have a dead tuple, the vacuum processing skips the page.&lt;/p&gt;

&lt;p&gt;For example, in the below image, we have a table with four pages. Two pages have of DEAD tuples.  The visibility map uses a bitmap that defines dead tuples on a specific page.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bit 0: No dead tuples on the page, therefore skip &lt;code&gt;VACCUM&lt;/code&gt; processing.&lt;/li&gt;
&lt;li&gt;Bit 1: Page consists of dead tuples; therefore, &lt;code&gt;VACCUM&lt;/code&gt; that specific page.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---2nZgpUd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/i-IYL_UOx4oPR_OJW_k1vcOq3R7CC08jHKtATLzYoJbJ8lBS9C7Be0K04oNAMrNzxaOBDDpKpXbcSYMeG9IXiXgg-3n41QawAISZmudvWxmmelidReucy9asQgcxtH9GhYsQmfw" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---2nZgpUd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/i-IYL_UOx4oPR_OJW_k1vcOq3R7CC08jHKtATLzYoJbJ8lBS9C7Be0K04oNAMrNzxaOBDDpKpXbcSYMeG9IXiXgg-3n41QawAISZmudvWxmmelidReucy9asQgcxtH9GhYsQmfw" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;Postgres visibility map to track dead tuples&lt;/p&gt;

&lt;h2&gt;
  
  
  Postgres Autovacuum Daemon
&lt;/h2&gt;

&lt;p&gt;Postgres automates the VACCUM processing using auto vacuum daemon. By default, it runs every 1 minute. When the VACCUM wakes up, it invokes three workers. These workers do the VACCUM processing on the target tables.&lt;/p&gt;

&lt;p&gt;You can query &lt;strong&gt;pg_settings&lt;/strong&gt; to check various configurations for the autovacuum process in Postgres:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select name,
    setting,
    category,
    short_desc
from pg_settings 
where name like '%autovacuum%'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vHbXzJN8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/wV9YBichpMe0o1VJ69Wmjpe8mlWmjhakBWAUQ9F27OhezPfl5LPHtYJT8Cwvfu-8NBHbLYTo1fotf9kmFrC83xexm8WN7Wtdu5NSdoqLjwd75Wwa5D1zou5AlalC2A" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vHbXzJN8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh3.googleusercontent.com/wV9YBichpMe0o1VJ69Wmjpe8mlWmjhakBWAUQ9F27OhezPfl5LPHtYJT8Cwvfu-8NBHbLYTo1fotf9kmFrC83xexm8WN7Wtdu5NSdoqLjwd75Wwa5D1zou5AlalC2A" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;Postgres autovacuum configuration. SQL Editor: &lt;a href="https://arctype.com/?utm_campaign=postgres-vacuum&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to modify autovacuum for a specific table in Postgres
&lt;/h3&gt;

&lt;p&gt;Automatically cleaning and reindexing the database every minute might not be optimal if you have millions of rows.  Therefore, we can configure table level. If you specify a table-level configuration, it bypasses the global setting.&lt;/p&gt;

&lt;p&gt;For example, in the below query, we set autovacuum for &lt;code&gt;SampleTable2&lt;/code&gt; if it has more than 100 DEAD tuples.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ALTER TABLE SampleTable2 SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 100)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  VACUUM vs VACUUM FULL
&lt;/h2&gt;

&lt;p&gt;As we know, full vacuum process reclaims space to the operating system. However, FULL VACUUM requires an exclusive lock on the table for its processing, and it blocks all other transactions.  In the below table, we can summaries the difference between these processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jwUBV1EG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/fWbwQTKo1soIoUEv1dzHjDRNs_RnJsBojsGDtfNFRSCKY0D14ucubMWYOfwqIuqAySBxAWzcVhudXht89rJE0nP254sLv1EogCY4Rxe_I9lFGVlAA6SNL8cESmMH-nRrCqdhYcY" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jwUBV1EG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh6.googleusercontent.com/fWbwQTKo1soIoUEv1dzHjDRNs_RnJsBojsGDtfNFRSCKY0D14ucubMWYOfwqIuqAySBxAWzcVhudXht89rJE0nP254sLv1EogCY4Rxe_I9lFGVlAA6SNL8cESmMH-nRrCqdhYcY" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;Vacuum vs Vacuum Full&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How Postgres implements &lt;code&gt;delete&lt;/code&gt; and &lt;code&gt;update&lt;/code&gt; statements&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;VACUUM&lt;/code&gt; to remove DEAD tuples&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;VACUUM FULL&lt;/code&gt; to return space back to the OS&lt;/li&gt;
&lt;li&gt;Configuring auto vacuum for individual tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These concepts are useful for reducing the server costs of databases while maintaining high availability. With the queries shown in the above article you can inspect the internals of your database and its memory consumption.&lt;/p&gt;

&lt;p&gt;If you want a SQL editor with an intuitive interface and easy data visualizations, &lt;a href="https://arctype.com?utm_campaign=postgres-vacuum&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;try Arctype today&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sAQrjueK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/visualization_demo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sAQrjueK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://blog.arctype.com/content/images/2021/02/visualization_demo.gif" alt="Time to Add Databases to your Spring Cleaning - VACUUMs in Postgres"&gt;&lt;/a&gt;Data visualizations with Arctype&lt;/p&gt;

</description>
      <category>database</category>
      <category>programming</category>
      <category>sql</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Create a Web App and Deploy to the Cloud in 20 minutes with Python</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Wed, 10 Feb 2021 23:08:16 +0000</pubDate>
      <link>https://forem.com/arctype/create-a-web-app-and-deploy-to-the-cloud-in-20-minutes-with-python-2ilm</link>
      <guid>https://forem.com/arctype/create-a-web-app-and-deploy-to-the-cloud-in-20-minutes-with-python-2ilm</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fheroku-example.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fheroku-example.gif" alt="Create a Web App and Deploy it to the Cloud in 20 minutes with Python"&gt;&lt;/a&gt;SQL Editor: &lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’ve been looking to deploy your first web app to the cloud, this is a great place to start!&lt;/p&gt;

&lt;p&gt;In this tutorial, I'm going to show how to make the web app and database shown in the gif above, and how to deploy it to Heroku so it can be used by anyone.&lt;/p&gt;

&lt;p&gt;The article is divided in 3 sections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creating a Flask app (web application for submitting the form)&lt;/li&gt;
&lt;li&gt;Setting up a Postgres database with Python (store the data from the submitted forms)&lt;/li&gt;
&lt;li&gt;Deploying the application to Heroku (hosting the application in the cloud so anyone can use it)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I provide a brief overview of each technology at the beginning of each section, so do not be deterred if you aren't familiar with some of these. &lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Requirements
&lt;/h3&gt;

&lt;p&gt;This guide is targeted towards beginner to intermediate programmers with some familiarity with programming and using the command line.&lt;/p&gt;

&lt;p&gt;You will need the following to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt;: You need to &lt;a href="https://www.postgresql.org/download/" rel="noopener noreferrer"&gt;download and install Postgres&lt;/a&gt; on your local computer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.6 or newer&lt;/strong&gt;: Python installers for different versions and OS are available for download  &lt;a href="https://www.python.org/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heroku Account&lt;/strong&gt;: You need to &lt;a href="https://signup.heroku.com/" rel="noopener noreferrer"&gt;create a free Heroku account&lt;/a&gt; if you do not already have one. This is where we will deploy the flask app and connect it to a remote Postgres database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you have all the above installed, we can start by setting up our development environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a Flask App for a Registration Form
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fflask_demo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fflask_demo.gif" alt="Create a Web App and Deploy it to the Cloud in 20 minutes with Python"&gt;&lt;/a&gt;Flask app example&lt;/p&gt;

&lt;p&gt;In this section, we're going to create the Flask app shown above.&lt;/p&gt;

&lt;p&gt;I’ve already created an example Flask app that renders a simple registration form used to collect information from a user.&lt;/p&gt;

&lt;p&gt;Flask is one of the most popular web frameworks written in Python. The flask application I made first makes a request to the endpoints defined in the &lt;code&gt;app/routes.py&lt;/code&gt; file to retrieve the data that is displayed in the registration form.&lt;/p&gt;

&lt;p&gt;It then renders the HTML pages contained in the &lt;code&gt;Template&lt;/code&gt; folder using the &lt;a href="https://jinja.palletsprojects.com/en/2.11.x/templates/" rel="noopener noreferrer"&gt;Jinja Template library&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Instead of starting from scratch, let’s make a copy of the Flask app I created by cloning the Github repo.&lt;/p&gt;

&lt;p&gt;Open a command line tool and run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/ToluClassics/Web_Registration_Form.git
cd web_registration_form
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you ever get lost, you can view the completed project here: &lt;a href="https://github.com/ToluClassics/flask_postgres" rel="noopener noreferrer"&gt;Flask-Postgres App&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next we’re going to create a virtual environment for this project and install the required dependencies. A virtual environment is an isolated environment for different Python projects. They are helpful to keep packages and dependencies separate between different projects.&lt;/p&gt;

&lt;p&gt;Depending on your computer’s operating system, run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Windows:
$ python3 -m venv env
$ source env/scripts/activate
(env) $ pip install -r requirements.txt


MacOS and UNIX:
$ python -m venv env
$ source env/bin/activate
(env) $ pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To test if our environment is properly set up, let’s try launching the application by entering &lt;code&gt;flask run&lt;/code&gt; in the virtual environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) derekxiao@Dereks-MBP Web_Registration_Form % flask run
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can enter the URL above in any browser and view the web form!&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding input validations for the registration form
&lt;/h3&gt;

&lt;p&gt;To get more familiar with the code, let’s add input validations to the form class.&lt;/p&gt;

&lt;p&gt;We want our app to prevent users from filling out the form with the same username or email address multiple times.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;wtforms&lt;/code&gt;, you can create a custom validator that is automatically added to each field by creating a method with the name &lt;code&gt;validate_&amp;lt;field_name&amp;gt;&lt;/code&gt;. In the &lt;code&gt;app/form.py&lt;/code&gt; file, we will add two methods &lt;code&gt;validate_email&lt;/code&gt; and &lt;code&gt;validate_username&lt;/code&gt;. We will query the Registrations table (we’ll create this table later) username and email entered by the user. If it returns a value , we will raise a validation error.&lt;/p&gt;

&lt;p&gt;Open the &lt;code&gt;form.py&lt;/code&gt; file and first import Registrations from the &lt;code&gt;models.py&lt;/code&gt; file (we'll add this file later).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from app.models import Registration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in the &lt;code&gt;RegistrationForm&lt;/code&gt; class add the following validation methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def validate_email(self, email):
    user = Registration.query.filter_by(email = email.data).first()
    if user is not None:
        raise ValidationError('Please use a different email address')

def validate_username(self, username):
    user = Registration.query.filter_by(username = username.data).first()
    if user is not None:
        raise ValidationError('Please use a different email address')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's add in the &lt;code&gt;models.py&lt;/code&gt; file so we can test the input validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a PostgreSQL Database
&lt;/h2&gt;

&lt;p&gt;First we need to create the database that &lt;code&gt;models.py&lt;/code&gt; will connect to.&lt;/p&gt;

&lt;p&gt;For this tutorial we'll be using a PostgreSQL database. PostgreSQL is an open source, &lt;a href="https://blog.arctype.com/postgres-ordbms-explainer?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;object-relational database&lt;/a&gt;. And the best part, it’s completely free.&lt;/p&gt;

&lt;p&gt;Let’s start by creating a new Postgres database to store the data from submitted registration forms. If you already have a Postgres database, feel free to jump to the next section to start connecting your Flask app to the database.&lt;/p&gt;

&lt;p&gt;If you haven’t yet, first &lt;a href="https://www.postgresql.org/download/" rel="noopener noreferrer"&gt;download and install Postgres&lt;/a&gt; on your computer.&lt;/p&gt;

&lt;p&gt;Now open a new command line window and type &lt;code&gt;psql&lt;/code&gt;. &lt;code&gt;psql&lt;/code&gt; is a terminal-based interface to manage your Postgres databases.&lt;/p&gt;

&lt;p&gt;Run the command &lt;code&gt;create database [database name];&lt;/code&gt; to create a new database.&lt;/p&gt;

&lt;p&gt;Below is a list of popular &lt;code&gt;psql&lt;/code&gt; commands to manage your database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;\l&lt;/code&gt; : list all available databases&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dt&lt;/code&gt; : list all tables&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\d table_name&lt;/code&gt;: describe a table, showing its columns, column type e.t.c&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\du&lt;/code&gt; : list all users and their roles.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\?&lt;/code&gt;: get a list of all psql commands&lt;code&gt;\q&lt;/code&gt;: to quit psql terminal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Alternatively, you can avoid memorizing the different psql commands and use Arctype's &lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;&lt;strong&gt;free SQL editor&lt;/strong&gt;&lt;/a&gt; that provides a modern interface for managing both Postgres and MySQL databases:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Farctype_demo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Farctype_demo.gif" alt="Create a Web App and Deploy it to the Cloud in 20 minutes with Python"&gt;&lt;/a&gt;&lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Try Arctype Today&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting a Flask app to a Local Postgres Database
&lt;/h2&gt;

&lt;p&gt;Next we'll create the database connection to our new Postgres database that will be used inside our &lt;code&gt;models.py&lt;/code&gt; file. To get started, there are 3 packages to install:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;flask-sqlalchemy&lt;/code&gt;: A python extension for managing databases&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flask-migrate&lt;/code&gt;: For making updates to an existing database&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Pyscopg2&lt;/code&gt;: A Postgresql database adapter for python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the activated virtual environment terminal run &lt;code&gt;(env) $ pip install flask-sqlalchemy flask-migrate pyscopg2&lt;/code&gt; to install the packages listed above. If you hit an error installing &lt;code&gt;pyscopg2&lt;/code&gt;, you may need to install &lt;code&gt;psycopg2-binary&lt;/code&gt; instead.&lt;/p&gt;

&lt;p&gt;The following sections I'll show how to use each package to configure and integrate our Postgres database with our Flask application. The end result will be a Flask app that is able to add new entries to a Postgres table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manage Postgres databases from Python with SQLAlchemy
&lt;/h3&gt;

&lt;p&gt;SQLAlchemy is a library that provides a way to seamlessly integrate python programs and databases. We will use SQLAlchemy as an Object Relational Mapper to convert python classes to tables in our Postgres database.&lt;/p&gt;

&lt;p&gt;Flask-SQLAlchemy automatically translates the created models to the syntax of any database (Postgres in our case).&lt;/p&gt;

&lt;p&gt;To interact with our database from the application, We need to add the following configuration variables to the config file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SQLALCHEMY_DATABASE_URI&lt;/code&gt;: this is the connection string that tells our Flask app which database to connect to. The usual form of the postgres database connection string is &lt;code&gt;postgresql://localhost/[YOUR_DATABASE_NAME]&lt;/code&gt;. If your Postgres user has a password, you will need to use a different &lt;a href="https://www.postgresql.org/docs/9.3/libpq-connect.html#:~:text=The%20general%20form%20for%20a,param1%3Dvalue1%26...%5D&amp;amp;text=Percent%2Dencoding%20may%20be%20used,words%20listed%20in%20Section%2031.1." rel="noopener noreferrer"&gt;URI variant&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SQLALCHEMY_TRACK_MODIFICATIONS&lt;/code&gt;: We set this to &lt;code&gt;False&lt;/code&gt; so we do not get a warning from flask every time we make a change to the application.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To make the above changes, open the &lt;code&gt;config.py&lt;/code&gt; file and add the following variables to the &lt;code&gt;Config&lt;/code&gt; class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SQLALCHEMY_DATABASE_URI = os.environ.get('SQLALCHEMY_DATABASE_URI') or \
    'postgresql://localhost/[YOUR_DATABASE_NAME]'

SQLALCHEMY_TRACK_MODIFICATIONS = False
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the next session, we'll use the &lt;code&gt;flask-migrate&lt;/code&gt; package to seamlessly handle changes to our database structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up Flask-Migrate to Manage Database Structures
&lt;/h3&gt;

&lt;p&gt;As we scale our application, we may need to make structural changes to our tables without losing all the data already in the database. &lt;a href="https://flask-migrate.readthedocs.io/" rel="noopener noreferrer"&gt;Flask-Migrate&lt;/a&gt; is a python extension that handles SQLAlchemy database migrations automatically using Alembic. It also supports multiple database migrations and other functionalities.&lt;/p&gt;

&lt;p&gt;Now, we will add a database object that represents our database and also create an instance of the migration class to handle database migrations.&lt;/p&gt;

&lt;p&gt;Open the &lt;code&gt;app/ __init__.py&lt;/code&gt; and first import the flask-migrate and SQLAlchemy packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from flask_migrate import Migrate
from flask_sqlalchemy import SQLAlchemy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add the following variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db = SQLAlchemy(app) ## Create DB object
migrate = Migrate(app,db) ## Create migration object
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Set up the PostgreSQL database model
&lt;/h3&gt;

&lt;p&gt;Now we're finally ready to create the &lt;code&gt;models.py&lt;/code&gt; file to set up our database model. Database models are used to represent tables and their structures in the database; they determine the schema of the table.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;models.py&lt;/code&gt; will define the structure of our tables and other necessary table information.  We need to create a model (Class) whose attributes are the same as the data fields we intend to store in the database tables.&lt;/p&gt;

&lt;p&gt;Inside the &lt;code&gt;app&lt;/code&gt; directory, create a new file called &lt;code&gt;models.py&lt;/code&gt; and first import the db instance that we created in the previous section: &lt;code&gt;from app import db&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then create a class with the attributes for each data field in the registration form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Registration(db.Model):
    id = db.Column(db.Integer, primary_key = True)
    first_name = db.Column(db.String(64))
    last_name = db.Column(db.String(64))
    username = db.Column(db.String(64))
    company = db.Column(db.String(64))
    contact_no = db.Column(db.String(64))
    email = db.Column(db.String(120), index=True, unique=True)

    def __repr__ (self):
        return '&amp;lt;User {}&amp;gt;'.format(self.username)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last step is importing the model in the &lt;code&gt;app\ __init__.py&lt;/code&gt; file: &lt;code&gt;from app import models&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you are working with your own database, we've created a free tool to &lt;a href="https://blog.arctype.com/erd-builder?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;design database schemas&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a new Registrations table in Postgres using Flask
&lt;/h3&gt;

&lt;p&gt;Next, we will instantiate the database from the command line by running &lt;code&gt;flask db init&lt;/code&gt; in the virtual environment.&lt;/p&gt;

&lt;p&gt;First ensure that your &lt;code&gt;FLASK_APP&lt;/code&gt; environment variable is set before you run this command. You can run the following commands to check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ python                   
Python 3.8.2 (default, Dec 21 2020, 15:06:04) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
&amp;gt;&amp;gt;&amp;gt; import os
&amp;gt;&amp;gt;&amp;gt; print(os.environ['FLASK_APP'])
app.py
&amp;gt;&amp;gt;&amp;gt; quit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you run &lt;code&gt;flask db init&lt;/code&gt; you should see the following output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ flask db init
    Creating directory /Users/mac/Desktop/Web_Registration_Form/migrations ... done
  Creating directory /Users/mac/Desktop/Web_Registration_Form/migrations/versions ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/script.py.mako ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/env.py ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/README ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/alembic.ini ... done
  Please edit configuration/connection/logging settings in
  '/Users/mac/Desktop/Web_Registration_Form/migrations/alembic.ini' before proceeding.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the command logs, we can see that a migrations folder is created in our application. This folder contains files that are needed to migrate tables and update table schemas on our database.&lt;/p&gt;

&lt;p&gt;Now that we have a migrations repository, it’s time to migrate the Registration table to the database using the &lt;code&gt;flask-migrate&lt;/code&gt; extension.&lt;/p&gt;

&lt;p&gt;Ensure that your Postgres server is up, then run the &lt;code&gt;flask db migrate&lt;/code&gt; command to automatically migrate our SQLAlchemy database to the Postgres database that we assigned earlier with the URI in the config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ flask db migrate -m "registrations table"
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.autogenerate.compare] Detected added table 'registration'
INFO [alembic.autogenerate.compare] Detected added index 'ix_registration_email' on '['email']'
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/versions/eccfa5d8a3f6_registrations_table.py ... done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;flask db migrate&lt;/code&gt; does not make any changes to the database, it only generates the migration script. To make the changes, we use the &lt;code&gt;flask db upgrade&lt;/code&gt; command. You can also revert the changes by using &lt;code&gt;flask db downgrade&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ flask db upgrade
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -&amp;gt; eccfa5d8a3f6, registrations table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can open Arctype and check if the table was successfully migrated to the local Postgres database&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2FScreen-Shot-2021-02-10-at-4.50.06-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2FScreen-Shot-2021-02-10-at-4.50.06-PM.png" alt="Create a Web App and Deploy it to the Cloud in 20 minutes with Python"&gt;&lt;/a&gt;&lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Get Started with Arctype for Free&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Inserting Data From a Flask App in Postgres
&lt;/h2&gt;

&lt;p&gt;In this section, we'll update our route function to add the data collected from the webpage to the website.&lt;/p&gt;

&lt;p&gt;First, we will update our imports by including our database instance and table model in the &lt;code&gt;app\routes.py&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from app.models import Registration
from app import db
from flask import redirect, url_for
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the route function, we will create an instance of the &lt;code&gt;Registration&lt;/code&gt; class and update it with information collected from the form. We then proceed to add and commit the instance to the database.&lt;/p&gt;

&lt;p&gt;Add the following if statement inside the &lt;code&gt;index()&lt;/code&gt; function in &lt;code&gt;routes.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if form.validate_on_submit(): 
        reg_form = Registration(username=form.username.data,email=form.email.data, first_name=form.first_name.data,last_name=form.last_name.data, company=form.company.data,contact_no=form.contact_no.data)
        db.session.add(reg_form)
        db.session.commit()
        return redirect(url_for('index'))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing if the Flask App is Inserting Data in Postgres
&lt;/h2&gt;

&lt;p&gt;Let's test our application locally one more time before deploying to Heroku:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ flask run
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://127.0.0.1:5000/" rel="noopener noreferrer"&gt;http://127.0.0.1:5000/&lt;/a&gt; and submit the form.&lt;/p&gt;

&lt;p&gt;After submitting the form, you can check that a new entry was added to your database by checking the Registrations table in Arctype:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fpostgres_update.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fpostgres_update.gif" alt="Create a Web App and Deploy it to the Cloud in 20 minutes with Python"&gt;&lt;/a&gt;&lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Try Arctype Today&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying the Flask app and Postgres Database to Heroku
&lt;/h2&gt;

&lt;p&gt;Now that we have tested our application locally, it's time to deploy the application and our Postgres database to Heroku so we can access it from anywhere.&lt;/p&gt;

&lt;p&gt;The first step is &lt;a href="https://devcenter.heroku.com/articles/heroku-cli" rel="noopener noreferrer"&gt;downloading and installing the Heroku CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Then we can log into Heroku from the command line with: &lt;code&gt;Heroku login&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Once we're logged in, we can &lt;a href="https://devcenter.heroku.com/articles/creating-apps" rel="noopener noreferrer"&gt;create a new app on Heroku&lt;/a&gt; with by running: &lt;code&gt;Heroku create [app-name]&lt;/code&gt;. Heroku apps are in a global namespace, therefore you need to choose a unique name for your application. If you do not specify a name for your application, Heroku will generate a random name for your application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ Heroku create arctype-registration-app
Creating ⬢ arctype-registration-app... done
https://arctype-registration-app.herokuapp.com/ | https://git.heroku.com/arctype-registration-app.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To proceed with development, we need to update our &lt;code&gt;requirements.txt&lt;/code&gt; file and create a &lt;code&gt;Procfile&lt;/code&gt; in our root directory. The &lt;code&gt;Procfile&lt;/code&gt; is used to declare the commands run by the app on Heroku.&lt;/p&gt;

&lt;p&gt;Next, we need to install &lt;code&gt;gunicorn&lt;/code&gt; which is a python server needed to run the application on Heroku. In the activated python environment, run &lt;code&gt;pip install gunicorn&lt;/code&gt; to install the &lt;code&gt;gunicorn&lt;/code&gt; package.&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;pip freeze &amp;gt; requirements.txt&lt;/code&gt; to update the requirements file. The &lt;a href="https://devcenter.heroku.com/articles/procfile#procfile-naming-and-location" rel="noopener noreferrer"&gt;&lt;code&gt;procfile&lt;/code&gt;&lt;/a&gt; defines processes based using the following format &lt;code&gt;&amp;lt;process type&amp;gt;: &amp;lt;command&amp;gt;&lt;/code&gt;. Since our application contains a web server (gunicorn), the process type is web and the command will launch the gunicorn server. Therefore, we need to run the command below on the terminal to create the Procfile:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;echo web: gunicorn app:app &amp;gt; Procfile&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Heroku has different&lt;a href="https://devcenter.heroku.com/articles/heroku-postgres-plans#hobby-tier" rel="noopener noreferrer"&gt;service plans&lt;/a&gt; for its postgresql offering. Choosing a plan for your application depends on the characteristics of your application and its bandwidth for service downtimes. In this tutorial we will use the &lt;a href="https://devcenter.heroku.com/articles/heroku-postgres-plans#hobby-tier" rel="noopener noreferrer"&gt;hobby-dev plan&lt;/a&gt;, which is free and serves the purpose we need it for.&lt;/p&gt;

&lt;p&gt;Now, we will create a hobby-dev postgres database for our application using the &lt;code&gt;heroku addons:create heroku-postgresql:hobby-dev --app app-name&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ heroku addons:create heroku-postgresql:hobby-dev --app arctype-registration-app
Creating heroku-postgresql:hobby-dev on ⬢ arctype-registration-app... free
Database has been created and is available
 ! This database is empty. If upgrading, you can transfer
 ! data from another database with pg:copy
Created postgresql-slippery-96960 as DATABASE_URL
Use heroku addons:docs heroku-postgresql to view documentation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More information on working with postgres databases can be &lt;a href="https://devcenter.heroku.com/articles/heroku-postgresql#connecting-in-python" rel="noopener noreferrer"&gt;found here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now that we have created a remote postgres database for our application on Heroku, we need to update the &lt;code&gt;SQLALCHEMY_DATABASE_URI&lt;/code&gt; variable in the &lt;code&gt;config.py&lt;/code&gt; file with the new database URI. To retrieve the remote database URI, we use the &lt;code&gt;heroku config --app app-name&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;Once the URI is updated, the final step is deploying our application to Heroku's servers.&lt;/p&gt;

&lt;p&gt;First, we will commit all our files and push to the heroku master branch. Run &lt;code&gt;git add .&lt;/code&gt; and &lt;code&gt;git commit -m “heroku commit”&lt;/code&gt;, then &lt;code&gt;git push heroku main&lt;/code&gt; to deploy to Heroku.&lt;/p&gt;

&lt;p&gt;Your app is now live at &lt;a href="https://app-name.herokuapp.com" rel="noopener noreferrer"&gt;&lt;code&gt;https://your-app-name.herokuapp.com&lt;/code&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FeCvMandZdVanuwd0xAYoAkNxVMHJ6NrgxpxULMXnSmy8km7llhZ7t5RzMgbMCyN2KcbuTm5x2w0dEa9Am94zYqFx8j_06lgCN315RBI6OklH8VP0yazgc72AsO6hoLF0z_CHeuqz" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FeCvMandZdVanuwd0xAYoAkNxVMHJ6NrgxpxULMXnSmy8km7llhZ7t5RzMgbMCyN2KcbuTm5x2w0dEa9Am94zYqFx8j_06lgCN315RBI6OklH8VP0yazgc72AsO6hoLF0z_CHeuqz" alt="Create a Web App and Deploy it to the Cloud in 20 minutes with Python"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating Tables from a Local Postgres Database to Heroku
&lt;/h2&gt;

&lt;p&gt;The last step is migrating the Registrations table that we created in our local Postgres database to the new Heroku database instance. We can do this by running &lt;code&gt;db create_all()&lt;/code&gt; from the heroku python terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(env) $ heroku run python
Running python on ⬢ arctype-registration-app... up, run.9023 (Free)
Python 3.6.12 (default, Sep 29 2020, 17:50:28) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
&amp;gt;&amp;gt;&amp;gt; from app import db
&amp;gt;&amp;gt;&amp;gt; db.create_all()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To check if the Registrations table was successfully created, connect to the Heroku database from Arctype and you should see the Registrations table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Congratulations!
&lt;/h2&gt;

&lt;p&gt;You are now the proud owner of a web app that anyone with internet access can use, from anywhere in the world. Lets zoom out and review what we covered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How a Flask app works&lt;/li&gt;
&lt;li&gt;Creating a Postgres database&lt;/li&gt;
&lt;li&gt;Creating a database model in Python using SQLAlchemy&lt;/li&gt;
&lt;li&gt;Migrating a database in Python using flask-migrate&lt;/li&gt;
&lt;li&gt;Inserting data into a Postgres database from a Flask app&lt;/li&gt;
&lt;li&gt;Deploying a Flask app and Postgres database to Heroku&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Postgres is a powerful database that is used by large tech companies across the world. &lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt; is a free SQL editor that makes working with databases easier. &lt;a href="https://arctype.com?utm_campaign=postgres-heroku&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Try Arctype for free today&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>database</category>
      <category>codenewbie</category>
    </item>
    <item>
      <title>Free Tool to Design Database Schemas in &lt;5 minutes</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Mon, 08 Feb 2021 20:42:42 +0000</pubDate>
      <link>https://forem.com/arctype/free-tool-to-design-database-schemas-in-5-minutes-1l1n</link>
      <guid>https://forem.com/arctype/free-tool-to-design-database-schemas-in-5-minutes-1l1n</guid>
      <description>&lt;p&gt;Designing a database is often one of the first steps when starting a new project.&lt;/p&gt;

&lt;p&gt;Spending time on the database schema can help ensure a scalable application down the line. &lt;/p&gt;

&lt;p&gt;After Arctype went remote, we created a collaborative ERD tool in Figma to quickly design database schemas as a team. &lt;/p&gt;

&lt;p&gt;We turned this tool into a public template so anyone can use it for their own projects. Check it out and let us know if there are any additional features you want to see! &lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.arctype.com/erd-builder/"&gt;Get access to the free ERD designer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/bND5cWmk_nk"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>database</category>
      <category>programming</category>
      <category>tooling</category>
      <category>design</category>
    </item>
    <item>
      <title>Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Wed, 03 Feb 2021 20:42:00 +0000</pubDate>
      <link>https://forem.com/arctype/forget-sql-vs-nosql-get-the-best-of-both-worlds-with-json-in-postgresql-2oan</link>
      <guid>https://forem.com/arctype/forget-sql-vs-nosql-get-the-best-of-both-worlds-with-json-in-postgresql-2oan</guid>
      <description>&lt;p&gt;Have you ever started a project and asked - "should I use a SQL or NoSQL database?"&lt;/p&gt;

&lt;p&gt;It’s a big decision. There are multiple horror stories of developers &lt;a href="http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/" rel="noopener noreferrer"&gt;choosing a NoSQL database and later regretting it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But now you can get the best of both worlds with JSON in PostgreSQL.&lt;/p&gt;

&lt;p&gt;In this article I cover the benefits of using JSON, anti-patterns to avoid, and an example of how to use JSON in Postgres.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;When to use JSON in Postgres&lt;/li&gt;
&lt;li&gt;JSON Basics.&lt;/li&gt;
&lt;li&gt;Types of JSON in PostgreSQL.&lt;/li&gt;
&lt;li&gt;Creating a JSON table&lt;/li&gt;
&lt;li&gt;How to query JSON data&lt;/li&gt;
&lt;li&gt;JSONPath: advanced JSON syntax&lt;/li&gt;
&lt;li&gt;Arctype: An Easy Tool for Databases&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Use a SQL Database for Non-Relational Data? &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxy2q9dydnlvzan5my9ti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxy2q9dydnlvzan5my9ti.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
Example of normalized data in a school database



&lt;p&gt;First we have to briefly cover the advantages of using SQL vs NoSQL.&lt;/p&gt;

&lt;p&gt;The difference between SQL and NoSQL is the data model. SQL databases use a &lt;em&gt;relational data model&lt;/em&gt;, and NoSQL databases usually use a &lt;em&gt;document model&lt;/em&gt;. A key difference is how each data model handles &lt;strong&gt;data normalization&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Data normalization is the process of splitting data into “normal forms” to reduce &lt;em&gt;data redundancy&lt;/em&gt;. &lt;a href="https://blog.arctype.com/sql-50-years/" rel="noopener noreferrer"&gt;The concept was first introduced in the 1970s as a way to reduce spending on expensive disk storage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the example above, we have a normalized entity relationship diagram for a school database. The &lt;code&gt;StudentClass&lt;/code&gt; table stores every class a student has taken. By normalizing the data, we only keep one row for each class in the &lt;code&gt;Class&lt;/code&gt; table, instead of duplicating class data for every student in the class.&lt;/p&gt;

&lt;p&gt;But what if we also wanted to track every lunch order (entree, sides, drink, snacks, etc) to send each student a summary at the end of every week?&lt;/p&gt;

&lt;p&gt;In this case it would make more sense to store the data in a single &lt;em&gt;document&lt;/em&gt; instead of normalizing it. Students will always be shown their entire lunch order, so we can avoid expensive joins by keeping the lunch order data together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "student_id": 100,
    "order_date": "2020-12-11",
    "order_details": {
        "cost": 5.87,
        "entree": ["pizza"],
        "sides": ["apple", "fries"],
        "snacks": ["chips"]
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
Example schema for lunch orders using JSON





&lt;p&gt;Instead of maintaining a separate NoSQL database, we now  store lunch orders as JSON objects inside an existing relational Postgres database.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is JSON? &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "student_id": 100,
    "order_date": "2020-12-11",
    "order_details": {
        "cost": 5.87,
        "entree": ["pizza"],
        "sides": ["apple", "fries"],
        "snacks": ["chips"]
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
JSON Example





&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Data type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;student_id&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;order_date&lt;/td&gt;
&lt;td&gt;Date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;order_details&lt;/td&gt;
&lt;td&gt;Object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sides&lt;/td&gt;
&lt;td&gt;Array&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;JSON, or Javascript Object Notation, is a flexible format to pass data between applications, similar to a csv file. However, instead of rows and columns, JSON objects are collections of key/value pairs.&lt;/p&gt;

&lt;p&gt;According to Stack Overflow, JSON is now the &lt;a href="https://insights.stackoverflow.com/trends?tags=json%2Cxml%2Cprotocol-buffers%2Cyaml%2Ccsv" rel="noopener noreferrer"&gt;most popular data interchange format&lt;/a&gt;, beating csv, yaml, and xml.&lt;/p&gt;

&lt;p&gt;The original creator of JSON, Douglas Crockford, attributes the &lt;a href="https://twobithistory.org/2017/09/21/the-rise-and-rise-of-json.html" rel="noopener noreferrer"&gt;success of JSON&lt;/a&gt; to its readability by both developers and machines, similar to &lt;a href="https://blog.arctype.com/sql-50-years/" rel="noopener noreferrer"&gt;why SQL has been dominant for almost 50 years&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The JSON format is easy to understand, but also flexible enough to handle both primitive and complex data types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evolution of JSON in PostgreSQL 
  
  Plain JSON type


&lt;/h2&gt;
&lt;p&gt;In 2012, PostgreSQL 9.2 introduced the first &lt;code&gt;JSON&lt;/code&gt; data type in Postgres. It had syntax validation but underneath it stored the incoming document directly as text with white spaces included. It wasn’t very useful for real world querying, index based searching and other functionalities you would normally do with a JSON document.&lt;/p&gt;
&lt;h3&gt;
  
  
  JSONB
&lt;/h3&gt;

&lt;p&gt;In late 2014, PostgreSQL 9.4 introduced the &lt;code&gt;JSONB&lt;/code&gt; data type and most importantly improved the querying efficiency by adding &lt;strong&gt;&lt;a href="https://blog.arctype.com/database-index/" rel="noopener noreferrer"&gt;indexing&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;JSONB&lt;/code&gt; data type stores JSON as a binary type. This introduced overhead in processing since there was a conversion involved but it offered the ability to index the data using GIN/Full text based indexing and included additional operators for easy querying.&lt;/p&gt;
&lt;h3&gt;
  
  
  JSONPath
&lt;/h3&gt;

&lt;p&gt;With JSON’s increasing popularity, the 2016 SQL Standard brought in a new   standard/path language for navigating JSON data. It’s a powerful way of     searching JSON data very similar to XPath for XML data. PostgreSQL 12 introduced support for the JSON Path standard.&lt;/p&gt;

&lt;p&gt;We will see examples of JSON, JSONB, and JSONPath in the sections below. An important thing to note is that all JSON functionality is natively present in the database. There is no need for a contrib module or an external package to be installed.&lt;/p&gt;
&lt;h2&gt;
  
  
  JSON Example in Postgres &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Lets create a Postgres table to store lunch orders with a JSON data type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create table LunchOrders(student_id int, order json);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can insert JSON formatted data into our table with an &lt;code&gt;INSERT&lt;/code&gt; statement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;insert into LunchOrders values(100, '{
    "order_date": "2020-12-11",
    "order_details": {
        "cost": 4.25,
        "entree": ["pizza"],
        "sides": ["apple", "fries"],
        "snacks": ["chips"]}
    }'      
);

insert into LunchOrders values(100, '{
    "order_date": "2020-12-12",
    "order_details": {
        "cost": 4.89,
        "entree": ["hamburger"],
        "sides": ["apple", "salad"],
        "snacks": ["cookie"]}
    }'      
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do a  &lt;code&gt;Select *&lt;/code&gt;  from the table, you would see something like below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-10.10.32-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-10.10.32-PM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;Get JSON objects. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Inserting data into a &lt;code&gt;JSONB&lt;/code&gt; column is exactly the same, except we change the data type to &lt;code&gt;jsonb&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create table LunchOrders(student_id int, orders jsonb);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to Query JSON Data in Postgres &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Querying data from JSON objects uses slightly different operators than the ones that we use for regular data types ( &lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;lt;&lt;/code&gt; , &lt;code&gt;&amp;gt;&lt;/code&gt;, etc).&lt;/p&gt;

&lt;p&gt;Here are some of the most common JSON operators:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operator&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;-&amp;gt;&lt;/td&gt;
&lt;td&gt;Select a key/value pair&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;-&amp;gt;&amp;gt;&lt;/td&gt;
&lt;td&gt;Filter query results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;#&amp;gt;&lt;/td&gt;
&lt;td&gt;Selecting a nested object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;#&amp;gt;&amp;gt;&lt;/td&gt;
&lt;td&gt;Filter query results in a nested object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@&amp;gt;&lt;/td&gt;
&lt;td&gt;Check if an object contains a value&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/12/functions-json.html" rel="noopener noreferrer"&gt;Full list of JSON operators&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;-&amp;gt;&lt;/code&gt; and &lt;code&gt;-&amp;gt;&amp;gt;&lt;/code&gt; operators work with both &lt;code&gt;JSON&lt;/code&gt; and &lt;code&gt;JSONB&lt;/code&gt; type data. The rest of the operators are &lt;em&gt;full text search&lt;/em&gt; operators and only work with the &lt;code&gt;JSONB&lt;/code&gt; data type.&lt;/p&gt;

&lt;p&gt;Let's see some examples of how to use each operator to query data in our &lt;code&gt;LunchOrders&lt;/code&gt; table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting values from a JSON object
&lt;/h3&gt;

&lt;p&gt;We can use the &lt;code&gt;-&amp;gt;&lt;/code&gt; operation to find every day that a specific student bought a school lunch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select orders -&amp;gt; 'order_date'
from lunchorders
where student_id = 100;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-10.13.44-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-10.13.44-PM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;Select JSON data. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Filtering JSON data using a &lt;code&gt;WHERE&lt;/code&gt; clause
&lt;/h3&gt;

&lt;p&gt;We can use the &lt;code&gt;-&amp;gt;&amp;gt;&lt;/code&gt; operator to filter for only lunch orders on a specific date.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select orders
from lunchorders
where orders -&amp;gt;&amp;gt; 'order_date' = '2020-12-11';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-10.44.10-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-10.44.10-PM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;Filter JSON by date. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This query is similar to the &lt;code&gt;=&lt;/code&gt; operator that we would normally use, except we have to first add a &lt;code&gt;-&amp;gt;&amp;gt;&lt;/code&gt; operator to tell Postgres that the &lt;code&gt;order_date&lt;/code&gt; field is in the &lt;code&gt;orders&lt;/code&gt; column.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting data from an array in a JSON object
&lt;/h3&gt;

&lt;p&gt;Let's say we wanted to find every side dish that a specific student has ordered.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;sides&lt;/code&gt; field is nested inside the &lt;code&gt;order_details&lt;/code&gt; object, but we can access it by chaining two &lt;code&gt;-&amp;gt;&lt;/code&gt; operators together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select
  orders -&amp;gt; 'order_date',
  orders -&amp;gt; 'order_details' -&amp;gt; 'sides'
from
  lunchorders
where
  student_id = 100;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-11.05.10-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-11.05.10-PM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;Getting nested values from a JSON object. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Great now we have arrays of the sides that student 100 ordered each day! What if we only wanted the first side in the array? We can chain together a &lt;em&gt;third&lt;/em&gt; &lt;code&gt;-&amp;gt;&lt;/code&gt; operator and give it the &lt;strong&gt;array index&lt;/strong&gt; we're looking for.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-11.10.14-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-30-at-11.10.14-PM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;Getting array values at a specific index. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieving nested values from a JSON object
&lt;/h3&gt;

&lt;p&gt;Instead of chaining together multiple &lt;code&gt;-&amp;gt;&lt;/code&gt; operators, we can also use the &lt;code&gt;#&amp;gt;&lt;/code&gt; operator to specify a path for retrieving a nested value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select orders #&amp;gt; '{order_details, sides}'
from lunchorders;

      ?column?      
--------------------
 ["apple", "fries"]
 ["apple", "salad"]
(2 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Checking if a JSON object contains a value
&lt;/h3&gt;

&lt;p&gt;Lets say we wanted to see every order a student made that had a side salad. We can't use the previous &lt;code&gt;-&amp;gt;&amp;gt;&lt;/code&gt; for filtering because &lt;code&gt;sides&lt;/code&gt; is an array of values.&lt;/p&gt;

&lt;p&gt;To check if an array or object contains a specific value, we can use the &lt;code&gt;@&amp;gt;&lt;/code&gt; operator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select
  orders
from
    lunchorders
where
    orders -&amp;gt; 'order_details' -&amp;gt; 'sides' @&amp;gt; '["salad"]';

orders                                                                   
----------
 {"order_date": "2020-12-12", "order_details": {"cost": 4.89, "sides": ["apple", "salad"], "entree": ["hamburger"], "snacks": ["cookie"]}}
(1 row)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  JSONPath: The Final Boss &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;JSON Path is a powerful tool for searching and manipulating a JSON object in SQL using &lt;em&gt;&lt;a href="https://www.postgresql.org/docs/12/datatype-json.html#DATATYPE-JSONPATH" rel="noopener noreferrer"&gt;JavaScript-like syntax&lt;/a&gt;&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dot (.) is used for member access.&lt;/li&gt;
&lt;li&gt;Square brackets ("[]") are used for array access.&lt;/li&gt;
&lt;li&gt;SQL/JSON arrays are 0-indexed, unlike regular SQL arrays that start from 1.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Built-in functions
&lt;/h3&gt;

&lt;p&gt;JSONPath also includes powerful built-in functions like &lt;code&gt;size()&lt;/code&gt; to find the length of arrays.&lt;/p&gt;

&lt;p&gt;Let's use the JSONPath &lt;code&gt;size()&lt;/code&gt; function to get every order that had &amp;gt;= 1 snack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select *
from lunchorders
where orders @@ '$.order_details.snacks.size() &amp;gt; 0';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-31-at-10.51.50-AM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F01%2FScreen-Shot-2021-01-31-at-10.51.50-AM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;JSONPath Build-in functions example. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison without type casting
&lt;/h3&gt;

&lt;p&gt;JSONPath also enables comparisons without explicit type casting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select *
from lunchorders
where orders @@ '$.order_details.cost &amp;gt; 4.50';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what the same query would look like with our regular JSON comparisons:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select *
from lunchorders
where (orders -&amp;gt; 'order_details' -&amp;gt;&amp;gt; 'cost')::numeric &amp;gt; 4.50;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  JSON Summary &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;In this article we've covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When to use SQL vs NoSQL&lt;/li&gt;
&lt;li&gt;A history of JSON in Postgres&lt;/li&gt;
&lt;li&gt;Examples of how to work with JSON data&lt;/li&gt;
&lt;li&gt;JSON query performance with indexing&lt;/li&gt;
&lt;li&gt;JSON anti-patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Working with JSON data can be complicated. &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype is a free, modern SQL editor&lt;/a&gt; that makes working with databases easier. Try it out today and leave a comment below with your thoughts!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fjson_update-2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2Fjson_update-2.gif" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;JSONPath Build-in functions example. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sneak peek at part 2 of the JSON in Postgres series:&lt;/p&gt;

&lt;p&gt;With our existing table, the database engine has to scan through the entire table to find a record. This is called a &lt;em&gt;sequential scan&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2FScreen-Shot-2021-02-03-at-9.51.26-AM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2021%2F02%2FScreen-Shot-2021-02-03-at-9.51.26-AM.png" alt="Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL"&gt;&lt;/a&gt;Analyzing JSON query performance. SQL Editor: &lt;a href="https://arctype.com?utm_campaign=json-postgres&amp;amp;utm_medium=blog&amp;amp;utm_source=devto" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sequential scans become degrade in performance as the dataset grows.&lt;/p&gt;

&lt;p&gt;Follow to get notified of Part 2 where I will show show how to index a dataset with 700k rows to improve query performance by 350X. &lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
      <category>programming</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Slow Queries? 10X Query Performance with a Database Index</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Tue, 26 Jan 2021 18:33:46 +0000</pubDate>
      <link>https://forem.com/arctype/slow-queries-10x-query-performance-with-a-database-index-1p90</link>
      <guid>https://forem.com/arctype/slow-queries-10x-query-performance-with-a-database-index-1p90</guid>
      <description>&lt;p&gt;A good database index can &lt;a href="https://blog.arctype.com/using-indexing-to-optimize-database-search-performance/"&gt;improve your SQL query speeds by 99% or more&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s take a table with 1 billion, 16 byte names and a disk with &lt;a href="https://courses.cs.washington.edu/courses/cse378/10sp/lectures/lec26.pdf"&gt;a 10ms seek time and a 10MB/s transfer rate&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If we wanted to find "John Smith" in this table, a regular search that has to check every single name in sequential order would take &lt;strong&gt;~2 hours&lt;/strong&gt; (.016ms transfer time * 500M rows on average, assume 0 seek time because sequential).&lt;/p&gt;

&lt;p&gt;This same search with a database index would only take &lt;strong&gt;~0.3 seconds&lt;/strong&gt; ((10ms seek time + .016ms transfer time) * log(1*10^10)). A &lt;strong&gt;99.99%&lt;/strong&gt; speed improvement.&lt;/p&gt;

&lt;p&gt;But database indexes also use increased overhead and &lt;a href="https://www.itprotoday.com/data-analytics-and-data-management/why-almost-everything-you-know-about-database-indexes-wrong"&gt;can degrade performance if not used correctly&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This article will cover the main considerations for creating the right index for your database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index type&lt;/li&gt;
&lt;li&gt;Selecting the correct column&lt;/li&gt;
&lt;li&gt;Choosing how many indexes to create&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Download &lt;a href="https://arctype.com/?utm_campaign=index-explained&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype&lt;/a&gt; to follow along with the examples below and create an index on your own database.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is a Database Index?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cc7bucKc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/Database-Index--2-.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cc7bucKc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/Database-Index--2-.png" alt="Slow Queries? 10X Query Performance with a Database Index"&gt;&lt;/a&gt;Database Index Example&lt;/p&gt;

&lt;p&gt;A database index is a data structure used to organize data so that it is easier to search.&lt;/p&gt;

&lt;p&gt;Indexes consist of a set of tuples. The first tuple value is the search key, and the second contains a pointer(s) to a block on the hard drive where the entire row of data is stored.&lt;/p&gt;

&lt;p&gt;These tuples are then organized into different data structures (i.e. B-tree, Hash, etc) depending on the database index type.&lt;/p&gt;

&lt;p&gt;To understand how a tree data structure speeds up search performance, I recommend playing with some of the &lt;a href="https://visualgo.net/bn/bst"&gt;interactive visualizations&lt;/a&gt; online.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Postgres Index Type Should You Use?
&lt;/h2&gt;

&lt;p&gt;Postgres offers 6 different index types to solve for different use cases.&lt;/p&gt;

&lt;p&gt;Here's a breakdown of their advantages and disadvantages:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;th&gt;When to use?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;B-Tree (Most common)&lt;/td&gt;
&lt;td&gt;O(log(n)) insertions and queries&lt;/td&gt;
&lt;td&gt;Can be used for both equality and range queries (i.e. &amp;lt;, =, &amp;gt;, BETWEEN, IN, etc)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hash&lt;/td&gt;
&lt;td&gt;O(1) (faster than B-tree)&lt;/td&gt;
&lt;td&gt;Only works for equality comparisons. Hash indexes are not recommended by Postgres beecause &lt;a href="https://www.postgresql.org/docs/9.1/indexes-types.html"&gt;they can product inaccurate results&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generalized Search Tree (GiST)&lt;/td&gt;
&lt;td&gt;O(log(n)) for insertion and queries&lt;/td&gt;
&lt;td&gt;Used for operations for beyond equality and range comparisons on geometric data types (i.e. &amp;lt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Space-partitioned GiST (SP-GiST)&lt;/td&gt;
&lt;td&gt;O(log(n)) for insertion and queries&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.sai.msu.su/~megera/wiki/spgist_dev"&gt;Non-balanced, disk-based data structures (i.e. quad-trees, k-d trees)&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generalized Inverted Indexes (GIN)&lt;/td&gt;
&lt;td&gt;O(log(n) for queries. Longer insertion time.&lt;/td&gt;
&lt;td&gt;Indexing data types that map multiple values to one row (i.e. arrays and full text search)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block Range Index (BRIN)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.percona.com/blog/2019/07/16/brin-index-for-postgresql-dont-forget-the-benefits/"&gt;20X faster than B-tree and a 99%+ space savings&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Table entries have to be ordered in the same format as the data on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Choosing the Right Database Index
&lt;/h2&gt;

&lt;p&gt;Creating an index does not guarantee better database performance. Every time you write to a table with an index, the database engine is updating both the table and any impacted indexes.&lt;/p&gt;

&lt;p&gt;This is how you can decide which table columns to use for an index:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose a column that is frequently queried but not frequently changed (add/delete)&lt;/li&gt;
&lt;li&gt;The column has a referential integrity constraint&lt;/li&gt;
&lt;li&gt;The column has a UNIQUE key integrity constraint.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every modern database engine also has a &lt;em&gt;query planner&lt;/em&gt; that decides how each query will be run. In some scenarios it's possible that queries you would expect to use your index are actually doing sequential scans. &lt;a href="https://blog.arctype.com/using-indexing-to-optimize-database-search-performance/"&gt;To check if the query plan is using your index, you can run &lt;code&gt;EXPLAIN&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For instance, running &lt;code&gt;EXPLAIN&lt;/code&gt; on &lt;a href="https://www.postgresql.org/docs/9.5/using-explain.html"&gt;this example&lt;/a&gt; shows that it is using a sequential scan instead of an index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;tenk1&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;unique1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;7000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                         &lt;span class="n"&gt;QUERY&lt;/span&gt; &lt;span class="n"&gt;PLAN&lt;/span&gt;
&lt;span class="c1"&gt;-----------------------------------------------------------------&lt;/span&gt;
 &lt;span class="n"&gt;Seq&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;tenk1&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;483&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7001&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;244&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unique1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;7000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How many indexes should I use?
&lt;/h3&gt;

&lt;p&gt;It depends. If you're managing a table with frequent changes, then you probably want less overhead with indexes. But on the other hand if you're making mostly reads from the table, then adding additional indexes would probably speed up performance.&lt;/p&gt;

&lt;p&gt;Before adding a new index, &lt;a href="https://www.itprotoday.com/data-analytics-and-data-management/why-almost-everything-you-know-about-database-indexes-wrong"&gt;check if your current indexes are actually slowing down your CPU&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Create an Index in Postgres - Syntax
&lt;/h2&gt;

&lt;p&gt;Postgres index examples are provided in the following sections. In the scope of this article, the examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Create Index&lt;/strong&gt; : Defining a new index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop Index&lt;/strong&gt; : Removing an index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;List Index&lt;/strong&gt; : Listing all indexes.Unique Index: Defining unique indexes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique Index:&lt;/strong&gt; Defining unique indexes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to create an index
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;method&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;column_name&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;ASC&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;NULLS&lt;/span&gt; &lt;span class="err"&gt;{&lt;/span&gt;&lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;LAST&lt;/span&gt; &lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The optional &lt;code&gt;ASC/DESC&lt;/code&gt; and &lt;code&gt;NULLS LAST&lt;/code&gt; parameters are beneficial for data that you plan on retrieving in sorted order, and you want the null values to appear first or last in the last.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to create a partial index
&lt;/h3&gt;

&lt;p&gt;You can also &lt;a href="https://www.postgresql.org/docs/8.0/indexes-partial.html"&gt;create an index on only a subset of a table&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;employee_index&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;employee_id&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A partial index is beneficial in situations where there are large clusters of data with the same index value. Even if this data is indexed, the Postgres query planner will usually use a sequential search because the data has the same values. So a partial index can remove clusters of data with the same index value and save space.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to remove an index
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to find existing indices
&lt;/h3&gt;

&lt;p&gt;Postgres automatically creates a &lt;code&gt;pg_indexes&lt;/code&gt; table that you can query to find existing indexes in a database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;indexname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;indexdef&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;pg_indexes&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="n"&gt;tablename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'employees'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Postgres Reindex Explained
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Reindex&lt;/code&gt; drops an existing index in a table and rebuilds it using the current table values. The most common scenario for using reindex is when the data has changed significantly, and &lt;a href="https://www.postgresql.org/docs/10/routine-reindex.html"&gt;there are now existing pages that are inefficiently using space&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A routine reindexing of your database can reduce the index size and improve performance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;##&lt;/span&gt; &lt;span class="n"&gt;Rebuild&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;specific&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt;
&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;my_index&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="o"&gt;##&lt;/span&gt; &lt;span class="n"&gt;Rebuild&lt;/span&gt; &lt;span class="k"&gt;every&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;
&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;my_table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="o"&gt;##&lt;/span&gt; &lt;span class="n"&gt;Rebuild&lt;/span&gt; &lt;span class="k"&gt;every&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt; 
&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;my_database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;A properly created database index can improve query performance by 99% or more.&lt;/p&gt;

&lt;p&gt;This article covered the main considerations for creating a database index that improves performance instead of slowing it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index type&lt;/li&gt;
&lt;li&gt;Selecting the correct column&lt;/li&gt;
&lt;li&gt;Choosing how many indexes to create&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that you've optimized your query performance, it's time to &lt;a href="https://blog.arctype.com/five-arctype-tips/"&gt;speed up your SQL workflow&lt;/a&gt;. Arctype's &lt;a href="https://arctype.com?utm_campaign=index-explained&amp;amp;utm_medium=blog&amp;amp;utm_source=blog"&gt;collaborative SQL client&lt;/a&gt; allows you to easily share databases, queries, and visualizations with anyone.&lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
      <category>programming</category>
      <category>discuss</category>
    </item>
    <item>
      <title>How Do the Top 20 Words in Biden's Inauguration Speech Compare to Trump's?</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Thu, 21 Jan 2021 17:16:16 +0000</pubDate>
      <link>https://forem.com/arctype/how-do-the-top-20-words-in-biden-s-inauguration-speech-compare-to-trump-s-2lc8</link>
      <guid>https://forem.com/arctype/how-do-the-top-20-words-in-biden-s-inauguration-speech-compare-to-trump-s-2lc8</guid>
      <description>&lt;p&gt;Can Python unite the nation? &lt;/p&gt;

&lt;p&gt;Yesterday we saw our country's 45th successful transfer of the presidency.&lt;/p&gt;

&lt;p&gt;This marked the end of a highly contested election during which our nation at times felt more divided than ever.&lt;/p&gt;

&lt;p&gt;But as I sat in my living room today with my parents and watched Biden’s inaugural address, I felt hopeful. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Some days you need a hand. There are other days when we're called to lend a hand. That's how it has to be, that's what we do for one another. And if we are that way our country will be stronger, more prosperous, more ready for the future. And we can still disagree.” - Joe Biden, 2021 Inaugural Address&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a citizen, I was inspired by Joe's promise for a united nation.&lt;/p&gt;

&lt;p&gt;But as a developer I started to wonder- &lt;em&gt;could I quantify this hope?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The inaugural address is a president’s first speech to the nation. The speech is meticulously written by a team of writers to capture the mood of the nation and the most pressing issues that we face.&lt;/p&gt;

&lt;p&gt;Could the specific words used in this speech give us insight into the path ahead?&lt;/p&gt;

&lt;p&gt;I compared the top 20 most common words in Biden's speech with the top 20 words in Trump's 2017 inaugural address to see where our country is now compared to four years ago, and what to expect over the next four years.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python set-up&lt;/li&gt;
&lt;li&gt;Web scraping with Beautiful Soup&lt;/li&gt;
&lt;li&gt;NLP with nlkt&lt;/li&gt;
&lt;li&gt;Results: Top 20 words takeaways&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Python to Find the Top 20 Most Common Words &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;This next section is a tutorial for the Python analysis. If natural language processing doesn't get you excited, then you may want to jump to the end (but it's also only 20 lines of code so could be fun to learn!)&lt;/p&gt;

&lt;p&gt;The goal for this analysis is to take each inaugural address and find the most common words. The analysis consists of two parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scraping the speech from the web using Beautiful Soup&lt;/li&gt;
&lt;li&gt;Processing the words using NLKT&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to run the code at home, this is what you'll need to do to get set up:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install python 3&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;requests&lt;/code&gt;, &lt;code&gt;BeautifulSoup&lt;/code&gt; and &lt;code&gt;nltk&lt;/code&gt; with &lt;code&gt;pip3 install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;brew install jupyter&lt;/code&gt; and then open a jupyter notebook by running &lt;code&gt;jupyter notebook&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now you can run all of the commands below in the jupyter notebook! &lt;/p&gt;

&lt;p&gt;If you want to skip the scraping and cleaning, you can &lt;a href="https://arctype.com?utm_campaign=speech-analysis&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;download Arctype&lt;/a&gt; and use the database credentials at the end to view the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Web Scraping with Beautiful Soup &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Web scraping is the process of collecting information from the web. In this scenario, we're going to be scraping transcripts of each president's inauguration speech.&lt;/p&gt;

&lt;p&gt;You can find each president's speech at these websites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.yahoo.com/now/full-transcript-joe-bidens-inauguration-175723360.html" rel="noopener noreferrer"&gt;Biden's 2021 inaugural address&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://globalnews.ca/news/3194820/donald-trump-inauguration-speech-and-transcript/" rel="noopener noreferrer"&gt;Trump's 2017 inaugural address&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We first use the &lt;code&gt;requests&lt;/code&gt; package to scrape the entire HTML code from each website.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.yahoo.com/now/full-transcript-joe-bidens-inauguration-175723360.html&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congrats you've built your first web scraper!&lt;/p&gt;

&lt;p&gt;This code is making a HTTP request to retrieve the HTML code from the server that the speech is stored at.&lt;/p&gt;

&lt;p&gt;Now we have to take this mess of HTML and find just the text from each president's speech. We can do this easily with Python's Beautiful Soup package.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;

&lt;span class="n"&gt;biden_speech&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the code above we've converted the HTML from earlier into a beautiful soup object that is easily parseable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fk7comclvhjdvzaw079pj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fk7comclvhjdvzaw079pj.png" alt="Using Chrome DevTools to Find a HTML Tag"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we have to find the specific HTML block that contains the text we're looking for. We can do this using the browser's DevTools console.&lt;/p&gt;

&lt;p&gt;Open the speech in a new tab in your browser and press &lt;code&gt;cmd+option+I&lt;/code&gt; to open the DevTools console. Highlight the text you're looking for, and you'll be able to see the HTML tag that contains that text in the console on the right.&lt;/p&gt;

&lt;p&gt;For Biden's speech, we can see that it's contained in a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; tag labelled with a &lt;code&gt;caas-body&lt;/code&gt; class name. Switching back to Python, we can find that tag using the &lt;code&gt;find_all&lt;/code&gt; method with our beautiful soup object from before.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;biden_speech_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;biden_speech&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;caas-body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we look at the &lt;code&gt;biden_speech_content&lt;/code&gt; object, we'll still find other html tags that aren't related to the speech such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"caas-readmore caas-readmore-collapse"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;button&lt;/span&gt; &lt;span class="na"&gt;aria-label=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"link rapid-noclick-resp caas-button
    collapse-button"&lt;/span&gt; &lt;span class="na"&gt;data-ylk=&lt;/span&gt;&lt;span class="s"&gt;"elm:readmore;slk:Story continues"&lt;/span&gt;
    &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        Story continues
    &lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to find just the text from Biden's speech, we can filter for the &lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt; tags that aren't labeled with a class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;biden_speech_content_v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;biden_speech_content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we have all the text, but the string &lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt; is appended to the beginning of every sentence. We can remove these HTML tags with the Beautiful Soup &lt;code&gt;get_text&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;biden_speech_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;biden_speech_content_v2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;biden_speech_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;biden_speech_str&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we should be left with a clean speech that we can analyze with the &lt;code&gt;nlkt&lt;/code&gt; package.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhh4w61elm02xb8gkxnv9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhh4w61elm02xb8gkxnv9.png" alt="Biden's Inauguration Speech, Cleaned with Python Beautiful Soup&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Finding Word Frequency with NLKT&lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We're getting close to the end now! The final steps are doing some basic natural language processing (NLP) techniques using the Python NLP package, &lt;code&gt;NLKT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We could do a frequency analysis of the speech now, but this would show words like "I", "We", and "The" as the most common words. In natural language processing these are called stop words.&lt;/p&gt;

&lt;p&gt;We can use NLKT's &lt;a href="https://gist.github.com/sebleier/554280" rel="noopener noreferrer"&gt;list of English stop words&lt;/a&gt; to find just the words that we're interested in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;nltk.tokenize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;word_tokenize&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;nltk.corpus&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stopwords&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;nltk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FreqDist&lt;/span&gt;

&lt;span class="n"&gt;biden_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;biden_speech_str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;filtered_biden_speech&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;biden_words&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stop_words&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isalpha&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's break down what the code is doing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Using &lt;code&gt;.lower()&lt;/code&gt; to cast the entire speech to lower case so it can be compared to the stop words&lt;/li&gt;
&lt;li&gt;Separating the string into individual words with &lt;code&gt;word_tokenize&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Removing stop words: &lt;code&gt;if not w in stop_words&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Removing punctuation like periods and commas: &lt;code&gt;w.isalpha()&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now we have a list of words that we can count!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;freq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FreqDist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filtered_biden_speech&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;freq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But what you might find as you look through the list is that there are separate counts for similar words such as "country" and "countries". In order to count these as one word, we have to &lt;a href="https://www.geeksforgeeks.org/python-lemmatization-with-nltk/" rel="noopener noreferrer"&gt;lemmatize&lt;/a&gt; the list so that every word is converted to its base word.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;nltk.stem&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WordNetLemmatizer&lt;/span&gt;

&lt;span class="n"&gt;lemmatized_biden&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;wordnet_lemmatizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lemmatize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filtered_biden_speech&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;freq_lemma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FreqDist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lemmatized_biden&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;freq_lemma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! You've successfully scraped data from the web and analyzed it with NLP all while supporting democracy. Let's take a look at the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Biden's vs. Trump's Inauguration Speeches: Most Frequent Words&lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;freq_lemma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;h&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_layout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yaxis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autorange&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reversed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fndw5wa45hcolmtjeeaot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fndw5wa45hcolmtjeeaot.png" alt="Top 10 Most Frequent Words from Biden's Inauguration Speech"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The top word was distorted by the lemmatizer, but the word was "us".&lt;/p&gt;

&lt;p&gt;These were the top 10 words from Trump's speech in 2017:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsm81repfn2bvmkturozm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsm81repfn2bvmkturozm.png" alt="Top 10 Most Frequent Words from Trump's Inauguration Speech"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What stood out to me is that 50% of the top 10 words in for both presidents were the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;America&lt;/li&gt;
&lt;li&gt;American&lt;/li&gt;
&lt;li&gt;Nation&lt;/li&gt;
&lt;li&gt;People&lt;/li&gt;
&lt;li&gt;One&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The optimistic side in me looks at this data and sees a nation that shares common values. We care about our country, and we care about each other.&lt;/p&gt;

&lt;p&gt;But at the same time, we are all facing our own unique issues. If we look at the next 10 most common words for each president's speech we begin to see some differences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Biden's Speech:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fm1wrzr0n9vw3mu5b833j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fm1wrzr0n9vw3mu5b833j.png" alt="Biden's Top 10-20 Most Common Words&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trump's Speech:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fod7ezxl01udiccldhyke.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fod7ezxl01udiccldhyke.png" alt="Trump's Top 10-20 Most Common Words&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Biden's speech was undeniably a call to bring our nation &lt;code&gt;together&lt;/code&gt; in &lt;code&gt;unity&lt;/code&gt;. On the other side, we can see Trump appealing to Americans whose &lt;code&gt;job[s]&lt;/code&gt; are under threat and need to &lt;code&gt;protect&lt;/code&gt; their livelihood and families.&lt;/p&gt;

&lt;p&gt;The data shows two groups of people facing their own challenges, but I also see one nation with common values.&lt;/p&gt;

&lt;p&gt;We set off to see if we could quantify "hope". And I believe we found an answer.&lt;/p&gt;

&lt;p&gt;If two presidents with polar opposite political views can appeal to their supporters with 50% of the same vocabulary, then there is still hope to unite around our similarities.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What are the common objects we as Americans love, that define us as Americans? I think we know. Opportunity, security, liberty, dignity, respect, honor, and yes, the truth. - Joe Biden, 2021 Inaugural Address&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  A Full Speech Comparison with Arctype
&lt;/h2&gt;

&lt;p&gt;I shared the top 20 words, but there were more than 500 unique words in Biden's inauguration speech. If you want to see more analysis, we've uploaded all the speech data to Arctype so you can skip the scraping and cleaning.&lt;/p&gt;

&lt;p&gt;The dataset includes 2 tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frequencies table: full list of the word frequencies for both speeches&lt;/li&gt;
&lt;li&gt;Sentences tables: cleaned sentences for both speeches so you can do your own analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how to connect to the data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvubphfo3y800xhi69hek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvubphfo3y800xhi69hek.png" alt="Arctype SQL Client Database Connection"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download the free &lt;a href="https://arctype.com?utm_campaign=speech-analysis&amp;amp;utm_medium=blog&amp;amp;utm_source=dev.to"&gt;Arctype SQL Client&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Input the credentials below in Arctype to connect to the database&lt;/li&gt;
&lt;li&gt;Run a query!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Database credentials:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;host: &lt;code&gt;arctype-pg-demo.c4i5p0deezvq.us-west-2.rds.amazonaws.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;port: &lt;code&gt;5432&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;user: &lt;code&gt;root&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;password: &lt;code&gt;HC9x0OkI9vVO4wqprscg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;database: &lt;code&gt;inauguration_2021&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you enjoyed this post, sign up for the &lt;a href="https://blog.arctype.com" rel="noopener noreferrer"&gt;Arctype newsletter&lt;/a&gt; to receive more posts written by experienced developers to help and inspire other devs.&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>database</category>
      <category>programming</category>
    </item>
    <item>
      <title>SQL is 50 Years Old. What Can We Learn From Its Success? </title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Fri, 15 Jan 2021 18:00:00 +0000</pubDate>
      <link>https://forem.com/arctype/sql-is-50-years-old-what-can-we-learn-from-its-success-29</link>
      <guid>https://forem.com/arctype/sql-is-50-years-old-what-can-we-learn-from-its-success-29</guid>
      <description>&lt;p&gt;&lt;em&gt;Arctype writer: &lt;a href="https://firstretail.com"&gt;Felix Schildorfer&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In March 1971, Intel introduced the world's first general microprocessor, the &lt;a href="https://spectrum.ieee.org/tech-history/silicon-revolution/chip-hall-of-fame-intel-4004-microprocessor"&gt;Intel 4004&lt;/a&gt;. It had ~2,300 transistors and cost $60.&lt;/p&gt;

&lt;p&gt;Fast forward almost 50 years, and the newest iPhone has nearly &lt;strong&gt;12 billion&lt;/strong&gt; transistors (but unfortunately costs a little more than $60).&lt;/p&gt;

&lt;p&gt;Many of the programming languages we use today were not introduced until the 90s (Java was introduced in 1996). However, there is one programming language that is still as popular today as it was when it was introduced nearly 50 years ago: &lt;strong&gt;SQL.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article will discuss the events that led to the introduction of relational databases, why SQL grew in popularity, and what we can learn from its success.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;A History of Early Database Management - IDS and CODASYL&lt;/li&gt;
&lt;li&gt;The Network Data Model - Better Than Today's Relational Model?&lt;/li&gt;
&lt;li&gt;Arrival of The Relational Model&lt;/li&gt;
&lt;li&gt;Relational vs. Network Data Models&lt;/li&gt;
&lt;li&gt;The Rise and Reign of SQL&lt;/li&gt;
&lt;li&gt;The Secret to SQL’s 50 Year Reign - And What We Can Learn&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  History of Early Database Management - IDS and CODASYL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ClS39oE3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/image-3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ClS39oE3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/image-3.png" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;&lt;a href="https://www.slashdb.com/2015/10/09/flashback-friday-charles-bachman/"&gt;Image Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In 1962, Charles W. Bachman (no relation to Erlich Bachman) was working as part of a small team at General Electric. One year later, Bachman’s team introduced what would later be recognized as the first database management system - the Integrated Data Store (IDS).&lt;/p&gt;

&lt;p&gt;10 years later, Bachman would receive the coveted &lt;a href="https://amturing.acm.org/"&gt;Turing Award&lt;/a&gt;, often called the Nobel Prize of computer science, for his contributions to computing with IDS.&lt;/p&gt;

&lt;h3&gt;
  
  
  What was IDS?
&lt;/h3&gt;

&lt;p&gt;In the early 1960s, computer science was just beginning as an academic field. For context, ASCII was not introduced until 1963.&lt;/p&gt;

&lt;p&gt;To understand IDS we have to first understand the two main forces that led to its development:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The introduction of disk storage&lt;/li&gt;
&lt;li&gt;A migration to high-level programming languages&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Disk storage
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Izaki7Vh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/image-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Izaki7Vh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/image-1.png" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;Above: &lt;a href="https://calisphere.org/item/ark:/13030/kt80003844/"&gt;Moving a RAMAC 305&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In 1956, IBM introduced the first commercial hard disk drive, the &lt;a href="https://www.computerhistory.org/storageengine/first-commercial-hard-disk-drive-shipped/#:~:text=Informed%20by%20Jacob%20Rabinow's%20ideas,of%20Accounting%20and%20Control"&gt;RAMAC 305&lt;/a&gt;%20system.).&lt;/p&gt;

&lt;p&gt;The introduction of disk drives allowed programmers to retrieve and update data by jumping directly to a location on the disk. This was a vast improvement from its predecessor, tape drives, which required moving sequentially through the tape to retrieve a specific piece of data.&lt;/p&gt;

&lt;p&gt;But developers now had to figure out where records were stored on the disk. Due to the limitations of &lt;a href="https://medium.com/@princeabhishek410/understanding-file-management-system-in-operating-system-4c7fbfc306f2"&gt;file management systems&lt;/a&gt; in early operating systems, this was an advanced task reserved only for experienced programmers.&lt;/p&gt;

&lt;p&gt;Developers needed a solution to simplify working with disk drives.&lt;/p&gt;

&lt;h4&gt;
  
  
  High-level programming
&lt;/h4&gt;

&lt;p&gt;At the same time, computer science was beginning to move from innovators to early adopters on the &lt;a href="https://en.wikipedia.org/wiki/Diffusion_of_innovations"&gt;adoption curve&lt;/a&gt;. Low level programming languages like Assembly were popular among the early academics, but regular programmers were switching to higher level programming languages like COBOL for their usability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cacm.acm.org/magazines/2016/7/204036-how-charles-bachman-invented-the-dbms-a-foundation-of-our-digital-world/fulltext?mobile=false#body-3"&gt;So what was IDS?&lt;/a&gt; IDS solved for disk storage and high-level programming. IDS allowed developers to use high-level programming languages, like COBOL, to build applications that input and retrieve data from disk storage. Because of this function, IDS has received the distinction as the first database management system.&lt;/p&gt;

&lt;h3&gt;
  
  
  CODASYL - A new standard for database management
&lt;/h3&gt;

&lt;p&gt;In 1969, the Committee of Data Systems Languages (CODASYL) released a report proposing a standard for database management. Bachman was part of the committee, and the report drew heavily from IDS.&lt;/p&gt;

&lt;p&gt;The CODASYL Data Model introduced many of the core features in database management systems that we use today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schemas&lt;/li&gt;
&lt;li&gt;Data definition language (DDL)&lt;/li&gt;
&lt;li&gt;Data manipulation language (DML)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly, IDS and CODASYL introduced a new way for modeling data that influenced the eventual development of SQL - the &lt;strong&gt;network data model&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Network Data Model - Better Than Todays Relational Model?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--o7pbrtYS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/NxY7lsez_qxOIV5dLEXwJf-lbwhDo27cQ5JsEHUM0Tw-20M_nRfW_8rKPa571f9uwavlTFvRb0HoZGv13LFHEbRy7i_3atsXMzh1DlWUa0_1V6SZHdDeaN53nHMLhydHTc0yolyN" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--o7pbrtYS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh5.googleusercontent.com/NxY7lsez_qxOIV5dLEXwJf-lbwhDo27cQ5JsEHUM0Tw-20M_nRfW_8rKPa571f9uwavlTFvRb0HoZGv13LFHEbRy7i_3atsXMzh1DlWUa0_1V6SZHdDeaN53nHMLhydHTc0yolyN" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;Above: Network model example&lt;/p&gt;

&lt;p&gt;A data model is a standardized way to describe (model) the world (data).&lt;/p&gt;

&lt;p&gt;The previous hierarchical data model used tree structures to describe data, but these were limited to one-to-many relationships. The new network model allow records to have multiple parents, which created a &lt;strong&gt;graph structure&lt;/strong&gt;. By allowing multiple parents, network models were able to model many-to-one and many-to-many relationships.&lt;/p&gt;

&lt;p&gt;In the network model, relationships between tables were stored in &lt;em&gt;sets&lt;/em&gt;. Each set had an &lt;em&gt;owner&lt;/em&gt; (i.e. teachers) and one or more &lt;em&gt;members&lt;/em&gt; (i.e. classes and students).&lt;/p&gt;

&lt;p&gt;One of the key benefits of the network model is that related records in a set were connected directly by pointers. Sets were implemented using next, prior, and owner pointers, which allowed for easy traversal similar to a linked list.&lt;/p&gt;

&lt;p&gt;The low-level nature of network data models offered performance benefits, but they came at a cost. The network data model had increased storage costs because every record had to store extra pointers to its previous and parent record.&lt;/p&gt;

&lt;h2&gt;
  
  
  Arrival of The Relational Model
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YaaGebIg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/image-4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YaaGebIg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/image-4.png" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;Above: &lt;a href="http://www.myrgpvdbms.com/2018/05/define-terms-domain-tuple-attribute-and.html"&gt;Example of a relational model&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In 1970, 8 years afters IDS, Edgar F. Codd introduced the relational model in his seminal paper &lt;a href="https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf"&gt;“A Relational Model of Data for Large Shared Data Banks”&lt;/a&gt; (also earning him a spot with Bachman as a Turing Award recipient).&lt;/p&gt;

&lt;p&gt;Codd showed that all data in a database could be represented in terms of &lt;em&gt;tuples&lt;/em&gt; (&lt;em&gt;rows&lt;/em&gt; in SQL) grouped together into &lt;em&gt;relations&lt;/em&gt; (&lt;em&gt;tables&lt;/em&gt; in SQL). To describe the database queries, he invented a form of first-order predicate logic called &lt;a href="https://www.geeksforgeeks.org/tuple-relational-calculus-trc-in-dbms/"&gt;tuple relational calculus&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Tuple relational calculus introduced a &lt;strong&gt;declarative language&lt;/strong&gt; for querying data. Declarative programming languages allow programmers to say &lt;em&gt;what&lt;/em&gt; they want to do without describing &lt;em&gt;how&lt;/em&gt; to do it.&lt;/p&gt;

&lt;p&gt;This new declarative language was much easier to use for developers. The relational model laid out all the data in the open. Developers could retrieve all the data from a table, or read a single row, in one command. (thanks &lt;a href="https://en.wikipedia.org/wiki/Query_optimization"&gt;query optimizer&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Gone were the days of following a labyrinth of pointers to find your data.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Relational vs Network Data Models
&lt;/h2&gt;

&lt;p&gt;Relational databases decreased the high storage costs that network databases had by &lt;strong&gt;normalizing data&lt;/strong&gt;. Normalization was a process of decomposing tables to eliminate redundancy, and thereby decrease the footprint of data on disk.&lt;/p&gt;

&lt;p&gt;However, relational databases had an increased CPU cost. In order to process normalized data, relational databases had to load tables in memory and use compute power to “join” tables back together. Let’s walk through the process for finding all classes and students for a given teacher with a relational model.&lt;/p&gt;

&lt;p&gt;The database system would first perform an operation to retrieve all relevant classes. Then it would perform a second operation to retrieve student data. All of this data would be stored in memory, and it would run a third operation to merge the data before returning the result.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--paZuVsb5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/8BGzrtrrXJig9p_y8RvzWdS36JEeCde97G5kyEKvwj70aeTiQTu7HkhiYmXJAYpJNykrWNVkI3NpQVaxpTZ2aNFZZSQwv7Nj2-QukpxKMm1pr8EivCMs0bb1dRrke3EBYITqAYIN" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--paZuVsb5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://lh4.googleusercontent.com/8BGzrtrrXJig9p_y8RvzWdS36JEeCde97G5kyEKvwj70aeTiQTu7HkhiYmXJAYpJNykrWNVkI3NpQVaxpTZ2aNFZZSQwv7Nj2-QukpxKMm1pr8EivCMs0bb1dRrke3EBYITqAYIN" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;Performance comparison between relational and network models&lt;/p&gt;

&lt;p&gt;In a performance case study using real data, &lt;a href="https://raima.com/network-model-vs-relational-model/"&gt;Raima&lt;/a&gt; found that network database models had 23x better insert performance and 123x faster query performance.&lt;/p&gt;

&lt;p&gt;So why are relational databases the leading database solution?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The relational model was more flexible to changes, and its declarative syntax simplified the job for programmers.&lt;/p&gt;

&lt;p&gt;And Moore’s law was working its magic in the background. The cost of computing continued to decrease, and eventually the increased computing cost of relational models was outweighed by the productivity gains.&lt;/p&gt;

&lt;p&gt;Fast forward 50 years, and now the most expensive resource in a data center is the CPU. &lt;a href="http://www.arctype.com/downloads"&gt;Sign up for Arctype&lt;/a&gt; and get notified of a future post where I discuss what databases might look like over the next 50 years.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rise and Reign of SQL
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VpzsEtZe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/arctype-brain-trust.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VpzsEtZe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/arctype-brain-trust.png" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And finally, we've come to the arrival of the SQL that we all love.  &lt;/p&gt;

&lt;p&gt;4 years after the publication of Codd’s paper, Donald Chamberlin and Raymond Boyce published “&lt;a href="https://dl.acm.org/doi/10.1145/800296.811515"&gt;SEQUEL: A Structured English Query Language&lt;/a&gt;”.&lt;/p&gt;

&lt;p&gt;They described SEQUEL as “a set of simple operations on tabular structures, […] of equivalent power to the first order predicate calculus.” IBM saw the potential and moved quickly to develop the first version of SEQUEL as part of their System R project in the early 1970s.&lt;/p&gt;

&lt;p&gt;The name would later change to SQL due to &lt;a href="https://en.wikipedia.org/wiki/SQL"&gt;trademark issues&lt;/a&gt; with the UK-based Hawker Siddeley aircraft company.&lt;/p&gt;

&lt;p&gt;The next big step in SQL's adoption was almost a decade later. In 1986, the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) published the first official SQL standard: &lt;strong&gt;SQL-86&lt;/strong&gt;. The standard broke down &lt;a href="https://www.db-book.com/"&gt;SQL in several parts&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Definition Language (DDL):&lt;/strong&gt; commands to define and modify schemas and relations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Manipulation Language (DML):&lt;/strong&gt; commands to query, insert and delete information from a database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transaction Control:&lt;/strong&gt; commands specifying the timing of transactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrity:&lt;/strong&gt; commands to set constraints in the information in a database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Views:&lt;/strong&gt; commands to define Views&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization:&lt;/strong&gt; commands to specify user access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedded SQL:&lt;/strong&gt; commands that specify how to embed SQL in other languages&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Competitors to SQL
&lt;/h3&gt;

&lt;p&gt;Between 1974 to today there have been numerous competitors that have tried stealing market share from SQL’s dominance as a query language. These new syntaxes often catered to a specific, new technology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lisp -&amp;gt; &lt;a href="https://www.quicklisp.org/beta/UNOFFICIAL/docs/clsql/doc/index.html"&gt;CLSQL&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;.NET -&amp;gt; &lt;a href="https://docs.microsoft.com/en-us/dotnet/csharp/linq/"&gt;LINQ&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ruby on Rails -&amp;gt; &lt;a href="https://guides.rubyonrails.org/active_record_querying.html"&gt;ActiveRecord&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fast forward 35 years and SQL is still ubiquitous with databases. How has SQL maintained its reign as a query language, and what can we learn from its story?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Secret to SQLs 50 Year Reign - And What We Can Learn
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EWeoN62J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/stack.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EWeoN62J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.arctype.com/content/images/2021/01/stack.png" alt="What Can We Learn From SQL’s 50 Year Reign? A Story of 2 Turing Awards"&gt;&lt;/a&gt;Above: &lt;a href="https://insights.stackoverflow.com/survey/2017#technology"&gt;Stack Overflow Developer Survey, 2017&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We started this story with Bachman’s introduction of the first database management system, IDS. We talked about how the shift to disk storage and high-level programming necessitated a new way of working with data. Then CODASYL came and standardized database management. IDS and CODASYL introduced the new network data model, and finally Codd dropped the relational model.&lt;/p&gt;

&lt;p&gt;This happened over &lt;em&gt;eight years&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;How did SQL manage to stick around for the next 50 years? I think these are the four main reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Built on first principles&lt;/li&gt;
&lt;li&gt;Bushnell’s law&lt;/li&gt;
&lt;li&gt;Listening and adapting&lt;/li&gt;
&lt;li&gt;Adoption of APIs&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Built on First Principles
&lt;/h3&gt;

&lt;p&gt;A first principle is a foundational proposition that can not be deduced from any other proposition or assumption. For instance, combining hydrocarbons with oxygen to create a chemical reaction. And this is still the principle that powers the internal combustion engines in every cars.&lt;/p&gt;

&lt;p&gt;In 1970, Codd created a new first principle for databases: tuple relational calculus. This new logic created led to the relational model, which then led to SQL. Tuple relational calculus is the chemical reaction, relational models are the internal combustion engines, and SQL is the car.&lt;/p&gt;

&lt;p&gt;In a later post, we’ll discuss new technologies that are trying to create the electric engine for databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bushnell’s Law
&lt;/h3&gt;

&lt;p&gt;Building on first principles alone can not guarantee success. Assembly is as close as programmers can get to typing 1s and 0s, but it was still replaced with COBOL (and later C).&lt;/p&gt;

&lt;p&gt;The missing ingredient was usability.&lt;/p&gt;

&lt;p&gt;We saw the same story play out with the switch from the network to relational model. The network model had faster performance, but every company today uses relational databases because of its simplicity (&lt;em&gt;to get started&lt;/em&gt;).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The best games are easy to learn but hard to master" - Nolan Bushnell, founder of Atari&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Nolan Bushnell knew the secret to getting people to use a new product. Assembly unfortunately was both difficult to learn and difficult to master.&lt;/p&gt;

&lt;p&gt;SQL found the perfect balance. With ~10 SQL commands, anyone can learn the 20% that will get you 80% of the way there.  But there is a long path of indexing, views, and optimization to becoming a master.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listening and Adapting
&lt;/h3&gt;

&lt;p&gt;Query languages are not timeless monoliths but adaptive groups of standards that change over time. The SQL standard has continued to adapt over time and incorporate feedback from its users.&lt;/p&gt;

&lt;p&gt;Since the original conception we have seen 10 different SQL standards all with important updates. Here were 3 big ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQL:1999:&lt;/strong&gt; Added regular expression matching, recursive queries (e.g. transitive closure), triggers, support for procedural and control-of-flow statements, non-scalar types (arrays), and some object-oriented features (e.g. structured types). Support for embedding SQL in Java (SQL/OLB) and vice versa (SQL/JRT).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL:2003:&lt;/strong&gt; Introduced XML-related features (SQL/XML), window functions, standardized sequences, and columns with auto-generated values (including identity-columns).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL:2016:&lt;/strong&gt; Adds row pattern matching, polymorphic table functions, JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SQL also demonstrates the power of creating rails that other products can build on. Instead of enforcing a syntax, SQL provides a standard for each database to create their own implementation (T-SQL, MySQL, PSQL, etc).&lt;/p&gt;

&lt;h3&gt;
  
  
  Adoption of APIs
&lt;/h3&gt;

&lt;p&gt;The final secret behind SQL’s success has been the rise of application programming interfaces (API). APIs simplify programming by abstracting the underlying implementation and only exposing objects or actions the developer needs.&lt;/p&gt;

&lt;p&gt;APIs allow SQL to continue to adapt to new technologies with specialized syntax. In 2006 Hadoop introduced the distributed file system (HDFS), which was initially inaccessible to the SQL syntax. Then in 2013 Apache created Apache Impala, which allowed developers to use SQL to query HDFS databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fascinating Story of SQL
&lt;/h2&gt;

&lt;p&gt;SQL is one of the most ubiquitous programming languages today, but we often forget how long of a history it has had. Its journey started at the dawn of modern computing and was brought to life by 2 Turing award recipients.&lt;/p&gt;

&lt;p&gt;I've shared my thoughts on why SQL has been able to maintain its dominance: first principles, Bushnell’s law, adapting, and APIs. Leave a comment and let me know what you think were the major factors that contributed to SQL’s success.&lt;/p&gt;

&lt;p&gt;There is one more technology to talk about that hasn’t changed in 50 years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL editors&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Working with databases is becoming an increasingly collaborative process as more people learn SQL. Developers today may be seen working with individuals on a marketing team to analyze user data, or debugging queries with a data scientist.&lt;/p&gt;

&lt;p&gt;Arctype has built a collaborative SQL editor that allows you to easily share databases, queries, and dashboards with anyone. Join our growing community and &lt;a href="http://www.arctype.com/downloads"&gt;try out Arctype today&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>discuss</category>
      <category>programming</category>
    </item>
    <item>
      <title>Becoming a SQL Guru with Recursive CTEs</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Wed, 30 Dec 2020 20:08:22 +0000</pubDate>
      <link>https://forem.com/arctype/becoming-a-sql-guru-with-recursive-ctes-5f27</link>
      <guid>https://forem.com/arctype/becoming-a-sql-guru-with-recursive-ctes-5f27</guid>
      <description>&lt;p&gt;&lt;em&gt;Arctype writer: &lt;a href="http://dliff.com/" rel="noopener noreferrer"&gt;Daniel Lifflander&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Intro to CTEs&lt;/li&gt;
&lt;li&gt;Organizing Complex Queries&lt;/li&gt;
&lt;li&gt;Unlocking the Power of CTEs with Recursion&lt;/li&gt;
&lt;li&gt;Recursive CTEs in Action&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Intro to CTEs &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;It's easy to get lost in a maze of subqueries, derived tables, and temporary tables when working with complex SQL queries. Postgres' common table expressions, or CTEs, are a lesser known feature, but they can be a useful structuring tool to craft a SQL query in a more readable and friendly way.&lt;/p&gt;

&lt;p&gt;CTEs begin as a &lt;code&gt;with&lt;/code&gt; statement and allow you to execute a query whose results will be available later to subsequent queries. If you have a query that relies on the results of another query, a reference of sorts, the CTE will allow Postgres to materialize the results of the reference query, ensuring that the query is only run once and its results are readily available to other queries farther down the line. (Note that this can be a double-edged sword, as it prevents the Postgres query planner from running optimization outside of the reference query’s SQL.)&lt;/p&gt;

&lt;p&gt;The sequential list of "code blocks" created by CTEs adds readability to SQL scripts because every block is dependent on the ones above it. In contrast, it can be difficult to trace dependencies between derived tables and parse through nested subqueries (especially as their depth increases).&lt;/p&gt;

&lt;h2&gt;
  
  
  Organizing Complex Queries &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Derived Table Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-11.59.46-AM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-11.59.46-AM.png" alt="Derived Table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temporary Table Example&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;This will require two separate queries.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.00.53-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.00.53-PM.png" alt="Temporary Table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CTE Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.01.23-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.01.23-PM.png" alt="CTE"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that the query above depends on a, which is listed first with the CTE, but as a derived table it is listed inside the query. CTEs allow you to logically order queries that are dependent on others. As queries get more complex, &lt;strong&gt;organization&lt;/strong&gt; becomes key to reducing errors. CTEs allow you to organize code into distinct blocks with a clear dependency hierarchy. &lt;/p&gt;

&lt;h2&gt;
  
  
  Unlocking the Power of CTEs with Recursion &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;You can also make CTEs more than syntactic sugar by adding the &lt;code&gt;recursive&lt;/code&gt; keyword! Once you add the recursive keyword, Postgres will now allow you to reference your CTE from within itself, allowing you to “generate” rows based on recursion.   &lt;/p&gt;

&lt;p&gt;Here’s an example using Postgres CTEs to implement the Fibonacci sequence:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.02.00-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.02.00-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running this, we’d get the first 10:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.02.28-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.02.28-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's probably not too often we find ourselves in a situation where we need Fibonacci numbers in SQL. Let’s continue on and take a look at a real problem I solved using this Postgres feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recursive CTEs in Action &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;In the manufacturing industry, a bill of materials (or BoM) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an &lt;a href="https://en.wikipedia.org/wiki/Product_(business)" rel="noopener noreferrer"&gt;end product&lt;/a&gt; … BOMs are of &lt;a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph" rel="noopener noreferrer"&gt;hierarchical nature&lt;/a&gt;, with the top level representing the finished product which may be a sub-assembly or a completed item. BOMs that describe the sub-assemblies are referred to as modular BOMs&lt;/p&gt;

&lt;p&gt;More simply, a BoM is a list of parts that a product is assembled from. Each part may consist of other parts, creating hierarchy. A car is a final product that is composed of many parts. A door is an example of a part of a car, its subparts being among a window, handle, various switches, and so forth. Recursive CTEs are great for hierarchical data! Let’s take a look at how we could model this hierarchy and use CTEs to generate a modular BoM sheet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;part table&lt;/strong&gt; stores our parts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.03.50-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.03.50-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;part_hierarchy&lt;/strong&gt; table stores a simple parent to child reference between parts, as well as a quantity if the part happens to need more than 1 of the child part for assembly. The check constraints prevent a part from referencing itself circularly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.04.24-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.04.24-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's insert some parts&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.04.46-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.04.46-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And some relations, approximately resembling the car example mentioned above&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.05.06-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.05.06-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our database we now have 5 parts and some relationships between them. If we wanted to see how many direct subassemblies a door has, we could query something like this, though it would only show us direct descendants:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.05.44-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.05.44-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, let's now construct a query to list &lt;em&gt;all&lt;/em&gt; of the parts and their subassemblies of our car.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.06.28-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.06.28-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The union all is the key part to this CTE. The first part of this UNION selects the base part, the second half selects any child parts by joining to the part_hierarchy table and calling itself (inducing recursion), then combines those results with the parent via the union. Without recursion, you'd have to add a join for every level of the hierarchy you needed to support.&lt;/p&gt;

&lt;p&gt;The level value is introduced here to track how deep the recursion has run. It is hard coded at 0 to represent the parent. As the query "unwraps" and it calls itself, this increases by 1 to indicate how many generations far from the parent part the child part is.   &lt;/p&gt;

&lt;p&gt;The very last part that selects from the CTE pads the name with spaces based on the level, which is a quick and dirty way to visualize the hierarchy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.06.52-PM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.arctype.com%2Fcontent%2Fimages%2F2020%2F12%2FScreen-Shot-2020-12-30-at-12.06.52-PM.png" alt="A journey down recursive CTEs."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From here, you could use aggregate functions and rollup to calculate the costs of specific assemblies.   &lt;/p&gt;

&lt;p&gt;This example is definitely simplified from what I ended up using to actually solve this problem. Supporting part versioning and branching and other business requirements quickly complicates what this solution would look like in the real world. Recursive CTEs in SQL are not the only way to store and query hierarchical data, but for certain data types, sizes, and speed requirements, they are a quick and convenient feature to have in your wheelhouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Arctype
&lt;/h3&gt;

&lt;p&gt;Now you're one step closer to becoming a SQL guru, but don't let an outdated SQL editor hold you back. Check out &lt;a href="//www.arctype.com"&gt;Arctype&lt;/a&gt; today and experience the modern SQL editor. &lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Working with geospatial data in Postgres.</title>
      <dc:creator>Derek Xiao</dc:creator>
      <pubDate>Mon, 28 Dec 2020 19:17:12 +0000</pubDate>
      <link>https://forem.com/arctype/working-with-geospatial-data-in-postgres-1am5</link>
      <guid>https://forem.com/arctype/working-with-geospatial-data-in-postgres-1am5</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffzlksrmjiw8jre1adtct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffzlksrmjiw8jre1adtct.png" alt="Working with geospatial data in Postgres"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL has several extensions that allow spatial and geometry data to be treated as first-class objects within a PostgreSQL database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Working with geospatial data&lt;/li&gt;
&lt;li&gt;Introduction to PostGIS&lt;/li&gt;
&lt;li&gt;Related extensions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Working with geospatial data &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;There are a variety of scenarios in which you may want to work with geospatial data in Postgres for your application, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Working with census data&lt;/li&gt;
&lt;li&gt;Storing addresses&lt;/li&gt;
&lt;li&gt;Calculating the distance between two paths&lt;/li&gt;
&lt;li&gt;Storing PointCloud data of the physical world&lt;/li&gt;
&lt;li&gt;Tracking shipping data&lt;/li&gt;
&lt;li&gt;Tracking cars and delivery vehicles&lt;/li&gt;
&lt;li&gt;Visualization of raster data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PostgreSQL offers &lt;a href="https://www.postgresql.org/download/products/6-postgresql-extensions/" rel="noopener noreferrer"&gt;extensions&lt;/a&gt; for working with geospatial data that allow you to treat that data as first-class objects in your database. Treating data as objects allows developers to create more powerful applications that can be built on top of data about the objects and relationships between them in the physical world.&lt;/p&gt;

&lt;h2&gt;
  
  
  PostGIS &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;The primary spatial-data extension is PostGIS. &lt;a href="https://postgis.net/" rel="noopener noreferrer"&gt;PostGIS&lt;/a&gt; (Geographic Information Systems) is an open-source extension of the PostgreSQL database that lets you work with geographic objects that integrate directly with your database. With PostGIS, geographic and spatial data can be treated as first-class objects in your database.   &lt;/p&gt;

&lt;p&gt;By adding the PostGIS extension to your PostgreSQL database, you can work seamlessly with geospatial data without having to convert that data from the format that the rest of your application is working with to use with your database. You can also determine relationships between that spatial data with the extension, such as the distance between two objects in your database. You can also use PostGIS to render visualizations of this data.   &lt;/p&gt;

&lt;p&gt;Working with data such as cities and geometry data is as simple as something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;SELECT&lt;/strong&gt; superhero.name&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FROM&lt;/strong&gt; city, superhero&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHERE&lt;/strong&gt; ST_Contains(city.geom, superhero.geom)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AND&lt;/strong&gt; city.name = 'Gotham';&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;PostGIS includes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spatial Types&lt;/li&gt;
&lt;li&gt;Point&lt;/li&gt;
&lt;li&gt;Line&lt;/li&gt;
&lt;li&gt;Polygon&lt;/li&gt;
&lt;li&gt;Etc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hierarchy of these spatial-focused type (from &lt;a href="https://postgis.net/workshops/postgis-intro/introduction.html" rel="noopener noreferrer"&gt;Introduction to PostGIS&lt;/a&gt;) is below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2FwufnED6YRf_o4EpPb0igYM6jOt4gltS65BzTOhCNU4Ndi1-O_hLy_4FrWFYt8BhXdchWdZUamKcdixycdwRtJ0f2tvEgFNlsNLaDtHRAudaQtpjRyqM1r4ZU0epX1n4JXU8BbiWx" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2FwufnED6YRf_o4EpPb0igYM6jOt4gltS65BzTOhCNU4Ndi1-O_hLy_4FrWFYt8BhXdchWdZUamKcdixycdwRtJ0f2tvEgFNlsNLaDtHRAudaQtpjRyqM1r4ZU0epX1n4JXU8BbiWx" alt="Working with geospatial data in Postgres."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spatial-Indexing&lt;/li&gt;
&lt;li&gt;Efficiently index spatial relationships&lt;/li&gt;
&lt;li&gt;Spatial-Functions&lt;/li&gt;
&lt;li&gt;For querying spatial properties, and the relationships between them&lt;/li&gt;
&lt;li&gt;Functions for analyzing geometric components, determining spatial relationships, and manipulating geometries
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In most databases, data is stored in rows and columns. With PostGIS, you can actually store data in a geometry column. This column stores data in a spatial coordinate system that’s defined by an SRID (&lt;a href="https://spatialreference.org/" rel="noopener noreferrer"&gt;Spatial Reference Identifier&lt;/a&gt;). This allows your database structure to reflect the spatial data that’s stored in the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Extensions &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;There are other PostgreSQL extensions related to PostGIS for working with spatial data, too:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="http://pgrouting.org/" rel="noopener noreferrer"&gt;pgRouting&lt;/a&gt; - an extension of PostGIS itself; pgRouting enables geospatial routing information such as:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Shortest distance&lt;br&gt;&lt;br&gt;
 Driving distance&lt;br&gt;&lt;br&gt;
 Traveling salesman&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/pramsey/pgsql-ogr-fdw" rel="noopener noreferrer"&gt;ogrfdw&lt;/a&gt; - a data wrapper for reading other spatial and non-spatial datasources as tables in PostgreSQL&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pgpointcloud/pointcloud" rel="noopener noreferrer"&gt;pgpointcloud&lt;/a&gt; A PostgreSQL extension and loader for storing Point Cloud data in PostgreSQL.&lt;/li&gt;
&lt;li&gt;PointCloud data about the physical environment that is gathered using 3D cameras, and used in application areas such as AR, VR, and robotics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PostgreSQL’s extensions for working with geospatial data allow you to work with data as first-class objects in your database. Check out &lt;a href="https://arctype.com" rel="noopener noreferrer"&gt;Arctype&lt;/a&gt; to discover the modern SQL editor for working with databases. &lt;/p&gt;

</description>
      <category>sql</category>
      <category>postgres</category>
      <category>database</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
