Forem: Derek Xiao

A Powerful Addition to Your Postgres Toolbelt: Table Inheritance

Derek Xiao — Fri, 26 Feb 2021 18:41:50 +0000

Table inheritance is a less commonly used Postgres feature, but it has the power to save time with both data retrieval and database management.

In this article, I’ll cover how inheritance works in Postgres and provide some examples of when to use inheritance.

To follow along with the examples in this article, try Arctype's free SQL editor to quickly connect to a Postgres database:

Try Arctype today

What is Table Inheritance in Postgres?

Inheritance is one of the main principles of object-oriented programming. It is a process for deriving one object from another object so that they have shared properties.

Inheritance in PostgreSQL allows you to create a child table based on another table, and the child table will include all of the columns in the parent table.

Let's take a database that's used to store blueprints for different types of homes.

There are some things that we know every home will have such as: bedrooms, bathrooms, and kitchens. We can create a parent table to store these shared attributes.

Now let's say we wanted to add a blueprint for a home with a patio. This new blueprint is identical to our existing one, but with a new room. Instead of recreating the entire blueprint, we can create a new "child" table that inherits the parent table.

We now have a copy of the main parent blueprint with a new "patio" item, without creating a duplicated blueprint.

Why should I use inheritance?

The two main benefits are:

More performant queries
Easier database management

More performant queries

Inheritance splits data into smaller tables that inherit some of the parent's fields. This in effect partitions the data, improving the speed to retrieve data.

Imagine you are fetching data that is BETWEEN two dates. There is a parent table called year_sales and inherited tables with data for each month.

A command to retrieve all sales between 2020-10-1 and 2020-10-15 would only scan the table for the month of October .

Inherited tables also creates more manageable indexes. Each individual table contains less data, which speeds up search both with and without an index.

Easier database management

Making future schema changes is easier because you only have to make one change to the parent table and then it's propagated to each inherited table. This saves time and lessens the chances of accidental divergences.

Running maintenance commands such as a full vacuum or re-index of inherited tables will also happen without blocking other data.

Example 1: Using inheritance to store table statistics by month

One of the most popular use cases for table inheritance is storing information divided by months. This gives the benefit of partitioning your data for faster queries.

I've used this solution to architect schemas for situations including:

1. Process execution audits

Inherited tables can be used to track data that is continuously loaded/unloaded into the system, user requests and computational processes, and other important information for monitoring the system's health.

2. User actions auditing for critical modules in the application

You can create an audit system to track who changed the data in the system and when did they do it. If the system has many users, then there is a lot of data. So, to speed up data access, it is more performant to use table inheritance.

Let's dive into an example.

First, create schema “example1”:

CREATE SCHEMA example1
    AUTHORIZATION postgres;

Then create a parent logging table:

CREATE TABLE example1.logging
(
    id integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    event_name character varying NOT NULL,
    start_time timestamp(6) without time zone NOT NULL,
    end_time timestamp(6) without time zone NOT NULL,
    CONSTRAINT february_log_pkey PRIMARY KEY (id, start_time, end_time)
)
TABLESPACE pg_default;

Create a child logging table for a specific month and year, which inherits fields from the parent table:

CREATE TABLE example1.january_log_2021
(
    CONSTRAINT start_time CHECK (start_time BETWEEN '2021-01-01' and '2021-01-31')
)
    INHERITS (example1.logging)
TABLESPACE pg_default;

The code contains a check constraint on the “start_time” column with the CHECK command. This keeps the date and time within January.

Fill in the logging table for January 2021:

INSERT INTO example1.january_log_2021(id,
    event_name, start_time, end_time)
    VALUES 
           (1, 'Log in', '2021-01-11 03:26:11', '2021-01-11 03:26:13'),
           (2, 'Log out', '2021-01-03 12:11:17', '2021-01-03 12:11:18'),
           (3, 'Upload file xml', '2021-01-06 16:14:28', '2021-01-06 16:14:59'),
           (4, 'Delete data', '2021-01-05 23:01:55', '2021-01-05 23:01:58');

We can check if the data was successfully inserted with a SELECT command:

SELECT * FROM example1.january_log_2021;

The data in each child table will also automatically be added to the parent table.

Example #2: Using inheritance to track the movement of ships

Another instance when I used inheritance was storing information about ships and their movements based on geolocation.

Each ship had both common and unique values. Because of this quality, I decided design the schema using inheritance, and I created a separate table for each ship based on the parent table.

Using an inheritance model gave us the following benefits:

When the application built the map of the ship's movement, it only had to reference the individual ship's table, increasing the speed of report building and monitoring.
Adding new specific fields for certain ships types did not require changing all tables.
We could retrieve baseline data for all ships using a single request.

Here's how we did it.

First create a new schema called “test”:

CREATE SCHEMA test
    AUTHORIZATION postgres;

Create a database with ships:

CREATE TABLE test.ship
(
    AIS_name text NOT NULL,
    type text NOT NULL,
    flag text NOT NULL,
    IMO character varying NOT NULL,
    MMSI character varying NOT NULL,
    callsign character varying NOT NULL,
    year_built character varying NOT NULL,
    length character varying NOT NULL,
    width character varying NOT NULL,
    draught character varying NOT NULL,
    speed character varying NOT NULL,
    AIS_class character varying NOT NULL,
    cargo character varying,
    CONSTRAINT ship_pkey PRIMARY KEY (MMSI)
)

TABLESPACE pg_default;

Create the table "sail_ship", which is a child to table “ship”:

CREATE TABLE test.sail_ship
(
    -- Inherited from table test.ship: AIS_name text NOT NULL,
    -- Inherited from table test.ship: type text NOT NULL,
    -- Inherited from table test.ship: flag text NOT NULL,
    -- Inherited from table test.ship: IMO character varying NOT NULL,
    -- Inherited from table test.ship: MMSI character varying NOT NULL,
    -- Inherited from table test.ship: callsign character varying NOT NULL,
    -- Inherited from table test.ship: year_built character varying NOT NULL,
    -- Inherited from table test.ship: length character varying NOT NULL,
    -- Inherited from table test.ship: width character varying NOT NULL,
    -- Inherited from table test.ship: draught character varying NOT NULL,
    -- Inherited from table test.ship: speed character varying NOT NULL,
    -- Inherited from table test.ship: AIS_class character varying NOT NULL,
    -- Inherited from table test.ship: cargo character varying,
    id_sail integer NOT NULL,
    course text NOT NULL,
    navigation_status text NOT NULL,
    CONSTRAINT sail_ship_pkey PRIMARY KEY (id_sail)
)
    INHERITS (test.ship)
TABLESPACE pg_default;

Now fill in table "sail_ship":

INSERT INTO test.sail_ship(AIS_name, type, flag, IMO, MMSI, callsign, year_built, length, width, draught, speed, AIS_class, cargo, id_sail, course, navigation_status)
    VALUES ('A P T JAMES', 'Ferry', 'Trinidad and Tobado', '9877717', '362254000', '9YNM', '2020', '94 m', '26 m', '2.9 m/', '13.1 kn/20.2 kn', '-', '-', 1, '-', '-'),
           ('MOZART', 'Container ship', 'Liberia', '9337274', '636018378', 'A8MA9', '2007', '222 m', '30 m', '10.4 m /', '12.9 kn / 23.6 kn', '-', 'Containers', 2, '-', '-'),
           ('ALIANCA SKY', ' Bulk carrier', 'Liberia', '9128441', '636014513', 'A8UK6', '1997', '186 m', '30 m', '8.8 m /', '10.2 kn / 17.2 kn', '-', 'Agricultural Commodities', 3, '-', '-'),
           ('XXX7', 'Ship', 'China', '-', '412444890', 'BVMY5', '-', '-', '-', '-/', '60.5 kn / 66.1 kn', '-', '-', 4, '-', '-');

The table “sail_ship” inherits all the columns of its parent table, “ship".

Let's also add some data to the parent table and see what happens:

INSERT INTO test.ship(AIS_name, type, flag, IMO, MMSI, callsign, year_built, length, width, draught, speed, AIS_class, cargo)
    VALUES ('A P T JAMES', 'Ferry', 'Trinidad and Tobado', '9877717', '362254000', '9YNM', '2020', '94 m', '26 m', '2.9 m/', '13.1 kn/20.2 kn', '-', '-');

As you can see in the picture, the field has been added, but the primary key is repeated, and so it is no longer unique

This is one of the caveats with inheritance in Postgres. No errors were found when adding to the child table, even though it violated the unique primary key.

We can remove these duplicates from child tables, by retrieving data from the “ship” table using the ONLY operator.

Here, the ONLY keyword indicates that the query should only be applied to the “ship” table and not to tables below ship in the inheritance hierarchy.

SELECT AIS_name, type, flag, year_built FROM ONLY test.ship;

Caveats to be aware of with PostgreSQL inheritance

You can’t apply RENAME with ALTER TABLE commands to the child tables;
The uniqueness of primary keys and foreign keys is not inherited. The inheritance mechanism is not capable of automatically distributing data from INSERT or COPY commands across tables in an inheritance hierarchy. INSERT inserts only into the specified table and no other;
The user must have access rights to both the parent table and the child table;
Columns will have to be dropped manually. If using the DROP COLUMN command to the parent table under the condition of cascading deletion, it cannot affect the child table.

Conclusion

In this article we covered:

How inheritance in Postgres works
Why you should use inheritance in your applications
Two examples for how inheritance is used in real applications

If you are looking for a SQL editor that makes working with databases even easier, try Arctype today for free:

Try Arctype today

Add Your Database to Your Spring Cleaning List

Derek Xiao — Tue, 16 Feb 2021 21:50:00 +0000

Every time you delete or update a row in your database, the old records are secretly still hiding in the background and taking up space on your hard drive.

A VACUUM process is like emptying the recycling bin on your laptop. It clears up space, reduces indexing time, and keeps your database squeaky clean.

But it's important to understand how VACUUM works so you can avoid the equivalent of vacuuming your house in the middle of a dinner party.

By understanding how and when Postgres and other databases clean themselves, you will be able to tune your database for low response times and your database server for the right amount of storage.

In this post we will cover:

What a VACUUM process is and how it works
Full vs Auto VACUUM
How to modify and inspect this process

How Postgres Executes SQL Statements - DEAD vs Removed Tuples

Updating a Postgres Table. SQL Editor: Arctype

During the life of a database, we might make thousands of changes to a database like above. But deleting a record does not actually free up the disk space.

Postgres uses multi-version concurrency control (also known as MVCC) to ensure concurrent data access and consistency.

Whenever we execute a SQL statement, it uses a snapshot of data instead of every row. This prevents users from viewing inconsistent data generated by concurrent transactions. It also minimizes lock contentions for different sessions trying to read or write data.

Each transaction gets a transaction ID – XID for a block of BEGIN and COMMIT statement. When a user inserts a row, but the transaction is not committed, other users cannot see the newly inserted row.

For example, in the below image, User A inserts two records into an empty table. If User B were to run a SELECT statement, they would get zero rows before the record committed. Similarly, another user still gets table data for a delete statement if the delete transaction statement is not committed.

MVCC data access example

When we execute a DELETE or UPDATE statement, Postgres does the following:

Delete command : Postgres does not remove the tuples physically; it marks the existing tuple as DEAD.
Update command: The update statement is similar to a DELETE plus an INSERT statement. Therefore, it marks the existing tuple as DEAD and inserts a new tuple.

If you have frequent DML( INSERT, UPDATE, DELETE) statements, these DEAD tuples can take unnecessary space. To free up space, we have to run the following maintenance processes:

Remove the dead tuples
Remove index tuples pointing to the dead tuples
Update the statistics

With this knowledge of how a SQL statement is executed and the maintenance requirements, we can discuss the VACUUM process.

Cleaning up dead tuples with a VACUUM in Postgres

Postgres uses the VACUUM maintenance process for removing DEAD tuples. It reclaims space occupied by dead tuples for reuse in the further data insertion.

The VACUUM process can run concurrently with other DML transactions because it does not put an exclusive lock on the table. It carries out following operations for removing dead tuples:

Postgres scans all pages of a target table and builds a list of dead tuples. It freezes the old tuples if required.
It removes the index tuples pointing to the dead tuples by referencing to the dead tuple list.
It updates the statistics as well as the system catalog for the target table after the VACCUM processing. It also updates the FSM (Free Space Map) and VM(Visibility Map).

Postgres VACUUM Example

I'll demonstrate the impact of VACUUM by creating an example table, deleting some values, and then running a VACUUM command. In this example I also use the pg_freespacecamp extension to monitor improvements in space utilization.

Create a table with a auto-generated series of data

create table SampleTable(id1 int, id2 int);

insert into
    SampleTable
values (
    generate_series(1,100000),
    generate_series(1,100000)
);

Measure free space usage using pg_freespacemap

create extension pg_freespacemap;

SELECT
  count(*) as npages,
  round(100 * avg(avail) / 8192, 2) as avg_fs_ratio
FROM
  pg_freespace(
    'SampleTable');

SQL Editor: Arctype

The table we created uses up 443 pages of space on the hard drive, and because all of the data was added sequentially, it has a perfect free space ratio of 0.

Now I'll delete every record with a value greater that 100. But if we re-run the command above, the number of pages and free space ratio don't change.

delete from SampleTable where id1 > 10;

To remove the unused pages, we have to run a VACUUM command.

VACUUM sampletable;

SQL Editor: Arctype

Now when we track the space usage we can see the number of used pages has gone down from 443 to 1! But the ratio of free to used space on this page has also increased from 0 to 49%. We can return this space to the operating system with a VACUUM FULL .

What does VACUUM FULL do?

Vacuum full diagram

The VACUUM process removes DEAD tuples for future usage, but it does not return the space to the operating system.

Therefore, if you perform a bulk data deletion or updates, you might be using too much storage due to space occupied by these DEAD tuples. The VACUUM FULL process returns the space to the operating system, as seen in the picture below. It does the following tasks.

VACUUM FULL process obtains an exclusive lock on the table.
It creates a new empty storage table file.
Copy the live tuples to the new table storage.
Removes the old table file and frees the storage.
It rebuilds all associated table indexes, updates the system catalogs and statistics.

Let's see how running the VACUUM FULL command impacts our previous sample table:

vacuum full SampleTable;

SELECT
  count(*) as npages,
  round(100 * avg(avail) / 8192, 2) as avg_fs_ratio
FROM
  pg_freespace(
    'SampleTable');

SQL Editor: Arctype

The free space ratio is now down from almost 50% to 0.

Postgres VACUUM Performance

VACUUM cleaning is costly because it needs to scan all pages of a target table. If you have a large table with million rows, it can be harmful to your database resources. To preserve resources, Postgres uses the Visibility Map. Each table in Postgres has a VM that determines whether the page in the table has dead tuples. If the page does not have a dead tuple, the vacuum processing skips the page.

For example, in the below image, we have a table with four pages. Two pages have of DEAD tuples. The visibility map uses a bitmap that defines dead tuples on a specific page.

Bit 0: No dead tuples on the page, therefore skip VACCUM processing.
Bit 1: Page consists of dead tuples; therefore, VACCUM that specific page.

Postgres visibility map to track dead tuples

Postgres Autovacuum Daemon

Postgres automates the VACCUM processing using auto vacuum daemon. By default, it runs every 1 minute. When the VACCUM wakes up, it invokes three workers. These workers do the VACCUM processing on the target tables.

You can query pg_settings to check various configurations for the autovacuum process in Postgres:

select name,
    setting,
    category,
    short_desc
from pg_settings 
where name like '%autovacuum%'

Postgres autovacuum configuration. SQL Editor: Arctype

How to modify autovacuum for a specific table in Postgres

Automatically cleaning and reindexing the database every minute might not be optimal if you have millions of rows. Therefore, we can configure table level. If you specify a table-level configuration, it bypasses the global setting.

For example, in the below query, we set autovacuum for SampleTable2 if it has more than 100 DEAD tuples.

ALTER TABLE SampleTable2 SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 100)

VACUUM vs VACUUM FULL

As we know, full vacuum process reclaims space to the operating system. However, FULL VACUUM requires an exclusive lock on the table for its processing, and it blocks all other transactions. In the below table, we can summaries the difference between these processes.

Vacuum vs Vacuum Full

Conclusion

In this article, we covered:

How Postgres implements delete and update statements
Using VACUUM to remove DEAD tuples
Using VACUUM FULL to return space back to the OS
Configuring auto vacuum for individual tables

These concepts are useful for reducing the server costs of databases while maintaining high availability. With the queries shown in the above article you can inspect the internals of your database and its memory consumption.

If you want a SQL editor with an intuitive interface and easy data visualizations, try Arctype today.

Data visualizations with Arctype

Create a Web App and Deploy to the Cloud in 20 minutes with Python

Derek Xiao — Wed, 10 Feb 2021 23:08:16 +0000

SQL Editor: Arctype

If you’ve been looking to deploy your first web app to the cloud, this is a great place to start!

In this tutorial, I'm going to show how to make the web app and database shown in the gif above, and how to deploy it to Heroku so it can be used by anyone.

The article is divided in 3 sections:

Creating a Flask app (web application for submitting the form)
Setting up a Postgres database with Python (store the data from the submitted forms)
Deploying the application to Heroku (hosting the application in the cloud so anyone can use it)

I provide a brief overview of each technology at the beginning of each section, so do not be deterred if you aren't familiar with some of these.

Technical Requirements

This guide is targeted towards beginner to intermediate programmers with some familiarity with programming and using the command line.

You will need the following to get started:

PostgreSQL: You need to download and install Postgres on your local computer.
Python 3.6 or newer: Python installers for different versions and OS are available for download here.
Heroku Account: You need to create a free Heroku account if you do not already have one. This is where we will deploy the flask app and connect it to a remote Postgres database.

Once you have all the above installed, we can start by setting up our development environment.

Creating a Flask App for a Registration Form

Flask app example

In this section, we're going to create the Flask app shown above.

I’ve already created an example Flask app that renders a simple registration form used to collect information from a user.

Flask is one of the most popular web frameworks written in Python. The flask application I made first makes a request to the endpoints defined in the app/routes.py file to retrieve the data that is displayed in the registration form.

It then renders the HTML pages contained in the Template folder using the Jinja Template library.

Instead of starting from scratch, let’s make a copy of the Flask app I created by cloning the Github repo.

Open a command line tool and run the following commands:

git clone https://github.com/ToluClassics/Web_Registration_Form.git
cd web_registration_form

If you ever get lost, you can view the completed project here: Flask-Postgres App

Next we’re going to create a virtual environment for this project and install the required dependencies. A virtual environment is an isolated environment for different Python projects. They are helpful to keep packages and dependencies separate between different projects.

Depending on your computer’s operating system, run the following commands:

Windows:
$ python3 -m venv env
$ source env/scripts/activate
(env) $ pip install -r requirements.txt


MacOS and UNIX:
$ python -m venv env
$ source env/bin/activate
(env) $ pip install -r requirements.txt

To test if our environment is properly set up, let’s try launching the application by entering flask run in the virtual environment.

(env) derekxiao@Dereks-MBP Web_Registration_Form % flask run
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Now you can enter the URL above in any browser and view the web form!

Adding input validations for the registration form

To get more familiar with the code, let’s add input validations to the form class.

We want our app to prevent users from filling out the form with the same username or email address multiple times.

With wtforms, you can create a custom validator that is automatically added to each field by creating a method with the name validate_<field_name>. In the app/form.py file, we will add two methods validate_email and validate_username. We will query the Registrations table (we’ll create this table later) username and email entered by the user. If it returns a value , we will raise a validation error.

Open the form.py file and first import Registrations from the models.py file (we'll add this file later).

from app.models import Registration

And in the RegistrationForm class add the following validation methods:

def validate_email(self, email):
    user = Registration.query.filter_by(email = email.data).first()
    if user is not None:
        raise ValidationError('Please use a different email address')

def validate_username(self, username):
    user = Registration.query.filter_by(username = username.data).first()
    if user is not None:
        raise ValidationError('Please use a different email address')

Now let's add in the models.py file so we can test the input validation.

Creating a PostgreSQL Database

First we need to create the database that models.py will connect to.

For this tutorial we'll be using a PostgreSQL database. PostgreSQL is an open source, object-relational database. And the best part, it’s completely free.

Let’s start by creating a new Postgres database to store the data from submitted registration forms. If you already have a Postgres database, feel free to jump to the next section to start connecting your Flask app to the database.

If you haven’t yet, first download and install Postgres on your computer.

Now open a new command line window and type psql. psql is a terminal-based interface to manage your Postgres databases.

Run the command create database [database name]; to create a new database.

Below is a list of popular psql commands to manage your database:

\l : list all available databases
dt : list all tables
\d table_name: describe a table, showing its columns, column type e.t.c
\du : list all users and their roles.
\?: get a list of all psql commands\q: to quit psql terminal

Alternatively, you can avoid memorizing the different psql commands and use Arctype's free SQL editor that provides a modern interface for managing both Postgres and MySQL databases:

Try Arctype Today

Connecting a Flask app to a Local Postgres Database

Next we'll create the database connection to our new Postgres database that will be used inside our models.py file. To get started, there are 3 packages to install:

flask-sqlalchemy: A python extension for managing databases
flask-migrate: For making updates to an existing database
Pyscopg2: A Postgresql database adapter for python

In the activated virtual environment terminal run (env) $ pip install flask-sqlalchemy flask-migrate pyscopg2 to install the packages listed above. If you hit an error installing pyscopg2, you may need to install psycopg2-binary instead.

The following sections I'll show how to use each package to configure and integrate our Postgres database with our Flask application. The end result will be a Flask app that is able to add new entries to a Postgres table.

Manage Postgres databases from Python with SQLAlchemy

SQLAlchemy is a library that provides a way to seamlessly integrate python programs and databases. We will use SQLAlchemy as an Object Relational Mapper to convert python classes to tables in our Postgres database.

Flask-SQLAlchemy automatically translates the created models to the syntax of any database (Postgres in our case).

To interact with our database from the application, We need to add the following configuration variables to the config file:

SQLALCHEMY_DATABASE_URI: this is the connection string that tells our Flask app which database to connect to. The usual form of the postgres database connection string is postgresql://localhost/[YOUR_DATABASE_NAME]. If your Postgres user has a password, you will need to use a different URI variant.
SQLALCHEMY_TRACK_MODIFICATIONS: We set this to False so we do not get a warning from flask every time we make a change to the application.

To make the above changes, open the config.py file and add the following variables to the Config class:

SQLALCHEMY_DATABASE_URI = os.environ.get('SQLALCHEMY_DATABASE_URI') or \
    'postgresql://localhost/[YOUR_DATABASE_NAME]'

SQLALCHEMY_TRACK_MODIFICATIONS = False

In the next session, we'll use the flask-migrate package to seamlessly handle changes to our database structure.

Setting up Flask-Migrate to Manage Database Structures

As we scale our application, we may need to make structural changes to our tables without losing all the data already in the database. Flask-Migrate is a python extension that handles SQLAlchemy database migrations automatically using Alembic. It also supports multiple database migrations and other functionalities.

Now, we will add a database object that represents our database and also create an instance of the migration class to handle database migrations.

Open the app/ __init__.py and first import the flask-migrate and SQLAlchemy packages:

from flask_migrate import Migrate
from flask_sqlalchemy import SQLAlchemy

Then add the following variables:

db = SQLAlchemy(app) ## Create DB object
migrate = Migrate(app,db) ## Create migration object

Set up the PostgreSQL database model

Now we're finally ready to create the models.py file to set up our database model. Database models are used to represent tables and their structures in the database; they determine the schema of the table.

models.py will define the structure of our tables and other necessary table information. We need to create a model (Class) whose attributes are the same as the data fields we intend to store in the database tables.

Inside the app directory, create a new file called models.py and first import the db instance that we created in the previous section: from app import db.

Then create a class with the attributes for each data field in the registration form:

class Registration(db.Model):
    id = db.Column(db.Integer, primary_key = True)
    first_name = db.Column(db.String(64))
    last_name = db.Column(db.String(64))
    username = db.Column(db.String(64))
    company = db.Column(db.String(64))
    contact_no = db.Column(db.String(64))
    email = db.Column(db.String(120), index=True, unique=True)

    def __repr__ (self):
        return '<User {}>'.format(self.username)

The last step is importing the model in the app\ __init__.py file: from app import models.

If you are working with your own database, we've created a free tool to design database schemas.

Creating a new Registrations table in Postgres using Flask

Next, we will instantiate the database from the command line by running flask db init in the virtual environment.

First ensure that your FLASK_APP environment variable is set before you run this command. You can run the following commands to check:

(env) $ python                   
Python 3.8.2 (default, Dec 21 2020, 15:06:04) 
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.environ['FLASK_APP'])
app.py
>>> quit()

After you run flask db init you should see the following output:

(env) $ flask db init
    Creating directory /Users/mac/Desktop/Web_Registration_Form/migrations ... done
  Creating directory /Users/mac/Desktop/Web_Registration_Form/migrations/versions ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/script.py.mako ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/env.py ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/README ... done
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/alembic.ini ... done
  Please edit configuration/connection/logging settings in
  '/Users/mac/Desktop/Web_Registration_Form/migrations/alembic.ini' before proceeding.

From the command logs, we can see that a migrations folder is created in our application. This folder contains files that are needed to migrate tables and update table schemas on our database.

Now that we have a migrations repository, it’s time to migrate the Registration table to the database using the flask-migrate extension.

Ensure that your Postgres server is up, then run the flask db migrate command to automatically migrate our SQLAlchemy database to the Postgres database that we assigned earlier with the URI in the config file.

(env) $ flask db migrate -m "registrations table"
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.autogenerate.compare] Detected added table 'registration'
INFO [alembic.autogenerate.compare] Detected added index 'ix_registration_email' on '['email']'
  Generating /Users/mac/Desktop/Web_Registration_Form/migrations/versions/eccfa5d8a3f6_registrations_table.py ... done

flask db migrate does not make any changes to the database, it only generates the migration script. To make the changes, we use the flask db upgrade command. You can also revert the changes by using flask db downgrade.

(env) $ flask db upgrade
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> eccfa5d8a3f6, registrations table

Now we can open Arctype and check if the table was successfully migrated to the local Postgres database

Get Started with Arctype for Free

Inserting Data From a Flask App in Postgres

In this section, we'll update our route function to add the data collected from the webpage to the website.

First, we will update our imports by including our database instance and table model in the app\routes.py file:

from app.models import Registration
from app import db
from flask import redirect, url_for

In the route function, we will create an instance of the Registration class and update it with information collected from the form. We then proceed to add and commit the instance to the database.

Add the following if statement inside the index() function in routes.py:

if form.validate_on_submit(): 
        reg_form = Registration(username=form.username.data,email=form.email.data, first_name=form.first_name.data,last_name=form.last_name.data, company=form.company.data,contact_no=form.contact_no.data)
        db.session.add(reg_form)
        db.session.commit()
        return redirect(url_for('index'))

Testing if the Flask App is Inserting Data in Postgres

Let's test our application locally one more time before deploying to Heroku:

(env) $ flask run
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Navigate to http://127.0.0.1:5000/ and submit the form.

After submitting the form, you can check that a new entry was added to your database by checking the Registrations table in Arctype:

Try Arctype Today

Deploying the Flask app and Postgres Database to Heroku

Now that we have tested our application locally, it's time to deploy the application and our Postgres database to Heroku so we can access it from anywhere.

The first step is downloading and installing the Heroku CLI.

Then we can log into Heroku from the command line with: Heroku login

Once we're logged in, we can create a new app on Heroku with by running: Heroku create [app-name]. Heroku apps are in a global namespace, therefore you need to choose a unique name for your application. If you do not specify a name for your application, Heroku will generate a random name for your application.

(env) $ Heroku create arctype-registration-app
Creating ⬢ arctype-registration-app... done
https://arctype-registration-app.herokuapp.com/ | https://git.heroku.com/arctype-registration-app.git

To proceed with development, we need to update our requirements.txt file and create a Procfile in our root directory. The Procfile is used to declare the commands run by the app on Heroku.

Next, we need to install gunicorn which is a python server needed to run the application on Heroku. In the activated python environment, run pip install gunicorn to install the gunicorn package.

Run pip freeze > requirements.txt to update the requirements file. The procfile defines processes based using the following format <process type>: <command>. Since our application contains a web server (gunicorn), the process type is web and the command will launch the gunicorn server. Therefore, we need to run the command below on the terminal to create the Procfile:

echo web: gunicorn app:app > Procfile

Heroku has differentservice plans for its postgresql offering. Choosing a plan for your application depends on the characteristics of your application and its bandwidth for service downtimes. In this tutorial we will use the hobby-dev plan, which is free and serves the purpose we need it for.

Now, we will create a hobby-dev postgres database for our application using the heroku addons:create heroku-postgresql:hobby-dev --app app-name command:

(env) $ heroku addons:create heroku-postgresql:hobby-dev --app arctype-registration-app
Creating heroku-postgresql:hobby-dev on ⬢ arctype-registration-app... free
Database has been created and is available
 ! This database is empty. If upgrading, you can transfer
 ! data from another database with pg:copy
Created postgresql-slippery-96960 as DATABASE_URL
Use heroku addons:docs heroku-postgresql to view documentation

More information on working with postgres databases can be found here.

Now that we have created a remote postgres database for our application on Heroku, we need to update the SQLALCHEMY_DATABASE_URI variable in the config.py file with the new database URI. To retrieve the remote database URI, we use the heroku config --app app-name command.

Once the URI is updated, the final step is deploying our application to Heroku's servers.

First, we will commit all our files and push to the heroku master branch. Run git add . and git commit -m “heroku commit”, then git push heroku main to deploy to Heroku.

Your app is now live at https://your-app-name.herokuapp.com!

Migrating Tables from a Local Postgres Database to Heroku

The last step is migrating the Registrations table that we created in our local Postgres database to the new Heroku database instance. We can do this by running db create_all() from the heroku python terminal:

(env) $ heroku run python
Running python on ⬢ arctype-registration-app... up, run.9023 (Free)
Python 3.6.12 (default, Sep 29 2020, 17:50:28) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from app import db
>>> db.create_all()

To check if the Registrations table was successfully created, connect to the Heroku database from Arctype and you should see the Registrations table.

Congratulations!

You are now the proud owner of a web app that anyone with internet access can use, from anywhere in the world. Lets zoom out and review what we covered:

How a Flask app works
Creating a Postgres database
Creating a database model in Python using SQLAlchemy
Migrating a database in Python using flask-migrate
Inserting data into a Postgres database from a Flask app
Deploying a Flask app and Postgres database to Heroku

Postgres is a powerful database that is used by large tech companies across the world. Arctype is a free SQL editor that makes working with databases easier. Try Arctype for free today.

Free Tool to Design Database Schemas in <5 minutes

Derek Xiao — Mon, 08 Feb 2021 20:42:42 +0000

Designing a database is often one of the first steps when starting a new project.

Spending time on the database schema can help ensure a scalable application down the line.

After Arctype went remote, we created a collaborative ERD tool in Figma to quickly design database schemas as a team.

We turned this tool into a public template so anyone can use it for their own projects. Check it out and let us know if there are any additional features you want to see!

Get access to the free ERD designer

Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL

Derek Xiao — Wed, 03 Feb 2021 20:42:00 +0000

Have you ever started a project and asked - "should I use a SQL or NoSQL database?"

It’s a big decision. There are multiple horror stories of developers choosing a NoSQL database and later regretting it.

But now you can get the best of both worlds with JSON in PostgreSQL.

In this article I cover the benefits of using JSON, anti-patterns to avoid, and an example of how to use JSON in Postgres.

When to use JSON in Postgres
JSON Basics.
Types of JSON in PostgreSQL.
Creating a JSON table
How to query JSON data
JSONPath: advanced JSON syntax
Arctype: An Easy Tool for Databases

Why Use a SQL Database for Non-Relational Data?

Example of normalized data in a school database

First we have to briefly cover the advantages of using SQL vs NoSQL.

The difference between SQL and NoSQL is the data model. SQL databases use a relational data model, and NoSQL databases usually use a document model. A key difference is how each data model handles data normalization.

Data normalization is the process of splitting data into “normal forms” to reduce data redundancy. The concept was first introduced in the 1970s as a way to reduce spending on expensive disk storage.

In the example above, we have a normalized entity relationship diagram for a school database. The StudentClass table stores every class a student has taken. By normalizing the data, we only keep one row for each class in the Class table, instead of duplicating class data for every student in the class.

But what if we also wanted to track every lunch order (entree, sides, drink, snacks, etc) to send each student a summary at the end of every week?

In this case it would make more sense to store the data in a single document instead of normalizing it. Students will always be shown their entire lunch order, so we can avoid expensive joins by keeping the lunch order data together.

{
    "student_id": 100,
    "order_date": "2020-12-11",
    "order_details": {
        "cost": 5.87,
        "entree": ["pizza"],
        "sides": ["apple", "fries"],
        "snacks": ["chips"]
    }
}

Example schema for lunch orders using JSON

Instead of maintaining a separate NoSQL database, we now store lunch orders as JSON objects inside an existing relational Postgres database.

What is JSON?

{
    "student_id": 100,
    "order_date": "2020-12-11",
    "order_details": {
        "cost": 5.87,
        "entree": ["pizza"],
        "sides": ["apple", "fries"],
        "snacks": ["chips"]
    }
}

JSON Example

Name	Data type
student_id	Integer
order_date	Date
order_details	Object
sides	Array

JSON, or Javascript Object Notation, is a flexible format to pass data between applications, similar to a csv file. However, instead of rows and columns, JSON objects are collections of key/value pairs.

According to Stack Overflow, JSON is now the most popular data interchange format, beating csv, yaml, and xml.

The original creator of JSON, Douglas Crockford, attributes the success of JSON to its readability by both developers and machines, similar to why SQL has been dominant for almost 50 years.

The JSON format is easy to understand, but also flexible enough to handle both primitive and complex data types.

Evolution of JSON in PostgreSQL Plain JSON type

In 2012, PostgreSQL 9.2 introduced the first JSON data type in Postgres. It had syntax validation but underneath it stored the incoming document directly as text with white spaces included. It wasn’t very useful for real world querying, index based searching and other functionalities you would normally do with a JSON document.

JSONB

In late 2014, PostgreSQL 9.4 introduced the JSONB data type and most importantly improved the querying efficiency by adding indexing.

The JSONB data type stores JSON as a binary type. This introduced overhead in processing since there was a conversion involved but it offered the ability to index the data using GIN/Full text based indexing and included additional operators for easy querying.

JSONPath

With JSON’s increasing popularity, the 2016 SQL Standard brought in a new standard/path language for navigating JSON data. It’s a powerful way of searching JSON data very similar to XPath for XML data. PostgreSQL 12 introduced support for the JSON Path standard.

We will see examples of JSON, JSONB, and JSONPath in the sections below. An important thing to note is that all JSON functionality is natively present in the database. There is no need for a contrib module or an external package to be installed.

JSON Example in Postgres

Lets create a Postgres table to store lunch orders with a JSON data type.

create table LunchOrders(student_id int, order json);

Now we can insert JSON formatted data into our table with an INSERT statement.

insert into LunchOrders values(100, '{
    "order_date": "2020-12-11",
    "order_details": {
        "cost": 4.25,
        "entree": ["pizza"],
        "sides": ["apple", "fries"],
        "snacks": ["chips"]}
    }'      
);

insert into LunchOrders values(100, '{
    "order_date": "2020-12-12",
    "order_details": {
        "cost": 4.89,
        "entree": ["hamburger"],
        "sides": ["apple", "salad"],
        "snacks": ["cookie"]}
    }'      
);

If you do a Select * from the table, you would see something like below.

Get JSON objects. SQL Editor: Arctype

Inserting data into a JSONB column is exactly the same, except we change the data type to jsonb.

create table LunchOrders(student_id int, orders jsonb);

How to Query JSON Data in Postgres

Querying data from JSON objects uses slightly different operators than the ones that we use for regular data types ( =, < , >, etc).

Here are some of the most common JSON operators:

Operator	Description
->	Select a key/value pair
->>	Filter query results
#>	Selecting a nested object
#>>	Filter query results in a nested object
@>	Check if an object contains a value

Full list of JSON operators.

The -> and ->> operators work with both JSON and JSONB type data. The rest of the operators are full text search operators and only work with the JSONB data type.

Let's see some examples of how to use each operator to query data in our LunchOrders table.

Getting values from a JSON object

We can use the -> operation to find every day that a specific student bought a school lunch.

select orders -> 'order_date'
from lunchorders
where student_id = 100;

Select JSON data. SQL Editor: Arctype

Filtering JSON data using a `WHERE` clause

We can use the ->> operator to filter for only lunch orders on a specific date.

select orders
from lunchorders
where orders ->> 'order_date' = '2020-12-11';

Filter JSON by date. SQL Editor: Arctype

This query is similar to the = operator that we would normally use, except we have to first add a ->> operator to tell Postgres that the order_date field is in the orders column.

Getting data from an array in a JSON object

Let's say we wanted to find every side dish that a specific student has ordered.

The sides field is nested inside the order_details object, but we can access it by chaining two -> operators together.

select
  orders -> 'order_date',
  orders -> 'order_details' -> 'sides'
from
  lunchorders
where
  student_id = 100;

Getting nested values from a JSON object. SQL Editor: Arctype

Great now we have arrays of the sides that student 100 ordered each day! What if we only wanted the first side in the array? We can chain together a third -> operator and give it the array index we're looking for.

Getting array values at a specific index. SQL Editor: Arctype

Retrieving nested values from a JSON object

Instead of chaining together multiple -> operators, we can also use the #> operator to specify a path for retrieving a nested value.

select orders #> '{order_details, sides}'
from lunchorders;

      ?column?      
--------------------
 ["apple", "fries"]
 ["apple", "salad"]
(2 rows)

Checking if a JSON object contains a value

Lets say we wanted to see every order a student made that had a side salad. We can't use the previous ->> for filtering because sides is an array of values.

To check if an array or object contains a specific value, we can use the @> operator:

select
  orders
from
    lunchorders
where
    orders -> 'order_details' -> 'sides' @> '["salad"]';

orders                                                                   
----------
 {"order_date": "2020-12-12", "order_details": {"cost": 4.89, "sides": ["apple", "salad"], "entree": ["hamburger"], "snacks": ["cookie"]}}
(1 row)

JSONPath: The Final Boss

JSON Path is a powerful tool for searching and manipulating a JSON object in SQL using JavaScript-like syntax:

Dot (.) is used for member access.
Square brackets ("[]") are used for array access.
SQL/JSON arrays are 0-indexed, unlike regular SQL arrays that start from 1.

Built-in functions

JSONPath also includes powerful built-in functions like size() to find the length of arrays.

Let's use the JSONPath size() function to get every order that had >= 1 snack.

select *
from lunchorders
where orders @@ '$.order_details.snacks.size() > 0';

JSONPath Build-in functions example. SQL Editor: Arctype

Comparison without type casting

JSONPath also enables comparisons without explicit type casting:

select *
from lunchorders
where orders @@ '$.order_details.cost > 4.50';

This is what the same query would look like with our regular JSON comparisons:

select *
from lunchorders
where (orders -> 'order_details' ->> 'cost')::numeric > 4.50;

JSON Summary

In this article we've covered:

When to use SQL vs NoSQL
A history of JSON in Postgres
Examples of how to work with JSON data
JSON query performance with indexing
JSON anti-patterns

Working with JSON data can be complicated. Arctype is a free, modern SQL editor that makes working with databases easier. Try it out today and leave a comment below with your thoughts!

JSONPath Build-in functions example. SQL Editor: Arctype

Sneak peek at part 2 of the JSON in Postgres series:

With our existing table, the database engine has to scan through the entire table to find a record. This is called a sequential scan.

Analyzing JSON query performance. SQL Editor: Arctype

Sequential scans become degrade in performance as the dataset grows.

Follow to get notified of Part 2 where I will show show how to index a dataset with 700k rows to improve query performance by 350X.

Slow Queries? 10X Query Performance with a Database Index

Derek Xiao — Tue, 26 Jan 2021 18:33:46 +0000

A good database index can improve your SQL query speeds by 99% or more.

Let’s take a table with 1 billion, 16 byte names and a disk with a 10ms seek time and a 10MB/s transfer rate.

If we wanted to find "John Smith" in this table, a regular search that has to check every single name in sequential order would take ~2 hours (.016ms transfer time * 500M rows on average, assume 0 seek time because sequential).

This same search with a database index would only take ~0.3 seconds ((10ms seek time + .016ms transfer time) * log(1*10^10)). A 99.99% speed improvement.

But database indexes also use increased overhead and can degrade performance if not used correctly.

This article will cover the main considerations for creating the right index for your database:

Index type
Selecting the correct column
Choosing how many indexes to create

Download Arctype to follow along with the examples below and create an index on your own database.

What is a Database Index?

Database Index Example

A database index is a data structure used to organize data so that it is easier to search.

Indexes consist of a set of tuples. The first tuple value is the search key, and the second contains a pointer(s) to a block on the hard drive where the entire row of data is stored.

These tuples are then organized into different data structures (i.e. B-tree, Hash, etc) depending on the database index type.

To understand how a tree data structure speeds up search performance, I recommend playing with some of the interactive visualizations online.

Which Postgres Index Type Should You Use?

Postgres offers 6 different index types to solve for different use cases.

Here's a breakdown of their advantages and disadvantages:

Type	Performance	When to use?
B-Tree (Most common)	O(log(n)) insertions and queries	Can be used for both equality and range queries (i.e. <, =, >, BETWEEN, IN, etc)
Hash	O(1) (faster than B-tree)	Only works for equality comparisons. Hash indexes are not recommended by Postgres beecause they can product inaccurate results
Generalized Search Tree (GiST)	O(log(n)) for insertion and queries	Used for operations for beyond equality and range comparisons on geometric data types (i.e. <)
Space-partitioned GiST (SP-GiST)	O(log(n)) for insertion and queries	Non-balanced, disk-based data structures (i.e. quad-trees, k-d trees)
Generalized Inverted Indexes (GIN)	O(log(n) for queries. Longer insertion time.	Indexing data types that map multiple values to one row (i.e. arrays and full text search)
Block Range Index (BRIN)	20X faster than B-tree and a 99%+ space savings	Table entries have to be ordered in the same format as the data on disk

Choosing the Right Database Index

Creating an index does not guarantee better database performance. Every time you write to a table with an index, the database engine is updating both the table and any impacted indexes.

This is how you can decide which table columns to use for an index:

Choose a column that is frequently queried but not frequently changed (add/delete)
The column has a referential integrity constraint
The column has a UNIQUE key integrity constraint.

Every modern database engine also has a query planner that decides how each query will be run. In some scenarios it's possible that queries you would expect to use your index are actually doing sequential scans. To check if the query plan is using your index, you can run EXPLAIN.

For instance, running EXPLAIN on this example shows that it is using a sequential scan instead of an index.

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000;

                         QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244)
   Filter: (unique1 < 7000)

How many indexes should I use?

It depends. If you're managing a table with frequent changes, then you probably want less overhead with indexes. But on the other hand if you're making mostly reads from the table, then adding additional indexes would probably speed up performance.

Before adding a new index, check if your current indexes are actually slowing down your CPU.

How to Create an Index in Postgres - Syntax

Postgres index examples are provided in the following sections. In the scope of this article, the examples include:

Create Index : Defining a new index.
Drop Index : Removing an index.
List Index : Listing all indexes.Unique Index: Defining unique indexes.
Unique Index: Defining unique indexes.

How to create an index

CREATE INDEX index_name ON table_name USING [method]
(
column_name [ASC | DESC] [NULLS {FIRST | LAST }],
...
);

The optional ASC/DESC and NULLS LAST parameters are beneficial for data that you plan on retrieving in sorted order, and you want the null values to appear first or last in the last.

How to create a partial index

You can also create an index on only a subset of a table.

CREATE INDEX employee_index ON employees (employee_id)
    WHERE employee_id > 200;

A partial index is beneficial in situations where there are large clusters of data with the same index value. Even if this data is indexed, the Postgres query planner will usually use a sequential search because the data has the same values. So a partial index can remove clusters of data with the same index value and save space.

How to remove an index

DROP INDEX index_name;

How to find existing indices

Postgres automatically creates a pg_indexes table that you can query to find existing indexes in a database.

SELECT
    indexname,
    indexdef
FROM
    pg_indexes
WHERE
    tablename = 'employees';

Postgres Reindex Explained

Reindex drops an existing index in a table and rebuilds it using the current table values. The most common scenario for using reindex is when the data has changed significantly, and there are now existing pages that are inefficiently using space.

A routine reindexing of your database can reduce the index size and improve performance.

## Rebuild a specific index
REINDEX INDEX my_index;

## Rebuild every index in a table
REINDEX TABLE my_table;

## Rebuild every index in a database 
REINDEX DATABASE my_database;

Takeaways

A properly created database index can improve query performance by 99% or more.

This article covered the main considerations for creating a database index that improves performance instead of slowing it down:

Index type
Selecting the correct column
Choosing how many indexes to create

Now that you've optimized your query performance, it's time to speed up your SQL workflow. Arctype's collaborative SQL client allows you to easily share databases, queries, and visualizations with anyone.

How Do the Top 20 Words in Biden's Inauguration Speech Compare to Trump's?

Derek Xiao — Thu, 21 Jan 2021 17:16:16 +0000

Can Python unite the nation?

Yesterday we saw our country's 45th successful transfer of the presidency.

This marked the end of a highly contested election during which our nation at times felt more divided than ever.

But as I sat in my living room today with my parents and watched Biden’s inaugural address, I felt hopeful.

“Some days you need a hand. There are other days when we're called to lend a hand. That's how it has to be, that's what we do for one another. And if we are that way our country will be stronger, more prosperous, more ready for the future. And we can still disagree.” - Joe Biden, 2021 Inaugural Address

As a citizen, I was inspired by Joe's promise for a united nation.

But as a developer I started to wonder- could I quantify this hope?

The inaugural address is a president’s first speech to the nation. The speech is meticulously written by a team of writers to capture the mood of the nation and the most pressing issues that we face.

Could the specific words used in this speech give us insight into the path ahead?

I compared the top 20 most common words in Biden's speech with the top 20 words in Trump's 2017 inaugural address to see where our country is now compared to four years ago, and what to expect over the next four years.

Python set-up
Web scraping with Beautiful Soup
NLP with nlkt
Results: Top 20 words takeaways

Using Python to Find the Top 20 Most Common Words

This next section is a tutorial for the Python analysis. If natural language processing doesn't get you excited, then you may want to jump to the end (but it's also only 20 lines of code so could be fun to learn!)

The goal for this analysis is to take each inaugural address and find the most common words. The analysis consists of two parts:

Scraping the speech from the web using Beautiful Soup
Processing the words using NLKT

If you want to run the code at home, this is what you'll need to do to get set up:

Install python 3
Install requests, BeautifulSoup and nltk with pip3 install
brew install jupyter and then open a jupyter notebook by running jupyter notebook.

Now you can run all of the commands below in the jupyter notebook!

If you want to skip the scraping and cleaning, you can download Arctype and use the database credentials at the end to view the data.

1. Web Scraping with Beautiful Soup

Web scraping is the process of collecting information from the web. In this scenario, we're going to be scraping transcripts of each president's inauguration speech.

You can find each president's speech at these websites:

We first use the requests package to scrape the entire HTML code from each website.

import requests

URL = 'https://www.yahoo.com/now/full-transcript-joe-bidens-inauguration-175723360.html'
page = requests.get(URL)

Congrats you've built your first web scraper!

This code is making a HTTP request to retrieve the HTML code from the server that the speech is stored at.

Now we have to take this mess of HTML and find just the text from each president's speech. We can do this easily with Python's Beautiful Soup package.

from bs4 import BeautifulSoup

biden_speech = BeautifulSoup(page.content, 'html.parser')

In the code above we've converted the HTML from earlier into a beautiful soup object that is easily parseable.

Now we have to find the specific HTML block that contains the text we're looking for. We can do this using the browser's DevTools console.

Open the speech in a new tab in your browser and press cmd+option+I to open the DevTools console. Highlight the text you're looking for, and you'll be able to see the HTML tag that contains that text in the console on the right.

For Biden's speech, we can see that it's contained in a <div> tag labelled with a caas-body class name. Switching back to Python, we can find that tag using the find_all method with our beautiful soup object from before.

biden_speech_content = biden_speech.find_all('div', class_='caas-body')

When we look at the biden_speech_content object, we'll still find other html tags that aren't related to the speech such as:

<div class="caas-readmore caas-readmore-collapse">
    <button aria-label="" class="link rapid-noclick-resp caas-button
    collapse-button" data-ylk="elm:readmore;slk:Story continues"
    title="">
        Story continues
    </button>
</div>

In order to find just the text from Biden's speech, we can filter for the <p> tags that aren't labeled with a class:

biden_speech_content_v2 = biden_speech_content[0].find_all('p', attrs={'class': None})

Now we have all the text, but the string <p> is appended to the beginning of every sentence. We can remove these HTML tags with the Beautiful Soup get_text method:

biden_speech_str = ""

for sentence in biden_speech_content_v2:
    text = sentence.get_text()
    biden_speech_str = biden_speech_str + " " + text

Finally, we should be left with a clean speech that we can analyze with the nlkt package.

2. Finding Word Frequency with NLKT

We're getting close to the end now! The final steps are doing some basic natural language processing (NLP) techniques using the Python NLP package, NLKT.

We could do a frequency analysis of the speech now, but this would show words like "I", "We", and "The" as the most common words. In natural language processing these are called stop words.

We can use NLKT's list of English stop words to find just the words that we're interested in.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk import FreqDist

biden_words = word_tokenize(biden_speech_str.lower())

filtered_biden_speech = [w for w in biden_words if not w in stop_words and w.isalpha()]

Let's break down what the code is doing:

Using .lower() to cast the entire speech to lower case so it can be compared to the stop words
Separating the string into individual words with word_tokenize
Removing stop words: if not w in stop_words
Removing punctuation like periods and commas: w.isalpha()

Now we have a list of words that we can count!

freq = FreqDist(filtered_biden_speech)
print (freq.most_common(20))

But what you might find as you look through the list is that there are separate counts for similar words such as "country" and "countries". In order to count these as one word, we have to lemmatize the list so that every word is converted to its base word.

from nltk.stem import WordNetLemmatizer

lemmatized_biden = [wordnet_lemmatizer.lemmatize(word) for word in filtered_biden_speech]

freq_lemma = FreqDist(lemmatized_biden)
print (freq_lemma.most_common(20))

Done! You've successfully scraped data from the web and analyzed it with NLP all while supporting democracy. Let's take a look at the results.

Biden's vs. Trump's Inauguration Speeches: Most Frequent Words

k,v = zip(*freq_lemma.most_common(10))

fig = px.bar(x=v,y=k, orientation='h')
fig.update_layout(yaxis=dict(autorange="reversed"))
fig.show()

The top word was distorted by the lemmatizer, but the word was "us".

These were the top 10 words from Trump's speech in 2017:

What stood out to me is that 50% of the top 10 words in for both presidents were the same:

America
American
Nation
People
One

The optimistic side in me looks at this data and sees a nation that shares common values. We care about our country, and we care about each other.

But at the same time, we are all facing our own unique issues. If we look at the next 10 most common words for each president's speech we begin to see some differences.

Biden's Speech:

Trump's Speech:

Biden's speech was undeniably a call to bring our nation together in unity. On the other side, we can see Trump appealing to Americans whose job[s] are under threat and need to protect their livelihood and families.

The data shows two groups of people facing their own challenges, but I also see one nation with common values.

We set off to see if we could quantify "hope". And I believe we found an answer.

If two presidents with polar opposite political views can appeal to their supporters with 50% of the same vocabulary, then there is still hope to unite around our similarities.

What are the common objects we as Americans love, that define us as Americans? I think we know. Opportunity, security, liberty, dignity, respect, honor, and yes, the truth. - Joe Biden, 2021 Inaugural Address

A Full Speech Comparison with Arctype

I shared the top 20 words, but there were more than 500 unique words in Biden's inauguration speech. If you want to see more analysis, we've uploaded all the speech data to Arctype so you can skip the scraping and cleaning.

The dataset includes 2 tables:

Frequencies table: full list of the word frequencies for both speeches
Sentences tables: cleaned sentences for both speeches so you can do your own analysis

Here's how to connect to the data:

Download the free Arctype SQL Client
Input the credentials below in Arctype to connect to the database
Run a query!

Database credentials:

host: arctype-pg-demo.c4i5p0deezvq.us-west-2.rds.amazonaws.com
port: 5432
user: root
password: HC9x0OkI9vVO4wqprscg
database: inauguration_2021

If you enjoyed this post, sign up for the Arctype newsletter to receive more posts written by experienced developers to help and inspire other devs.

SQL is 50 Years Old. What Can We Learn From Its Success?

Derek Xiao — Fri, 15 Jan 2021 18:00:00 +0000

Arctype writer: Felix Schildorfer

In March 1971, Intel introduced the world's first general microprocessor, the Intel 4004. It had ~2,300 transistors and cost $60.

Fast forward almost 50 years, and the newest iPhone has nearly 12 billion transistors (but unfortunately costs a little more than $60).

Many of the programming languages we use today were not introduced until the 90s (Java was introduced in 1996). However, there is one programming language that is still as popular today as it was when it was introduced nearly 50 years ago: SQL.

This article will discuss the events that led to the introduction of relational databases, why SQL grew in popularity, and what we can learn from its success.

A History of Early Database Management - IDS and CODASYL
The Network Data Model - Better Than Today's Relational Model?
Arrival of The Relational Model
Relational vs. Network Data Models
The Rise and Reign of SQL
The Secret to SQL’s 50 Year Reign - And What We Can Learn

History of Early Database Management - IDS and CODASYL

Image Source

In 1962, Charles W. Bachman (no relation to Erlich Bachman) was working as part of a small team at General Electric. One year later, Bachman’s team introduced what would later be recognized as the first database management system - the Integrated Data Store (IDS).

10 years later, Bachman would receive the coveted Turing Award, often called the Nobel Prize of computer science, for his contributions to computing with IDS.

What was IDS?

In the early 1960s, computer science was just beginning as an academic field. For context, ASCII was not introduced until 1963.

To understand IDS we have to first understand the two main forces that led to its development:

The introduction of disk storage
A migration to high-level programming languages

Disk storage

Above: Moving a RAMAC 305

In 1956, IBM introduced the first commercial hard disk drive, the RAMAC 305%20system.).

The introduction of disk drives allowed programmers to retrieve and update data by jumping directly to a location on the disk. This was a vast improvement from its predecessor, tape drives, which required moving sequentially through the tape to retrieve a specific piece of data.

But developers now had to figure out where records were stored on the disk. Due to the limitations of file management systems in early operating systems, this was an advanced task reserved only for experienced programmers.

Developers needed a solution to simplify working with disk drives.

High-level programming

At the same time, computer science was beginning to move from innovators to early adopters on the adoption curve. Low level programming languages like Assembly were popular among the early academics, but regular programmers were switching to higher level programming languages like COBOL for their usability.

So what was IDS? IDS solved for disk storage and high-level programming. IDS allowed developers to use high-level programming languages, like COBOL, to build applications that input and retrieve data from disk storage. Because of this function, IDS has received the distinction as the first database management system.

CODASYL - A new standard for database management

In 1969, the Committee of Data Systems Languages (CODASYL) released a report proposing a standard for database management. Bachman was part of the committee, and the report drew heavily from IDS.

The CODASYL Data Model introduced many of the core features in database management systems that we use today:

Schemas
Data definition language (DDL)
Data manipulation language (DML)

Most importantly, IDS and CODASYL introduced a new way for modeling data that influenced the eventual development of SQL - the network data model.

The Network Data Model - Better Than Todays Relational Model?

Above: Network model example

A data model is a standardized way to describe (model) the world (data).

The previous hierarchical data model used tree structures to describe data, but these were limited to one-to-many relationships. The new network model allow records to have multiple parents, which created a graph structure. By allowing multiple parents, network models were able to model many-to-one and many-to-many relationships.

In the network model, relationships between tables were stored in sets. Each set had an owner (i.e. teachers) and one or more members (i.e. classes and students).

One of the key benefits of the network model is that related records in a set were connected directly by pointers. Sets were implemented using next, prior, and owner pointers, which allowed for easy traversal similar to a linked list.

The low-level nature of network data models offered performance benefits, but they came at a cost. The network data model had increased storage costs because every record had to store extra pointers to its previous and parent record.

Arrival of The Relational Model

Above: Example of a relational model

In 1970, 8 years afters IDS, Edgar F. Codd introduced the relational model in his seminal paper “A Relational Model of Data for Large Shared Data Banks” (also earning him a spot with Bachman as a Turing Award recipient).

Codd showed that all data in a database could be represented in terms of tuples (rows in SQL) grouped together into relations (tables in SQL). To describe the database queries, he invented a form of first-order predicate logic called tuple relational calculus.

Tuple relational calculus introduced a declarative language for querying data. Declarative programming languages allow programmers to say what they want to do without describing how to do it.

This new declarative language was much easier to use for developers. The relational model laid out all the data in the open. Developers could retrieve all the data from a table, or read a single row, in one command. (thanks query optimizer)

Gone were the days of following a labyrinth of pointers to find your data.

Relational vs Network Data Models

Relational databases decreased the high storage costs that network databases had by normalizing data. Normalization was a process of decomposing tables to eliminate redundancy, and thereby decrease the footprint of data on disk.

However, relational databases had an increased CPU cost. In order to process normalized data, relational databases had to load tables in memory and use compute power to “join” tables back together. Let’s walk through the process for finding all classes and students for a given teacher with a relational model.

The database system would first perform an operation to retrieve all relevant classes. Then it would perform a second operation to retrieve student data. All of this data would be stored in memory, and it would run a third operation to merge the data before returning the result.

Performance comparison between relational and network models

In a performance case study using real data, Raima found that network database models had 23x better insert performance and 123x faster query performance.

So why are relational databases the leading database solution?

Usability.

The relational model was more flexible to changes, and its declarative syntax simplified the job for programmers.

And Moore’s law was working its magic in the background. The cost of computing continued to decrease, and eventually the increased computing cost of relational models was outweighed by the productivity gains.

Fast forward 50 years, and now the most expensive resource in a data center is the CPU. Sign up for Arctype and get notified of a future post where I discuss what databases might look like over the next 50 years.

The Rise and Reign of SQL

And finally, we've come to the arrival of the SQL that we all love.

4 years after the publication of Codd’s paper, Donald Chamberlin and Raymond Boyce published “SEQUEL: A Structured English Query Language”.

They described SEQUEL as “a set of simple operations on tabular structures, […] of equivalent power to the first order predicate calculus.” IBM saw the potential and moved quickly to develop the first version of SEQUEL as part of their System R project in the early 1970s.

The name would later change to SQL due to trademark issues with the UK-based Hawker Siddeley aircraft company.

The next big step in SQL's adoption was almost a decade later. In 1986, the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) published the first official SQL standard: SQL-86. The standard broke down SQL in several parts:

Data Definition Language (DDL): commands to define and modify schemas and relations
Data Manipulation Language (DML): commands to query, insert and delete information from a database
Transaction Control: commands specifying the timing of transactions
Integrity: commands to set constraints in the information in a database
Views: commands to define Views
Authorization: commands to specify user access
Embedded SQL: commands that specify how to embed SQL in other languages

Competitors to SQL

Between 1974 to today there have been numerous competitors that have tried stealing market share from SQL’s dominance as a query language. These new syntaxes often catered to a specific, new technology:

Lisp -> CLSQL
.NET -> LINQ
Ruby on Rails -> ActiveRecord

Fast forward 35 years and SQL is still ubiquitous with databases. How has SQL maintained its reign as a query language, and what can we learn from its story?

The Secret to SQLs 50 Year Reign - And What We Can Learn

Above: Stack Overflow Developer Survey, 2017

We started this story with Bachman’s introduction of the first database management system, IDS. We talked about how the shift to disk storage and high-level programming necessitated a new way of working with data. Then CODASYL came and standardized database management. IDS and CODASYL introduced the new network data model, and finally Codd dropped the relational model.

This happened over eight years.

How did SQL manage to stick around for the next 50 years? I think these are the four main reasons:

Built on first principles
Bushnell’s law
Listening and adapting
Adoption of APIs

Built on First Principles

A first principle is a foundational proposition that can not be deduced from any other proposition or assumption. For instance, combining hydrocarbons with oxygen to create a chemical reaction. And this is still the principle that powers the internal combustion engines in every cars.

In 1970, Codd created a new first principle for databases: tuple relational calculus. This new logic created led to the relational model, which then led to SQL. Tuple relational calculus is the chemical reaction, relational models are the internal combustion engines, and SQL is the car.

In a later post, we’ll discuss new technologies that are trying to create the electric engine for databases.

Bushnell’s Law

Building on first principles alone can not guarantee success. Assembly is as close as programmers can get to typing 1s and 0s, but it was still replaced with COBOL (and later C).

The missing ingredient was usability.

We saw the same story play out with the switch from the network to relational model. The network model had faster performance, but every company today uses relational databases because of its simplicity (to get started).

"The best games are easy to learn but hard to master" - Nolan Bushnell, founder of Atari

Nolan Bushnell knew the secret to getting people to use a new product. Assembly unfortunately was both difficult to learn and difficult to master.

SQL found the perfect balance. With ~10 SQL commands, anyone can learn the 20% that will get you 80% of the way there. But there is a long path of indexing, views, and optimization to becoming a master.

Listening and Adapting

Query languages are not timeless monoliths but adaptive groups of standards that change over time. The SQL standard has continued to adapt over time and incorporate feedback from its users.

Since the original conception we have seen 10 different SQL standards all with important updates. Here were 3 big ones:

SQL:1999: Added regular expression matching, recursive queries (e.g. transitive closure), triggers, support for procedural and control-of-flow statements, non-scalar types (arrays), and some object-oriented features (e.g. structured types). Support for embedding SQL in Java (SQL/OLB) and vice versa (SQL/JRT).
SQL:2003: Introduced XML-related features (SQL/XML), window functions, standardized sequences, and columns with auto-generated values (including identity-columns).
SQL:2016: Adds row pattern matching, polymorphic table functions, JSON.

SQL also demonstrates the power of creating rails that other products can build on. Instead of enforcing a syntax, SQL provides a standard for each database to create their own implementation (T-SQL, MySQL, PSQL, etc).

Adoption of APIs

The final secret behind SQL’s success has been the rise of application programming interfaces (API). APIs simplify programming by abstracting the underlying implementation and only exposing objects or actions the developer needs.

APIs allow SQL to continue to adapt to new technologies with specialized syntax. In 2006 Hadoop introduced the distributed file system (HDFS), which was initially inaccessible to the SQL syntax. Then in 2013 Apache created Apache Impala, which allowed developers to use SQL to query HDFS databases.

The Fascinating Story of SQL

SQL is one of the most ubiquitous programming languages today, but we often forget how long of a history it has had. Its journey started at the dawn of modern computing and was brought to life by 2 Turing award recipients.

I've shared my thoughts on why SQL has been able to maintain its dominance: first principles, Bushnell’s law, adapting, and APIs. Leave a comment and let me know what you think were the major factors that contributed to SQL’s success.

There is one more technology to talk about that hasn’t changed in 50 years.

SQL editors.

Working with databases is becoming an increasingly collaborative process as more people learn SQL. Developers today may be seen working with individuals on a marketing team to analyze user data, or debugging queries with a data scientist.

Arctype has built a collaborative SQL editor that allows you to easily share databases, queries, and dashboards with anyone. Join our growing community and try out Arctype today.

Becoming a SQL Guru with Recursive CTEs

Derek Xiao — Wed, 30 Dec 2020 20:08:22 +0000

Arctype writer: Daniel Lifflander

Intro to CTEs
Organizing Complex Queries
Unlocking the Power of CTEs with Recursion
Recursive CTEs in Action

Intro to CTEs

It's easy to get lost in a maze of subqueries, derived tables, and temporary tables when working with complex SQL queries. Postgres' common table expressions, or CTEs, are a lesser known feature, but they can be a useful structuring tool to craft a SQL query in a more readable and friendly way.

CTEs begin as a with statement and allow you to execute a query whose results will be available later to subsequent queries. If you have a query that relies on the results of another query, a reference of sorts, the CTE will allow Postgres to materialize the results of the reference query, ensuring that the query is only run once and its results are readily available to other queries farther down the line. (Note that this can be a double-edged sword, as it prevents the Postgres query planner from running optimization outside of the reference query’s SQL.)

The sequential list of "code blocks" created by CTEs adds readability to SQL scripts because every block is dependent on the ones above it. In contrast, it can be difficult to trace dependencies between derived tables and parse through nested subqueries (especially as their depth increases).

Organizing Complex Queries

Derived Table Example

Temporary Table Example

This will require two separate queries.

CTE Example

Note that the query above depends on a, which is listed first with the CTE, but as a derived table it is listed inside the query. CTEs allow you to logically order queries that are dependent on others. As queries get more complex, organization becomes key to reducing errors. CTEs allow you to organize code into distinct blocks with a clear dependency hierarchy.

Unlocking the Power of CTEs with Recursion

You can also make CTEs more than syntactic sugar by adding the recursive keyword! Once you add the recursive keyword, Postgres will now allow you to reference your CTE from within itself, allowing you to “generate” rows based on recursion.

Here’s an example using Postgres CTEs to implement the Fibonacci sequence:

Running this, we’d get the first 10:

It's probably not too often we find ourselves in a situation where we need Fibonacci numbers in SQL. Let’s continue on and take a look at a real problem I solved using this Postgres feature.

Recursive CTEs in Action

In the manufacturing industry, a bill of materials (or BoM) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product … BOMs are of hierarchical nature, with the top level representing the finished product which may be a sub-assembly or a completed item. BOMs that describe the sub-assemblies are referred to as modular BOMs

More simply, a BoM is a list of parts that a product is assembled from. Each part may consist of other parts, creating hierarchy. A car is a final product that is composed of many parts. A door is an example of a part of a car, its subparts being among a window, handle, various switches, and so forth. Recursive CTEs are great for hierarchical data! Let’s take a look at how we could model this hierarchy and use CTEs to generate a modular BoM sheet.

part table stores our parts.

part_hierarchy table stores a simple parent to child reference between parts, as well as a quantity if the part happens to need more than 1 of the child part for assembly. The check constraints prevent a part from referencing itself circularly.

Let's insert some parts

And some relations, approximately resembling the car example mentioned above

In our database we now have 5 parts and some relationships between them. If we wanted to see how many direct subassemblies a door has, we could query something like this, though it would only show us direct descendants:

Finally, let's now construct a query to list all of the parts and their subassemblies of our car.

The union all is the key part to this CTE. The first part of this UNION selects the base part, the second half selects any child parts by joining to the part_hierarchy table and calling itself (inducing recursion), then combines those results with the parent via the union. Without recursion, you'd have to add a join for every level of the hierarchy you needed to support.

The level value is introduced here to track how deep the recursion has run. It is hard coded at 0 to represent the parent. As the query "unwraps" and it calls itself, this increases by 1 to indicate how many generations far from the parent part the child part is.

The very last part that selects from the CTE pads the name with spaces based on the level, which is a quick and dirty way to visualize the hierarchy.

From here, you could use aggregate functions and rollup to calculate the costs of specific assemblies.

This example is definitely simplified from what I ended up using to actually solve this problem. Supporting part versioning and branching and other business requirements quickly complicates what this solution would look like in the real world. Recursive CTEs in SQL are not the only way to store and query hierarchical data, but for certain data types, sizes, and speed requirements, they are a quick and convenient feature to have in your wheelhouse.

Arctype

Now you're one step closer to becoming a SQL guru, but don't let an outdated SQL editor hold you back. Check out Arctype today and experience the modern SQL editor.

Working with geospatial data in Postgres.

Derek Xiao — Mon, 28 Dec 2020 19:17:12 +0000

PostgreSQL has several extensions that allow spatial and geometry data to be treated as first-class objects within a PostgreSQL database.

Working with geospatial data
Introduction to PostGIS
Related extensions

Working with geospatial data

There are a variety of scenarios in which you may want to work with geospatial data in Postgres for your application, including:

Working with census data
Storing addresses
Calculating the distance between two paths
Storing PointCloud data of the physical world
Tracking shipping data
Tracking cars and delivery vehicles
Visualization of raster data

PostgreSQL offers extensions for working with geospatial data that allow you to treat that data as first-class objects in your database. Treating data as objects allows developers to create more powerful applications that can be built on top of data about the objects and relationships between them in the physical world.

PostGIS

The primary spatial-data extension is PostGIS. PostGIS (Geographic Information Systems) is an open-source extension of the PostgreSQL database that lets you work with geographic objects that integrate directly with your database. With PostGIS, geographic and spatial data can be treated as first-class objects in your database.

By adding the PostGIS extension to your PostgreSQL database, you can work seamlessly with geospatial data without having to convert that data from the format that the rest of your application is working with to use with your database. You can also determine relationships between that spatial data with the extension, such as the distance between two objects in your database. You can also use PostGIS to render visualizations of this data.

Working with data such as cities and geometry data is as simple as something like:

SELECT superhero.name

FROM city, superhero

WHERE ST_Contains(city.geom, superhero.geom)

AND city.name = 'Gotham';

PostGIS includes:

Spatial Types
Point
Line
Polygon
Etc

The hierarchy of these spatial-focused type (from Introduction to PostGIS) is below:

Spatial-Indexing
Efficiently index spatial relationships
Spatial-Functions
For querying spatial properties, and the relationships between them
Functions for analyzing geometric components, determining spatial relationships, and manipulating geometries

In most databases, data is stored in rows and columns. With PostGIS, you can actually store data in a geometry column. This column stores data in a spatial coordinate system that’s defined by an SRID (Spatial Reference Identifier). This allows your database structure to reflect the spatial data that’s stored in the database.

Related Extensions

There are other PostgreSQL extensions related to PostGIS for working with spatial data, too:

pgRouting - an extension of PostGIS itself; pgRouting enables geospatial routing information such as:

Shortest distance

Driving distance

Traveling salesman

ogrfdw - a data wrapper for reading other spatial and non-spatial datasources as tables in PostgreSQL
pgpointcloud A PostgreSQL extension and loader for storing Point Cloud data in PostgreSQL.
PointCloud data about the physical environment that is gathered using 3D cameras, and used in application areas such as AR, VR, and robotics

PostgreSQL’s extensions for working with geospatial data allow you to work with data as first-class objects in your database. Check out Arctype to discover the modern SQL editor for working with databases.

Forem: Derek Xiao

A Powerful Addition to Your Postgres Toolbelt: Table Inheritance

What is Table Inheritance in Postgres?

Why should I use inheritance?

Example 1: Using inheritance to store table statistics by month

Example #2: Using inheritance to track the movement of ships

Caveats to be aware of with PostgreSQL inheritance

Conclusion

Add Your Database to Your Spring Cleaning List

How Postgres Executes SQL Statements - DEAD vs Removed Tuples

Cleaning up dead tuples with a VACUUM in Postgres

Postgres VACUUM Example

Create a table with a auto-generated series of data

Measure free space usage using pg_freespacemap

What does VACUUM FULL do?

Postgres VACUUM Performance

Postgres Autovacuum Daemon

How to modify autovacuum for a specific table in Postgres

VACUUM vs VACUUM FULL

Conclusion

Create a Web App and Deploy to the Cloud in 20 minutes with Python

Technical Requirements

Creating a Flask App for a Registration Form

Adding input validations for the registration form

Creating a PostgreSQL Database

Connecting a Flask app to a Local Postgres Database

Manage Postgres databases from Python with SQLAlchemy

Setting up Flask-Migrate to Manage Database Structures

Set up the PostgreSQL database model

Creating a new Registrations table in Postgres using Flask

Inserting Data From a Flask App in Postgres

Testing if the Flask App is Inserting Data in Postgres

Deploying the Flask app and Postgres Database to Heroku

Migrating Tables from a Local Postgres Database to Heroku

Congratulations!

Free Tool to Design Database Schemas in <5 minutes

Forget SQL vs NoSQL - Get the Best of Both Worlds with JSON in PostgreSQL

Table of Contents

Why Use a SQL Database for Non-Relational Data?

What is JSON?

Evolution of JSON in PostgreSQL Plain JSON type

JSONB

JSONPath

JSON Example in Postgres

How to Query JSON Data in Postgres

Getting values from a JSON object

Filtering JSON data using a WHERE clause

Getting data from an array in a JSON object

Retrieving nested values from a JSON object

Checking if a JSON object contains a value

JSONPath: The Final Boss

Built-in functions

Comparison without type casting

JSON Summary

Slow Queries? 10X Query Performance with a Database Index

What is a Database Index?

Which Postgres Index Type Should You Use?

Choosing the Right Database Index

How many indexes should I use?

How to Create an Index in Postgres - Syntax

How to create an index

How to create a partial index

How to remove an index

How to find existing indices

Postgres Reindex Explained

Takeaways

How Do the Top 20 Words in Biden's Inauguration Speech Compare to Trump's?

Table of Contents

Using Python to Find the Top 20 Most Common Words

1. Web Scraping with Beautiful Soup

2. Finding Word Frequency with NLKT

Biden's vs. Trump's Inauguration Speeches: Most Frequent Words

A Full Speech Comparison with Arctype

SQL is 50 Years Old. What Can We Learn From Its Success?

Table of Contents

History of Early Database Management - IDS and CODASYL

What was IDS?

Disk storage

High-level programming

CODASYL - A new standard for database management

Filtering JSON data using a `WHERE` clause