Overview
An almost incomprehensible amount of data is created daily, and figures grow ever-increasingly each year. For future usage, this data has to be stored electronically for analysis and research to produce actionable insights. In this article, I will explain one of the most widely used data management systems, SQL, focusing on Relational Databases.
What is SQL?
SQL stands for Structured Query Language, pronounced popularly as se-qu-el
. SQL is a standard language for managing, manipulating, and querying databases basically a language used to communicate with databases. It was developed at IBM in the early 1970s and has undergone tremendous development updates, adapting to the emerging technologies. There are different types of SQL Database Management Systems like PostgreSQL, MySQL, SQL Server, Oracle, SQLite, etc. SQL stores data in a tabular format with columns and rows. The type of data stored in these databases is known as structured data.
SQL can be further broken down into 4 sublanguages for tackling different jobs, namely:
-
Data Definition Language(DML): It is used to modify the structure of the database(used to create and modify tables, views etc). The commands used are
CREATE
,ALTER
,DROP
, andTRUNCATE
. ExamplesCREATE TABLE
CREATE TABLE customers (id INT,
name VARCHAR(255));
ALTER TABLE
ALTER TABLE customers ADD COLUMN email VARCHAR(255);
DROP TABLE
DROP TABLE customers;
-
Data Query Language(DQL): It is used for querying data. DQL command is mainly the
SELECT
statement.
SELECT *
FROM customers;
-
Data Manipulation Language(DML):
It is used to act on the data itself. Commands include
INSERT
,UPDATE
,DELETE
, andMERGE
. ExamplesINSERT
INSERT INTO customers (id, name, email)
(1, 'John Doe', 'john@example.com')
(2, 'Marya Akoth', 'marya.akoth@mail.com');
UPDATE
UPDATE customers
SET name = 'Jane Doe' WHERE id = 1;
DELETE
DELETE FROM customers WHERE id = 1;
-
Data Control Language(DCL): It is used to manage access to a database and its data. It is usually used by database administrators to manage access in an organisation. Commands include
GRANT
andREVOKE
.
Why is SQL the foundation of Data Analytics?
- In the data universe, data engineers and database administrators will use SQL to ensure that everybody in their organisation can access the data they need, sometimes depending on the permissions granted to the users.
- Data Scientists will use SQL to query data stored in a database to train and load data into their Machine Learning Models.
- Data analysts will use data to query tables of data and derive insights.
What are the benefits of SQL?
- Standard Database Management System: SQL is the standard language for retrieving data and interacting with databases. This is because most software applications and scripting languages, such as Python and R, are compatible with SQL.
- Scalability: SQL is a perfect tool for processing large datasets compared to tools like spreadsheets, where handling many rows might not be effective. This gives SQL an upper hand since it can easily handle millions of entries. Nevertheless, its scalability serves as a key benefit to meet the needs of modern data analysis.
- Easy to learn and effective: When compared to other programming languages, SQL is simple to learn and use, enabling data analysts to achieve immediate and effective results. Its clear syntax and straightforward commands empower analysts to execute complex computations efficiently.
Handling Structured data: Someone who works as an analyst should be able to comprehend SQL to perform different tasks on structured data. This includes generating and managing data sets kept in structured databases like Oracle, Microsoft SQL Server, and MySQL.
Data Preparation and Wrangling: Data cleaning and preprocessing are among the initial steps involved in data analysis. Unexpectedly, SQL is essential for performing these functions, especially when utilizing big data tools. This facilitates data manipulation in a manner that simplifies analysis while also helping the analyst avoid numerous errors.
A practical, slightly deeper understanding of common terms and syntax used in SQL.
Now that we have defined SQL, its sublanguages, and its benefits, let's dive deeper and deepen our understanding of the basic terms and syntax associated with SQL.
Let's dive into it.
Step 1: Create the Schema
CREATE SCHEMA ecommerce_ke;
USE ecommerce_ke;
Clause Tip: CREATE SCHEMA
defines a namespace for organizing your tables and other database objects.
Step 2: Create the customers
Table
CREATE TABLE customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
city VARCHAR(50),
registration_date DATE
);
Step 3: Create the orders
Table
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
order_date DATE,
order_amount DECIMAL(10, 2),
status VARCHAR(20),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Step 4: Insert Sample Data (with explicit customer_id
)
INSERT INTO customers (customer_id, first_name, last_name, email, city, registration_date) VALUES
(1, 'Akinyi', 'Otieno', 'akinyi.otieno@example.com', 'Nairobi', '2023-01-15'),
(2, 'Kamau', 'Mwangi', 'kamau.mwangi@example.com', 'Mombasa', '2023-02-10'),
(3, 'Wanjiru', 'Kariuki', 'wanjiru.kariuki@example.com', 'Kisumu', '2023-03-05'),
(4, 'Mutiso', 'Nzomo', 'mutiso.nzomo@example.com', 'Nakuru', '2023-03-18'),
(5, 'Naliaka', 'Wekesa', 'naliaka.wekesa@example.com', 'Eldoret', '2023-04-01');
INSERT INTO orders (order_id, customer_id, order_date, order_amount, status) VALUES
(1, 1, '2023-03-01', 5000.00, 'Completed'),
(2, 2, '2023-03-02', 7500.00, 'Completed'),
(3, 3, '2023-03-04', 3000.00, 'Pending'),
(4, 1, '2023-03-10', 2000.00, 'Completed'),
(5, 4, '2023-04-12', 10000.00, 'Cancelled'),
(6, 5, '2023-04-14', 4500.00, 'Completed');
Step 5: Query Examples
1. SELECT
and FROM
SELECT first_name, last_name, city FROM customers;
Clause Tip: SELECT
retrieves specific columns, and FROM
specifies the table source.
Output:
first_name | last_name | city
-----------|-----------|---------
Akinyi | Otieno | Nairobi
Kamau | Mwangi | Mombasa
Wanjiru | Kariuki | Kisumu
Mutiso | Nzomo | Nakuru
Naliaka | Wekesa | Eldoret
2. WHERE
SELECT * FROM orders WHERE status = 'Completed';
Clause Tip: WHERE
filters records based on specified conditions.
Output:
order_id | customer_id | order_date | order_amount | status
---------|-------------|------------|--------------|----------
1 | 1 | 2023-03-01 | 5000.00 | Completed
2 | 2 | 2023-03-02 | 7500.00 | Completed
4 | 1 | 2023-03-10 | 2000.00 | Completed
6 | 5 | 2023-04-14 | 4500.00 | Completed
3. GROUP BY
SELECT city, COUNT(*) AS total_customers FROM customers GROUP BY city;
Clause Tip: GROUP BY
aggregates data by one or more columns.
Output:
city | total_customers
---------|------------------
Nairobi | 1
Mombasa | 1
Kisumu | 1
Nakuru | 1
Eldoret | 1
4. HAVING
SELECT customer_id, COUNT(*) AS total_orders FROM orders GROUP BY customer_id HAVING COUNT(*) > 1;
Clause Tip: HAVING
filters groups created by GROUP BY
, unlike WHERE
which filters rows.
Output:
customer_id | total_orders
-------------|--------------
1 | 2
5. ORDER BY
SELECT * FROM orders ORDER BY order_amount DESC;
Clause Tip: ORDER BY
sorts results by one or more columns.
Output:
order_id | customer_id | order_amount | status
---------|-------------|--------------|---------
5 | 4 | 10000.00 | Cancelled
2 | 2 | 7500.00 | Completed
1 | 1 | 5000.00 | Completed
6 | 5 | 4500.00 | Completed
3 | 3 | 3000.00 | Pending
4 | 1 | 2000.00 | Completed
6. LIMIT
SELECT * FROM customers ORDER BY registration_date DESC LIMIT 3;
Clause Tip: LIMIT
restricts the number of rows returned.
Output:
first_name | registration_date
-----------|-------------------
Naliaka | 2023-04-01
Mutiso | 2023-03-18
Wanjiru | 2023-03-05
Conclusion
SQL is a tool that spans various industries and is also the backbone of the data industry. Therefore, for someone looking to delve into data will have to learn SQL.
Top comments (0)