DEV Community

Cover image for SQL For Data Analytics in a Nutshell
Emmanuel Kiriinya
Emmanuel Kiriinya

Posted on

4

SQL For Data Analytics in a Nutshell

Overview

An almost incomprehensible amount of data is created daily, and figures grow ever-increasingly each year. For future usage, this data has to be stored electronically for analysis and research to produce actionable insights. In this article, I will explain one of the most widely used data management systems, SQL, focusing on Relational Databases.

What is SQL?

SQL stands for Structured Query Language, pronounced popularly as se-qu-el. SQL is a standard language for managing, manipulating, and querying databases basically a language used to communicate with databases. It was developed at IBM in the early 1970s and has undergone tremendous development updates, adapting to the emerging technologies. There are different types of SQL Database Management Systems like PostgreSQL, MySQL, SQL Server, Oracle, SQLite, etc. SQL stores data in a tabular format with columns and rows. The type of data stored in these databases is known as structured data.
SQL can be further broken down into 4 sublanguages for tackling different jobs, namely:

  • Data Definition Language(DML): It is used to modify the structure of the database(used to create and modify tables, views etc). The commands used are CREATE, ALTER, DROP, and TRUNCATE. Examples CREATE TABLE
CREATE TABLE customers (id INT,
 name VARCHAR(255));

Enter fullscreen mode Exit fullscreen mode

ALTER TABLE

ALTER TABLE customers ADD COLUMN email VARCHAR(255);
Enter fullscreen mode Exit fullscreen mode

DROP TABLE

DROP TABLE customers;
Enter fullscreen mode Exit fullscreen mode
  • Data Query Language(DQL): It is used for querying data. DQL command is mainly the SELECT statement.
SELECT *
FROM customers;
Enter fullscreen mode Exit fullscreen mode
  • Data Manipulation Language(DML): It is used to act on the data itself. Commands include INSERT, UPDATE, DELETE, and MERGE. Examples INSERT
INSERT INTO customers (id, name, email) 
(1, 'John Doe', 'john@example.com')
(2, 'Marya Akoth', 'marya.akoth@mail.com');
Enter fullscreen mode Exit fullscreen mode

UPDATE

UPDATE customers 
SET name = 'Jane Doe' WHERE id = 1;
Enter fullscreen mode Exit fullscreen mode

DELETE

DELETE FROM customers WHERE id = 1;
Enter fullscreen mode Exit fullscreen mode
  • Data Control Language(DCL): It is used to manage access to a database and its data. It is usually used by database administrators to manage access in an organisation. Commands include GRANT and REVOKE.

Why is SQL the foundation of Data Analytics?

  1. In the data universe, data engineers and database administrators will use SQL to ensure that everybody in their organisation can access the data they need, sometimes depending on the permissions granted to the users.
  2. Data Scientists will use SQL to query data stored in a database to train and load data into their Machine Learning Models.
  3. Data analysts will use data to query tables of data and derive insights.

What are the benefits of SQL?

  1. Standard Database Management System: SQL is the standard language for retrieving data and interacting with databases. This is because most software applications and scripting languages, such as Python and R, are compatible with SQL.
  2. Scalability: SQL is a perfect tool for processing large datasets compared to tools like spreadsheets, where handling many rows might not be effective. This gives SQL an upper hand since it can easily handle millions of entries. Nevertheless, its scalability serves as a key benefit to meet the needs of modern data analysis.
  3. Easy to learn and effective: When compared to other programming languages, SQL is simple to learn and use, enabling data analysts to achieve immediate and effective results. Its clear syntax and straightforward commands empower analysts to execute complex computations efficiently.
  4. Handling Structured data: Someone who works as an analyst should be able to comprehend SQL to perform different tasks on structured data. This includes generating and managing data sets kept in structured databases like Oracle, Microsoft SQL Server, and MySQL.

  5. Data Preparation and Wrangling: Data cleaning and preprocessing are among the initial steps involved in data analysis. Unexpectedly, SQL is essential for performing these functions, especially when utilizing big data tools. This facilitates data manipulation in a manner that simplifies analysis while also helping the analyst avoid numerous errors.

A practical, slightly deeper understanding of common terms and syntax used in SQL.

Now that we have defined SQL, its sublanguages, and its benefits, let's dive deeper and deepen our understanding of the basic terms and syntax associated with SQL.
Let's dive into it.

Step 1: Create the Schema

CREATE SCHEMA ecommerce_ke;
USE ecommerce_ke;
Enter fullscreen mode Exit fullscreen mode

Clause Tip: CREATE SCHEMA defines a namespace for organizing your tables and other database objects.


Step 2: Create the customers Table

CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    city VARCHAR(50),
    registration_date DATE
);
Enter fullscreen mode Exit fullscreen mode

Step 3: Create the orders Table

CREATE TABLE orders (
    order_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT,
    order_date DATE,
    order_amount DECIMAL(10, 2),
    status VARCHAR(20),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Enter fullscreen mode Exit fullscreen mode

Step 4: Insert Sample Data (with explicit customer_id)

INSERT INTO customers (customer_id, first_name, last_name, email, city, registration_date) VALUES
(1, 'Akinyi', 'Otieno', 'akinyi.otieno@example.com', 'Nairobi', '2023-01-15'),
(2, 'Kamau', 'Mwangi', 'kamau.mwangi@example.com', 'Mombasa', '2023-02-10'),
(3, 'Wanjiru', 'Kariuki', 'wanjiru.kariuki@example.com', 'Kisumu', '2023-03-05'),
(4, 'Mutiso', 'Nzomo', 'mutiso.nzomo@example.com', 'Nakuru', '2023-03-18'),
(5, 'Naliaka', 'Wekesa', 'naliaka.wekesa@example.com', 'Eldoret', '2023-04-01');

INSERT INTO orders (order_id, customer_id, order_date, order_amount, status) VALUES
(1, 1, '2023-03-01', 5000.00, 'Completed'),
(2, 2, '2023-03-02', 7500.00, 'Completed'),
(3, 3, '2023-03-04', 3000.00, 'Pending'),
(4, 1, '2023-03-10', 2000.00, 'Completed'),
(5, 4, '2023-04-12', 10000.00, 'Cancelled'),
(6, 5, '2023-04-14', 4500.00, 'Completed');
Enter fullscreen mode Exit fullscreen mode

Step 5: Query Examples

1. SELECT and FROM

SELECT first_name, last_name, city FROM customers;
Enter fullscreen mode Exit fullscreen mode

Clause Tip: SELECT retrieves specific columns, and FROM specifies the table source.

Output:

first_name | last_name | city
-----------|-----------|---------
Akinyi     | Otieno    | Nairobi
Kamau      | Mwangi    | Mombasa
Wanjiru    | Kariuki   | Kisumu
Mutiso     | Nzomo     | Nakuru
Naliaka    | Wekesa    | Eldoret
Enter fullscreen mode Exit fullscreen mode

2. WHERE

SELECT * FROM orders WHERE status = 'Completed';
Enter fullscreen mode Exit fullscreen mode

Clause Tip: WHERE filters records based on specified conditions.

Output:

order_id | customer_id | order_date | order_amount | status
---------|-------------|------------|--------------|----------
1        | 1           | 2023-03-01 | 5000.00      | Completed
2        | 2           | 2023-03-02 | 7500.00      | Completed
4        | 1           | 2023-03-10 | 2000.00      | Completed
6        | 5           | 2023-04-14 | 4500.00      | Completed
Enter fullscreen mode Exit fullscreen mode

3. GROUP BY

SELECT city, COUNT(*) AS total_customers FROM customers GROUP BY city;
Enter fullscreen mode Exit fullscreen mode

Clause Tip: GROUP BY aggregates data by one or more columns.

Output:

city     | total_customers
---------|------------------
Nairobi  | 1
Mombasa  | 1
Kisumu   | 1
Nakuru   | 1
Eldoret  | 1
Enter fullscreen mode Exit fullscreen mode

4. HAVING

SELECT customer_id, COUNT(*) AS total_orders FROM orders GROUP BY customer_id HAVING COUNT(*) > 1;
Enter fullscreen mode Exit fullscreen mode

Clause Tip: HAVING filters groups created by GROUP BY, unlike WHERE which filters rows.

Output:

customer_id | total_orders
-------------|--------------
1            | 2
Enter fullscreen mode Exit fullscreen mode

5. ORDER BY

SELECT * FROM orders ORDER BY order_amount DESC;
Enter fullscreen mode Exit fullscreen mode

Clause Tip: ORDER BY sorts results by one or more columns.

Output:

order_id | customer_id | order_amount | status
---------|-------------|--------------|---------
5        | 4           | 10000.00     | Cancelled
2        | 2           | 7500.00      | Completed
1        | 1           | 5000.00      | Completed
6        | 5           | 4500.00      | Completed
3        | 3           | 3000.00      | Pending
4        | 1           | 2000.00      | Completed
Enter fullscreen mode Exit fullscreen mode

6. LIMIT

SELECT * FROM customers ORDER BY registration_date DESC LIMIT 3;
Enter fullscreen mode Exit fullscreen mode

Clause Tip: LIMIT restricts the number of rows returned.

Output:

first_name | registration_date
-----------|-------------------
Naliaka    | 2023-04-01
Mutiso     | 2023-03-18
Wanjiru    | 2023-03-05
Enter fullscreen mode Exit fullscreen mode

Conclusion

SQL is a tool that spans various industries and is also the backbone of the data industry. Therefore, for someone looking to delve into data will have to learn SQL.

Heroku

Build AI apps faster with Heroku.

Heroku makes it easy to build with AI, without the complexity of managing your own AI services. Access leading AI models and build faster with Managed Inference and Agents, and extend your AI with MCP.

Get Started

Top comments (0)

Heroku

Build AI apps faster with Heroku.

Heroku makes it easy to build with AI, without the complexity of managing your own AI services. Access leading AI models and build faster with Managed Inference and Agents, and extend your AI with MCP.

Get Started

👋 Kindness is contagious

Discover fresh viewpoints in this insightful post, supported by our vibrant DEV Community. Every developer’s experience matters—add your thoughts and help us grow together.

A simple “thank you” can uplift the author and spark new discussions—leave yours below!

On DEV, knowledge-sharing connects us and drives innovation. Found this useful? A quick note of appreciation makes a real impact.

Okay