DEV Community

Cover image for SQL NORMALIZATION
Paulet Wairagu
Paulet Wairagu

Posted on

SQL NORMALIZATION

Introduction:
When designing a relational database, it's important to focus on the concept of database normalization. This process is vital for efficient data organization and maintaining accuracy. In this article, we will explore the initial three normal forms, including 1NF, 2NF, and 3NF. We'll also provide examples written in SQL to help clarify each level of normalization.

1. First Normal Form (1NF):
The first step in the normalization process is 1NF. It requires each column to have a unique name and to contain atomic values. Here's an example:

CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(50),
    email VARCHAR(100),
    phone_numbers VARCHAR(255)
);
Enter fullscreen mode Exit fullscreen mode

In the above table, the "phone_numbers" column violates 1NF because it can contain multiple phone numbers in a single cell. To achieve 1NF, you'd create a separate table for phone numbers and link it to the Customers table.

CREATE TABLE PhoneNumbers (
    phone_id INT PRIMARY KEY,
    customer_id INT,
    phone_number VARCHAR(15),
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
Enter fullscreen mode Exit fullscreen mode

In the "PhoneNumbers" table:

  • "phone_id" is a unique identifier for each phone number.
  • "customer_id" is a foreign key that references the "customer_id" in the "Customers" table, establishing a relationship between the two tables.
  • "phone_number" contains individual phone numbers, ensuring that each entry is atomic.

The "PhoneNumbers" table is linked to the "Customers" table through the "customer_id" foreign key, creating two separate tables and adhering to 1NF by separating multi-valued attributes into a related table.

2. Second Normal Form (2NF):
2NF ensures that each non-key column is fully functionally dependent on the primary key by splitting tables.

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

CREATE TABLE Products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode

To achieve 2NF, you would split the Orders table into two: one for orders and another for order details. The "quantity" column would then depend on both "order_id" and "product_id."

1. Orders Table:

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    -- Other order-related columns
);
Enter fullscreen mode Exit fullscreen mode

In this table, we store general order information like the order date, customer_id, and other order-specific details. The "order_id" column serves as the primary key.

2. OrderDetails Table:

CREATE TABLE OrderDetails (
    order_detail_id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    -- Other order detail columns
);
Enter fullscreen mode Exit fullscreen mode

In the "OrderDetails" table:

  • "order_detail_id" is a unique identifier for each order detail.
  • "order_id" is a foreign key that references the "order_id" in the "Orders" table, establishing a relationship between the two tables.
  • "product_id" is a foreign key that references the "product_id" from the "Products" table (assuming you have a "Products" table to store product information).
  • "quantity" contains the quantity of a specific product in a particular order.

By splitting the original "Orders" table into these two tables, you ensure that each table contains information relevant to a specific entity (orders or order details). This separation adheres to 2NF because now each non-key column (like "quantity") is fully functionally dependent on the primary key of its respective table.

3. Third Normal Form (3NF):
3NF takes the concept further by ensuring that non-key columns are not transitively dependent on the primary key. It often involves creating additional tables. Consider this example:

CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(50),
    department_id INT,
    department_name VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode

In this table, "department_name" depends on "department_id," which itself depends on the primary key. To achieve 3NF, you'd create a separate "Departments" table and link it to the "Employees" table.

1. Employees Table

CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(50),
    department_id INT,
    -- Other employee-related columns
);
Enter fullscreen mode Exit fullscreen mode

In this table, we store employee information, including their name and department_id, which represents the department to which each employee belongs. The "employee_id" column serves as the primary key.

2. Departments Table:

CREATE TABLE Departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(50),
    -- Other department-related columns
);
Enter fullscreen mode Exit fullscreen mode

In the "Departments" table:

  • "department_id" is a unique identifier for each department.
  • "department_name" contains the name of each department.

Now, to establish a relationship between these tables:

ALTER TABLE Employees
ADD FOREIGN KEY (department_id) REFERENCES Departments(department_id);
Enter fullscreen mode Exit fullscreen mode

By creating the "Departments" table and linking it to the "Employees" table through the "department_id" foreign key, you adhere to 3NF. This separation ensures that non-key columns (such as "department_name") are not transitively dependent on the primary key of the "Employees" table, as they are now directly dependent on the "department_id" in the "Departments" table.

Conclusion:
Database normalization is a crucial aspect of database design, helping maintain data integrity and efficiency. By following the principles of 1NF, 2NF, and 3NF, you can design a robust database structure that minimizes redundancy and maximizes data reliability.

In practice, normalization often involves more complex scenarios, but understanding these fundamental concepts is essential for building efficient and maintainable databases.

Hackerrank Practice Questions on Normalization

  1. https://www.hackerrank.com/challenges/database-normalization-1-1nf/problem?h_r=internal-search

  2. https://www.hackerrank.com/challenges/database-normalization-3/problem?h_r=internal-search

Image of Datadog

Diagram Like A Pro

Bring your cloud architecture to life with expert tips from AWS and Datadog. In this ebook, AWS Solutions Architects Jason Mimick and James Wenzel reveal pro strategies for building clear, compelling diagrams that make an impact.

Get the Guide

Top comments (0)

ACI image

ACI.dev: Fully Open-source AI Agent Tool-Use Infra (Composio Alternative)

100% open-source tool-use platform (backend, dev portal, integration library, SDK/MCP) that connects your AI agents to 600+ tools with multi-tenant auth, granular permissions, and access through direct function calling or a unified MCP server.

Check out our GitHub!

πŸ‘‹ Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay