SQL Explained: Normal Forms

Author:Murphy  |  View: 27520  |  Time: 2025-03-22 21:32:36

Normalisation in a database system refers to the process of structuring your data in such a way as to reduce data redundancy and improve data integrity.

It involves breaking down a database into multiple tables and defining relationships between the tables based on a set of rules called normal forms.

The goal is to eliminate redundant data and ensure that data inter-dependencies make sense.

Although there are many different types of normal forms, for most database systems, applying the first three (1NF, 2NF, and 3NF) is sufficient to achieve a well-structured and optimised Database Design.

In this article, we'll start off with a very simple single table that is unnormalised, then apply each of the three normal forms to it and any subsequent tables that come out of the process and see what our final design looks like at the end.

Before that, here's a brief explanation of 1NF, 2NF and 3NF.

First Normal Form (1NF)

For a table to be in 1NF it must satisfy the following criteria.

  • Each row must be uniquely identifiable. This may be done by defining a primary key on one or more columns of the table.
  • Any repeating groups should be removed by separating them into their own separate tables.
  • Ensure that values in each column are atomic and contain a single value.

Second Normal Form (2NF)

For a table to be in 2NF it must satisfy the following criteria.

  • It must already be in 1NF.
  • Partial dependencies are removed by removing any non-key attributes that depend only on part of a composite primary key.
  • All non-key attributes must depend on the entire primary key, not just part of it.

Third Normal Form (3NF)

For a table to be in 3NF it must satisfy the following criteria.

  • It must already be in 2NF.
  • Transitive dependencies are eliminated by removing any non-key attributes that depend on other non-key attributes.
  • All non-key attributes must depend directly on the primary key, not on other non-key attributes.

Plain text descriptions of normalisation like those above can be tricky to get your head around, but as we go through some worked examples that show their application, you should get a clearer idea of the steps involved in normalisation. Not only that, but it should become clear what the advantages of doing normalisation are in the first place.

Let's go through some worked examples of applying what we've learned.

Setting up a test environment

You can use whatever database system you're comfortable with for testing e.g SQLite, Postgres, MySQL etc … but I'll be using Oracle's live SQL website to run my tests. I explained how to get access to it and use this service in a previous article I wrote on using Grouping Sets, Rollup and Cube in SQL. It's completely free to set up and use. The link to that article is below.

SQL Explained: Grouping Sets, Rollup, and Cube

Sample table creation & population

Initially, we only need one table for our examples and our data set is based on Stock price information. It contains a TransactionID, a StockTicker (identifier) of the stock, its price, how many shares were bought/sold (volume) and to which date the price and volume relate.

Note that the to_date(…) syntax shown below is specific to Oracle SQL. If you're not using Oracle check your SQL's documentation for the equivalent syntax.

DROP TABLE StockTransactions;
CREATE TABLE StockTransactions (
    TransactionID INT,
    StockTicker VARCHAR2(10),
    PriceDate DATE,
    Prices VARCHAR2(20),
    Volumes VARCHAR2(20),
    BrokerName VARCHAR2(50),
    BrokerContact VARCHAR2(15),
    BrokerAddress VARCHAR2(100)
);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Prices, Volumes, BrokerName, BrokerContact,BrokerAddress) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), '150.00, 152.00', '1000, 1200', 'Broker 1', '123-456-7890','123 Wall St, NY');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Prices, Volumes, BrokerName, BrokerContact,BrokerAddress) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), '250.00, 255.00', '2000, 2100', 'Broker 2', '234-567-8901','456 Market St, NY');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Prices, Volumes, BrokerName, BrokerContact,BrokerAddress) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), '153.00, 154.00', '1100, 1300', 'Broker 1', '123-456-7890','123 Wall St, NY');

select * from stocktransactions;

+---------------+-------------+------------+-------------------+----------------+------------+---------------+------------------+
| TransactionID | StockTicker | PriceDate  | Prices            | Volumes        | BrokerName | BrokerContact |BrokerAddress     |
+---------------+-------------+------------+-------------------+----------------+------------+---------------+------------------+
| 1             | AAPL        | 2023-01-01 | 150.00, 152.00    | 1000, 1200     | Broker 1   | 123-456-7890  |123 Wall St, NY   |
| 2             | MSFT        | 2023-01-01 | 250.00, 255.00    | 2000, 2100     | Broker 2   | 234-567-8901  |456 Market St, NY |
| 3             | AAPL        | 2023-01-02 | 153.00, 154.00    | 1100, 1300     | Broker 1   | 123-456-7890  |123 Wall St, NY   |
+---------------+-------------+------------+-------------------+----------------+------------+---------------+------------------+

This is our starting point, a completely unnormalised table.

Applying 1NF

We can see right away that this table has a few "issues". Let's now apply 1NF to it by getting rid of repeating groups in the Prices and Volumescolumns by ensuring each column contains only a single value. We do this by restructuring the table so that each price and volume value appears in separate rows. Also, let's correct the data names and types of these columns and add our primary key.

DROP TABLE  StockTransactions;

CREATE TABLE StockTransactions (
    TransactionID INT,
    StockTicker VARCHAR2(10),
    PriceDate DATE,
    Price DECIMAL(10, 2),
    Volume INT,
    BrokerName VARCHAR2(50),
    BrokerContact VARCHAR2(15),
    BrokerAddress VARCHAR2(100),
    PRIMARY KEY (TransactionID, StockTicker, PriceDate, Price, Volume)
);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume, BrokerName, BrokerContact,BrokerAddress) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 150.00, 1000, 'Broker 1', '123-456-7890','123 Wall St, NY');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume, BrokerName, BrokerContact,BrokerAddress) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 152.00, 1200, 'Broker 1', '123-456-7890','123 Wall St, NY');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume, BrokerName, BrokerContact,BrokerAddress) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 250.00, 2000, 'Broker 2', '234-567-8901','456 Market St, SF');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume, BrokerName, BrokerContact,BrokerAddress) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 255.00, 2100, 'Broker 2', '234-567-8901','456 Market St, SF');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume, BrokerName, BrokerContact,BrokerAddress) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), 153.00, 1100, 'Broker 1', '123-456-7890','123 Wall St, NY');

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume, BrokerName, BrokerContact,BrokerAddress) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), 154.00, 1300, 'Broker 1', '123-456-7890','123 Wall St, NY');

select * from stocktransactions;

+---------------+-------------+------------+--------+--------+------------+---------------+---------------------+
| TransactionID | StockTicker | PriceDate  | Price  | Volume | BrokerName | BrokerContact | BrokerAddress       |
+---------------+-------------+------------+--------+--------+------------+---------------+---------------------+
| 1             | AAPL        | 2023-01-01 | 150.00 | 1000   | Broker 1   | 123-456-7890  | 123 Wall St, NY     |
| 1             | AAPL        | 2023-01-01 | 152.00 | 1200   | Broker 1   | 123-456-7890  | 123 Wall St, NY     |
| 2             | MSFT        | 2023-01-01 | 250.00 | 2000   | Broker 2   | 234-567-8901  | 456 Market St, SF   |
| 2             | MSFT        | 2023-01-01 | 255.00 | 2100   | Broker 2   | 234-567-8901  | 456 Market St, SF   |
| 3             | AAPL        | 2023-01-02 | 153.00 | 1100   | Broker 1   | 123-456-7890  | 123 Wall St, NY     |
| 3             | AAPL        | 2023-01-02 | 154.00 | 1300   | Broker 1   | 123-456-7890  | 123 Wall St, NY     |
+---------------+-------------+------------+--------+--------+------------+---------------+---------------------+

That looks better and the table is now in 1NF. Ok, moving on to 2NF.

Applying 2NF

Recall that for 2NF, all non-key attributes must be fully dependent on the primary key and we have,

  • Primary Key: (TransactionID, StockTicker, PriceDate, Price, Volume)
  • Non-Key Attributes: BrokerName, BrokerContact, BrokerAddress

The BrokerName, BrokerContact and BrokerAddress fields depend only on StockTicker, not on the entire primary key. This is a partial dependency and to fix it, we need to move BrokerName ,BrokerContact and the BrokerAddressfields to a separate table where StockTicker is the primary key. Once that's done, we can remove those three columns from our original StockTransactions table and set up a foreign key on it.

First, we create our new Brokers table and populate it.

drop table Brokers;
CREATE TABLE Brokers (
    StockTicker VARCHAR2(10) PRIMARY KEY,
    BrokerName VARCHAR2(50),
    BrokerContact VARCHAR2(15),
    BrokerAddress VARCHAR2(100)
);

INSERT INTO Brokers (StockTicker, BrokerName, BrokerContact, BrokerAddress) VALUES
('AAPL', 'Broker 1', '123-456-7890', '123 Wall St, NY');

INSERT INTO Brokers (StockTicker, BrokerName, BrokerContact, BrokerAddress) VALUES
('MSFT', 'Broker 2', '234-567-8901', '456 Market St, SF');

select * from brokers;

+-------------+------------+---------------+---------------------+
| StockTicker | BrokerName | BrokerContact | BrokerAddress       |
+-------------+------------+---------------+---------------------+
| AAPL        | Broker 1   | 123-456-7890  | 123 Wall St, NY     |
| MSFT        | Broker 2   | 234-567-8901  | 456 Market St, SF   |
+-------------+------------+---------------+---------------------+

Now for the StockTransactions table changes.

DROP TABLE StockTransactions;
CREATE TABLE StockTransactions (
    TransactionID INT,
    StockTicker VARCHAR2(10),
    PriceDate DATE,
    Price DECIMAL(10, 2),
    Volume INT,
    PRIMARY KEY (TransactionID, StockTicker, PriceDate, Price, Volume),
    FOREIGN KEY (StockTicker) REFERENCES Brokers(StockTicker)
);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 150.00, 1000);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 152.00, 1200);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 250.00, 2000);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 255.00, 2100);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), 153.00, 1100);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), 154.00, 1300);

select * from stocktransactions;

+---------------+-------------+------------+--------+--------+
| TransactionID | StockTicker | PriceDate  | Price  | Volume |
+---------------+-------------+------------+--------+--------+
| 1             | AAPL        | 2023-01-01 | 150.00 | 1000   |
| 1             | AAPL        | 2023-01-01 | 152.00 | 1200   |
| 2             | MSFT        | 2023-01-01 | 250.00 | 2000   |
| 2             | MSFT        | 2023-01-01 | 255.00 | 2100   |
| 3             | AAPL        | 2023-01-02 | 153.00 | 1100   |
| 3             | AAPL        | 2023-01-02 | 154.00 | 1300   |
+---------------+-------------+------------+--------+--------+

Great, our system is now in 2NF.

Applying 3NF

To achieve 3NF all non-key attributes must be directly dependent on the primary key and not on any other non-key attribute. This is sometimes referred to as Transitive Dependency.

Our StockTransactions table is fine in this respect, but we can see that in our new Brokers table, the BrokerAddress and BrokerContactfields only depend on the BrokerName column and not on the StockTicker, so we should move that broker data into its own table.


DROP TABLE BrokerDetails CASCADE CONSTRAINTS;

CREATE TABLE BrokerDetails (
    BrokerName VARCHAR2(50) PRIMARY KEY,
    BrokerContact VARCHAR2(15),
    BrokerAddress VARCHAR2(100)
);

INSERT INTO BrokerDetails (BrokerName, BrokerContact, BrokerAddress) VALUES
('Broker 1', '123-456-7890', '123 Wall St, NY');

INSERT INTO BrokerDetails (BrokerName, BrokerContact, BrokerAddress) VALUES
('Broker 2', '234-567-8901', '456 Market St, SF');

-- Brokers Table
DROP TABLE Brokers CASCADE CONSTRAINTS;

CREATE TABLE Brokers (
    StockTicker VARCHAR2(10) PRIMARY KEY,
    BrokerName VARCHAR2(50),
    FOREIGN KEY (BrokerName) REFERENCES BrokerDetails(BrokerName)
);

INSERT INTO Brokers (StockTicker, BrokerName) VALUES
('AAPL', 'Broker 1');

INSERT INTO Brokers (StockTicker, BrokerName) VALUES
('MSFT', 'Broker 2');

-- StockTransactions Table
DROP TABLE stocktransactions CASCADE CONSTRAINTS;

CREATE TABLE StockTransactions (
    TransactionID INT,
    StockTicker VARCHAR2(10),
    PriceDate DATE,
    Price DECIMAL(10, 2),
    Volume INT,
    PRIMARY KEY (TransactionID, StockTicker, PriceDate, Price, Volume),
    FOREIGN KEY (StockTicker) REFERENCES Brokers(StockTicker)
);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 150.00, 1000);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(1, 'AAPL', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 152.00, 1200);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 250.00, 2000);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(2, 'MSFT', TO_DATE('2023-01-01', 'YYYY-MM-DD'), 255.00, 2100);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), 153.00, 1100);

INSERT INTO StockTransactions (TransactionID, StockTicker, PriceDate, Price, Volume) VALUES
(3, 'AAPL', TO_DATE('2023-01-02', 'YYYY-MM-DD'), 154.00, 1300);

And our final tables look like this,

SELECT * FROM BrokerDetails;

SELECT * FROM Brokers;

SELECT * FROM StockTransactions;

BrokerDetails
+------------+---------------+---------------------+
| BrokerName | BrokerContact | BrokerAddress       |
+------------+---------------+---------------------+
| Broker 1   | 123-456-7890  | 123 Wall St, NY     |
| Broker 2   | 234-567-8901  | 456 Market St, SF   |
+------------+---------------+---------------------+

Brokers
+-------------+------------+
| StockTicker | BrokerName |
+-------------+------------+
| AAPL        | Broker 1   |
| MSFT        | Broker 2   |
+-------------+------------+

StaockTransactions
+---------------+-------------+------------+--------+--------+
| TransactionID | StockTicker | PriceDate  | Price  | Volume |
+---------------+-------------+------------+--------+--------+
| 1             | AAPL        | 2023-01-01 | 150.00 | 1000   |
| 1             | AAPL        | 2023-01-01 | 152.00 | 1200   |
| 2             | MSFT        | 2023-01-01 | 250.00 | 2000   |
| 2             | MSFT        | 2023-01-01 | 255.00 | 2100   |
| 3             | AAPL        | 2023-01-02 | 153.00 | 1100   |
| 3             | AAPL        | 2023-01-02 | 154.00 | 1300   |
+---------------+-------------+------------+--------+--------+

Summary

In conclusion, I've highlighted the three most common normal forms used in modern database systems and shown examples of their use. You should strive to get your database system set up in such a way as to be as normalised as possible.

Although there are some downsides to this process, including more complexity and possibly some of your queries being less performant than they otherwise would be, the gains in normalising your data far outweigh the cons.

_OK, that's all for me just now. I hope you found this article useful. If you did, please check out my profile page at this link. From there, you can see my other published stories and subscribe to get notified when I post new content._

I know times are tough and wallets constrained, but if you got real value from this article, please consider buying me a wee dram.

If you liked this content, I think you'll find these articles interesting too.

SQL Explained: Common Table Expressions

Build a MVP version of Google Translate with Groq and Llama3

Tags: Data Engineering Data Science Database Design Normal Form Sql

Comment