Beginners Guide to Normalization

Akash Jain
4 min readJan 10, 2021

--

Introduction

In the Database, the table is related data stored at the same location. That is Data related to an employee will be stored at the employee table and data related to department stored at department table. This is a piece of related information. If we want to access which department a particular employee belongs to, we perform selection on the join of these two tables. That is the employee and the department table. That constitutes a new table including both table’s information. But why not do joining at the creation of tables itself and combine these two tables Into a universal table for employee and department. After all the selection of an employee may require data about there department frequently if not every time. But there is a catch. Let us see in the following section.

Disadvantages of Universal table

Redundancy

Assume there are around a million employees in your organization and about 100 thousand out of them belongs to the IT dept. Then for every row for those 100 thousand records, we have to duplicate the information i.e waste of space in the database. This term of duplication of data in DBMS is known as Redundancy.

Anomalies

Anomalies are kind of similar to disasters in the database. This might lead to some pretty big problems. Let us now see three types of Anomalies and what kind of damage they do to the database.

Insert Anomaly : Wrong information might get entered into a table because it is universal. Suppose for the two employees department-ID is the same but department-name entered is different. As you can see there is a problem, we are introducing human errors into a system.

Deletion Anomaly: If suppose we want to delete all the employees related to one department. Then the department itself will be deleted with all its information such as Dept. ID., Dept. Head etc.

Update Anomaly If suppose we want to change the department name from IT to CSIT, then for all rows it has to be changed which is again a lot of overhead and is very error-prone.

All of the above problems will arise if we tried to reduce the overhead of designing the database and combined all tables into one. Hence we split tables into as many tables as we possibly can. This splitting of a table into many tables is called normalization.

Normalization

We know that in DBMS smallest table is made up of two columns that are two attributes. If we have all tables reduced to two attributes, then searching some info into a table may require going through all the tables. Hence even though two attributes are the best architecture for the database. Some bigger tables might also work fine and lessen the workload of DBMS admin of joining this table to get information. Hence this splitting of the table into smaller tables in the most viable way is called normalization.

Normal Forms in Normalization

Normal forms are used to apply normalization into a database. In this blog, we’ll see 4 normal forms. i.e. 1NF, 2NF, 3NF, BCNF.

  1. 1NF: The first normal form states that values in a table should be atomic. ie. neither mult-ivalued nor composite attributes are allowed in the table. The multivalued attribute can be explained easily by the example of the mobile number attribute. There can be multiple mobile numbers for the same user that is multivalued. And The composite attribute can be the name of a person, which can be split into three different attributes, that is the first name, middle name, and last name of a person.
  2. 2NF: Second Normal Form states that Table should be in 1NF and any kind of partial dependencies is not allowed in a table. i.e. A table “T” is in 2NF if every non-prime attribute in T is not partially dependent on any key of T. That is non key attributes should not derive any other attribute by themselves.
  3. 3NF: The third normal form states that It should be in 2NF and any kind of transitive dependencies are not allowed in a table. That is if the primary key of a table is deriving some attribute, then that derived attribute which is a non key attribute should not derive some other attribute in the same table. In such case normalize the table.
  4. BCNF: BCNF states that It should be in 3NF and only the primary key can derive all other attributes. i.e. A Table “T” is in BCNF if whenever a nontrivial functional dependencies x->A holds in T, then X is always a superkey for table T. i.e. all FD should only be derived from the super key. Here FD is an attribute to attribute relation that is. If attribute ‘a’ can derive attribute ‘b’ and ‘c’ then a->bc is an FD.

Conclusion

Normalization is a very important topic in process of Designing large-scale and efficient databases. Normalization helps us to minimize the error in our design and build future proof database systems. This blog is only meant to introduce you to a topic. There is more to normalization than just theory. It has its own mathematics for deriving tables into normal form. Please check those topics if you are interested. Thank you for reading.

--

--