Normalization is a database schema design technique to reduce redundancy means having copies of the same data in the database. This concept was introduced by IBM researcher E.F. Codd in the 1970s. Let’s take an example of a college’s database where the database contains table ‘A’, having the following attributes, studentId, studentName, courseId, courseName, facultyId, facultySalary. There can be a case that one student gets rusticated from the college. And after his leaving his data will be deleted from table ‘A’ but suppose he was the only student who was enrolled in the Computer Network course, if we will delete his data then that course information will also be lost. This problem is also called Anomaly But what does it exactly mean. Let’s go deep into this concept.
Anomalies are the problems that occur due to redundancy in the database. These anomalies are of three types Insertion Anomalies, Deletion Anomalies, Update Anomalies.
Insertion Anomalies occur when insertion operation is performed where insertion is not the primary key. Let’s take the same example of the college’s database if we need to insert a new course in the table ‘A’, for this studentId must be inserted as studentId is the primary key. This would lead to a wrong entry in the database because there is no new admission. This is called an insertion anomaly.
Update Anomalies occur when the update operation is performed where the update is not the primary key, which means if we want to update one attribute which is not the primary key, an update operation will be performed as many times this attribute is existing in the table which should not be the case. Let’s take the same example, if we want to change the salary for one facultyId, update operation will be performed all over the table wherever that facultyId is existing, which should not be the case, update operation should be performed for once for updating all the values for that facultyId.
Deletion Anomalies occur when deletion operation is performed for some specific key but not the primary key. Due to this, some important information is also deleted. Let’s take the same example of a college database and suppose If one student gets rusticated from the college who was the only student of the Computer Networks course. if we will delete his data then the course information will also be lost. This is called deletion anomaly.
All these anomalies can be removed by dividing the tables into sub tables which are also called normalization of the database. But how we can normalize the database. There are several ways to normalize the database and the types of normalization are 1NF,2NF,3NF, BCNF, etc. These are called forms.
A. 1NF: For the First Form, table should not contain multi-valued attributes. Let’s take the same example of the college database if one student is enrolled for more than one course, there will be more than one value for the course attribute, and that is not a normalized form. For normalizing the database we can divide the table into two or more sub-tables having the primary key as a foreign key and the courseName attribute, and by doing this we can insert more than one value for one studentId.
B. 2NF: There are two conditions for the second form, table or relation must be in the first normal form(1NF) and should not contain any partial dependency means the non-prime attributes should be fully functional dependent on the candidate key. Let’s take the same example of the college database where one more attribute location is existing, but it can be similar for more than one student. For achieving 2NF there should be no partial dependency means there should be no nonprime attribute is dependent on the proper superset of the candidate key.
As shown below AB is the only candidate key and there is no partial dependency because there is no subset of AB determining any nonprime attribute.
Consider following functional dependencies in relation R (A, B, C, D )AB -> C [A and B together determine C]
BC -> D [ B and C determines D ]
3. 3NF: There are two conditions for the third form, the table or relation must be in 2NF and there should be no transitive dependency means no non-prime attribute is determined by another non-prime attribute in the table. If A->B then for no transitive dependency any one of these two relations must follow:
- A is a candidate key or Super key.
- B is a prime attribute.
Consider relation R(A, B, C, D)
AB -> CD,
D -> A,
As shown AB-> CD where AB is Candidate key and for D->A, D is not a super key but it is a prime attribute. So this will be considered in the 3NF form.
4. BCNF: 3NF is an adequate normal form for a relational database, still, this (3NF) normal form may not remove 100% redundancy. For solving this problem BCNF considers only the first condition of the 3NF form which means all the attributes can be derived from the primary key.
Consider relation R(A, B, C, D)A -> C,
A -> B,
B -> D,
B -> A
[A and B is the candidate key]
As shown A, B is the candidate key and all other attributes can be derived from A, B. Thus this relation is in BCNF form.