In database design, database normalization is the process of structuring tables in a relational database system. A well-normalized database can reduce data redundancy and eliminate data anomalies, though it may sacrifice some querying performance: it involves decomposing a larger table into smaller, more manageable tables, and then establishing relationships between them.

There is a series of structured guidelines, called normal forms, that help organize data to reduce redundancy, improve data integrity and prevent anomalies:

First Normal Form (1NF): Each column contains atomic values and prohibits arrays in a cell; a table is two-dimensional. Also, each row should be uniquely identified.
Second Normal Form (2NF): Extends 1NF by requiring that all non-prime columns (columns not part of the primary key) have full functional dependencies on the entire primary key columns, not just part of it.
Third Normal Form (3NF): Further builds upon 2NF by eliminating any transitive dependencies. This ensures that non-prime columns depend only on the primary key and not on other non-key columns.
Boyce-Codd Normal Form (BCNF): Similar to 3NF but stricter, aimed at eliminating structures that could introduce rare logical inconsistencies in the data. It ensures that every non-trivial functional dependency of columns is on a superkey.
Forth Normal Form (4NF): Enhances BCNF by addressing certain types of complex multi-valued dependencies that are not handled by 3NF, ensuring every multi-valued dependency is dependent on a superkey.
Fifth Normal Form (5NF): 4NF plus every join dependency for the table is a result of the candidate keys. It addresses where certain join dependencies exist that are not covered by other normal forms.

While achieving higher normal forms is desired depending on specific needs, in many practical cases, 3NF is typically considered as fully-normalized.

1NF: First Normal Form

The First Normal Form (1NF) is the most basic level of normalization. It ensures that a table's structure is simple and that all of its data are stored in a way that allows for efficient querying. The key requirement for a table to be in 1NF is that all its columns must contain atomic values and each row must be uniquely identified. Let's look at an example to illustrate the concept of 1NF.

Job History Table:

Employee ID	Start Date	End Date	Job ID	Department ID	Salary
101	21-SEP-07, 28-OCT-11	27-OCT-11, 15-MAR-15	AC_ACCOUNT, AC_MGR	110	17000, 17000
102	13-JAN-11	24-JUL-16	IT_PROG	60	17000
114	24-MAR-16	31-DEC-17	ST_CLERK	50	11000
122	01-JAN-17	31-DEC-17	ST_CLERK	50	7900
176	24-MAR-16, 01-JAN-17	31-DEC-16, 31-DEC-17	SA_REP, SA_MAN	80	8600, 8600
200	17-SEP-05, 01-JUL-12	17-JUN-11, 31-DEC-16	AD_ASST, AC_ACCOUNT	110	4400, 4400
201	17-FEB-14	19-DEC-17	MK_REP	20	13000

This table violates 1NF. Three columns - Start Date, End Date, and Job ID - contain non-atomic values that could be further separated by commas. As a result, the Employee ID is not a unique identifier for each row: there can be multiple Start Date, End Date, and Job ID values associated with an Employee ID value.

To achieve 1NF, you should restructure this table, ensuring each column contains indivisible values. Specifically, separate the inner arrays in the Start Date, End Date, and Job ID columns into individual rows.

Job History Table (1NF):

Employee ID	Start Date	End Date	Job ID	Department ID	Salary
101	21-SEP-07	27-OCT-11	AC_ACCOUNT	110	17000
101	28-OCT-11	15-MAR-15	AC_MGR	110	17000
102	13-JAN-11	24-JUL-16	IT_PROG	60	17000
114	24-MAR-16	31-DEC-17	ST_CLERK	50	11000
122	01-JAN-17	31-DEC-17	ST_CLERK	50	7900
176	24-MAR-16	31-DEC-16	SA_REP	80	8600
176	01-JAN-17	31-DEC-17	SA_MAN	80	8600
200	17-SEP-05	17-JUN-11	AD_ASST	110	4400
200	01-JUL-12	31-DEC-16	AC_ACCOUNT	110	4400
201	17-FEB-14	19-DEC-17	MK_REP	20	13000

This table now satisfies 1NF. Each cell in any of the columns contains indivisible atomic values. Also, the combination of the three columns (Employee ID, Start Date, End Date) can serve as a unique identifier for each row: there is no two rows where all three column values are equal. This simplifies queries as you don't need complex logic to handle multiple values in a cell.

2NF: Second Normal Form

In relational databases, functional dependency is the fundamental concept that describes the relationship between columns in a table. It is a constraint that dictates how the values in one column determine the values in another column. Put simply, a functional dependency states that the value of one set of columns uniquely determines the value of another column within the same table. For example, in the 1NF table we've seen earlier, one set of columns (Employee ID, Start Date, End Date) uniquely determines the value of Job ID: any one employee can hold one and only one job title during a given period.

The Second Normal Form (2NF) is built upon the foundation of 1NF. 2NF eliminates any partial dependencies in a table by ensuring that all non-prime columns are fully functionally dependent on the entire primary key. Here, partial dependency refers to a situation where a non-key column (column that is not part of the primary key) depends on only a part of the primary key, rather than on the entire primary key. For example, in the 1NF table mentioned earlier, the Salary column depends only on Employee ID, not on the entire primary key (which is the combination of Employee ID, Start Date, and End Date.) This is the scenario where we refer to as a partial dependency occurs.

To move this table into 2NF, we should decompose it into two separate tables as follows.

Job History Table (2NF):

Employee ID	Start Date	End Date	Job ID	Department ID
101	21-SEP-07	27-OCT-11	AC_ACCOUNT	110
101	28-OCT-11	15-MAR-15	AC_MGR	110
102	13-JAN-11	24-JUL-16	IT_PROG	60
114	24-MAR-16	31-DEC-17	ST_CLERK	50
122	01-JAN-17	31-DEC-17	ST_CLERK	50
176	24-MAR-16	31-DEC-16	SA_REP	80
176	01-JAN-17	31-DEC-17	SA_MAN	80
200	17-SEP-05	17-JUN-11	AD_ASST	90
200	01-JUL-12	31-DEC-16	AC_ACCOUNT	110
201	17-FEB-14	19-DEC-17	MK_REP	20

Salary Table (2NF):

Employee ID	Salary
101	17000
101	17000
102	17000
114	11000
122	7900
176	8600
176	8600
200	4400
200	4400
201	13000

Now, both tables are in 2NF. The 1NF and 2NF particularly useful for heavy read-only cases. For systems with millions of records, lots of complex querying and no need to add new data or update existing records, 1NF and 2NF offer faster record retrieval.

3NF: Third Normal Form

In the context of database normalization, a transitive dependency occurs when a non-prime column (a column that is not part of the primary key) depends on another non-prime column. To be more specific, consider three columns: A, B, and C. Given A acts as the primary key, A determines B and C. However, if the column B can also fully determine the column C independently, we say that there is a transitive dependency.

Third Normal Form (3NF) is built upon 2NF and eliminates any transitive dependencies in a 2NF table, ensuring that non-prime attributes are transitively dependent only on the primary key. For example, let's consider the 2NF job history table we made earlier.

Job History Table (2NF):

Employee ID	Start Date	End Date	Job ID	Department ID
101	21-SEP-07	27-OCT-11	AC_ACCOUNT	110
101	28-OCT-11	15-MAR-15	AC_MGR	110
102	13-JAN-11	24-JUL-16	IT_PROG	60
114	24-MAR-16	31-DEC-17	ST_CLERK	50
122	01-JAN-17	31-DEC-17	ST_CLERK	50
176	24-MAR-16	31-DEC-16	SA_REP	80
176	01-JAN-17	31-DEC-17	SA_MAN	80
200	17-SEP-05	17-JUN-11	AD_ASST	90
200	01-JUL-12	31-DEC-16	AC_ACCOUNT	110
201	17-FEB-14	19-DEC-17	MK_REP	20

In this table, notice that the Department ID is fully dependent on Job ID, whereas the composite primary key consists of Employee ID, Start Date, and End Date. That is, any Job ID value belongs to one and only one Department ID column value. Thus, knowing the job ID allows us to determine the corresponding department ID. This situation is referred to as a transitive dependency, and thereby the table violates 3NF.

To achieve 3NF, you should decompose the table as follows.

Job History Table (3NF):

Employee ID	Start Date	End Date	Job ID
101	21-SEP-07	27-OCT-11	AC_ACCOUNT
101	28-OCT-11	15-MAR-15	AC_MGR
102	13-JAN-11	24-JUL-16	IT_PROG
114	24-MAR-16	31-DEC-17	ST_CLERK
122	01-JAN-17	31-DEC-17	ST_CLERK
176	24-MAR-16	31-DEC-16	SA_REP
176	01-JAN-17	31-DEC-17	SA_MAN
200	17-SEP-05	17-JUN-11	AD_ASST
200	01-JUL-12	31-DEC-16	AC_ACCOUNT
201	17-FEB-14	19-DEC-17	MK_REP

Job ID Table (3NF):

Job ID	Department ID
AC_ACCOUNT	110
AC_MGR	110
AD_ASST	110
IT_PROG	60
SA_MAN	80
SA_REP	80
ST_CLERK	50
MK_REP	20

The 3NF tables are ideal when lots of data entry and updates are involved, as the 3NF design minimizes the possible duplication of data and the associated risk of data conflicts. However, if your goal is to build a read-heavy table, you may consider 1NF or 2NF, as 3NF requires table joins to retrieve the whole information.

Denormalization

Database normalization offers many benefits, but it also introduces some drawbacks. It minimizes data redundancy, improves data integrity, and enhances data security. However, normalized databases often necessitate many table joins for data retrieval, as information is distributed across multiple tables. This complexity may potentially impact query performance, particularly with more complex queries, due to the added overhead of joining tables.

Denormalization is the process of introducing intentional data redundancy by reversing the database normalization. This is usually employed to avoid expensive SQL joins between 3NF tables. The cost of denormalization is the advantages of normalization we have to give up. So, one should apply denormalization selectively as a last resort if the querying performance problems are experienced.

Rather than denormalizing the actual tables, one might consider introducing a materialized view. It is a database object that provides precomputed snapshot of the result set of a complex query. The database objects from which a materialized view retrieves data-such as tables, views, and other materialized views-are referred to as detailed tables^[1]^[2].

For data warehousing purposes, which involve maintaining a central repository of integrated data from multiple sources within an organization, commonly created materialized views include materialized aggregate views, single-table materialized aggregate views, and materialized join views. All these types of materialized views can leverage query rewrite, an optimization technique that transforms a user request written in terms of detailed tables into a semantically equivalent request that incorporates one or more materialized views.

^[1] Term used in the context of data warehousing purpose. Database objects where a materialized view query data from is called master tables when it is for replication purpose. ^↩

^[2] Materialized views can also serve as read-only copies of remote data on a local machine. For the purpose, primary key, rowid, object, and/or subqueries are commonly created as materialized views in replication environments. ^↩

^[3] And thereby it is distinguished from ^↩

Database Normalization

1NF: First Normal Form

Job History Table:

Job History Table (1NF):

2NF: Second Normal Form

Job History Table (2NF):

Salary Table (2NF):

3NF: Third Normal Form

Job History Table (2NF):

Job History Table (3NF):

Job ID Table (3NF):

Denormalization

Post a Comment

0 Comments

Most Popular

Introducing Oracle Database XE

Understanding Relational Data Model

Database Normalization

Tags

Wikipedia

Contact form

Database Normalization

1NF: First Normal Form

Job History Table:

Job History Table (1NF):

2NF: Second Normal Form

Job History Table (2NF):

Salary Table (2NF):

3NF: Third Normal Form

Job History Table (2NF):

Job History Table (3NF):

Job ID Table (3NF):

Denormalization

You may like these posts

Post a Comment

0 Comments

Social Plugin

Most Popular

Introducing Oracle Database XE

Understanding Relational Data Model

Database Normalization

Tags

Wikipedia

Contact form