Sunday, April 18, 2010

Normalization and De-Normalization

Database normalization is a data design and organization process applied to data structures based on rules that help building relational databases.  Normalization is done to achieve the following goals:
  • eliminating redundant data (for example, storing the same data in more than one table)
  • ensuring data dependencies make sense (only storing related data in a table).



Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via defined relationships.

The Normal Forms

The database community has developed a series of guidelines for ensuring that databases are normalized and these are referred to as Normal Forms.

1NF: Eliminate Repeating Groups

1NF sets the very basic rules for an organized database:
  • Eliminate duplicate columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

2NF: Eliminate Redundant Data

2NF further addresses the concept of removing duplicate data:
  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

3NF: Eliminate Columns Not Dependent On Key

If attributes do not contribute to a description of the key, remove them to a separate table. All attributes must be directly dependent on the primary key.

4NF: Isolate Independent Multiple Relationships

No table may contain two or more 1:n or n:m relationships that are not directly related.

BCNF: Boyce-Codd Normal Form

If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.

De-normalization

De-normalization is the process of attempting to optimize the performance of a database by adding redundant data. It is sometimes necessary because current DBMSs implement the relational model poorly. A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.

We will discuss Normalization Forms in more details in other post.

No comments: