ETL and Data Warehousing – Mohamed A.M. Mahyoub

A quick look at the history of data warehousing to have a clear understanding of this concept; data warehousing was started by Bill Inmon when he wrote a paper about data de-normalizing ( read more about normalization) in which he argued that in some situations using the third normal form wouldn’t necessarily be best to build a proper database. His work was then taken and developed by Dr. Ralph Kimball, and he commercialised data warehousing. Data warehouse is often called “big data” due to the fact that it’s not in third normal form. It should be de-normalized to second normal form which will cause data redundancy and make the data warehouse much bigger than a fully normalized database. Data redundancy will require more storage but it will give faster access to data and time is very important when it comes to huge databases. Hence the purpose of a data warehouse is to provide comprehensive data in a suitable time and format for decision making [1].

A stretched definition for data warehousing includes business intelligence tools, tools to extract, transform and load data (ETL) onto the data warehouse. After the ETL stage, the data is cleaned, transformed and catalogued and is made available to users.

Figure 1 diagram shows the structure of data warehousing and ETL process in business intelligence. During the transformation phase, the extracted data might need to be normalized or demoralized.

The ‘Data Warehouse’ environment typically transforms the relational data model into new architectures (schemas). There are many schema models designed for data warehousing such as snowflakes schema and fact constellation schema but the most commonly used is the star schema.

References:

Source: Last accessed 19th Oct 2013 6PM.