Data Warehouse Concepts

 

What is a Data Warehouse?

A warehouse in general terms is a repository where we store our information. Coming to the data warehouse, it is a collection of data organized in a specific manner and categorized information.

A data warehouse stores historical data of an organization so that they can analyze their performance over the past years and plan for the future.

 

The popular definition of the data warehouse by WH Inmon:

A data warehouse is:

  • Subject oriented: Data in the warehouse is categorized in different subject areas. For example, consider a KFC store. It has many branches all over the world. If we have to analyze “sales” for India, this is termed as “subject”.

  • Integrated: A data warehouse has data coming from multiple sources which are integrated into the warehouse. For example, consider the same KFC store, stores in India may store date field as “dd/mm/yyyy” whereas the same data in another country will be stored as “MM/DD/YYYY”. The data warehouse will have only one format fixed to say “MM/DD/YYYY”.

  • Time-Variant: A data warehouse stores historical data with which we can identify patterns of sales over a time period of 3 months,6 months or 2 years of any organization which has a warehouse.

  • Non-Volatile: Data in the warehouse will not change once it is entered.


Another definition from Ralph Kimball is more precise:

A data warehouse is a copy of transaction data specifically structured for query and analysis.

 

What is a Data mart?

A data mart is a subset of a data warehouse. Suppose we have an organization established in many different locations and each location maintains a data warehouse which we call it as data mart because a warehouse will have all the data integrated and as far as data mart is considered it will be a part of the data warehouse.

 

What are uses of having a Data Warehouse?

A Data Warehouse, in general, is used to analyze trends over a period of time and enhances the decision making of an organization. Once the data is loaded into the warehouse will be creating an OLAP cube or directly use the data to analyze trends. Basing on which the top level management will approach their future business strategies.

 

Data warehouse Architecture

We will multiple sources there all the operational data will be stored. Some may store them in flat files and some in databases. We will have to read all the sources and perform ETL (Extract, Transform and Load operations) which will be used to integrate data from these sources and transform them into one unique structure and then load into our target data warehouse.

 

Data Warehouse Architecture

 

Data Warehouse Architecture

 

There are 2 approaches in designing your data warehouse. First one is “Top-Down approach” and second is “Bottom-up approach”.


Top – Down Approach: The above image shows the top-down approach, where we are reading data from multiple sources and transforming the data and loading into your warehouse, then on top of that we are creating our data marts.


Bottom-up Approach: The opposite of the above approach is this. Here we will be creating data marts first and then we will create our data ware house on top of all the data marts.