A warehouse in general terms is a repository where we store our information. Coming to the data warehouse, it is a collection of data organized in a specific manner and categorized information.
A data warehouse stores historical data of an organization so that they can analyze their performance over the past years and plan for the future.
A data warehouse is:
Another definition from Ralph Kimball is more precise:
A data warehouse is a copy of transaction data specifically structured for query and analysis.
A data mart is a subset of a data warehouse. Suppose we have an organization established in many different locations and each location maintains a data warehouse which we call it as data mart because a warehouse will have all the data integrated and as far as data mart is considered it will be a part of the data warehouse.
A Data Warehouse, in general, is used to analyze trends over a period of time and enhances the decision making of an organization. Once the data is loaded into the warehouse will be creating an OLAP cube or directly use the data to analyze trends. Basing on which the top level management will approach their future business strategies.
We will multiple sources there all the operational data will be stored. Some may store them in flat files and some in databases. We will have to read all the sources and perform ETL (Extract, Transform and Load operations) which will be used to integrate data from these sources and transform them into one unique structure and then load into our target data warehouse.
There are 2 approaches in designing your data warehouse. First one is “Top-Down approach” and second is “Bottom-up approach”.
Top – Down Approach: The above image shows the top-down approach, where we are reading data from multiple sources and transforming the data and loading into your warehouse, then on top of that we are creating our data marts.
Bottom-up Approach: The opposite of the above approach is this. Here we will be creating data marts first and then we will create our data ware house on top of all the data marts.