Tableau Tutorial

Define Data warehouse?

           A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision-making process.


Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.


Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.


Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.


Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

A data warehouse is a copy of transaction data specifically structured for query and analysis.


What does subject-oriented data warehouse signify?

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.


List any five applications of the data warehouse?

The five applications of data warehouse:

  • Subject-oriented.
  • Integrated
  • Time - variant
  • Non- volatile
  • A data warehouse is a copy of transaction data specifically structured for query and analysis


What do OLAP and OLTP stand for?

           OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by a number of transactions per second. In OLTP database there is detailed and current data and schema used to store transactional databases is the entity model (usually 3NF).


           OLAP (On-line Analytical Processing) is characterized by the relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemes (usually star schema).


What is the very basic difference between data warehouse and operational databases?

           The fundamental difference between operational systems and data warehousing systems is that operational systems are designed to support transaction processing whereas data warehousing systems are designed to support online analytical processing (or OLAP, for short).


           Based on this fundamental difference, data usage patterns associated with operational systems are significantly different than usage patterns associated with data warehousing systems. As a result, data warehousing systems are designed and optimized using methodologies that drastically differ from that of operational systems.


What is Data Warehousing?

           Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.


List the functions of data warehouse tools and utilities.

The following are the functions of data warehouse tools and utilities:

Data Extraction - Involves gathering data from multiple heterogeneous sources.

Data Cleaning - Involves finding and correcting the errors in data.

Data Transformation - Involves converting the data from legacy format to warehouse format.

Data Loading - Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.

Refreshing - Involves updating from data sources to the warehouse.


Define load manager & the functions of a load manager

           A load manager performs the operations required to extract and load the process. The size and complexity of load manager vary between specific solutions from the data warehouse to the data warehouse.


           Load manager extracts data from the source system. Fast load the extracted data into the temporary data store. Perform simple transformations into the structure similar to the one in the data warehouse.


Define a warehouse manager & the functions of a warehouse manager

            The warehouse manager is responsible for the warehouse management process. The warehouse manager consists of third-party system software, C programs, and shell scripts. The size and complexity of warehouse manager vary between specific solutions.


           The warehouse manager performs consistency and referential integrity checks, creates the indexes, business views, partition views against the base data, transforms and merge the source data into the temporary store into the published data warehouse, backs up the data in the data warehouse, and archives the data that has reached the end of its captured life.