Page 20 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 20
Data Warehousing and Data Mining
notes of data warehouses is that data is stored at its most elemental level for use in reporting and
information analysis.
Within this generic intent, there are two primary approaches to organising the data in a data
warehouse.
The first is using a “dimensional” approach. In this style, information is stored as “facts”
which are numeric or text data that capture specific data about a single transaction or event,
and “dimensions” which contain reference information that allows each transaction or event
to be classified in various ways. As an example, a sales transaction would be broken up into
facts such as the number of products ordered, and the price paid, and dimensions such as date,
customer, product, geographical location and salesperson. The main advantages of a dimensional
approach is that the Data Warehouse is easy for business staff with limited information
technology experience to understand and use. Also, because the data is pre-processed into the
dimensional form, the Data Warehouse tends to operate very quickly. The main disadvantage of
the dimensional approach is that it is quite difficult to add or change later if the company changes
the way in which it does business.
The second approach uses database normalization. In this style, the data in the data warehouse is
stored in third normal form. The main advantage of this approach is that it is quite straightforward
to add new information into the database the primary disadvantage of this approach is that it can
be rather slow to produce information and reports.
Task Database means to keep records in a particular format. What about data
warehouse?
1.8 getting Multidimensional Data out of the Warehouse
Data warehouse, generally speaking, is a kind of analytical tool used in business area. Since the
amount of data of such a database is usually far beyond our imagination, and the inter-relation
among those data are often intertwined, multidimensional data, with its multidimensional
structure, are very popular for database as described.
A key feature of multidimensional data is that it can be viewed from different aspects, leading to
a broader picture of the whole data and a clear view of the trend. Therefore, such kind of data is
often used as part of OLAP to server users with a high-speed and effective query.
Getting multidimensional data out of the data warehouse are as follows:
1. Large data volumes, e.g., sales, telephone calls
(a) Giga-, Tera-, Peta-, Exa-byte
2. OLAP = On-Line Analytical Processing
(a) Interactive analysis
(b) Explorative discovery
(c) Fast response times required
3. OLAP operations
(a) Aggregation of data
(b) Standard aggregations operator, e.g., SUM
(c) Starting level, (Quarter, City)
14 LoveLy professionaL university