Page 165 - DMGT505_MANAGEMENT_INFORMATION_SYSTEM
P. 165
Management Information Systems
Notes In other words, the data warehouse provides data that is already transformed and summarized,
therefore making it an appropriate environment for more efficient DSS and EIS applications.
You should be able to know the data mining concept as well with the concept of data warehouse.
Data mining is the process of extracting patterns from data. As more data are gathered, with the
amount of data doubling every three years, data mining is becoming an increasingly important
tool to transform these data into information. It is commonly used in a wide range of profiling
practices, such as marketing, surveillance, fraud detection and scientific discovery.
While data mining can be used to uncover patterns in data samples, it is important to be aware
that the use of non-representative samples of data may produce results that are not indicative of
the domain. Similarly, data mining will not find patterns that may be present in the domain, if
those patterns are not present in the sample being “mined”. There is a tendency for insufficiently
knowledgeable “consumers” of the results to attribute “magical abilities” to data mining, treating
the technique as a sort of all-seeing crystal ball. Like any other tool, it only functions in conjunction
with the appropriate raw material: in this case, indicative and representative data that the user
must first collect.
!
Caution The discovery of a particular pattern in a particular set of data does not necessarily
mean that pattern is representative of the whole population from which that data was
drawn.
Did u know? An important part of the data warehousing process is the verification and
validation of patterns on other samples of data.
8.5.1 Characteristics of Data Warehouse
According to Bill Inmon, author of Building the data Warehouse and the guru who is widely
considered to be the originator of the data warehousing concept, there are generally four
characteristics that describe a data warehouse:
1. Subject Oriented: Data are organized according to subject instead of application, e.g., an
insurance company using a data warehouse would organize their data by customer,
premium, and claim, instead of by different products (auto, life, etc.). The data organized
by subject contain only the information necessary for decision support processing.
2. Integrated: when data resides in many separate applications in the operational environment,
encoding of data is often inconsistent. For instance, in one application, gender might be
coded as “m” and “f” in another by 0 and 1. When data are moved from the operational
environment into the data warehouse, they assume a consistent coding convention, e.g.,
gender data is transformed to “m” and “f”.
3. Time-variant: The data warehouse contains a place for storing data that are five to 10 years
old, or older, to be used for comparisons, trends, and forecasting. These data are not
updated.
4. Non-volatile: Data are not updated or changed in any way once they enter the data
warehouse, but are only loaded and accessed.
160 LOVELY PROFESSIONAL UNIVERSITY