Page 60 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 60
Data Warehousing and Data Mining
notes 3. Scope for pilot: KL Hub (for actual data capture, UAT & roll-out), but framework
must incorporate APJCC perspective
4. Define data definitions, DWH structure, data capture processes, business logics &
system rules, applications & tools for Datamart.
5. Design & implement
3.8 summary
l z In this unit, you learnt about the data mining technique. Data Mining is an analytic process
designed to explore data (usually large amounts of data - typically business or market
related) in search of consistent patterns and/or systematic relationships between variables,
and then to validate the findings by applying the detected patterns to new subsets of
data.
l z The ultimate goal of data mining is prediction - and predictive data mining is the most
common type of data mining and one that has the most direct business applications.
l z The process of data mining consists of three stages: (1) the initial exploration, (2) model
building or pattern identification with validation/verification, and (3) deployment (i.e., the
application of the model to new data in order to generate predictions).
l z In this unit you also learnt about a statistical perspective of data mining, similarity measures,
decision tree and many more.
3.9 keywords
Decision Tree: A decision tree is a structure that can be used to divide up a large collection
of records into successively smaller sets of records by applying a sequence of simple decision
rules.
Dice: The dice coefficient is a generalization of the harmonic mean of the precision and recall
measures.
Genetic Algorithms: Genetic algorithms are mathematical procedures utilizing the process of
genetic inheritance.
Similarity Measures: Similarity measures provide the framework on which many data mining
decision are based.
3.10 self assessment
Fill in the blanks:
1. ....................... is the science of learning from data.
2. ....................... are known to be crude information and not knowledge by themselves.
3. ....................... provide the framework on which many data mining decision are based.
4. The goal of ....................... systems is to meet user needs.
5. The ....................... is sometimes calculated using the max operator in place of the min.
6. ....................... are very sophisticated modeling techniques capable of modeling extremely
complex functions.
7. ....................... are mathematical procedures utilizing the process of genetic inheritance.
54 LoveLy professionaL university