Page 28 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 28
Data Warehousing and Data Mining
notes knowledge can be applied to decision-making, process control, information management, and
query processing. Therefore, data mining is considered one of the most important frontiers in
database and information systems and one of the most promising interdisciplinary developments
in the information technology.
2.2 What is Data Mining?
In simple words, data mining refers to extracting or “mining” knowledge from large amounts of
data. Some other terms like knowledge mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging are also used for data mining. Many people treat
data mining as a synonym for another popularly used term, Knowledge Discovery from Data,
or KDD.
Some people view data mining as simply an essential step in the process of knowledge discovery.
Knowledge discovery as a process and consists of an iterative sequence of the following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved from the database)
4. Data transformation (where data are transformed or consolidated into forms appropriate
for mining by performing summary or aggregation operations, for instance)
5. Data mining (an essential process where intelligent methods are applied in order to extract
data patterns)
6. Pattern evaluation (to identify the truly interesting patterns representing knowledge
based on some interestingness measures)
7. Knowledge presentation (where visualisation and knowledge representation techniques
are used to present the mined knowledge to the user).
The first four steps are different forms of data preprocessing, which are used for data preparation
for mining. After this the data-mining step may interact with the user or a knowledge base.
The interesting patterns are presented to the user and may be stored as new knowledge in the
knowledge base.
2.3 Definition of Data Mining
Today, in industry, in media, and in the database research milieu, the term data mining is becoming
more popular than the longer term of knowledge discovery from data. Therefore in a broader
view of data mining functionality data mining can be defined as “the process of discovering
interesting knowledge from large amounts of data stored in databases, data warehouses, or other
information repositories.”
22 LoveLy professionaL university