Page 150 - DCAP208_Management Support Systems
P. 150
Unit 9: Data Mining
older classical statistical methods. While data mining is still in its infancy, it is becoming a trend Notes
and ubiquitous. Before data mining develops into a conventional, mature and trusted discipline,
many still pending issues have to be addressed. Some of these issues are discussed below.
Did u know? These issues are not exclusive and are not ordered in any way.
Security and Social Issues
Security is an important issue with any data collection that is shared and/or is intended to be
used for strategic decision-making. In addition, when data is collected for customer profiling,
user behaviour understanding, correlating personal data with other information, etc., large
amounts of sensitive and private information about individuals or companies is gathered and
stored. This becomes controversial given the confidential nature of some of this data and the
potential illegal access to the information. Moreover, data mining could disclose new implicit
knowledge about individuals or groups that could be against privacy policies, especially if
there is potential dissemination of discovered information. Another issue that arises from this
concern is the appropriate use of data mining. Due to the value of data, databases of all sorts of
content are regularly sold, and because of the competitive advantage that can be attained from
implicit knowledge discovered, some important information could be withheld, while other
information could be widely distributed and used without control.
User Interface Issues
The knowledge discovered by data mining tools is useful as long as it is interesting, and above
all understandable by the user. Good data visualization eases the interpretation of data mining
results, as well as helps users better understand their needs. Many data exploratory analysis
tasks are significantly facilitated by the ability to see data in an appropriate visual presentation.
There are many visualization ideas and proposals for effective data graphical presentation.
However, there is still much research to accomplish in order to obtain good visualization tools
for large datasets that could be used to display and manipulate mined knowledge. The major
issues related to user interfaces and visualization are “screen real-estate”, information rendering,
and interaction. Interactivity with the data and data mining results is crucial since it provides
means for the user to focus and refine the mining tasks, as well as to picture the discovered
knowledge from different angles and at different conceptual levels.
Mining Methodology Issues
These issues pertain to the data mining approaches applied and their limitations. Topics such as
versatility of the mining approaches, the diversity of data available, the dimensionality of the
domain, the broad analysis needs (when known), the assessment of the knowledge discovered,
the exploitation of background knowledge and metadata, the control and handling of noise in
data, etc. are all examples that can dictate mining methodology choices. For instance, it is often
desirable to have different data mining methods available since different approaches may perform
differently depending upon the data at hand. Moreover, different approaches may suit and solve
user’s needs differently.
Most algorithms assume the data to be noise-free. This is of course a strong assumption. Most
datasets contain exceptions, invalid or incomplete information, etc., which may complicate, if
not obscure, the analysis process and in many cases compromise the accuracy of the results. As
a consequence, data preprocessing (data cleaning and transformation) becomes vital. It is often
seen as lost time, but data cleaning, as time-consuming and frustrating as it may be, is one of the
LOVELY PROFESSIONAL UNIVERSITY 143