Page 150 - DCAP208_Management Support Systems
P. 150

Unit 9: Data Mining




          older classical statistical methods. While data mining is still in its infancy, it is becoming a trend  Notes
          and ubiquitous. Before data mining develops into a conventional, mature and trusted discipline,
          many still pending issues have to be addressed. Some of these issues are discussed below.



             Did u know? These issues are not exclusive and are not ordered in any way.
          Security and Social Issues


          Security is an important issue with any data collection that is shared and/or is intended to be
          used for strategic decision-making. In addition, when data is collected for customer profiling,
          user behaviour understanding, correlating personal data with other information, etc., large
          amounts of sensitive and private information about individuals or companies is gathered and
          stored. This becomes controversial given the confidential nature of some of this data and the
          potential illegal access to the information. Moreover, data mining could disclose new implicit
          knowledge about individuals or groups that could be against privacy policies, especially if
          there is potential dissemination of discovered information. Another issue that arises from this
          concern is the appropriate use of data mining. Due to the value of data, databases of all sorts of
          content are regularly sold, and because of the competitive advantage that can be attained from
          implicit knowledge discovered, some important information could be withheld, while other
          information could be widely distributed and used without control.

          User Interface Issues

          The knowledge discovered by data mining tools is useful as long as it is interesting, and above
          all understandable by the user. Good data visualization eases the interpretation of data mining
          results, as well as helps users better understand their needs. Many data exploratory analysis
          tasks are significantly facilitated by the ability to see data in an appropriate visual presentation.
          There are many visualization ideas and proposals for effective data graphical presentation.
          However, there is still much research to accomplish in order to obtain good visualization tools
          for large datasets that could be used to display and manipulate mined knowledge. The major
          issues related to user interfaces and visualization are “screen real-estate”, information rendering,
          and interaction. Interactivity with the data and data mining results is crucial since it provides
          means for the user to focus and refine the mining tasks, as well as to picture the discovered
          knowledge from different angles and at different conceptual levels.

          Mining Methodology Issues

          These issues pertain to the data mining approaches applied and their limitations. Topics such as
          versatility of the mining approaches, the diversity of data available, the dimensionality of the
          domain, the broad analysis needs (when known), the assessment of the knowledge discovered,
          the exploitation of background knowledge and metadata, the control and handling of noise in
          data, etc. are all examples that can dictate mining methodology choices. For instance, it is often
          desirable to have different data mining methods available since different approaches may perform
          differently depending upon the data at hand. Moreover, different approaches may suit and solve
          user’s needs differently.
          Most algorithms assume the data to be noise-free. This is of course a strong assumption. Most
          datasets contain exceptions, invalid or incomplete information, etc., which may complicate, if
          not obscure, the analysis process and in many cases compromise the accuracy of the results. As
          a consequence, data preprocessing (data cleaning and transformation) becomes vital. It is often
          seen as lost time, but data cleaning, as time-consuming and frustrating as it may be, is one of the




                                           LOVELY PROFESSIONAL UNIVERSITY                                   143
   145   146   147   148   149   150   151   152   153   154   155