Page 158 - DCAP606_BUSINESS_INTELLIGENCE
P. 158

Unit 11: Data Mining




          2.   ......................... are any facts, numbers, or text that can be processed by a computer.  Notes
          3.   The patterns, associations, or relationships among all types of data can provide
               ...................................

          11.2 Data Mining Approaches


          Two widespread data mining methods for finding concealed patterns in data are clustering and
          classification analysis. Although classification and clustering are often cited in the identical
          breath, they are different analytical advances.
          Imaging a database of customer records, where each record represents a customer’s attributes.
          These can encompass identifiers such as name and address, demographic data such as gender
          and age, and financial attributes such as income and revenue spent.
          Clustering is an automated method to group associated records together. Related records are
          grouped together on the basis of similar values for attributes.




             Notes  This approach of segmenting the database via clustering analysis is often used as an
            exploratory technique because it is not necessary for the end-user/analyst to identify
            ahead of time how records should be associated simultaneously.

                                   Figure 11.2: Clustering Example






























          Source: http://www.ibm.com/developerworks/data/library/techarticle/dm-0811wurst/
          outlier_by_clustering.jpg
          Records inside a cluster are more alike to each other, and more different from records that are in
          other clusters. Counting on the specific implementation, there is a kind of measure of likeness
          that is used, but the general aim is for the approach to converge to groups of associated records.
          Classification is a different method than clustering. Unlike clustering, a classification analysis
          requires that the end-user/analyst understand ahead of time how classes are characterised.




                                           LOVELY PROFESSIONAL UNIVERSITY                                   153
   153   154   155   156   157   158   159   160   161   162   163