Page 58 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 58

Data Warehousing and Data Mining




                    notes          5.   Engine management: Neural networks have been used to analyze the input of sensors from
                                       an engine. The neural network controls the various parameters within which the engine
                                       functions, in order to achieve a particular goal, such as minimizing fuel consumption.

                                   3.6 genetic algorithms

                                   Genetic algorithms are mathematical procedures utilizing the process of genetic inheritance. They
                                   have been usefully applied to a wide variety of analytic problems. Data mining can combine
                                   human understanding with automatic analysis of data to detect patterns or key relationships.
                                   Given a large database defined over a number of variables, the goal is to efficiently find the most
                                   interesting patterns in the database. Genetic algorithms have been applied to identify interesting
                                   patterns in some applications. They usually are used in data mining to improve the performance
                                   of other algorithms, one example being decision tree algorithms, another association rules.
                                   Genetic algorithms require certain data structure. They operate on a population with characteristics
                                   expressed in categorical form. The analogy with genetics is that the population (genes) consist
                                   of characteristic. One way to implement genetic algorithms is to apply operators (reproduction,
                                   crossover,  selection)  with  the  feature  of  mutation  enhance  generation  of  potentially  better
                                   combinations. The genetic algorithm process is thus:
                                   1.   Randomly select parents

                                   2.   Reproduce through crossover, Reproduction is the choosing which individual entities will
                                       survive. In other words, some objective function or selection characteristic is needed to
                                       determine survival. Crossover relates to change in future generations of entities.

                                   3.   Select survivors for the next generation through a fitness function.
                                   4.   Mutation  is  the  operation  by  which  randomly  selected  attributes  of  randomly  selected
                                       entities in subsequent operations are changed.

                                   5.   Iterate until either a given fitness level is attained, or the present number of iteration is
                                       reached.
                                   Genetic algorithm parameters include population size, crossover rate, and the mutation rate.

                                   advantages of genetic algorithm

                                   Genetic algorithms are very easy to develop and to validate which makes them highly attractive of
                                   they apply. The algorithm is parallel, meaning that it can applied to large populations efficiently.
                                   The algorithm is also efficient in that if it begins with a poor original solution, it can rapidly
                                   progress to good solutions. Use of mutation makes the method capable of identifying global
                                   optima even in very nonlinear problem domains. The method does not require knowledge about
                                   the distribution of the data.
                                   Disadvantages of genetic algorithms


                                   Genetic algorithms require mapping data sets to a from where attributes have discrete values for
                                   the genetic algorithm to work with. This is usually possible, but can be lose a great deal of detail
                                   information when dealing with continuous variables. Coding the data into categorical from can
                                   unintentionally lead to biases in the data.
                                   There are also limits to the size of data set that can be analyzed with genetic algorithms. For very
                                   large data sets, sampling will be necessary, which leads to different results across different runs
                                   over the same data set.







          52                               LoveLy professionaL university
   53   54   55   56   57   58   59   60   61   62   63