Page 58 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 58
Data Warehousing and Data Mining
notes 5. Engine management: Neural networks have been used to analyze the input of sensors from
an engine. The neural network controls the various parameters within which the engine
functions, in order to achieve a particular goal, such as minimizing fuel consumption.
3.6 genetic algorithms
Genetic algorithms are mathematical procedures utilizing the process of genetic inheritance. They
have been usefully applied to a wide variety of analytic problems. Data mining can combine
human understanding with automatic analysis of data to detect patterns or key relationships.
Given a large database defined over a number of variables, the goal is to efficiently find the most
interesting patterns in the database. Genetic algorithms have been applied to identify interesting
patterns in some applications. They usually are used in data mining to improve the performance
of other algorithms, one example being decision tree algorithms, another association rules.
Genetic algorithms require certain data structure. They operate on a population with characteristics
expressed in categorical form. The analogy with genetics is that the population (genes) consist
of characteristic. One way to implement genetic algorithms is to apply operators (reproduction,
crossover, selection) with the feature of mutation enhance generation of potentially better
combinations. The genetic algorithm process is thus:
1. Randomly select parents
2. Reproduce through crossover, Reproduction is the choosing which individual entities will
survive. In other words, some objective function or selection characteristic is needed to
determine survival. Crossover relates to change in future generations of entities.
3. Select survivors for the next generation through a fitness function.
4. Mutation is the operation by which randomly selected attributes of randomly selected
entities in subsequent operations are changed.
5. Iterate until either a given fitness level is attained, or the present number of iteration is
reached.
Genetic algorithm parameters include population size, crossover rate, and the mutation rate.
advantages of genetic algorithm
Genetic algorithms are very easy to develop and to validate which makes them highly attractive of
they apply. The algorithm is parallel, meaning that it can applied to large populations efficiently.
The algorithm is also efficient in that if it begins with a poor original solution, it can rapidly
progress to good solutions. Use of mutation makes the method capable of identifying global
optima even in very nonlinear problem domains. The method does not require knowledge about
the distribution of the data.
Disadvantages of genetic algorithms
Genetic algorithms require mapping data sets to a from where attributes have discrete values for
the genetic algorithm to work with. This is usually possible, but can be lose a great deal of detail
information when dealing with continuous variables. Coding the data into categorical from can
unintentionally lead to biases in the data.
There are also limits to the size of data set that can be analyzed with genetic algorithms. For very
large data sets, sampling will be necessary, which leads to different results across different runs
over the same data set.
52 LoveLy professionaL university