Page 165 - DCAP208_Management Support Systems
P. 165

Management Support Systems




                    Notes          degree they attained when predicting income. The better definition of “near” might in fact be
                                   other people that you graduated from college with rather than the people that you live next to.
                                   Nearest Neighbor techniques are among the easiest to use and understand because they work in
                                   a way similar to the way that people think - by detecting closely matching examples. They also
                                   perform quite well in terms of automation, as many of the algorithms are robust with respect to
                                   dirty data and missing data. Lastly they are particularly adept at performing complex ROI
                                   calculations because the predictions are made at a local level where business simulations could
                                   be performed in order to optimize ROI. As they enjoy similar levels of accuracy compared to
                                   other techniques the measures of accuracy such as lift are as good as from any other.

                                   How to Use Nearest Neighbor for Prediction

                                   One of the essential elements underlying the concept of clustering is that one particular object
                                   (whether they be cars, food or customers) can be closer to another object than can some third
                                   object. It is interesting that most people have an innate sense of ordering placed on a variety of
                                   different objects. Most people would agree that an apple is closer to an orange than it is to a
                                   tomato and that a Toyota Corolla is closer to a Honda Civic than to a Porsche. This sense of
                                   ordering on many different objects helps us place them in time and space and to make sense of
                                   the world. It is what allows us to build clusters – both in databases on computers as well as in our
                                   daily lives. This definition of nearness that seems to be ubiquitous also allows us to make
                                   predictions.

                                   The nearest neighbor prediction algorithm simply stated is:
                                   Objects that are “near” to each other will have similar prediction values as well. Thus if you
                                   know the prediction value of one of the objects you can predict it for it’s nearest neighbors.

                                   Where has the Nearest Neighbor Technique been Used in Business?

                                   One of the classical places that nearest neighbor has been used for prediction has been in text
                                   retrieval. The problem to be solved in text retrieval is one where the end user defines a document
                                   that is interesting to them and they solicit the system to “find more documents like this one”.
                                   Effectively defining a target of: “this is the interesting document” or “this is not interesting”.
                                   The prediction problem is that only a very few of the documents in the database actually have
                                   values for this prediction field (namely only the documents that the reader has had a chance to
                                   look at so far). The nearest neighbor technique is used to find other documents that share
                                   important characteristics with those documents that have been marked as interesting.

                                   Using Nearest Neighbor for Stock Market Data

                                   As with almost all prediction algorithms, nearest neighbor can be used in a variety of places.
                                   Its successful use is mostly dependent on the pre-formatting of the data so that nearness can be
                                   calculated and where individual records can be defined. In the text retrieval example this was
                                   not too difficult - the objects were documents. This is not always as easy as it is for text retrieval.
                                   Consider what it might be like in a time series problem - say for predicting the stock market.
                                   In this case the input data is just a long series of stock prices over time without any particular
                                   record that could be considered to be an object. The value to be predicted is just the next value of
                                   the stock price.

                                   The way that this problem is solved for both nearest neighbor techniques and for some other
                                   types of prediction algorithms is to create training records by taking, for instance, 10 consecutive
                                   stock prices and using the first 9 as predictor values and the 10th as the prediction value. Doing
                                   things this way, if you had 100 data points in your time series you could create 10 different
                                   training records.


          158                               LOVELY PROFESSIONAL UNIVERSITY
   160   161   162   163   164   165   166   167   168   169   170