Page 165 - DCAP208_Management Support Systems
P. 165
Management Support Systems
Notes degree they attained when predicting income. The better definition of “near” might in fact be
other people that you graduated from college with rather than the people that you live next to.
Nearest Neighbor techniques are among the easiest to use and understand because they work in
a way similar to the way that people think - by detecting closely matching examples. They also
perform quite well in terms of automation, as many of the algorithms are robust with respect to
dirty data and missing data. Lastly they are particularly adept at performing complex ROI
calculations because the predictions are made at a local level where business simulations could
be performed in order to optimize ROI. As they enjoy similar levels of accuracy compared to
other techniques the measures of accuracy such as lift are as good as from any other.
How to Use Nearest Neighbor for Prediction
One of the essential elements underlying the concept of clustering is that one particular object
(whether they be cars, food or customers) can be closer to another object than can some third
object. It is interesting that most people have an innate sense of ordering placed on a variety of
different objects. Most people would agree that an apple is closer to an orange than it is to a
tomato and that a Toyota Corolla is closer to a Honda Civic than to a Porsche. This sense of
ordering on many different objects helps us place them in time and space and to make sense of
the world. It is what allows us to build clusters – both in databases on computers as well as in our
daily lives. This definition of nearness that seems to be ubiquitous also allows us to make
predictions.
The nearest neighbor prediction algorithm simply stated is:
Objects that are “near” to each other will have similar prediction values as well. Thus if you
know the prediction value of one of the objects you can predict it for it’s nearest neighbors.
Where has the Nearest Neighbor Technique been Used in Business?
One of the classical places that nearest neighbor has been used for prediction has been in text
retrieval. The problem to be solved in text retrieval is one where the end user defines a document
that is interesting to them and they solicit the system to “find more documents like this one”.
Effectively defining a target of: “this is the interesting document” or “this is not interesting”.
The prediction problem is that only a very few of the documents in the database actually have
values for this prediction field (namely only the documents that the reader has had a chance to
look at so far). The nearest neighbor technique is used to find other documents that share
important characteristics with those documents that have been marked as interesting.
Using Nearest Neighbor for Stock Market Data
As with almost all prediction algorithms, nearest neighbor can be used in a variety of places.
Its successful use is mostly dependent on the pre-formatting of the data so that nearness can be
calculated and where individual records can be defined. In the text retrieval example this was
not too difficult - the objects were documents. This is not always as easy as it is for text retrieval.
Consider what it might be like in a time series problem - say for predicting the stock market.
In this case the input data is just a long series of stock prices over time without any particular
record that could be considered to be an object. The value to be predicted is just the next value of
the stock price.
The way that this problem is solved for both nearest neighbor techniques and for some other
types of prediction algorithms is to create training records by taking, for instance, 10 consecutive
stock prices and using the first 9 as predictor values and the 10th as the prediction value. Doing
things this way, if you had 100 data points in your time series you could create 10 different
training records.
158 LOVELY PROFESSIONAL UNIVERSITY