Page 88 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 88
Data Warehousing and Data Mining
notes We begin with what us perhaps the best-known data type in traditional data analysis, namely,
d-dimensional vectors x of measurements on N objects or individual, or N objects where for each
of which we have d measurements or attributes. Such data is often referred to as multivariate
data and can be thought of as an N x d data matrix. Classical problems in data analysis involving
multivariate data include classification (learning a functional mapping from a vector x to y where
y is a categorical, or scalar, target variable of interest), regression (same as classification, except y,
which takes real values), clustering (learning a function that maps x into a set of categories, where
the categories are unknown a priori), and density estimation (estimating the probability density
function, or PDF, for x, p (x)).
The dimensionality d of the vectors x plays a significant role in multivariate modeling. In
problems like text classification and clustering of gene expression data, d can be as large 10 and
3
10 dimensions. Density estimation theory shows that the amount of data needed to reliably to
4
estimate a density function scales exponentially in d (the so-called “curse of dimensionality”).
Fortunately, many predictive problems including classification and regression do not need a full
d dimensional estimate of the PDF p(x), relying instead on the simpler problem of determining
of a conditional probability density function p(y/x), where y is the variable whose value the data
minor wants to predict.
Recent research has shown that combining different models can be effective in reducing the
instability that results form predictions using a single model fit to a single set of data. A variety of
model-combining techniques (with exotic names like bagging, boosting, and stacking) combine
massive computational search methods with variance-reduction ideas from statistics; the result
is relatively powerful automated schemes for building multivariate predictive models. As the
data minor’s multivariate toolbox expands, a significant part of the data mining is the practical
intuition of the tools themselves.
Case Study hideaway Warehouse Management system (WMs)
the company
Hideaway Beds – Wall Bed Company offers the Latest Designs wall beds. Wall Beds have
been around since 1918 in American and Europe. The company ships their products
to approximately 100 retailers in Australia as well as taking online orders directly from
individual consumers.
Key Benefits
1. Order accuracy increases from 80% to 99.9 %
2. Order picking times reduced by one third
Contd...
82 LoveLy professionaL university