Page 88 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 88

Data Warehousing and Data Mining




                    notes          We begin with what us perhaps the best-known data type in traditional data analysis, namely,
                                   d-dimensional vectors x of measurements on N objects or individual, or N objects where for each
                                   of which we have d measurements or attributes. Such data is often referred to as multivariate
                                   data and can be thought of as an N x d data matrix. Classical problems in data analysis involving
                                   multivariate data include classification (learning a functional mapping from a vector x to y where
                                   y is a categorical, or scalar, target variable of interest), regression (same as classification, except y,
                                   which takes real values), clustering (learning a function that maps x into a set of categories, where
                                   the categories are unknown a priori), and density estimation (estimating the probability density
                                   function, or PDF, for x, p (x)).
                                   The  dimensionality  d  of  the  vectors  x  plays  a  significant  role  in  multivariate  modeling.  In
                                   problems like text classification and clustering of gene expression data, d can be as large 10  and
                                                                                                           3
                                   10  dimensions. Density estimation theory shows that the amount of data needed to reliably to
                                    4
                                   estimate a density function scales exponentially in d (the so-called “curse of dimensionality”).
                                   Fortunately, many predictive problems including classification and regression do not need a full
                                   d dimensional estimate of the PDF p(x), relying instead on the simpler problem of determining
                                   of a conditional probability density function p(y/x), where y is the variable whose value the data
                                   minor wants to predict.

                                   Recent research has shown that combining different models can be effective in reducing  the
                                   instability that results form predictions using a single model fit to a single set of data. A variety of
                                   model-combining techniques (with exotic names like bagging, boosting, and stacking) combine
                                   massive computational search methods with variance-reduction ideas from statistics; the result
                                   is relatively powerful automated schemes for building multivariate predictive models. As the
                                   data minor’s multivariate toolbox expands, a significant part of the data mining is the practical
                                   intuition of the tools themselves.

                                       

                                     Case Study    hideaway Warehouse Management system (WMs)

                                     the company
                                     Hideaway Beds – Wall Bed Company offers the Latest Designs wall beds. Wall Beds have
                                     been  around  since  1918  in  American  and  Europe.    The  company  ships  their  products
                                     to approximately 100 retailers in Australia as well as taking online orders directly from
                                     individual consumers.


















                                     Key Benefits
                                     1.   Order accuracy increases from 80% to 99.9 %
                                     2.   Order picking times reduced by one third
                                                                                                         Contd...





          82                               LoveLy professionaL university
   83   84   85   86   87   88   89   90   91   92   93