Page 218 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 218

Data Warehousing and Data Mining




                    notes          The performance of these query processing techniques are compared using a disk I/O. Finally,
                                   we propose a recommendation for Database Management Systems (DBMSs) to select the most
                                   cost-effective query processing techniques based on the cost model
                                   Before you can analyze and interpret customers buying behavior, we have to store all the historical
                                   information of customers. This voluminous amount of data is most suitably stored in a data
                                   warehouse. The current DBMSs mainly deal with daily operation of business such as Transaction
                                   Processing System (TPS) involving reading and writing data. Data warehouse handles the large
                                   read only information. The volume of data in data warehouse is increasing rapidly. If a query
                                   was to be made on this large volume of data, the response time will, as expected, not be good.
                                   The next question that comes to mind will be: How can we achieve a reasonable response time?
                                   Response time relies heavily on query processing technique and experiment on new techniques.
                                   A new reliable and faster query processing technique will greatly enhance the performance of
                                   data warehousing processes..

                                   11.4.1 relational Model


                                   Suppose we are working with a relation R that has m number of tuples {t , t , … t  }, and O
                                                                                                1
                                                                                                  2
                                                                                                       m
                                   number of attributes {a , a , …, a }, the size S of O number of attributes {s , s , …, s } and instance
                                                                                                   o
                                                     1
                                                       2
                                                            0
                                                                                            1
                                                                                              2
                                   of tuples {i  , i  , i  , …, i  m o  }. The term’s relation and table are used. Table 11.1 shows the
                                                    1 o
                                                1 2
                                            1 1
                                   relational model R.
                                                            table 11.1: the relational Model r
                                          a 1          a 2          . . . .       a 4        . . . .
                                          i 11         i 12         . . . .       i 14       . . . .
                                          :            :            :             :          :
                                          :            :            :             :          :
                                          i            i            :             i          :
                                          m1            m2                        m4
                                   Assumptions
                                   Let us assume that:
                                   The access time per tuple with 1000 bytes is 0.1 sec.
                                   There are 10,000 tuples in the R relation.
                                   The disk I/O fetches 100 bytes in 0.01 sec.
                                   In a data warehousing environment, the information sources are made up of combined data from
                                   different sources. These data are fed into a central repository from the dispersed client locations
                                   and form the R relation. In such a situation, usually the respective clients are only interested in
                                   their own data sets. Suppose, the particular client wants to display a  and a . Given this situation,
                                                                                              4
                                                                                        1
                                   how can we achieve a reasonable response time?
                                   11.4.2 cost Model

                                   Here, we introduce a cost model to measure the performance of query processing techniques.
                                   In general, there are three components to make up the response time of a query. Scanning the
                                   data in the table level only will exclude the access time of the index level such as in the case of
                                   the full scan technique. Scanning the data in the index level only will exclude the access time of
                                   the table level such as in the case of the inverted partitioned indexes. Table 11.2 shows the cost
                                   model for respective query processing techniques. To measure the response time of a query, we
                                   need to submit what type of query processing techniques as a parameter such as Full Scan, Index
                                   techniques or Inverted Partitioned Index. Thus, DBMSs will be able to pick up what components
                                   required.


          212                              LoveLy professionaL university
   213   214   215   216   217   218   219   220   221   222   223