Page 206 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE
P. 206

Unit 12: Query Processing and Optimization




          Query Optimisation: Amongst all equivalent plans choose the one with the lowest cost. Cost is  Notes
          estimated using statistical information from the database catalogue, for example, number  of
          tuples in each relation, size of tuples, etc.
          Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost estimation
          is made on the basis of heuristic rules.




              Task       Criteria for a new query creation in DBMS.

          12.1.2 Measure of Query Cost

          Cost is generally measured as total elapsed time for answering  the query.  There are  many
          factors that  contribute  to  time cost.  These are  disk  accesses,  CPU  time,  or even  network
          communication.
          Typically disk access is the predominant cost as disk transfer is a very slow event and is also
          relatively easy to estimate. It is measured by taking into account the following activities:
             Number of seeks           ×      average-seek-cost
             Number of blocks read     ×      average-block-read-cost
             Number of blocks written  ×      average-block-written-cost.




             Notes       The cost for writing a block is higher than the cost for reading a block.

          This is due to the fact that the data is read back after being written to ensure that the write was
          successful. However, for the sake of simplicity we will just use number of block transfers from
          disk as the cost measure. We will also ignore the difference in cost between sequential and
          random I/O, CPU and communication costs. The I/O cost depends on the search criteria i.e.,
          point/range query on an ordering/other fields and the file structures: heap, sorted, hashed. It is
          also dependent on the use of indices such as primary, clustering, secondary, B+ tree, multilevel,
          etc. There are other cost factors also, these may include buffering, disk placement, materialisation,
          overflow/free space management etc.

          12.2 Selection Operation

          The selection operation can be performed in a number of ways. Let us discuss the algorithms and
          the related cost of performing selection operations.

          12.2.1 File Scan

          These are the search algorithms that locate and retrieve records that fulfil a selection condition
          in a file. The following are the two basic files scan algorithms for selection operation:

          1.   Linear search: This algorithm scans each file block and tests all records to see whether they
               satisfy the selection condition.
               The cost of this algorithm (in terms of block transfer) is defined as:
               Cost of searching records satisfying a condition = Number of blocks in a database = N .
                                                                                     b
               Cost for searching a key attribute value = Average number of block transfer for locating
               the value (on an average, half of the file needs to be traversed) so it is = N /2.
                                                                           b


                                           LOVELY PROFESSIONAL UNIVERSITY                                   199
   201   202   203   204   205   206   207   208   209   210   211