Page 206 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM

Page 206 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE

P. 206

Unit 12: Query Processing and Optimization

Query Optimisation: Amongst all equivalent plans choose the one with the lowest cost. Cost is Notes
estimated using statistical information from the database catalogue, for example, number of
tuples in each relation, size of tuples, etc.
Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost estimation
is made on the basis of heuristic rules.

Task Criteria for a new query creation in DBMS.

12.1.2 Measure of Query Cost

Cost is generally measured as total elapsed time for answering the query. There are many
factors that contribute to time cost. These are disk accesses, CPU time, or even network
communication.
Typically disk access is the predominant cost as disk transfer is a very slow event and is also
relatively easy to estimate. It is measured by taking into account the following activities:
Number of seeks × average-seek-cost
Number of blocks read × average-block-read-cost
Number of blocks written × average-block-written-cost.

Notes The cost for writing a block is higher than the cost for reading a block.

This is due to the fact that the data is read back after being written to ensure that the write was
successful. However, for the sake of simplicity we will just use number of block transfers from
disk as the cost measure. We will also ignore the difference in cost between sequential and
random I/O, CPU and communication costs. The I/O cost depends on the search criteria i.e.,
point/range query on an ordering/other fields and the file structures: heap, sorted, hashed. It is
also dependent on the use of indices such as primary, clustering, secondary, B+ tree, multilevel,
etc. There are other cost factors also, these may include buffering, disk placement, materialisation,
overflow/free space management etc.

12.2 Selection Operation

The selection operation can be performed in a number of ways. Let us discuss the algorithms and
the related cost of performing selection operations.

12.2.1 File Scan

These are the search algorithms that locate and retrieve records that fulfil a selection condition
in a file. The following are the two basic files scan algorithms for selection operation:

1. Linear search: This algorithm scans each file block and tests all records to see whether they
satisfy the selection condition.
The cost of this algorithm (in terms of block transfer) is defined as:
Cost of searching records satisfying a condition = Number of blocks in a database = N .
b
Cost for searching a key attribute value = Average number of block transfer for locating
the value (on an average, half of the file needs to be traversed) so it is = N /2.
b

LOVELY PROFESSIONAL UNIVERSITY 199

201 202 203 204 205 206 207 208 209 210 211