Page 215 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE
P. 215
Database Management Systems/Managing Database
Notes memory buffers] *f (M) where f is a “fudge factor”, typically around 1.2. The probe relation
partitions ri need not fit in memory.
Average size of a partition si will be less than M blocks using the formula for n as above thereby
allowing room for the index. If the build relation s is very huge, then the value of n as given by
the above formula may be greater than M” 1 i.e., the number of buckets is > the number of buffer
pages. In such a case, the relation s can be recursively partitioned, instead of partitioning n ways,
use M – 1 partitions for s and further partition the M – 1 partitions using a different hash function.
You should use the same partitioning method on r. This method is rarely required, as recursive
partitioning is not needed for relations of 1GB or less for a memory size of 2MB, with block size
of 4KB.
Cost calculation for Simple Hash-join
1. Cost of partitioning r and s: all the blocks of r and s are read once and after partitioning
written back, so cost 1 = 2 (blocks of r + blocks of s).
2. Cost of performing the hash-join using build and probe will require at least one block
transfer for reading the partitions
Cost 2 = (blocks of r + blocks of s)
3. There are a few more blocks in the main memory that may be used for evaluation, they
may be read or written back. We ignore this cost as it will be too less in comparison to cost
1 and cost 2.
Thus, the total cost = cost 1 + cost 2
= 3 (blocks of r + blocks of s)
Cost of Hash-Join requiring Recursive Partitioning
The cost of partitioning in this case will increase to number of recursion required, it may be
calculated as:
Number of iterations required = ([log (blocks of s) ×1)
M-1
Thus, cost 1 will be modified as:
= 2 (blocks of r + blocks of s) × ([log (blocks of s)] ×1)
M-1
The cost for step (2) and (3) here will be the same as that given in steps (2) and (3) above.
Thus, total cost = 2(blocks of r + blocks of s) ( [log (blocks of s) – 1] ) + (blocks of r + blocks
M–1
of s).
Because s is in the inner term in this expression, it is advisable to choose the smaller relation as
the build relation. If the entire build input can be kept in the main memory, n can be set to 1 and
the algorithm need not partition the relations but may still build an in-memory index, in such
cases the cost estimate goes down to (Number of blocks r + Number of blocks of s).
Handling of Overflows
Even if s is recursively partitioned hash-table overflow can occur, i.e., some partition si may not
fit in the memory. This may happen if there are many tuples in s with the same value for join
attributes or a bad hash function.
208 LOVELY PROFESSIONAL UNIVERSITY