Page 215 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE
P. 215

Database Management Systems/Managing Database




                    Notes          memory  buffers] *f (M) where f is a “fudge factor”, typically around 1.2. The probe relation
                                   partitions ri need not fit in memory.
                                   Average size of a partition si will be less than M blocks using the formula for n as above thereby
                                   allowing room for the index. If the build relation s is very huge, then the value of n as given by
                                   the above formula may be greater than M” 1 i.e., the number of buckets is > the number of buffer
                                   pages. In such a case, the relation s can be recursively partitioned, instead of partitioning n ways,
                                   use M – 1 partitions for s and further partition the M – 1 partitions using a different hash function.
                                   You should use the same partitioning method on r. This method is rarely required, as recursive
                                   partitioning is not needed for relations of 1GB or less for a memory size of 2MB, with block size
                                   of 4KB.

                                   Cost calculation for Simple Hash-join

                                   1.  Cost of partitioning r and s: all the blocks of r and s are read once and after partitioning
                                       written back, so cost 1 = 2 (blocks of r + blocks of s).

                                   2.  Cost of performing the hash-join using build and probe will require at least one block
                                       transfer for reading the partitions
                                       Cost 2 = (blocks of r + blocks of s)

                                   3.  There are a few more blocks in the main memory that may be used for evaluation, they
                                       may be read or written back. We ignore this cost as it will be too less in comparison to cost
                                       1 and cost 2.
                                       Thus, the total cost = cost 1 + cost 2
                                       = 3 (blocks of r + blocks of s)

                                   Cost of Hash-Join requiring Recursive Partitioning

                                   The cost of partitioning in this case will increase to number of recursion required, it may be
                                   calculated as:
                                   Number of iterations required = ([log   (blocks of s) ×1)
                                                                  M-1
                                   Thus, cost 1 will be modified as:
                                          = 2 (blocks of r + blocks of s) × ([log   (blocks of s)] ×1)
                                                                       M-1
                                   The cost for step (2) and (3) here will be the same as that given in steps (2) and (3) above.
                                   Thus, total cost = 2(blocks of r + blocks of s) ( [log  (blocks of s) – 1] ) + (blocks of r + blocks
                                                                            M–1
                                   of s).
                                   Because s is in the inner term in this expression, it is advisable to choose the smaller relation as
                                   the build relation. If the entire build input can be kept in the main memory, n can be set to 1 and
                                   the algorithm need not partition the relations but may still build an in-memory index, in such
                                   cases the cost estimate goes down to (Number of blocks r + Number of blocks of s).

                                   Handling of Overflows

                                   Even if s is recursively partitioned hash-table overflow can occur, i.e., some partition si may not
                                   fit in the memory. This may happen if there are many tuples in s with the same value for join
                                   attributes or a bad hash function.







          208                               LOVELY PROFESSIONAL UNIVERSITY
   210   211   212   213   214   215   216   217   218   219   220