Page 83 - DCAP507_SYSTEM_SOFTWARE
P. 83
Unit 5: Table Processing: Searching
5.3 Hash Searching Notes
Hash searching is also termed as Random Entry Searching. Binary search algorithms, being fast,
can only function on tables that are ordered and packed, that is, tables that have contiguous
items ordered by keywords.
Did u know? These kind of search procedures may have to be utilized in combination with
a sort algorithm which orders the data and packs the data as well.
In fact, it is needless for the table to be ordered and packed to attain good speed in searching. As
for now, it is likely to perform considerably better with an unpacked, unordered table. The
number of storage spaces assigned to it goes beyond the number of items to be accumulated.
Putting elements in order slows down the procedure. A significant enhancement can be attained
by inserting elements in a random manner. The random entry-number K is developed from the
key. If the Kth position is void, then the new element is placed there; if not, then some other cell
must be found for the insertion.
The primary problem is the production of a random number from the key. Certainly we don’t
in fact need a random number in the sense that a specified keyword may give up one position
today and another tomorrow. What we need is a process that will produce pseudo-random,
reliable table positions for keywords.
One quite good outlook for four character EBCDIC keywords is to merely divide the keyword
by the table length N and use the remainder. This method functions well providing N and the
key size (32 bits in our case) have no common factors. For a specified group of M keywords, the
remainders should be fairly equally distributed over O”{N-l). Another method is to consider the
keyword as a binary fraction and multiply it by another binary fraction:
L 1,SYMBOL
M 0,RHO
The outcome is a 64-bit product in registers 0 and 1. If RHO is selected cautiously in the low order
31 bits will be equally distributed among 0 to 1, and a second .multiplication by N will produce
a number equally distributed over 0(N–l). This is called the power residue technique.
It has the benefit that the 31·bit first result can be used to generate another equally distributed
number (by multiplying again by RHO) in the event that the first probe of the table is ineffective.
The second problem is the process to be followed when the first tryout entry results in a filled
position. There are a number of techniques of resolving this problem. These are:
1. Random Entry with Replacement: A series of random numbers is generated from the
keyword (like by the power residue method). From each of these a number between 1 and
N is formed and the table is probed at that position. Probings are terminated when a void
space is found.
Note Observe that the random numbers generated are independent and it is perfectly
possible (but not likely) to probe the same position twice.
2. Random Entry without Replacement: This is the similar as above excluding that any
effort to probe the same position two times is avoided. This technique holds benefit over
the above only when probes are luxurious, e.g., for files on tape or drum.
LOVELY PROFESSIONAL UNIVERSITY 77