Page 198 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 198
Unit 9: Hashing
of the square are used to get the hash value. Since the middle bits of the square usually Notes
depend on all the characters in the identifier, it is expected that different identifi ers will
result in different values. The number of middle bits that we select depends on table size.
Therefore if r is the number of middle bits used to form hash value, then the table size will
be 2r, hence when you use mid square method the table size should be a power of 2.
9.3 Open Hashing
The simplest form of open hashing defines each slot in the hash table to be the head of a linked
list. All records that hash to a particular slot are placed on that slot’s linked list. The fi gure below
illustrates a hash table where each slot stores one record and a link pointer to the rest of the list.
Records within a slot’s list can be ordered in several ways: by insertion order, by key value order,
or by frequency-of-access order. Ordering the list by key value provides an advantage in the case
of an unsuccessful search, because I know to stop searching the list once you encounter a key
that is greater than the one being searched for. If records on the list are unordered or ordered by
frequency, then an unsuccessful search will need to visit every record on the list.
Given a table of size M storing N records, the hash function will (ideally) spread the records evenly
among the M positions in the table, yielding on average N/M records for each list. Assuming
that the table has more slots than there are records to be stored, you can hope that few slots will
contain more than one record. In the case where a list is empty or has only one record, a search
requires only one access to the list. Thus, the average cost for hashing should be Θ(1). However,
if clustering causes many records to hash to only a few of the slots, then the cost to access a record
will be much higher because many elements on the linked list must be searched.
Open hashing is most appropriate when the hash table is kept in main memory, with the lists
implemented by a standard in-memory linked list. Storing an open hash table on disk in an
efficient way is difficult, because members of a given linked list might be stored on different disk
blocks. This would result in multiple disk accesses when searching for a particular key value,
which defeats the purpose of using hashing.
Let:
1. U be the universe of keys:
(a) Integers
(b) Character strings
(c) Complex bit patterns
LOVELY PROFESSIONAL UNIVERSITY 193