Page 198 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 198

Unit 9: Hashing




               of the square are used to get the hash value. Since the middle bits of the square usually   Notes

               depend on all the characters in the identifier, it is expected that different identifi ers will
               result in different values. The number of middle bits that we select depends on table size.
               Therefore if r is the number of middle bits used to form hash value, then the table size will
               be 2r, hence when you use mid square method the table size should be a power of 2.

          9.3 Open Hashing


          The simplest form of open hashing defines each slot in the hash table to be the head of a linked
          list. All records that hash to a particular slot are placed on that slot’s linked list. The fi gure below
          illustrates a hash table where each slot stores one record and a link pointer to the rest of the list.





















          Records within a slot’s list can be ordered in several ways: by insertion order, by key value order,
          or by frequency-of-access order. Ordering the list by key value provides an advantage in the case
          of an unsuccessful search, because I know to stop searching the list once you encounter a key
          that is greater than the one being searched for. If records on the list are unordered or ordered by
          frequency, then an unsuccessful search will need to visit every record on the list.
          Given a table of size M storing N records, the hash function will (ideally) spread the records evenly
          among the M positions in the table, yielding on average N/M records for each list. Assuming
          that the table has more slots than there are records to be stored, you can hope that few slots will
          contain more than one record. In the case where a list is empty or has only one record, a search
          requires only one access to the list. Thus, the average cost for hashing should be Θ(1). However,
          if clustering causes many records to hash to only a few of the slots, then the cost to access a record
          will be much higher because many elements on the linked list must be searched.
          Open hashing is most appropriate when the hash table is kept in main memory, with the lists
          implemented by a standard in-memory linked list. Storing an open hash table on disk in an

          efficient way is difficult, because members of a given linked list might be stored on different disk

          blocks. This would result in multiple disk accesses when searching for a particular key value,
          which defeats the purpose of using hashing.
          Let:
          1.   U be the universe of keys:
               (a)  Integers
               (b)  Character strings
               (c)   Complex bit patterns








                                           LOVELY PROFESSIONAL UNIVERSITY                                   193
   193   194   195   196   197   198   199   200   201   202   203