Page 211 - DCAP310_INTRODUCTION_TO_ARTIFICIAL_INTELLIGENCE_AND_EXPERT_SYSTEMS
P. 211

Unit 11: Knowledge Organization and Management




          which is eight characters that we can even memorize. Now suppose you place those characters  Notes
          in your ‘decrypter’ and you get 1024 TB ‘decompressed’. Most plainly said, we have a simple
          equation where we have three variables (a, b and c). Within hash functions we only know the
          value of the output which would be (if c = (a + b)) the letter ‘c’. So we need 2*c attempts to find out
          the result which means brute-force.
                                 Figure 11.1: Working of Hash Function



















          A hash function should be referentially transparent (stable), i.e., if called twice on input that is
          “equal” (for example, strings that consist of the same sequence of characters), it should give the
          same result. There is a construct in many programming languages that allows the user to
          override equality and hash functions for an object: if two objects are equal, their hash codes must
          be the same. This is crucial to finding an element in a hash table quickly, because two of the same
          element would both hash to the same slot.

          All hash functions that map a larger set of data to a smaller set of data cause collisions. Such hash
          functions try to map the keys to the hash values as evenly as possible because collisions become
          more frequent as hash tables fill up. Thus, single-digit hash values are frequently restricted to
          80% of the size of the table. Depending on the algorithm used, other properties may be required
          as well, such as double hashing and linear probing. Although the idea was conceived in the
          1950s, the design of good hash functions is still a topic of active research.
          Hash functions are related to (and often confused with) checksums, check digits, fingerprints,
          randomization functions, error correcting codes, and cryptographic hash functions. Although
          these concepts overlap to some extent, each has its own uses and requirements and is designed
          and optimized differently.

          Hash Tables

          Hash functions are primarily used in hash tables, to quickly locate a data record (e.g., a dictionary
          definition) given its search key (the headword). Specifically, the hash function is used to map the
          search key to an index; the index gives the place in the hash table where the corresponding
          record should be stored. Hash tables, in turn, are used to implement associative arrays and
          dynamic sets.
          Typically, the domain of a hash function (the set of possible keys) is larger than its range (the
          number of different table indexes), and so it will map several different keys to the same index.
          Therefore, each slot of a hash table is associated with (implicitly or explicitly) a set of records,
          rather than a single record. For this reason, each slot of a hash table is often called a bucket, and
          hash values are also called bucket indices.
          Thus, the hash function only hints at the record’s location—it tells where one should start
          looking for it. Still, in a half-full table, a good hash function will typically narrow the search



                                           LOVELY PROFESSIONAL UNIVERSITY                                   205
   206   207   208   209   210   211   212   213   214   215   216