Page 194 - DCAP407_DATA_STRUCTURE
P. 194

Unit 10:  Trees



               There  are two data encoding schemes and Huffman  Encoding scheme falls under one of the two
               schemes. The following are the two data encoding schemes:
               1.   Fixed Length Encoding: In fixed length encoding, all symbols are encoded using the same number
                    of bits. An example of fixed length encoding is ASCII code which uses 7 bits to encode a total of
                    128 different symbols. The difficulty with fixed length codes is that the probability of occurrence of
                    the symbols to be encoded is not considered. A symbol that occurs 1000 times is encoded with the
                    same number of bits as a symbol which  occurs  only  10 times. This disadvantage makes fixed
                    length encoding inefficient for data compression.
               2.   Variable Length Encoding: Variable length encoding assigns less number of bits to symbols which
                    occur more often and more  number of bits to symbols whose frequency is less. The  Huffman
                    Encoding scheme falls in the category of variable length encoding  i.e.,  code for the symbol
                    depends on the frequency of occurrence of that symbol.
               Algorithm for Constructing Huffman tree
               The following sequence of steps needs to be followed to construct a Huffman tree:
               1.   Input all symbols along with their respective frequencies.
               2.   Create leaf nodes representing the scanned symbols.
               3.   Let S be a set containing all the nodes created in step 2

               When there is only one node in S, the following steps need to be followed:
               1.   Sort the nodes (symbols) in S with respect to their frequencies.
               2.   Create a new node to combine the two nodes with least frequencies.
               3.   Frequency of this new combined node will be equal to the sum of frequencies of nodes which were
                    combined. This newly  created combined node will be the parent of two nodes which were
                    combined.
               4.   Replace the two nodes which were combined with the new combined node in S.
               5.   After the 4   step,  you will be left with only one node, which is the root of the Huffman tree,
                             th
                    having frequency equal to the sum of all frequencies of all symbols. Thus, a tree is generated with
                    leaf nodes containing the basic symbols whose code is to be found.
               With the help of an example we will learn how to construct a Huffman tree. The table 10.1 shows the
               frequency of occurrence of different symbols.


                                            Table 10.1: Symbol Frequency Table



                                          Symbol  Frequency of Occurrence

                                             A              24

                                             B              12

                                             C              10

                                             D               8

                                             E               8



               Using the symbols and frequencies from the table 10.1, we can create the leaf nodes and then sort them.
               Symbols D and E have the least frequency, 8; these two nodes are combined to make a node DE having



                                        LOVELY PROFESSIONAL UNIVERSITY                          187
   189   190   191   192   193   194   195   196   197   198   199