Page 254 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 254

Unit 12: Sorting





          When only one active run remains the algorithm finishes up as shown in lines 30 and 32 of   Notes

          TWOWAY-MERGE-it just copies all the remaining elements to the end of  file X. Figure 12.1
          visualizes multiway merging.
                         Figure 12.1: Main-memory Organization for Multiway Merging



            Input buffers, one for
            each unfinished run
                                               Select smallest
                                               unchosen                        Bf 0
                                               element for
                                               output

                                                                       P 0

                                   ………..







                                 Pointers to first
                                unchosen elements

          It is easy to see that phase 2 of the two-phase, multiway merge-sort algorithm performs only θ
          (n) I/O operations and this is also the running time of the whole algorithm. In spite of this, the
          algorithm has a limitation-it can not sort very large fi les.
          If phase 1 of the algorithm produces more than m – 1 runs (N/M > m – 1), all runs can not be
          merged in one go in phase 2, because each run requires a one-page input buffer in main-memory
          and one page of main-memory is reserved for the output buffer. How large should the file be for

          this to happen?

          Multiway Merge Sort of Very Large Files


          Sometimes there may be a need to sort extremely large files or there is only a small amount of
          available main memory. As described in the previous section, two-phase, multiway merge sort
          may not work in such situations.

          A natural way to extend the two-phase, multiway merge sort for files of any size is to do not
          one but many iterations in phase 2 of the algorithm. That is, we employ the external memory
          merge-sort algorithm from Section 3, but instead of using TWOWAY-MERGE, we use the
          multiway merging (as described in the previous section) to merge m – 1 runs from file Y into one

          run in file X. Then, in each iteration of the main loop of phase 2, we reduce the number of runs

          by a factor of m – 1.
          What is the running time of this algorithm, which we call simply multiway merge sort. Phase 1
          and each iteration of the main loop of phase 2 takes θ(n) I/O operations. After phase 1, we start
          up with [N/M] = [n/m] runs, each iteration of the main loop of phase 2 reduces the number of
          runs by a factor of m – 1, and we stop when we have just one run. Thus, there are log m–1 (n/m)
          iterations of the main loop of phase 2. Therefore, the total running time of the algorithm is
          θ(n log  (n/m)) = θ (n log  n – n log m) = θ (n log  n – n) = θ (n log  n).
                m–1            m       m          m             m
          Remember that the cost of the external-memory merge-sort algorithm from Section 3 is
          θ(n log (n/m)). Thus, multiway merge sort is faster by a factor of  (1 – 1 /log n)log m). Actually,
                2                                                      m    2



                                           LOVELY PROFESSIONAL UNIVERSITY                                   249
   249   250   251   252   253   254   255   256   257   258   259