Page 254 - DCAP605_ADVANCED_DATA_STRUCTURE_AND

Page 254 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS

P. 254

Unit 12: Sorting

When only one active run remains the algorithm ﬁnishes up as shown in lines 30 and 32 of Notes

TWOWAY-MERGE-it just copies all the remaining elements to the end of ﬁle X. Figure 12.1
visualizes multiway merging.
Figure 12.1: Main-memory Organization for Multiway Merging

Input buffers, one for
each unfinished run
Select smallest
unchosen Bf 0
element for
output

P 0

………..

Pointers to first
unchosen elements

It is easy to see that phase 2 of the two-phase, multiway merge-sort algorithm performs only θ
(n) I/O operations and this is also the running time of the whole algorithm. In spite of this, the
algorithm has a limitation-it can not sort very large ﬁ les.
If phase 1 of the algorithm produces more than m – 1 runs (N/M > m – 1), all runs can not be
merged in one go in phase 2, because each run requires a one-page input buffer in main-memory
and one page of main-memory is reserved for the output buffer. How large should the ﬁle be for

this to happen?

Multiway Merge Sort of Very Large Files

Sometimes there may be a need to sort extremely large ﬁles or there is only a small amount of
available main memory. As described in the previous section, two-phase, multiway merge sort
may not work in such situations.

A natural way to extend the two-phase, multiway merge sort for ﬁles of any size is to do not
one but many iterations in phase 2 of the algorithm. That is, we employ the external memory
merge-sort algorithm from Section 3, but instead of using TWOWAY-MERGE, we use the
multiway merging (as described in the previous section) to merge m – 1 runs from ﬁle Y into one

run in ﬁle X. Then, in each iteration of the main loop of phase 2, we reduce the number of runs

by a factor of m – 1.
What is the running time of this algorithm, which we call simply multiway merge sort. Phase 1
and each iteration of the main loop of phase 2 takes θ(n) I/O operations. After phase 1, we start
up with [N/M] = [n/m] runs, each iteration of the main loop of phase 2 reduces the number of
runs by a factor of m – 1, and we stop when we have just one run. Thus, there are log m–1 (n/m)
iterations of the main loop of phase 2. Therefore, the total running time of the algorithm is
θ(n log (n/m)) = θ (n log n – n log m) = θ (n log n – n) = θ (n log n).
m–1 m m m m
Remember that the cost of the external-memory merge-sort algorithm from Section 3 is
θ(n log (n/m)). Thus, multiway merge sort is faster by a factor of (1 – 1 /log n)log m). Actually,
2 m 2

LOVELY PROFESSIONAL UNIVERSITY 249

249 250 251 252 253 254 255 256 257 258 259