Page 265 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 265
Advanced Data Structure and Algorithms
Notes Therefore, the probability follows the binomial distribution, which has
Mean: E[n ] = np = 1
i
Variance: Var[n ] = np(1 – p) = 1 – 1/n
i
For any random variable, we have
2
E[ n ] = Var[n ] + E [n ]
2
i i i
= 1 – 1/n + 1 2
= 2 – 1n
= Θ(1)
Putting this value in equation A above, (do some tweaking) and we have a expected time for
INSERTION_SORT, O(n).
Now back to our original problem
In the above Bucket sort algorithm, we observe
T(n) = [Time to insert n elements in array A] + [Time to go through auxiliary array B[0 . . n-1] *
(Sort by INSERTION_SORT)
= O(n) + (n-1) (n)
= O(n)
Therefore, the entire Bucket sort algorithm runs in linear expected time.
Task Discuss bubble sort with suitable example.
12.9 External Sorting
External sorting refers to the sorting of a file that is on disk (or tape). Internal sorting refers to the
sorting of an array of data that is in RAM. The main concern with external sorting is to minimize
disk access since reading a disk block takes about a million times longer than accessing an item
in RAM (according to Shaffer – see the reference at the end of this document).
Perhaps the simplest form of external sorting is to use a fast internal sort with good locality of
reference (which means that it tends to reference nearby items, not widely scattered items) and
hope that your operating system’s virtual memory can handle it. (Quicksort is one sort algorithm
that is generally very fast and has good locality of reference.) If the file is too huge, however, even
virtual memory might be unable to fit it. Also, the performance may not be too great due to the
large amount of time it takes to access data on disk.
Methods
Most external sort routines are based on mergesort. They typically break a large data fi le into
a number of shorter, sorted “runs”. These can be produced by repeatedly reading a section of
the data file into RAM, sorting it with ordinary quicksort, and writing the sorted data to disk.
After the sorted runs have been generated, a merge algorithm is used to combine sorted fi les into
longer sorted files. The simplest scheme is to use a 2-way merge: merge 2 sorted files into one
sorted file, then merge 2 more, and so on until there is just one large sorted file. A better scheme
is a multiway merge algorithm: it might merge perhaps 128 shorter runs together.
260 LOVELY PROFESSIONAL UNIVERSITY