Page 265 - DCAP605_ADVANCED_DATA_STRUCTURE_AND

Page 265 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS

P. 265

Advanced Data Structure and Algorithms

Notes Therefore, the probability follows the binomial distribution, which has

Mean: E[n ] = np = 1
i
Variance: Var[n ] = np(1 – p) = 1 – 1/n
i
For any random variable, we have
2
E[ n ] = Var[n ] + E [n ]
2
i i i
= 1 – 1/n + 1 2
= 2 – 1n
= Θ(1)
Putting this value in equation A above, (do some tweaking) and we have a expected time for
INSERTION_SORT, O(n).

Now back to our original problem
In the above Bucket sort algorithm, we observe
T(n) = [Time to insert n elements in array A] + [Time to go through auxiliary array B[0 . . n-1] *
(Sort by INSERTION_SORT)
= O(n) + (n-1) (n)
= O(n)

Therefore, the entire Bucket sort algorithm runs in linear expected time.

Task Discuss bubble sort with suitable example.

12.9 External Sorting

External sorting refers to the sorting of a ﬁle that is on disk (or tape). Internal sorting refers to the

sorting of an array of data that is in RAM. The main concern with external sorting is to minimize
disk access since reading a disk block takes about a million times longer than accessing an item
in RAM (according to Shaffer – see the reference at the end of this document).
Perhaps the simplest form of external sorting is to use a fast internal sort with good locality of
reference (which means that it tends to reference nearby items, not widely scattered items) and
hope that your operating system’s virtual memory can handle it. (Quicksort is one sort algorithm

that is generally very fast and has good locality of reference.) If the ﬁle is too huge, however, even
virtual memory might be unable to ﬁt it. Also, the performance may not be too great due to the

large amount of time it takes to access data on disk.
Methods

Most external sort routines are based on mergesort. They typically break a large data ﬁ le into
a number of shorter, sorted “runs”. These can be produced by repeatedly reading a section of

the data ﬁle into RAM, sorting it with ordinary quicksort, and writing the sorted data to disk.
After the sorted runs have been generated, a merge algorithm is used to combine sorted ﬁ les into

longer sorted ﬁles. The simplest scheme is to use a 2-way merge: merge 2 sorted ﬁles into one

sorted ﬁle, then merge 2 more, and so on until there is just one large sorted ﬁle. A better scheme

is a multiway merge algorithm: it might merge perhaps 128 shorter runs together.

260 LOVELY PROFESSIONAL UNIVERSITY

260 261 262 263 264 265 266 267 268 269 270