Page 247 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 247
Advanced Data Structure and Algorithms
Notes the h form a geometric series, the sum of all h with k = 1, ..., t is in O(ht) = O( n). Thus O(n⋅n)
k k
sorting steps are needed for this part where k t.
Now let k > t. When the sequence is arranged as an array with h columns there are n/h elements
k
k
in each column. Thus, O((n/h )2) sorting steps are needed to sort each column, since Insertion
k
Sort has quadratic complexity. There are h columns, therefore the number of sorting steps for h -
k
k
sorting the entire data sequence is in O((n/h )2⋅h ) = O(n⋅n/h ). Again, the n/h form a geometric
k
k
k
k
series whose sum is in O(n/ht) = O( n). Therefore, again O(n⋅n) steps are needed for this part
where k > t.
It can be shown that for this h-sequence the upper bound is tight. But there is another h-sequence
that leads to a more efficient behavior of Shellsort.
Theorem: With the h-sequence 1, 2, 3, 4, 6, 8, 9, 12, 16, ..., 2p3q, ... Shellsort needs O(n⋅log(n)2)
steps for sorting a sequence of length n (Pratt [Pra 79]).
Proof: If g = 2 and h = 3, then γ(g, h) = (g-1)⋅(h-1) – 1 = 1, i.e. in a 2,3-sorted sequence to the right
of each element only the next element can be smaller. Therefore, O(n) sorting steps suffice to sort
the sequence with Insertion Sort. Considering elements with odd and with even index separately,
it becomes clear that again O(n) sorting steps suffice to make a 4,6-sorted sequence 2-sorted.
Similarly, O(n) sorting steps suffice to make a 6,9-sorted sequence 3-sorted and so on.
The above h-sequence has the property that for each h also 2h and 3h occurs, so O(n) sorting
k k k
steps suffice for each h . Altogether there are log(n)2 elements in the h-sequence; thus the
k
complexity of Shellsort with this h-sequence is in O(n⋅log(n)2).
The h-sequence of Pratt performs best asymptotically, but it consists of log(n)2 elements.
Particularly, if the data sequence is presorted, a h-sequence with less elements is better, since the
data sequence has to be scanned (by the for-i-loop in the program) for each h , even if only a few
k
sorting steps are performed.
By combining the arguments of these two theorems h-sequences with O(log(n)) elements can be
derived that lead to a very good performance in practice, as for instance the h-sequence of the
program. But unfortunately, there seems to be no h-sequence that gives Shellsort a worst case
performance of O(n⋅log(n)). It is an open question whether possibly the average complexity is in
O(n⋅log(n)).
12.4 Heap Sort
Heapsort is a sorting technique which sorts a contiguous list of length n with O(nlog2(n)
comparisons and movement of entries, even in the worst case. Hence it achieves worst-case
bounds better than those of quicksort, and for contiguous list it is better than mergesort, since it
needs only a small and constant amount of space apart from the list being sorted.
Heapsort proceeds in two phases. First, all the entries in the list are arranged to satisfy heap
property, and then top of the heap is removed and another entry is promoted to take its place
repeatedly. Therefore we need a procedure which builds an initial heap to arrange all the entries
in the list to satisfy heap property. The procedure which builds an initial heap uses a procedure
which adjust the ith entry in the list whose entries at 2i and 2i + 1 positions already satisfy heap
property in such a manner entry at ith position in the list will also satisfy heap property.
The following C code implements the algorithm.
void adjust(list x, int i, int n)
{
int j, k;
Boolean flag;
242 LOVELY PROFESSIONAL UNIVERSITY