Page 262 - DCAP201_FUNDAMENTALS_OF_DATA_STRUCTURES
P. 262
Unit 14: Searching
The algorithm is easy to remember if you think about a child’s guessing game. Imagine a Notes
number between 1 and 1000 and guess the number. Each time you guess, you will get to know
“higher” or “lower.” Of course you would begin by guessing 500, then either 250 or 750 depending
on the response.
You would continue to refine your choice until you got the number correct.
14.2.1 Characteristics
Every iteration eliminates half of the remaining possibilities. This makes binary searches very
efficient – even for large collections. Our implementation relies on recursion, though it is
equally as common to see an iterative approach.
Binary search requires a sorted collection. This means the collection must either be sorted before
searching, or inserts/updates must be smart. Also, binary searching can only be applied to a
collection that allows random access (indexing).
In the real world, binary searching is frequently used thanks to its performance characteristics
over large collections. The only time binary searching doesn’t make sense is when the collection
is being frequently updated (relative to searches), since resorting will be required.
Hash tables can often provide better (though somewhat unreliable) performance. Typically, it’s
relatively clear when data belongs in a hash table (for frequent lookups) versus when a search is
needed.
14.2.2 Analysis
n
A binary search on an array is O(log ) because at each test you can “throw out” one half of the
2
search space. If we assume n, the number of items to search, is a power of two (i.e. n = 2x) then,
given that n is cut in half at each comparison, the most comparisons needed to find a single item
in n is x.
Notes It is noteworthy that for very small arrays a linear search can prove faster than a
binary search. However as the size of the array to be searched increases the binary search
is the clear victor in terms of number of comparisons and therefore overall speed.
Still, the binary search has some drawbacks. First of all, it requires that the data to be searched
be in sorted order. If there is even one element out of order in the data being searched it can
throw off the entire process.
!
Caution When presented with a set of unsorted data the efficient programmer must decide
whether to sort the data and apply a binary search or simply apply the less-efficient linear
search.
Even the best sorting algorithm is a complicated process. Is the cost of sorting the data is worth
the increase in search speed gained with the binary search? If you are searching only once, it is
probably better to do a linear search in most cases.
Once the data is sorted it can also prove very expensive to add or delete items. In a sorted array,
for instance, such operations require a ripple-shift of array elements to open or close a “hole” in
the array.
LOVELY PROFESSIONAL UNIVERSITY 255