[gclist] Name that hypothesis
Jerry Leichter
leichter@smarts.com
Tue, 10 Dec 1996 14:39:22 -0500
I'm really curious about your qsort results.
Most people identify "qsort" with "pick the first (last) element as the pivot",
an identification that becomes clear when they tell you that a pre-sorted input
is the worst case, and produces O(n^2) behavior (i.e., every partition pulls off
just 1 element).
In fact, *even Tony Hoare's original paper on quicksort* recognized this
problem, and recommended *against* using the first or last element. Hoare
suggested taking the first, middle, and last element and using whichever was "in
the middle" of those three values as the pivot. Knuth later showed that doing
this by sorting those three elements and using the resulting center element as
the pivot was "safe", in that it didn't affect the statistical assumptions that
give you the "average O(n log n)" case. If you do it this way, the resulting
partitions are optimal for both the initially sorted and initially sorted
backwards cases. It also guarantees that neither the first nor the last element
will be swapped, which you can use to eliminate most of the index boundary
comparisons. For small keys, this can be half the cost of running the
algorithm!
The worst case is still O(N^2), and your "merging decks" example produces this
behavior. (It's still "twice as good" as pure center-pivot - it pulls off the
two highest or two lowest elements each time.)
Using your idea of noticing when a partition is bad and then using, not the
center element, but the one 1/4 of the way in, combined with continuing to take
the "middle of first, nth-of-the-way in, and last", would work very well for
this case. Another alternative is to generalize the "middle of three". There's
an old varient, perhaps even due to Hoare, called SampleSort, which says that
you should sample k elements, sort them, and choose their middle element as the
pivot. Statistically, k=3 is already "good enough". "Deck shuffling" has a
bias that makes it fail badly when k is fixed at 3, and the 3 elements are
always the first, middle, and last. One could step up to k=5 when things go
bad, or stick to k=3 and use 3 *randomly chosen* elements. In either case, the
worst case remains O(N^2), but it's going to be quite difficult to construct an
example. (I will admit that hardly anyone bothers to do this. The "shuffling
decks" case is pretty unlikely, and when it does occur, the two "decks" are
almost certainly of different lengths. Smaller partitions in this case are of
length something like m+2, where m is the difference in the "deck" lengths, so
for any reasonable m, the performance isn't too bad.)
I've never seen qsort code that uses a pure center-pivot algorithm. Anyone who
propagates such code simply doesn't know what he's doing. (That's not to say
there isn't *tons* of such code out there.)
Quicksort is one of the most heavily studied algorithms around. Optimizations
for the many-equal-elements case are known. (Essentially, pull runs of equal
elements "off to the side" as you find them; leave just one "representative" to
participate in later sorting passes.) A less-well-known, but very important
optimization has to do with the end case. Most people just let the recursion -
whether written that way or with explicit stack manipulation - run down to some
trivial case, often a 1-element partition (or a 3-element partition if they are
using the "center of 3" algorithm). This is bad, since there is a large fixed
cost to do a partition run. It's much better to sort "small" partitions - the
size of which depends on all sorts of details, but is typically in the 6-12
element range - by some other, low-overhead methods - often bubble sort. Ah,
but in fact one can do better! Simply ignore "small" partitions as they are
formed - detect them *before* recursion/stack push and do *nothing*. When
quicksort is done, do a single bubble-sort over the entire array. Since the
cost of bubble-sort depends on the maximum distance an element must move, and no
element can possibly leave the last partition that contained it, and that's
certain to be small, you get essentially the same effect as bubble sorting the
small partitions directly, but you only pay the fixed costs of doing the bubble
sort once, and you save the cost of keeping track of the small partitions.
Shell sort isn't bad, but quicksort is almost always faster, and has rather nice
memory access patterns in a VM system (though the access patterns aren't LRU;
ideally, you'd want to be able to tell the pager that they are linear). Except
in artificial circumstances, I'm not sure the effort of writing two complete
sorts is worth it, since the "bad" cases for quicksort can so easily be made
very unlikely.
-- Jerry