[gclist] Name that hypothesis

Jerry Leichter leichter@smarts.com
Tue, 10 Dec 1996 14:39:22 -0500


I'm really curious about your qsort results.

Most people identify "qsort" with "pick the first (last) element as the pivot", 
an identification that becomes clear when they tell you that a pre-sorted input 
is the worst case, and produces O(n^2) behavior (i.e., every partition pulls off 
just 1 element).

In fact, *even Tony Hoare's original paper on quicksort* recognized this 
problem, and recommended *against* using the first or last element.  Hoare 
suggested taking the first, middle, and last element and using whichever was "in 
the middle" of those three values as the pivot.  Knuth later showed that doing 
this by sorting those three elements and using the resulting center element as 
the pivot was "safe", in that it didn't affect the statistical assumptions that 
give you the "average O(n log n)" case.  If you do it this way, the resulting 
partitions are optimal for both the initially sorted and initially sorted 
backwards cases. It also guarantees that neither the first nor the last element 
will be swapped, which you can use to eliminate most of the index boundary 
comparisons.  For small keys, this can be half the cost of running the 
algorithm!

The worst case is still O(N^2), and your "merging decks" example produces this 
behavior.  (It's still "twice as good" as pure center-pivot - it pulls off the 
two highest or two lowest elements each time.)

Using your idea of noticing when a partition is bad and then using, not the 
center element, but the one 1/4 of the way in, combined with continuing to take 
the "middle of first, nth-of-the-way in, and last", would work very well for 
this case.  Another alternative is to generalize the "middle of three".  There's 
an old varient, perhaps even due to Hoare, called SampleSort, which says that 
you should sample k elements, sort them, and choose their middle element as the 
pivot.  Statistically, k=3 is already "good enough".  "Deck shuffling" has a 
bias that makes it fail badly when k is fixed at 3, and the 3 elements are 
always the first, middle, and last.  One could step up to k=5 when things go 
bad, or stick to k=3 and use 3 *randomly chosen* elements.  In either case, the 
worst case remains O(N^2), but it's going to be quite difficult to construct an 
example.  (I will admit that hardly anyone bothers to do this.  The "shuffling 
decks" case is pretty unlikely, and when it does occur, the two "decks" are 
almost certainly of different lengths.  Smaller partitions in this case are of 
length something like m+2, where m is the difference in the "deck" lengths, so 
for any reasonable m, the performance isn't too bad.)

I've never seen qsort code that uses a pure center-pivot algorithm.  Anyone who 
propagates such code simply doesn't know what he's doing.  (That's not to say 
there isn't *tons* of such code out there.)

Quicksort is one of the most heavily studied algorithms around.  Optimizations 
for the many-equal-elements case are known.  (Essentially, pull runs of equal 
elements "off to the side" as you find them; leave just one "representative" to 
participate in later sorting passes.)  A less-well-known, but very important 
optimization has to do with the end case.  Most people just let the recursion - 
whether written that way or with explicit stack manipulation - run down to some 
trivial case, often a 1-element partition (or a 3-element partition if they are 
using the "center of 3" algorithm).  This is bad, since there is a large fixed 
cost to do a partition run.  It's much better to sort "small" partitions - the 
size of which depends on all sorts of details, but is typically in the 6-12 
element range - by some other, low-overhead methods - often bubble sort.  Ah, 
but in fact one can do better!  Simply ignore "small" partitions as they are 
formed - detect them *before* recursion/stack push and do *nothing*.  When 
quicksort is done, do a single bubble-sort over the entire array.  Since the 
cost of bubble-sort depends on the maximum distance an element must move, and no 
element can possibly leave the last partition that contained it, and that's 
certain to be small, you get essentially the same effect as bubble sorting the 
small partitions directly, but you only pay the fixed costs of doing the bubble 
sort once, and you save the cost of keeping track of the small partitions.

Shell sort isn't bad, but quicksort is almost always faster, and has rather nice 
memory access patterns in a VM system (though the access patterns aren't LRU; 
ideally, you'd want to be able to tell the pager that they are linear).  Except 
in artificial circumstances, I'm not sure the effort of writing two complete 
sorts is worth it, since the "bad" cases for quicksort can so easily be made 
very unlikely.
							-- Jerry