8.10. RADIX SORTING 159
here as a generic term encompassing not only decimal digits, but also alphabetic
characters, or whatever is appropriate to the data one is sorting.
8.10.1 LSD-first radix sorting
The idea of the LSD-fir st algorithm is to first u se the least significant character to
order all records, then the second-least significant, and so f orth. At each stage, we
perform a stable sort, so that if the k most significant characters of two records
are identical, they will remain sorted by the remaining, least significant, characters.
Because characters have a limited range of values, it is easy to sort them in linear
time (using, for example, distributionSort2, or, if the records are kept in a linked
list, by keeping an array of list headers, one for each possible character value).
Figure 8.9 illustrates the process.
LSD-first radix sort is p recisely the algorithm used by card sorters. These ma-
chines had a series of bins and could be programmed (using plugboards) to drop
cards from a feeder into bins depen ding on what was punched in a particular col-
umn. By repeating the process for each column, one ended up with a sorted deck
of card s.
Each distribution of a record to a bin takes (abou t) constant time (assuming we
use pointers to avoid moving large amounts of data around). Thus, the total time
is proportional to the total amount of key data—which is th e total number of bytes
in all keys. In other words, radix sorting is O(B) where B is the total number of
bytes of key data. If keys are K bytes long, then B = NK, w here N is the number
of records. S ince merge sorting, heap sorting, etc., require O(N lg N ) comparisons,
each requiring in the worst case K time, we get a total time of O(NK lg N) =
O(B lg N) time for these sorts. Even if we assume constant comparison time, if
keys are no longer than they have to be (in order to provide N different keys we
must h ave K ≥ log
C
N, where C is the number of possible characters), then radix
sorting is also O(N lg N).
Thus, relaxing the constraint on what we can do to keys yields a fast sorting
procedur e, at least in pr inciple. As usual, the Devil is in the details. If the keys
are considerably longer than log
C
N, as they very often are, the passes made on the
last characters will typically be largely wasted. One possible improvement, which
Knuth credits to M. D. Maclaren, is to use LSD-fir st radix sort on the first two
characters, and then finish with an insertion sort (on the theory that thin gs will
almost be in order after the radix sort). We must fudge th e definition of “character”
for this purpose, allowing characters to grow slightly with N. For example, when
N = 100000, Maclaren’s optimal procedure is to sort on the first and second 10-bit
segments of the key (on an 8-bit machine, this is the first 2.25 characters). Of
course, this technique can, in principle, make no guarantees of O(B) performance.
8.10.2 MSD-first radix sorting
Performing r ad ix sort starting at the most significant digit probably seems more
natural to most of us. We sort the input by the first (most-significant) character
into C (or fewer) subsequences, one for each starting character (that is, the first