Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site rtp47.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!unc!mcnc!rti-sel!rtp47!throopw
From: throopw@rtp47.UUCP (Wayne Throop)
Newsgroups: net.math
Subject: Re: Sorting a sorted list by a different order of keys
Message-ID: <76@rtp47.UUCP>
Date: Mon, 24-Jun-85 12:58:55 EDT
Article-I.D.: rtp47.76
Posted: Mon Jun 24 12:58:55 1985
Date-Received: Wed, 26-Jun-85 04:42:23 EDT
References: <7321@watdaisy.UUCP>
Organization: Data General, RTP, NC
Lines: 91

I agree with most of what Gregory Rawlins said in the referenced
posting, but there are two points I'd like to clarify.

>>>>Actually, there are some sorting methods that are better than N log N;
>>>     Pardon me? Loose lips sink ships, please define your model.
>>As just an example -- not the best possible algorithm -- [...etc...]

>    I try not to clutter the net with things like this but i wish
>to point out that the time bound of any algorithm _depends on your model_.

Hear hear.

>That was the reason i asked for a definition of the model. I didn't
>want people to go away with the notion that nlgn is beatable in
>the general case where you know nothing about your data and you
>are counting the minimum number of comparisons necessary and
>sufficient to transform any permutation of n data items to some
>(fixed) permutation of itself.

Here is point one where I disagree and want to put in my two cents
worth.  Comparison sorting is not the "general case" of sorting.  The
general case of sorting is generating an "ordered" permutation of n data
items.  Comparison sorting "knows" something about the data items, in
that it knows there is some test on any two of them that will answer
the question "is x less-than y".  I claim that this is "knowing
something" about the data.

Thus, I think that the original poster was quite reasonable when he said
"there are some sorting methods that are better than NlogN"  He
*didn't* say "there are some comparison sorting methods better than
NlogN", which is indeed false.

>    There are only three ways to "beat" nlgn. Either you know
>something about the distribution of the input prior to running
>you algorithm or you decided to count something other than the
>number of comparisons of two data elements, or you count something
>other than the worst case possible, in all three cases you
>have changed the model.

I agree 100%, keeping in mind that what this boils down to is "to beat
NlogN you have to use something other than a comparison sort".

>                        Radix sort is a simple example of the 
>first type of special case since it won't work unless you know
>that the input consists of integers in some prespecified range.

The bit about radix sorting is where I wish to pick nit number two.
Radix sorting will work in just about any case where comparison sorting
will work.  (In fact, I suspect that is will work in *any* case where
comparison sorting will work, but I can't prove it).

All you need is a transform on the key to map it onto a bit-string that
preserves the ordering relation of the key that a comparison sort would
use, and that makes the ordering relation a "lexical" compare of the
mapped bit-strings.

For (a trivial) example, say that what you've got are keys of 20 ascii
characters that you wish to compare lexically.  The the order of the
simple radix sort I'll describe is 20N (where N is the number of
records).  (Note that this means that radix sorting will outperform
comparison sorting in this case when you have more than 10**6 records or
so.)

The radix sort of these fixed-length keys is simple.  You radix sort on
the right-most character.  Then on the next-to-rightmost character.  And
so on.  Each stage of radix sorting is stable, so when you have gotten
to the point of sorting on the leftmost byte, you have a complete sort.

As mentioned above, this technique is applicable given only that you can
map your keys into a lexically-comparable-bit-string form.  It turns out
that this can be done for many interesting cases, such as floating point
numbers, bcd (with leading sign) numbers, and so on (strings of ascii
characters are already so mapped).

The analogy of radix sorting and comparison sorting is strong.  The
general comparison sort consists of an alogorithm, and a comparison
function that will order the keys.  The general radix sort consists of
an algorithm and a mapping function that maps the keys into a bit-string
as described above.

>Sorting in parallel using n processors (taking constant time)
>is an example of the second. Hash sort is an example of the 
>third type since you are concerned with the average case.

Isn't the worst case of comparison sorting N-squared?  I thought that
the NlogN was only the "average" or "expected" case.

>Gregory J.E. Rawlins, Department of Computer Science, U. Waterloo
>{allegra|clyde|linus|inhp4|decvax}!watmath!watdaisy!gjerawlins
-- 
Wayne Throop at Data General, RTP, NC
!mcnc!rti-sel!rtp47!throopw