Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site rtp47.UUCP Path: utzoo!watmath!clyde!burl!ulysses!unc!mcnc!rti-sel!rtp47!throopw From: throopw@rtp47.UUCP (Wayne Throop) Newsgroups: net.math Subject: Re: Sorting a sorted list by a different order of keys Message-ID: <76@rtp47.UUCP> Date: Mon, 24-Jun-85 12:58:55 EDT Article-I.D.: rtp47.76 Posted: Mon Jun 24 12:58:55 1985 Date-Received: Wed, 26-Jun-85 04:42:23 EDT References: <7321@watdaisy.UUCP> Organization: Data General, RTP, NC Lines: 91 I agree with most of what Gregory Rawlins said in the referenced posting, but there are two points I'd like to clarify. >>>>Actually, there are some sorting methods that are better than N log N; >>> Pardon me? Loose lips sink ships, please define your model. >>As just an example -- not the best possible algorithm -- [...etc...] > I try not to clutter the net with things like this but i wish >to point out that the time bound of any algorithm _depends on your model_. Hear hear. >That was the reason i asked for a definition of the model. I didn't >want people to go away with the notion that nlgn is beatable in >the general case where you know nothing about your data and you >are counting the minimum number of comparisons necessary and >sufficient to transform any permutation of n data items to some >(fixed) permutation of itself. Here is point one where I disagree and want to put in my two cents worth. Comparison sorting is not the "general case" of sorting. The general case of sorting is generating an "ordered" permutation of n data items. Comparison sorting "knows" something about the data items, in that it knows there is some test on any two of them that will answer the question "is x less-than y". I claim that this is "knowing something" about the data. Thus, I think that the original poster was quite reasonable when he said "there are some sorting methods that are better than NlogN" He *didn't* say "there are some comparison sorting methods better than NlogN", which is indeed false. > There are only three ways to "beat" nlgn. Either you know >something about the distribution of the input prior to running >you algorithm or you decided to count something other than the >number of comparisons of two data elements, or you count something >other than the worst case possible, in all three cases you >have changed the model. I agree 100%, keeping in mind that what this boils down to is "to beat NlogN you have to use something other than a comparison sort". > Radix sort is a simple example of the >first type of special case since it won't work unless you know >that the input consists of integers in some prespecified range. The bit about radix sorting is where I wish to pick nit number two. Radix sorting will work in just about any case where comparison sorting will work. (In fact, I suspect that is will work in *any* case where comparison sorting will work, but I can't prove it). All you need is a transform on the key to map it onto a bit-string that preserves the ordering relation of the key that a comparison sort would use, and that makes the ordering relation a "lexical" compare of the mapped bit-strings. For (a trivial) example, say that what you've got are keys of 20 ascii characters that you wish to compare lexically. The the order of the simple radix sort I'll describe is 20N (where N is the number of records). (Note that this means that radix sorting will outperform comparison sorting in this case when you have more than 10**6 records or so.) The radix sort of these fixed-length keys is simple. You radix sort on the right-most character. Then on the next-to-rightmost character. And so on. Each stage of radix sorting is stable, so when you have gotten to the point of sorting on the leftmost byte, you have a complete sort. As mentioned above, this technique is applicable given only that you can map your keys into a lexically-comparable-bit-string form. It turns out that this can be done for many interesting cases, such as floating point numbers, bcd (with leading sign) numbers, and so on (strings of ascii characters are already so mapped). The analogy of radix sorting and comparison sorting is strong. The general comparison sort consists of an alogorithm, and a comparison function that will order the keys. The general radix sort consists of an algorithm and a mapping function that maps the keys into a bit-string as described above. >Sorting in parallel using n processors (taking constant time) >is an example of the second. Hash sort is an example of the >third type since you are concerned with the average case. Isn't the worst case of comparison sorting N-squared? I thought that the NlogN was only the "average" or "expected" case. >Gregory J.E. Rawlins, Department of Computer Science, U. Waterloo >{allegra|clyde|linus|inhp4|decvax}!watmath!watdaisy!gjerawlins -- Wayne Throop at Data General, RTP, NC!mcnc!rti-sel!rtp47!throopw