Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 exptools; site ho95e.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxj!mhuxt!houxm!ho95e!wcs
From: wcs@ho95e.UUCP (x0705)
Newsgroups: net.math
Subject: Re: YASMP (Yet Another Sorting Methods Posting)
Message-ID: <129@ho95e.UUCP>
Date: Fri, 5-Jul-85 19:46:59 EDT
Article-I.D.: ho95e.129
Posted: Fri Jul  5 19:46:59 1985
Date-Received: Sun, 7-Jul-85 04:58:13 EDT
References: <3070@cca.UUCP> <> <3130@cca.UUCP>
Organization: AT&T Bell Labs, Holmdel NJ
Lines: 109

From the ongoing discussiopns of RADIX sort:
> >> 	Finally, radix sorting is O(n log n), at best.
> 
> >Not quite.  Radix sorting is O(n * m), where n is the number of records,
> >and m is some muliplier derived from the length of the key.  ....

Let's compare apples with apples.  There are at least three important
measurements on input length: (For now assume we're sorting character strings)
	n - the number of records
	m - the "length" of a record, either average or max or whatever...
	N - the total amount of input data, in bytes or bits.
		N =~= n*m

A crudely written radix sort uses n * m steps, where m is the length of the
longest key.  This is how a mechanical card sorter works - sort the whole
batch on the mth column, then the m-1th, then... the 1st.  The total effort
is > N steps, possibly >>N if you have nasty data.

An efficiently written radix sort uses n * m steps where m is less than the
average key length.  Total work <=N steps, and does especially well on the
kind of data that trashes the crude version.  More later...

Radix sorts can be done in fewer steps if you use a bigger radix, i.e. use
2 or 4 bytes of the data instead of 1 at a time.  This uses more memory.

A comparison sort needs at least n log n comparisons, where each comparison
takes m steps, for a total of <= N log n work.  m is less than the average
key length, and is the "average" number of identical bytes between the two
words being compared.  The basic comparison algorithm looks like this:

	compare(string1, string2)
	char string1[MAX], string2[MAX];
	{
		int i;
		for (i=0 ; i string2[i]) return (GREATER);
		}
		return (EQUAL);
	}
Of course you would write better C code than that, and you can compare 2 or
4 bytes at once with no space penalty.

So the radix sort takes m*n steps, while the comparison sort takes m*n*log n,
for slightly different m's.
The comparison sort requires about N + O(n)*POINTERSIZE space;
the radix sort can require   about N + O(m+n+BINSIZE)*POINTERSIZE space,
where BINSIZE is the number of bins for your radix (e.g. 256).

> >[Also, read Knuth's "Sorting and Searching"... 
Yes!

============
Here's a rough description of the efficient radix sort (written in
pseudo-C - please excuse any dangling pointers and handwaving )

struct LIST {	char *	string;
		struct LIST * next_guy;
		struct LIST * last_item_in_list;
	    }
SORT(input)	/* The main sort routine - calls the real sort1 routine */
	char **input;
{
	struct LIST * input_list, * output_list;
	make input into a LIST;
	output_list = sort1( input_list, 0)
	return ( make_output_list_into_a_char** );
}
struct LIST *
sort1( input, column )
	struct LIST * input;
	int column;
{
	struct LIST * answer,
		    * bucket[256],	/* one for each ASCII character */
		    item;
	int i;

	for (item=input.string; item!=NULL; item = item->next_guy)
	{
		append( bucket[item->string[column]], item);
		/* put the item in the bucket for the columnth character */
	}

	answer = NULL;
	for ( i=0; i<256; i++ )	/* for each bucket, do */
	{
		if (bucket[i]==NULL) continue;		/* empty bucket */
		if (bucket[i]->next_guy == NULL)	/* one item	*/
			append( answer, bucket[i] );
		else					/* many		*/
		{
			sort1( bucket[i], column+1 );
			append( answer, bucket[i] );
		}
	}
	return (answer);
}

void append( list, item )
	struct LIST list;
	struct LIST item;
{
	Append item to end of list by magic in O(1) time;
	/* really should be named (nconc ) or (tconc ) if you like LISP	*/
}
-- 
Bill Stewart, AT&T Bell Labs, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs