Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 exptools; site ho95e.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxj!mhuxt!houxm!ho95e!wcs From: wcs@ho95e.UUCP (x0705) Newsgroups: net.math Subject: Re: YASMP (Yet Another Sorting Methods Posting) Message-ID: <129@ho95e.UUCP> Date: Fri, 5-Jul-85 19:46:59 EDT Article-I.D.: ho95e.129 Posted: Fri Jul 5 19:46:59 1985 Date-Received: Sun, 7-Jul-85 04:58:13 EDT References: <3070@cca.UUCP> <> <3130@cca.UUCP> Organization: AT&T Bell Labs, Holmdel NJ Lines: 109 From the ongoing discussiopns of RADIX sort: > >> Finally, radix sorting is O(n log n), at best. > > >Not quite. Radix sorting is O(n * m), where n is the number of records, > >and m is some muliplier derived from the length of the key. .... Let's compare apples with apples. There are at least three important measurements on input length: (For now assume we're sorting character strings) n - the number of records m - the "length" of a record, either average or max or whatever... N - the total amount of input data, in bytes or bits. N =~= n*m A crudely written radix sort uses n * m steps, where m is the length of the longest key. This is how a mechanical card sorter works - sort the whole batch on the mth column, then the m-1th, then... the 1st. The total effort is > N steps, possibly >>N if you have nasty data. An efficiently written radix sort uses n * m steps where m is less than the average key length. Total work <=N steps, and does especially well on the kind of data that trashes the crude version. More later... Radix sorts can be done in fewer steps if you use a bigger radix, i.e. use 2 or 4 bytes of the data instead of 1 at a time. This uses more memory. A comparison sort needs at least n log n comparisons, where each comparison takes m steps, for a total of <= N log n work. m is less than the average key length, and is the "average" number of identical bytes between the two words being compared. The basic comparison algorithm looks like this: compare(string1, string2) char string1[MAX], string2[MAX]; { int i; for (i=0 ; istring2[i]) return (GREATER); } return (EQUAL); } Of course you would write better C code than that, and you can compare 2 or 4 bytes at once with no space penalty. So the radix sort takes m*n steps, while the comparison sort takes m*n*log n, for slightly different m's. The comparison sort requires about N + O(n)*POINTERSIZE space; the radix sort can require about N + O(m+n+BINSIZE)*POINTERSIZE space, where BINSIZE is the number of bins for your radix (e.g. 256). > >[Also, read Knuth's "Sorting and Searching"... Yes! ============ Here's a rough description of the efficient radix sort (written in pseudo-C - please excuse any dangling pointers and handwaving ) struct LIST { char * string; struct LIST * next_guy; struct LIST * last_item_in_list; } SORT(input) /* The main sort routine - calls the real sort1 routine */ char **input; { struct LIST * input_list, * output_list; make input into a LIST; output_list = sort1( input_list, 0) return ( make_output_list_into_a_char** ); } struct LIST * sort1( input, column ) struct LIST * input; int column; { struct LIST * answer, * bucket[256], /* one for each ASCII character */ item; int i; for (item=input.string; item!=NULL; item = item->next_guy) { append( bucket[item->string[column]], item); /* put the item in the bucket for the columnth character */ } answer = NULL; for ( i=0; i<256; i++ ) /* for each bucket, do */ { if (bucket[i]==NULL) continue; /* empty bucket */ if (bucket[i]->next_guy == NULL) /* one item */ append( answer, bucket[i] ); else /* many */ { sort1( bucket[i], column+1 ); append( answer, bucket[i] ); } } return (answer); } void append( list, item ) struct LIST list; struct LIST item; { Append item to end of list by magic in O(1) time; /* really should be named (nconc ) or (tconc ) if you like LISP */ } -- Bill Stewart, AT&T Bell Labs, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs