Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!killer!pollux!dalsqnt!rpp386!jfh
From: jfh@rpp386.UUCP (John F. Haugh II)
Newsgroups: comp.misc
Subject: Re: Anybody have a checksum algorithm that detects byte-swap?
Message-ID: <3341@rpp386.UUCP>
Date: 29 Jun 88 01:20:59 GMT
References: <735@vsi.UUCP>
Reply-To: jfh@rpp386.UUCP (The Beach Bum)
Distribution: comp
Organization: Big "D" Home for Wayward Hackers
Lines: 37

In article <735@vsi.UUCP> friedl@vsi.UUCP (Stephen J. Friedl) writes:
>     I am writing some sort programs on two different machines
>and really don't want to move megabyte files around to see if the
>output from identically-run programs is the same.

>     I have a naive algorithm of multiplying the byte just read
>with the byte number:
>
>		while (c = getchar(), c != EOF)
>			sum += (c * ++count);

naive, i'll say ;-)

i doubt you'll ever overflow a 32bitter, but a 16 bit machine will
overflow after (possibly) 256 characters, assuming a 16 bit sum.
[ unless you go checking 24MB files ;-) ]

i suggest trying something more random -

		long sum;

		while (c = getchar (), c != EOF)
			sum = (((sum <<  1) & 0xfffffffe) |
			       ((sum >> 31) & 0x00000001)) ^ c;

(in other words, a rotate left one followed by xor-ing in the
character).  this should be as fast or faster than yours (no multiply),
and it shouldn't ever overflow.  it's also not as complex as a full
blown CRC16.  i've used similiar code for hashing functions with
nice results.

- john.
-- 
John F. Haugh II                 +--------- Cute Chocolate Quote ---------
HASA, "S" Division               | "USENET should not be confused with
UUCP:   killer!rpp386!jfh        |  something that matters, like CHOCOLATE"
DOMAIN: jfh@rpp386.uucp          |             -- with my apologizes