Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA
Path: utzoo!linus!philabs!cmcl2!seismo!brl-tgr!tgr!Jacob_Palme_QZ%QZCOM.MAILNET@MIT-MULTICS.ARPA
From: Jacob_Palme_QZ%QZCOM.MAILNET@MIT-MULTICS.ARPA
Newsgroups: net.mail.headers
Subject: Re: Checksum as a replacement for missing Message-ID.
Message-ID: <9102@brl-tgr.ARPA>
Date: Sat, 9-Mar-85 15:23:06 EST
Article-I.D.: brl-tgr.9102
Posted: Sat Mar  9 15:23:06 1985
Date-Received: Tue, 12-Mar-85 21:37:27 EST
Sender: news@brl-tgr.ARPA
Lines: 61


     FROM: William "Chops" Westfield 

     Why is it necessary for two hosts trying to create a MESSAGE-
     ID to come up with the same result? I dont understand why
     anyone but the original host would try to create a message
     id...

If the original host always created Message-Id-s, this would be a
better solution. We will however have to accept the fact that some
hosts do not create globally unique Message-ID-s. Neither RFC822
nor X.400 unfortunately require mandatory globally unique Message-
ID-s (IPMessageID-s in X.400 terminology).

Suppose one and the same message gets forwarded, directly or
indirectly, to two different mailing lists, and that a certain
user is a member of both lists. If the intermediate hosts handling
the mailing list created a checksum, this could be used by the
host for the recipient user to stop displaying the same message
twice, or, to tell him that it is the same message which he gets
twice.

Why should the intermediate host add a Message-ID to a message
lacking such an ID? Because the ID is very useful for loop control.
If two mailing lists are members of each other (which has advantages,
but cannot be done with present practices on Arpanet because of the
risk for loops) then if the list maintaining program kept a list
of the ID-s of messages sent via the list (COM does this) then
it could stop re-sending the message when it comes around the
second time.

     FROM: Craig.Everhart@CMU-CS-A.ARPA
     Why use the From: or Date: fields at all? The From: field is
     a popular candidate for editing by automatic agents; I'm not
     convinced that Mr. Palme's algorithm will remove all traces
     of that editing. The Date:-field algorithm was underspecified
     (year in century, as SMTP would have? What is the origin for
     months? Any use of time zone information?).

The goal of course is to find an algorithm with a very very low
probability of getting the same ID for two different messages, but
also with low probability of giving different ID-s for the same
message because of some transformation on the message.

Only using TEXT CONTENT is NOT acceptable. Suppose in a voting
application that two people wrote messages with the only content
being the word "Yes!". Only using TEXT CONTENT would hide the very
important fact of the names of the people who voted "Yes!". Not
using Date/time is also not acceptable, suppose the same person
voted "Yes!" on two different issues, this fact would then be
hidden.

     FROM: Craig.Everhart@CMU-CS-A.ARPA
     It may be less expensive
     multiplication as a basis for the checksum on many small
     machines. Are there suitable algorithms based on bit
     rotations or shifts?

Most of the multiplications in my algorithm (all of those in
processing the body of the message) were by powers of 2
thus can be implemented by shifts.