Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!amdahl!amdcad!rpw3
From: rpw3@amdcad.AMD.COM (Rob Warnock)
Newsgroups: comp.protocols.ibm
Subject: Re: (none)
Message-ID: <23005@amdcad.AMD.COM>
Date: 23 Sep 88 07:42:05 GMT
References: <8809221350.AA14186@jade.berkeley.edu>
Reply-To: rpw3@amdcad.UUCP (Rob Warnock)
Organization: [Consultant] San Mateo, CA
Lines: 59

In article <8809221350.AA14186@jade.berkeley.edu> "John A. Pershing Jr."
 writes:
+---------------
| No, you're not missing anything.  The reliability provided by the SNA DLC
| layer is carefully preserved by all higher layers, so that additional
| CRCs are probably redundant.  There is probably a tacit "assumption" that
| the various nodes are reliable -- that is, that they won't introduce any
| bit errors without detecting the fault (e.g., via a machine check).
+---------------

But actual disastrous experience in the ARPAnet (the infamous "black hole",
among others) showed the ARPAnauts that there *are* often various nodes out
there which can give errors without a machine check. Even today, that's
still quite likely. Though the IBM PC family has parity-protected memory,
many of the communications or network boards that plug into it don't. The
same is sadly true for much more expensive environments. For performance
reasons, many embedded controller systems do not use parity on their data
memory. Many designers apparently feel that the parity problems were "solved"
when the DRAM alpha problem was diagnosed and fixed in the early 80's, or
they are using static RAMs which "aren't supposed to" have errors.

The TCP/IP/UDP checksum is very good at catching single bit errors of the kind
which arise in packet switches, routers, and bridges without parity memory,
though not particularly good at communications-link long-burst errors. (Tat's
what the CRC-16 or CRC-32 is for!)

+---------------
| As I remember (it's been a long time), TCP doesn't make many assumptions
| about the reliability of the lower layers; therefore, it needs some sort
| of checksum to provide reliable transport.  If, in fact, the lower layers
| *are* reliable then TCP probably doesn't need the checksum; however, a
| proper TCP implementation cannot make such an assumption.
+---------------

The one assumption TCP *does* make is that lower layers will not deliver
a corrupted datagram and claim it's correct. The TCP (and IP and UDP)
checksum(s) are a low-cost but extrememly useful last-ditch protection
of this assumption, and are well worth it.

In fact, as 3rd-party vendors have entered the IBM SNA world, the assumption
in the IBM world that bit errors will cause either physical checksum errors
or CPU machine checks is probably becoming invalid, and SNA *should* have
an end-to-end checksum a la TCP. (Probably too late, though...)

By the way, "end-to-end" is the key. Having a network board compute your
IP checksum for you is no good if the data lays around in non-parity
memory *after* the checksum has been checked. (Though it's probably o.k. 
if the checksum is computed *on the fly* as the data is being DMA'd to
host memory, but few [none?] of the so-called "smart" network boards do
it this way.)


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403