Path: utzoo!attcan!uunet!husc6!think!ames!pasteur!ucbvax!NNSC.NSF.NET!craig
From: craig@NNSC.NSF.NET (Craig Partridge)
Newsgroups: comp.protocols.tcp-ip
Subject: re: Mail Delivery Problems
Message-ID: <8811281713.AA10600@ucbvax.Berkeley.EDU>
Date: 28 Nov 88 13:46:35 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 41


> It may be related to packet size. I say this because all the TCP connection
> handshake packets and the SMTP handshake are generally small, and the packets
> following the DATA command would be significantly larger (given a reasonably
> sized message). Also, I can FTP to any of the sites involved and can
> retrieve files without difficulty (anonymous ftp). However, attempts to send
> files, to the only known affected system that permits it, fail once the
> actual transfer of the file starts. Directory listings, cwd's, and ftp
> mode commands (ie bin and ascii commands) all work.

> I am at a loss to explain this. I can't see why this would happen, given
> that a TCP connection is established successfully and some packets can get
> through. I don't think it is our systems here that are at fault as they are
> able to mail to many other Internet sites without problems. Further, we have
> a variety of systems here and all have the same problem with the affected
> hosts.

Bob:

    I've encountered this problem a number of times.

    The problem is probably IP fragmentation.  You go from those small
datagrams before the DATA command to a large one that fragments and some
fragments never get through (see the 'always fragments' case in Mogul
and Kent's SIGCOMM '87 paper "Fragmentation Considered Harmful").  Do
the MTUs of the various networks you traverse differ?

    Another possibility is that you are running over a slow link which
is sensitive to packet size.  If your TCP has a poor RTT estimator, your
connection can, in fact, fail on the first large datagram (whose RTT is
a factor of 200 larger than the RTT for small datagrams).  Any slow links
(< 9.6Kbits/sec) in your path?  Do you have the Jacobson RTT code in your TCP?

    A final possibility I've seen is that various gateways at various
times in their software life cycles have had bugs that caused them to spindle
or discard datagrams over a certain size.  (Two typical problems -- refusing
to fragment -- corrupting a data buffer chain).  To my knowledge, all such
bugs are now gone, but if you can't find a fragmentation problem, look for
a busted gateway.

Craig