Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!ISI.EDU!braden From: braden@ISI.EDU.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: 4.2/4.3 TCP and long RTTs Message-ID: <8612081657.AA00512@braden.isi.edu> Date: Mon, 8-Dec-86 11:57:21 EST Article-I.D.: braden.8612081657.AA00512 Posted: Mon Dec 8 11:57:21 1986 Date-Received: Mon, 8-Dec-86 20:16:40 EST Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 38 Approved: tcp-ip@sri-nic.arpa Craig: The amount of your hair pulling must be small compared to the time integral of hair pulled by our UCL friends over the years. Quite simply, their conclusion was that most TCP implementations have design problems that make them behave poorly over paths which have very long delay and moderate to high loss. SATNET sometimes (often?) exhibits that behaviour. Recently, the ARPANET+core_gateway system has also exhibited that behaviour, and many TCP's have not been up to it... lots of broken connections, etc. I suggest that the cause of this situation is a performance/ robustness tradeoff inherent to TCP implementations. Most of the currently-available TCPs have been implemented and tested in an LAN environment, to provide optimal performance in a low-delay, low-error situation. On the other hand, when we wrote the original experimental implementations of TCP, we found the little beasties to be amazingly robust; they would tenaciously hold on for minutes (or hours!) retransmitting until a path came back, and would get the data through in spite of terrible bugs. But we were writing and testing them for equally-experimental gateway implementations and frequently testing to UCL, and did not demand high throughput or low delay. It would certainly be interesting to understand exactly how these TCP;s have failed. I suspect it is a combination of a Zhang-catastrophe (RTT measurement diverging towards infinity due to high loss rate) with an implementation-imposed upper bound on retransmission time before the connection breaks. On the other hand, the answer may be that selective retransmission is really absolutely essential to deal with the long-delay, lossy situation. I would like to get someone interesting in running some experiments on this (maybe you just did??) Would it be possible for you to disable just the selective retransmisssion feature of RDP and try again? Bob Braden