Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!LOKI.BBN.COM!craig From: craig@LOKI.BBN.COM.UUCP Newsgroups: mod.protocols.tcp-ip Subject: TCP RTT woes revisited Message-ID: <8612142045.AA25629@ucbvax.Berkeley.EDU> Date: Sun, 14-Dec-86 15:46:14 EST Article-I.D.: ucbvax.8612142045.AA25629 Posted: Sun Dec 14 15:46:14 1986 Date-Received: Tue, 16-Dec-86 01:44:51 EST Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 73 Approved: tcp-ip@sri-nic.arpa This weekend I had time to start processing Van Jacobson's suggested fixes/modifications. Things started working very well after the first fix which made TCP choose better fragment sizes and increased the time to live for IP fragments. The subsequent testing also revealed some interesting results. (These are preliminary and subject to reappraisal). (1) EACKs appear to make a huge difference in use of the network. After seeing signs this was the case, I ran the simple test of pushing 50,000 data packets though a software loopback that dropped 4% of the packets. With EACKs there were 1,930 retransmissions, of which 1 received packet was a duplicate (note that some of the retransmissions were also dropped). Without EACKS there were 12,462 retransmissions of which 9,344 received packets were duplicates. 12,462 retransmissions is, of course, bad news, and comes from the fact that this RDP sends up to four packets in parallel. Typically the four get put into the send queue in the same tick of the timer, so when the first gets retransmitted, all four do. The moral seems to be use EACKs even though they aren't required for a conforming implementation. (2) Lixia Zhang's suggestion that one use the RTT of the SYN to compute the initial timeout estimate appears to work very well. (3) EACKs may make it possible to all but stomp out RTT feedback (those unfortunate cases where a dropped packet leds to an RTT = (the number of retries * SRTT) + SRTT being used to compute a new SRTT. I've been experimenting with discarding RTTs for out of order acks. This is best explained by example. If packets 1, 2, 3 and 4 are sent, and the first ack is an EACK for 3, the implementation uses the RTT for 3 to recompute the SRTT, but will discard the RTTs for 1 and 2 when they are eventually acked (or EACKed). The argument in favor of this scheme is that the acks for 1, and 2 probably represent either (a) RTTs for packets that were dropped, and thus including them would lead to feedback or (b) RTTs that reflect an earlier (and slower) state of the network (3 was sent after 1 and 2) and using them would make the SRTT a less good prediction of the RTT of the next packet. Note that (b) would be more convincing if it wasn't the case that 1, 2, 3 and 4 were probaby sent within a few milliseconds of each other. Watching 5 trial runs of 100 64-byte data packets bounced off Goonhilly this algorithm kept the SRTT within the observed range of real RTTs (as opposed to RTTs for packets that were dropped and had to be retransmitted). Using EACKs but taking the RTT for every packet, (again doing 5 trial runs) several cases of RTT-feedback were seen. In one case the SRTT soared to ~35 seconds when a few packets were dropped in a short period. Since the implementation uses Mill's suggested changes which make lowering the SRTT take longer than raising it, the SRTT took some time to recover. People may be wondering about observed throughput. How fast does RDP run vis-a-vis TCP? That turns out to be very difficult to answer. Identical tests run in parallel or one right after another give throughput rates that vary by factors of 2 of more. As a result it is difficult to get throughput numbers that demonstrably show differences which reflect more than random variation. After running tests for 7 weekends (and millions of packets) I have some theories, but those keep changing as different tests are run. Craig P.S. Those millions of packets are almost all over a software loopback. The contribution to network congestion has been small.