Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site ucbvax.ARPA Path: utzoo!linus!philabs!prls!amdimage!amdcad!decwrl!ucbvax!daemon From: tcp-ip@ucbvax.ARPA Newsgroups: fa.tcp-ip Subject: Re: Interactive traffic punishment via MILNET? Message-ID: <9794@ucbvax.ARPA> Date: Fri, 9-Aug-85 12:50:14 EDT Article-I.D.: ucbvax.9794 Posted: Fri Aug 9 12:50:14 1985 Date-Received: Mon, 12-Aug-85 21:09:58 EDT Sender: daemon@ucbvax.ARPA Organization: University of California at Berkeley Lines: 108 From: mgardner@BBNCC5.ARPA Lixia, It is not easy to give a brief answer to your question of what exactly are the problems with the mailbridges, but I will do my best. Gateways are inherently bottlenecks to traffic between two networks. For example, ARPANET and MILNET are reliable networks, but their traffic is funneled through gateways designed to drop data whenever pressed for space. Retransmission at the link level is fast, because the retransmission timer triggers a retransmission fairly quickly. The retransmission timers at the transport layer must be slower, and so retransmission by TCP will affect what the user sees. The interactive user is, of course, most likely to notice. Speeding up this timer, by the way, is not a good solution, since the effect is increased congestion and poorer service for everyone. (More dropped datagrams, more retransmitted datagrams.) Another reason that the internet will never function as well as a subnet is that the gateways link heterogeneous systems. If one side is sending much faster than the other side is receiving, the gateways are designed to drop datagrams. This problem are exacerbated by the current lack of buffer space in the LSI/11s, by the lack of an effective means of slowing down a source, and by a rudimentary routing metric that does not allow routing to respond to bursts in traffic. The mailbridges are a worse bottleneck than other gateways for several good reasons. First they were placed with the idea that the traffic between them would be filtered for mail. We expected a reduction in traffic. On the contrary, since the physical split of ARPANET and MILNET, there has been a sharp rise in the amount of traffic between the two networks. The bridges are overloaded. In addition, there are a number of hosts which send almost all their traffic to the other net. These hosts may be on the wrong network. A third problem for the mailbridges is load-sharing. It is important that the traffic between the two networks be spread among the different mailbridges. This is the function of the load-sharing tables. But this is a static routing, based on expected traffic. Since the destination is not known, the routing most likely to provide good service is to home a host to its nearest mailbridge. However, when the host has a one or two hop path on one side of the mailbridge and a five or six hop path on the other side, the mailbridge will see speed mismatch problems, similar to those associated with mismatched network speed. The solution is not to ignore the load-sharing, since, everyone sending to the same bridge will create even worse problems. These are the problems we see in a perfect world where hardware and software problems have been banished. Unfortunately, we live in the real world. The software and hardware problems themselves can be in the hosts, the lines, or the network. They are usually hard to diagnose, since the symptom of the problem, for example congestion, may be physically remote from the source of the problem. It is often not even clear where in the chain the problem lies. For example, is congestion at an ISI IMP caused by the mailbridge, by ARPANET congestion around ISI, by back-up from a local net, by ARPANET congestion remote from ISI, by a host at another IMP, or by still another factor? I look at mailbridge statistics every day. I see, almost daily, the effects of host problems. Although these problems are most often caught by the host administrators, and, if not, are tracked by our monitoring center, let me list a few of the problems that I followed personally. I have seen a run-away ethernet bring MILISI to its knees, a gateway with a routing plug cause congestion felt by a host on the other side of the network, and three cases of hosts flooding the network with faulty IP datagrams. The internet is pathetically vulnerable to congestion caused by a single host. At BBN we have a number of tools to monitor the long-range performance of the internet. The gateways send messages, called traps, any time an event of interest occurs. We summarize these on a daily basis, and keep the detailed trap reports on hand for use when we see a problem. The gateways store throughput information, including how many datagrams were processed by each gateway, summarized for the gateway, and separated by interface or neighbor. Throughput reports give us detailed information, such as how many datagrams are dropped (discarded) by the gateway, broken down by reason, and the number of datagrams sent back out the same interface they used on arrival. We can also collect statistics on the number of datagrams between each source and destination host. In addition, we can measure a wide range of parameters in ARPANET or MILNET. These include detailed throughput statistics, statistics about the end-to-end traffic and about the store-and-forward traffic. But even with all these tools (and others) at our disposal, we are stopped at the host. There we find TCP/IP implementations written by many different people and containing subtle differences in interpretation that could lead to major problems. Given this range of sources for the problems, what can we, at BBN, do to improve the situation? Keep in mind that we affect the mailbridges, the IMPs, and, since we monitor the lines, the line quality, but we can only open a discussion concerning host problems. Analysis of the host to mailbridge traffic data, has revealed that there are a number of hosts (including TACs) sending most of their traffic to the other net. Some of this traffic can be moved off the internet, reducing the load, by the addition of TACs and rehoming hosts. We are considering adding a mailbridge. Software to increase the number of buffers in the LSI/11 gateways has already been written. We are investigating ways to reduce the control traffic, which should also reduce the load on the mailbridges. We have increased our attention to host problems and are notifying the host administrator when see problems. We are also considering writing guidelines for optimizing communication with ARPANET/MILNET. This would include appropriate settings for retransmission timers and sending rates. It should also include guidelines for reasonable responses to source quenches, those largely ignored messages sent by the gateway to a host which is sending data too fast. I hope this answers your question and will open up some interesting discussion on this mailing list. Marianne