Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site ucbvax.ARPA
Path: utzoo!linus!philabs!prls!amdimage!amdcad!decwrl!ucbvax!daemon
From: tcp-ip@ucbvax.ARPA
Newsgroups: fa.tcp-ip
Subject: Re: Interactive traffic punishment via MILNET?
Message-ID: <9794@ucbvax.ARPA>
Date: Fri, 9-Aug-85 12:50:14 EDT
Article-I.D.: ucbvax.9794
Posted: Fri Aug  9 12:50:14 1985
Date-Received: Mon, 12-Aug-85 21:09:58 EDT
Sender: daemon@ucbvax.ARPA
Organization: University of California at Berkeley
Lines: 108

From: mgardner@BBNCC5.ARPA


Lixia,

It is not easy to give a brief answer to your question of what exactly are the
problems with the mailbridges, but I will do my best.

Gateways are inherently bottlenecks to traffic between two networks.  For
example, ARPANET and MILNET are reliable networks, but their traffic is
funneled through gateways designed to drop data whenever pressed for space.
Retransmission at the link level is fast, because the retransmission timer
triggers a retransmission fairly quickly.  The retransmission timers at the
transport layer must be slower, and so retransmission by TCP will affect what
the user sees.  The interactive user is, of course, most likely to notice.  
Speeding up this timer, by the way, is not a good solution, since the effect is
increased congestion and poorer service for everyone. (More dropped datagrams,
more retransmitted datagrams.)

Another reason that the internet will never function as well as a subnet is
that the gateways link heterogeneous systems.  If one side is sending much
faster than the other side is receiving, the gateways are designed to drop
datagrams.  This problem are exacerbated by the current lack of buffer space in
the LSI/11s, by the lack of an effective means of slowing down a source, and by
a rudimentary routing metric that does not allow routing to respond to bursts
in traffic.

The mailbridges are a worse bottleneck than other gateways for several good
reasons.  First they were placed with the idea that the traffic between them
would be filtered for mail.  We expected a reduction in traffic.  On the
contrary, since the physical split of ARPANET and MILNET, there has been a
sharp rise in the amount of traffic between the two networks.  The bridges are
overloaded.  In addition, there are a number of hosts which send almost all
their traffic to the other net.  These hosts may be on the wrong network.  A
third problem for the mailbridges is load-sharing.  It is important that the
traffic between the two networks be spread among the different mailbridges.
This is the function of the load-sharing tables.  But this is a static routing,
based on expected traffic.  Since the destination is not known, the routing
most likely to provide good service is to home a host to its nearest
mailbridge.  However, when the host has a one or two hop path on one side of
the mailbridge and a five or six hop path on the other side, the mailbridge
will see speed mismatch problems, similar to those associated with mismatched
network speed.  The solution is not to ignore the load-sharing, since, everyone
sending to the same bridge will create even worse problems.

These are the problems we see in a perfect world where hardware and software
problems have been banished.  Unfortunately, we live in the real world.  The
software and hardware problems themselves can be in the hosts, the lines, or
the network.  They are usually hard to diagnose, since the symptom of the
problem, for example congestion, may be physically remote from the source of
the problem.  It is often not even clear where in the chain the problem lies.
For example, is congestion at an ISI IMP caused by the mailbridge, by ARPANET
congestion around ISI, by back-up from a local net, by ARPANET congestion
remote from ISI, by a host at another IMP, or by still another factor?

I look at mailbridge statistics every day.  I see, almost daily, the effects of
host problems.  Although these problems are most often caught by the host
administrators, and, if not, are tracked by our monitoring center, let me list
a few of the problems that I followed personally.  I have seen a run-away
ethernet bring MILISI to its knees, a gateway with a routing plug cause
congestion felt by a host on the other side of the network, and three cases of
hosts flooding the network with faulty IP datagrams.  The internet is
pathetically vulnerable to congestion caused by a single host.

At BBN we have a number of tools to monitor the long-range performance of
the internet.  The gateways send messages, called traps, any time an event of
interest occurs.  We summarize these on a daily basis, and keep the detailed
trap reports on hand for use when we see a problem.  The gateways store
throughput information, including how many datagrams were processed by each
gateway, summarized for the gateway, and separated by interface or neighbor.
Throughput reports give us detailed information, such as how many datagrams are
dropped (discarded) by the gateway, broken down by reason, and the number of
datagrams sent back out the same interface they used on arrival.  We can also
collect statistics on the number of datagrams between each source and destination
host.  In addition, we can measure a wide range of parameters in ARPANET or
MILNET.  These include detailed throughput statistics, statistics about the
end-to-end traffic and about the store-and-forward traffic.

But even with all these tools (and others) at our disposal, we are stopped at
the host.  There we find TCP/IP implementations written by many different
people and containing subtle differences in interpretation that could lead to
major problems.

Given this range of sources for the problems, what can we, at BBN, do to
improve the situation?  Keep in mind that we affect the mailbridges, the IMPs,
and, since we monitor the lines, the line quality, but we can only open a
discussion concerning host problems.

Analysis of the host to mailbridge traffic data, has revealed that there are a
number of hosts (including TACs) sending most of their traffic to the other
net.  Some of this traffic can be moved off the internet, reducing the load, by
the addition of TACs and rehoming hosts.  We are considering adding a
mailbridge.  Software to increase the number of buffers in the LSI/11 gateways
has already been written.  We are investigating ways to reduce the control
traffic, which should also reduce the load on the mailbridges.  We have
increased our attention to host problems and are notifying the host
administrator when see problems.  We are also considering writing guidelines
for optimizing communication with ARPANET/MILNET.  This would include
appropriate settings for retransmission timers and sending rates.  It should
also include guidelines for reasonable responses to source quenches, those
largely ignored messages sent by the gateway to a host which is sending data
too fast.

I hope this answers your question and will open up some interesting discussion
on this mailing list.  

Marianne