Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ucbvax!GLACIER.STANFORD.EDU!jbn From: jbn@GLACIER.STANFORD.EDU.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: Maintaining Statistics for TCP/IP Implementations Message-ID: <8612201420.AA14560@ucbvax.Berkeley.EDU> Date: Sat, 20-Dec-86 00:23:40 EST Article-I.D.: ucbvax.8612201420.AA14560 Posted: Sat Dec 20 00:23:40 1986 Date-Received: Sat, 20-Dec-86 11:35:19 EST References: <8612181148.AA02736@ucbvax.Berkeley.EDU> Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: glacier!jbn (John B. Nagle) Organization: Stanford University, IC Laboratory Lines: 66 Keywords: TCP statistics instrumentation logging Approved: tcp-ip@sri-nic.arpa Much of what I learned about congestion in the Internet I learned by instrumenting a TCP implementation. The information that you need is not necessarily the information that a typical implementation keeps. Yet as it turns out, collecting this information is quite inexpensive. Management of the exceptional cases is the crucial issue. During the life of a TCP connection, it is useful to maintain some event counts, and at the conclusion of the connection, it is useful to generate a log entry of some form, at least for connections that meet some criteria. When a packet is received, there are several possibilities as to its disposition. The most useful (not, unfortunately, always the most common) case is that it contains new and acceptable data, an ACK that acknowledges previously unacknowledged data, or a window update that advances the window. This case must of course be handled efficiently. Packets which change the state of the connection are also useful, but efficiency is less of an issue. But packets which do none of these things are redundant; they represent an error somewhere in the system. It is immensely useful to count the useful packets over the life of a connection. My criterion was that if less than 95% of the packets received over the life of a connection were useful, (allowing for at least 5 non-useful packets on short sessions to handle startup issues), then a log entry should be generated to indicate trouble. Reading such a log is an edifying experience. The most notable fact about such a log is that certain machines are represented all out of proportion to the amount of traffic they generate. One of course logs the identities of the hosts involved in the connections. A log entry here corresponds to "dropping a trouble ticket" in a telephone central office; it indicates something to be fixed. Enough said. One also wants to keep a tally of retransmission attempts; again, if the number of retransmitted packets is large over the life of the connection, something is wrong and this should be noted. Of course, if a connection closes abnormally, one logs that fact for later analysis. It is also useful to log rejected packets. Find all those places in your TCP where you decide to drop a packet because it is "bad", and make them calls to a routine that logs the packet with an error code. One turns up all sorts of dirty laundry that way. The number of ICMP Source Quenches received is also quite useful; again, large values compared to the volume of data traffic are significant. When I operated a VAX with such logging two years ago, there would be five or six connections logged as bad when the network was operating properly; there might be hundreds when something was wrong. That's how I managed to make a large network based on slow links work properly. It is worth thinking about how one might report such data in a standard way to a network monitor node. Something that generated one datagram per "bad" TCP connection might be quite useful; some would of course get lost but serialization would allow the network monitor to detect this, and statistical techniques could be used to compensate for the lost data. You do need to log a measure of the total data transmitted in each direction on the connection, and log entries should also contain cumulative information about the total amount of data and total number of connections so that statistical computations can be made. One needs this information to manage a network. With it, one can manage your network, and make it perform well. Without it, one can just grumble and make excuses. John Nagle