Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rochester!cornell!uw-beaver!mit-eddie!husc6!think!ames!ucbcad!ucbvax!TOPAZ.RUTGERS.EDU!hedrick From: hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) Newsgroups: comp.protocols.tcp-ip Subject: Ethernet meltdowns Message-ID: <8707081440.AA00679@topaz.rutgers.edu> Date: Wed, 8-Jul-87 10:40:32 EDT Article-I.D.: topaz.8707081440.AA00679 Posted: Wed Jul 8 10:40:32 1987 Date-Received: Sat, 11-Jul-87 06:59:46 EDT Sender: daemon@ucbvax.BERKELEY.EDU Distribution: world Organization: The ARPA Internet Lines: 179 During the last week or so we have run into several oddities on our Ethernets that I thought might interest this group. Nothing that will surprise any veterans, but sometimes war stories are useful to people trying to figure out what is going on with their own net. For several months, we have been having mysterious software problems on one Ethernet. THis is our "miscelleanous" network. No diskless Suns. Several Unix timesharing systems, a few VMS machines, a DEC-20, and some Xerox Interlisp-D machines. The problems: - every week or so, all of our Bridge terminal servers crashed. When it happened, they all crashed at the same time. - fairly rarely, a Celerity Unix system would run out of mbufs. - a Kinetics Ethernet/Appletalk gateway running the kip code would hang or crash (not sure which) every few days We sent a dump of the Bridge crash to Bridge. Celerity wouldn't talk to us because we made a few changes to the kernel. Kinetics swapped hardware for us, so we knew it wasn't hardware, but we still haven't figured out how to debug the problem. (The author of the software suspects the Ethernet device driver, but it's going to take us months to learn enough about the infamous Intel Ethernet chip to find a subtle device-level problem. Typical known problem: packet sizes that are a multiple of 18 bytes hang the hardware when the phase of the moon is wrong. How's a bunch of poor Unix hackers gonna debug a system where the critical chip has a 1/4 inch thick bug list, which we don't have a copy of.) Anyway, Bridge finally came back with a response that unfortunately I have only second-hand "We got a very high rate of packets from two different Ethernet addresses each claiming to be the same Internet address. This shouldn't cause us problems, but does. We found the problem, and it will be fixed in the next release." They gave us the two Ethernet addresses and the Internet address. Two Celerities were claiming to be some other machine. So we break out our trusty copy of etherfind. (This is a Sun utility that lets you look at packets. There's a fairly general way of specifying which ones you want to see, and they will decode the source, destination, and protocol types for IP. We've got lots of Ethernet debugging tools, but this is by far the most useful for this kind of problem.) It turns out that the Celerities have the infamous bug that causes them to get the addresses wrong in ICMP error messages. Before proceeding with the war story, let me list the classic 4.2 bugs that lead to network problems: 1) Somebody sends to a broadcast address that you don't understand. There are 6 possible broadcast addresses. For a subnetted network 128.6.4, they are 255.255.255.255, 128.6.4.255, (the correct ones by current standards) 128.6.255.255 (for machines that don't know about subnetting), and the corresponding ones for machines that use the old standards: 0.0.0.0, 128.6.4.0, and 128.6.0.0. We have enough of a combination of software versions that there is no one broadcast address that all of our machines understand. So suppose somebody sends to 128.6.4.255. Our 4.2 machines, which expect 0.0.0.0 or 128.6.0.0, see this as an attempt to connect to host 255 on the local subnet. Since IP forwarding is on by default, they helpfully decide to forward it. Thus they issue ARP requests for the address 128.6.4.255. Presumably nobody responds. So the net effect is that each broadcast results in every 4.2 machine on the Ethernet issing an ARP request, all at the same time. This causes massive collisions, and also every machine has to look at all those the ARP requests and throw it away. This will tend to cause a momentary pause in normal processing. 2) Same scenario, but somebody has turned off ipforwarding on all the 4.2 machines. Alas, this simply causes all the 4.2 machines to issue ICMP unreachable messages back to the original sender. This still results in massive collisions, but at least this time only one machine (the one that sent the broadcast) has to process the fallout. That's if everything works. Unfortunately, some 4.2 versions have an error in setting up the headers for the error message. They forget to reverse the source and destination, as I recall. 3) Somebody sends a broadcast UDP packet, e.g. routed routing information. Hosts that are not running routed (or whatever) attempt to send back ICMP port unreachable. They are supposed to avoid doing this for broadcasts, but the test for broadcastedness in udp_usrreq doesn't agree with the one in ip_input, so for certain broadcast addresses, every machine on the network that isn't running the appropriate daemon will send back an ICMP error. Again, lots of collisions. If you have a few gateawys running routed, but most hosts not running it, you'll have network interference every 30 sec. Then again, there are those machines where the ICMP messages have the wrong source and destination address. Now back to the war story. The case I actually saw with etherfind was caused by routed broadcasts. Our 2 Celerities would each respond with ICMP port unreachable. Unfortunately, they have the bug that caused the IP addresses in the ICMP error message to be wrong. I think it ended up sending packets with source address == the machine that had sent the routed's, and destination == the broadcast address. This would explain why our Bridge terminal servers were seeing packets from two different Ethernet addresses, both claiming to be a different machine. We had certainly been seeing spotty network response, and as far as I can see, it went away when we fixed these problems. As far as we know, the Bridge terminal servers and Kinetics gateways have both stopped crashing, and the Celerities have stopped losing mbuf's. What we suspect is that some obscure case came up that create a problem more serious than the one we saw with etherfind. Note that one of the failure modes is that certain broadcasts can lead to error messages sent to the broadcast address. We haven't analysed the code carefully enough to be sure exactly what conditions trigger it, but we suspect that the two machines may have gotten into an infinite loop of error messages. Since the messages would be broadcasts, everyone on the network would see them. This is generally called a "broadcast storm". The best guess is that both the Bridge and Kinetics crashes were caused by subtle bugs in their low-level code that fail under very heavy broadcast loads. Probably the Celerity "mbuf leak" is something similar. Unfortunately, without a record of the packets on the network at the exact time of failure, it is impossible to be sure what was going on. But Bridge's crash analysis seems to indicate a broadcast storm involving the Celerities. The fix to this is to make sure every one of your 4.2 systems has been made safe in the following fashion: - turn off ipforwarding, in ip_input - in the routine ip_forward (in ip_input), very near the beginning of the routine, there is a test with lots of conditions, that ends up throwing away the packet and exiting. Add "! ipforwarding || " to the beginning of the test. - in udp_usrreq, a few pages into the routine, in_pcblookup is called to see whether there is a port for the UDP packet. If not (it returns NULL), normally icmp_error is called to send port unreachable. However there is a test to see whether the packet was send as a broadcast. If so, it is simply discarded. That test must agree with the test for broadcastedness in ip_input. This seems to differ in various implementations, so I can't tell you the code to use. One common bug is to forget that ip_input recognizes 255.255.255.255 as a broadcast address. It normally does this in a completely different place than it tests for other broadcast addresses. So you may be able to add something like "ui->ui_dst.s_addr == -1 || " to the test in udp_usrreq. These apply to 4.2. 4.3 probably doesn't need them all, and may not need any of them. Now for the second war story. Our computer center recently bought a few diskless Suns for staff use. Until then, all diskless Suns had been on separate Ethernets separated from our other Ethernets by carefully-designed IP gateways. However the computer center figured that a small number of these things wasn't going to kill their network, so they connected them to their main Ethernet. On it is a VAXcluster (2 8650's), a few 780's, some terminal servers and other random stuff, and level 2 bridges (Applitek broadband Ethernet bridges and Ungerman-Bass remote bridges) to more or less everywhere else on campus. Since they were still setting up the configuration, it isn't surprising that a diskless Sun 3/50 got turned on before its server was properly configured to respond. Nobody thought anything of this. We first discovered there were problems when we got a call from somebody in a building half a mile away that his VAX was suddenly not doing any useful work. Then we got a call from our branch in Newark saying the same thing about their VAXes. Then someone noticed that the cluster was suddenly very slow. Well, it turns out that the Suns were sitting there sending out requests for their server to boot them. These were broadcast TFTP requests. Unfortunately, they used a new broadcast address, which the Wollongong VMS code doesn't understand. So VMS attempted to forward them. This means that it issued an ARP request for the broadcast address. There is some problem in the Wollongong TCP that we don't quite understand yet. It seems that whenever there are lots of requests to talk to a host that doesn't respond to ARP's, the whole CPU ends up being used up in issuing ARP's. For example, when something goes wrong with our IBM-compatible mainframe (which is used to handle most of the printer output for the cluster, using Unix lpd implementations on both systems) the VAX cluster becomes unusable. As far as we can tell, it is spending all of its time trying to ARP the mainframe. In this case, the same phenomenon was triggered by the attempt to forward broadcast packets. Since our VMS systems mostly sit on networks that are connected by level 2 bridges instead of real IP gateways, broadcasts go throughout the whole campus, and essentially every VMS system is brought to its knees. Unfortunately, there is no way we can fix this. The Sun broadcast is being issued by its boot ROM, which is the one piece of software we aren't equipped to change, and we don't have source to the Wollongong code. So the solution for the moment is to put the Suns on a subnet that is safely isolated behind an IP gateway. This fixes the problem, because IP gateways don't pass broadcasts, or they only pass very carefully selected ones.