Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!husc6!cmcl2!brl-adm!adm!cpw%sneezy@LANL.GOV From: cpw%sneezy@LANL.GOV (C. Philip Wood) Newsgroups: comp.unix.wizards Subject: 4.3 BSD networking Message-ID: <8479@brl-adm.ARPA> Date: Sun, 26-Jul-87 10:39:02 EDT Article-I.D.: brl-adm.8479 Posted: Sun Jul 26 10:39:02 1987 Date-Received: Sun, 26-Jul-87 21:04:58 EDT Sender: news@brl-adm.ARPA Lines: 119 INTRODUCTION Our MILNET host (VAX BSD 4.3) can get pretty busy sometimes. And, if you throw in rude hosts with broken network implementions, sending junk, the result used to be: 'panic: out of mbufs: map full I have made a number of changes (some posted to news) to the kernel to allow us to weather this problem, with some good success. However, since then, I have had a few crashes which, I assume, resulted from traversing untested code in the kernel. I am hoping for some discussion, pointers to other discussions, fixes, etc., on buffering, congestion control, garbage collection, recognition and control of rude hosts. What follows is an attempt to summarize my experience modifiying the kernel. Familiarity with 'netinet/*.c' and 'sys/uipc*.c' and 'h/*.h' modules is assumed. SUMMARY OF CHANGES To begin with, I noticed there was provision for sleep and wakeup on the 'mfree' queue. However, this code was never exercised since the panic occured first. I modified 'm_clalloc' to just return a failure which would cause 'm_more' to sleep in the M_WAIT state and 'm_expand' to return 'mfree' on the off chance that some other process had released some message buffers. At first this did not work at all! I found that the numerous calls to MGET/m_get were not very careful about the wait option. Consequently, the kernel would attempt a sleep on an interrupt. I found all these babys and changed the canwait flag appropriately. This revised system worked very well. I could prove this by pounding the system with thousands of packets which used to panic the unrevised system. The new version stayed up and I thought "Oh boy". However, my joy was short lived (6 days). The first crash I experienced resulted from a bug in MCLGET which assumed that on call the mbuf (m)->m_len was not equal to CLBYTES. So, a failure return from MCLALLOC would return a success from MCLGET if (on call) m->m_len was equal to CLBYTES. Then the calling process would happily fill in the pseudo cluster with whatever, eventually leading to some awful panic like a Segmentation Fault, depending on what that cluster space might have been used for (like a socket structure or someones data space). I fixed this one, and thought "Oh boy". Well, another few days went by and we restarted the named daemon, and: panic: Segmentation fault By this time I had accumulated a pretty neat set of adb scripts with which to dump out numerous aspects of the message buffering scheme, and found that: 1. There were no free mbufs. The kernel had run out of mbufs 2516 times and droped 2462 M_DONTWAIT requests. The difference, 54, would be the number of times processes had been put to sleep. The 'm_want' flag was zero so, presumably there were no processes waiting for mbufs or one was about to awake. 2. There were 232 UDP domain nameserver packets waiting on receive queues on the 'udb' queue. 3. The kernel was attempting to attach a new tcp socket with the sequence: ipintr -> tcp_input -> sonewconn when it encountered a failure from tcp_usrreq and attempted to dequeue the socket vi 'soqremque'. The socket had already been soqremque'd deep in the guts of a sequence something like: tcp_usrreq -> in_pcbdetach -> sofree Consequently, the code in soqremque attempted to use a 0 based mbuf and grabbed a weird address for a socket out of low core. I am trying to figure out how to fix this last one. One fix would be to put a silent check for zero in soqremque and just return, maybe bump a counter or print from where called? Any suggestions, would be appreciated. COMMENTARY In one sense I am fixing untested kernel code. But, if I step back just a tad and take a look at what I'm doing, I see that I am attempting, haphazardly, to resolve the problem of buffering and congestion control. It turns out, in all cases (see below) I have investigated, the exhaustion of the page map happened after all mbufs had been put on one queue or another. That is to say, I can account for every mbuf in the pool. None had been "leaked" or forgotten. case 1. The first case came to light when I discovered most of the mbufs were linked on a tcp reassembly queue for some telnet connection from a VMS system over MILNET. Each mbuf had one character in it. With a receive window of 4K you can run out of mbufs pretty easy. case 2. The second case came resulted from sending lots of udp packets of trash over an ethernet and swamping the udp queue. case 3. The last case I investigated, resulted from many domain name udp packets queueing up on the udp queue. Similar to case 2, but in this case the packets were 'legitimate'. AS I SEE IT The above points to two related items: 1. The 4.3 BSD kernel must be made more robust, to avoid being corrupted by rude hosts. Does anyone have ideas on how to identify resource hogs? What to do when you find one? 2. Once a misbehaving host has been identified, who is it we contact to get the problem fixed in a timely fashion. Where is it written down who to contact when XYZ vendors, ABC-n/XXX, zzzOS operating system is doing something wrong, and it is located 2527 miles away in a vault operated by QRS, International? Should this be part of the registration process for a particular domain? Is it already? Thanks for reading this far. Phil Wood, cpw@lanl.gov.