Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!gatech!ncar!ames!lll-tis!helios.ee.lbl.gov!nosc!ucsd!ucbvax!ANDREW.CMU.EDU!ddp+
From: ddp+@ANDREW.CMU.EDU (Drew Daniel Perkins)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: a proposed modification to ARP
Message-ID: 
Date: 11 Jul 88 22:54:45 GMT
References: <10435@ulysses.homer.nj.att.com>
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 24

The CMU router has an implementation of ARP which deals with ARP and dead hosts
very effectively.  We use two timers and a counter instead of the usual
implementation with one timer.  The first timer is like that in BSD UNIX.
Every time an arp cache entry is referenced, the timer is reset to some "large"
value.  If the timer goes off (because no packets were SENT to the host), the
entry is removed.  I think Berkeley uses 5 minutes for this timeout and we use
20 minutes.  Because of our additional timer and counter, this could easily be
hours (days).

Our other timer and counter work as follows.  When an arp entry is refreshed
(created/updated), the second timer is reset to some "small" timeout, say two
minutes, and the counter is reset to zero.  If the timer goes off (because no
ARP requests or replies were RECEIVED from the host), the timer is reset and
the counter is incremented.  If the counter is less than a small constant such
as two, a point-to-point ARP request is sent directly to the host (i.e. it is
NOT broadcast).  If the host is still alive, it will answer it causing the
timer to be reset and the counter to be reset.  If the host is dead, changed
it's ethernet address, moved, etc. it will NOT reply, causing the timer to go
off again.  If the retransmission counter ever reaches two (for example), the
entry is removed.  In this way, arp entries are continuously (but slowly)
tested for accuracy, and the router is very resilient to ARP or hardware
problems.  We rarely see a bad ARP entry persist for more than a few minutes.

Drew