Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!mcvax!enea!chalmers!cthct!tropp From: tropp@cthct.UUCP (Ulf Tropp) Newsgroups: comp.unix.questions,comp.unix.wizards Subject: Re: Help on deciphering crash Message-ID: <11@cthct.UUCP> Date: Thu, 8-Jan-87 05:22:16 EST Article-I.D.: cthct.11 Posted: Thu Jan 8 05:22:16 1987 Date-Received: Sun, 11-Jan-87 23:17:10 EST References: <3645@sdcrdcf.UUCP> <4891@mimsy.UUCP> <1419@cit-vax.Caltech.Edu> <4914@mimsy.UUCP> Reply-To: tropp@cthct.UUCP (Ulf Tropp) Organization: Dept. of Comp. Tech., Chalmers, Gothenburg, Sweden Lines: 61 Xref: mnetor comp.unix.questions:609 comp.unix.wizards:563 In article <4914@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>In article <3645@sdcrdcf.UUCP> davem@sdcrdcf.UUCP (David Melman) writes: >>>Our Vax 750 running 4.2BSD has occassionally been crashing with: >>>machine check 2: cp tbuf par fault >>> va 80039728 errpc 8000394e mdr a smr 8 rdtimo 0 tbgpar 0 cacherr 5 >>> busserr 6 mcesr 9 pc 8000394e ps1 40c0008 mcsr 80016 > >Anyway, you could try disabling the cache: > > mtpr(CADR, 1); /* CADR is register 0x25 */ > >but that will probably slow the machine to a crawl. Disabling >and reenabling the cache might well flush it, though. If > > mtpr(CADR, 1); > mtpr(CADR, 0); > >does not clear the problem, perhaps reenabling it after a long >delay will. We had a lousy cache once that would cause a mchk approximately once an hour. Since DEC couldn't supply a new board in a week, I had plenty of time to test recovery code. What I did was essentially: mtpr(CADR,1); if(mcf->mc5_cacherr&0xe){ mtpr(CAER,0xf); /* fetch offending byte w/o cache */ if(mcf->mc5_va&0x80000000) i = *((char *)mcf->mc5_va); else i = fubyte(mcf->mc5_va); if(mfpr(CAER)&0xe){ return; /* run without cache */ } printf("Cache reenabled\n"); mtpr(CADR,0); } return; Probably not entirely correct, but id did seem to work: the sytem would mostly return orderly to the aborted instruction, sometimes going directly into a new mchk a couple of times. Anyway, does somebody know about which instructions that can be restarted? Shouldn't anyone that can generate a page fault? BTW, a comment in the 4.2 tbuf recovery code says "Should we use pc or errpc.." (when looking at the instruction to return to). Clearly it must be pc, since that is what we is returning to, so I changed the 4.2 code. In-Real-Life: Ulf Tropp Systems Administrator Dept. of Computer Engineering Chalmers Univ. of Technology S-412 96 Gothenburg Sweden UUCP: ..mcvax!enea!chalmers!cthct!tropp ARPA: tropp%cthct.uucp@seismo.CSS.GOV (?)