Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!linus!philabs!cmcl2!seismo!brl-tgr!tgr!tp4!cbush@RAND-UNIX.ARPA From: cbush@RAND-UNIX.ARPA Newsgroups: net.unix-wizards Subject: 4.2BSD panic trap 9 problem on VAX 11/785 Message-ID: <497@brl-tgr.ARPA> Date: Wed, 7-Aug-85 14:10:51 EDT Article-I.D.: brl-tgr.497 Posted: Wed Aug 7 14:10:51 1985 Date-Received: Sat, 10-Aug-85 23:48:37 EDT Sender: news@brl-tgr.ARPA Lines: 94 Need help with persistent, at least one a day, system crashes on our new VAX 11/785 running 4.2BSD UNIX. Always get the same panic messages; ... Aug 6 12:10 trap type 9, code = 80001400, pc = 80001400 panic: Protection fault syncing disks... 3 3 2 2 2 2 1 done ... My reading of the above and from looking at the kernal stack frames says, in summary, the system was attempting to execute the instruction at location 80001400 while in user mode!! How can that be? I must be missing something. The kernal stack says were really in kernal code "copyout" having gotten there from various places not always the same (Not a wild branch). In the last crash "gettimeofday" was calling copyout. If the system screwed up and made an invalid context switch to user mode why crash at 80001400 why not the instruction before? or first kernal instruction after the switch? Is the address validation only done at the page bounderys? I don't know if its hardware or software. We have seen these same trap 9's at same address on two of our four VAX 11/780's but very infrequently, less than one a month on heavly loaded systems. At this point were stymied. Now looking at kernal code differances on the various machines and changing hardware configuration in attempt to narrow it down. A PORTION OF THE KERNAL STACK, with my incomplete comments (First vax dump I've looked at) ----adb command *(scb-4)$c gives ---- (adb crash man pages seems screwed up?) sbr 8002bc64 slr 4b68 --- system pte's look good. p0br 808aae00 p0lr 74 p1br 800ab200 p1lr 1fffea _boot() from 80021f3a _boot(0,0) from 80021f3a _panic(80043083) from 8000cf76 _trap() from 800224ec _Xtransflt() from 80001035 _syscall() from 800227fc _Xsyscall(7fffe6f0,0) from 80001054 ?(7fffe734) from 214d ?(1,7fffebd8,7fffebe0) from 47c ?() from 37 ---STACK FRAME created by call to trap 7ffffefc: 0 number of args (really 5) 7fffff00: 2fff0000 mask/psw 7fffff04: 7fffff8c ap 7fffff08: 7fffff74 fp 7fffff0c: 80001035 pc 7fffff10: 8 r0 7fffff14: 7fffff6c r1 7fffff18: 0 r2 7fffff1c: 7fffed8c r3 7fffff20: 0 r4 7fffff24: 0 r5 7fffff28: 0 r6 7fffff2c: 0 r7 7fffff30: 8003f9cc r8 7fffff34: 8 r9 7fffff38: 7fffe6e8 r10 7fffff3c: 7fffed84 r11 7fffff40: 0 ?? 7fffff44: 7fffe6d0 trap arg0 sp unused i think 7fffff48: 9 arg1 trap# 7fffff4c: 80001400 arg2 code 7fffff50: 80001400 arg3 pc 7fffff54: c00000 arg4 psl previous=USER???; current=kernal; is=0 7fffff58: 8000b3b3 pc in gettimeofday inst after jbs _Copyout 7fffff5c: 7fffff6c 7fffff60: 7fffe6f0 7fffff64: 8 7fffff68: 0 7fffff6c: 1d566af3 7fffff70: 1fbd0 ---STACK FRAME created by call to gettimeofday 7fffff74: 0 7fffff78: 28000000 7fffff7c: 7fffffe8 7fffff80: 7fffffa4 7fffff84: 800227fc 7fffff88: 80000000 7fffff8c: 0 7fffff90: 4 7fffff94: 0 7fffff98: 3 7fffff9c: 9c400 7fffffa0: 216e Any hints would be greatly appreciated.