Path: utzoo!utgpu!attcan!uunet!lll-winken!lll-tis!ames!amdahl!pacbell!belltec!jim
From: jim@belltec.UUCP (Mr. Jim's Own Logon)
Newsgroups: comp.sys.ibm.pc
Subject: Re: Why am I getting this crazy parity interrupt???
Summary: I HATE PARITY ERRORS!!!
Message-ID: <254@belltec.UUCP>
Date: 10 Aug 88 14:34:12 GMT
References: <2968@dalcs.UUCP>
Distribution: na
Organization: Bell Technologies, Fremont, CA
Lines: 57

In article <2968@dalcs.UUCP>, lane@dalcs.UUCP (John Wright/Dr. Pat Lane) writes:
> Perhaps some tech expert out there can tell me what's wrong with my system,
.
.
> 
> I get an occasional but frequent memory parity interrupt at location F000:ADF1.
> It always occurs while doing a floppy disk access.  It's never caused a problem
> other than the interruption and I've never detected any corruption of the file
> being read or written. I have one of those little TSR parity interuppt handlers
> installed so it's just a persistent annoyance.  What really strikes me strange
> is that the address given (by DOS's parity interrupt handler) is in ROM which
> was, I thought, not parity checked?!?.
> 
> -- 
> John Wright      /////////////////     Phone:  902-424-3805  or  902-424-6527


   Expert?  Well, maybe. The reason that the address reported is in ROM is
that without special hardware support (which isn't in PC's) the reported
error address is always the address fetch immediately AFTER the offending
read. This reported address often is nowhere near the actual address that 
failed. Only memory tests will report actual failing addresses (that or a
logic analyzer). 

   Why is it happening? Firstly, understand that it is a hardware problem.
Software cannot cause a parity error once the memory has been initialized
(which happens during the initial BIOS memory count). If you have tried
to upgrade the speed of your system by changing the crystal and maybe the
RAMs, this is the problem. There is a lot more to the system timing than
just the clock speed and the RAM access time. You could try faster RAMs,
but it may not make any difference. Could be the machine is just getting
old, parts age like everything else. When they age, the timing changes and
if the original design was on the edge (as far too many of the clone
machines are) then it breaks. Could be one weak RAM chip, could be noise
on the power supply, could be slightly conductive dust on the motherboard,
could be the star wars site next to your house.

   What can you do? Run an AT memory diagnostic for a day. IF it fails at
the same address or the same bit every time, then maybe replacing a single
RAM chip will solve the problem. Run an extended floppy test and watch for
parity errors, this will indicate if it is DMA related (DMA cycles have 
different memory timing than CPU cycles). If it is DMA related, Uh, well,
uh, well at least it is good to know that it is DMA related. Clean everything
in the system. Might make a difference, and it will make your mother proud of 
you.

   One final word of advice, unless you have a weak cell in a RAM, parity
errors are rarely single events. If you get a parity error the remainder 
of what you are doing is suspect. Some versions of DOS and UNIX do not
reset the parity logic after the first is seen, so any subsequent errors
are unnoticed. IF you are running something important, and want whatever
results you get to be good, give up after the first parity error and start
again. A fair number of computers just shut down on a parity error, giving
you no chance to proceed.


						-Jim Wall
						Bell Technologies Inc.