Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site bunker.UUCP Path: utzoo!watmath!clyde!burl!ulysses!ucbvax!decvax!ittvax!bunker!reno From: reno@bunker.UUCP (Jim Reno) Newsgroups: net.unix-wizards,net.micro.68k Subject: Pre-fetch Message-ID: <891@bunker.UUCP> Date: Tue, 9-Jul-85 10:03:49 EDT Article-I.D.: bunker.891 Posted: Tue Jul 9 10:03:49 1985 Date-Received: Thu, 11-Jul-85 07:46:39 EDT Distribution: net Organization: Bunker Ramo, Trumbull Ct Lines: 37 Xref: watmath net.unix-wizards:13762 net.micro.68k:995 We ran into an interesting problem here that I haven't seen mentioned anywhere previously. The symptoms were that under certain circumstances the system would hang due to a memory fault while in the kernel. The system we use is 68000 based with a 4-segment memory management unit. While in supervisor state the MMU is essentially disabled and the processor has access to all of physical memory. The kernel happens to relocate some parts of itself during initialization. It turned out that a routine for a driver was being relocated to the absolute end of physical memory, such that the last two bytes of memory contained an RTS (return from subroutine). When this instruction was executed the fault occurred because the 68K prefetches 2 to 4 bytes ahead of where it's executing. The prefetch was into nonexistent memory, hence the external logic produced the fault. This is a classic problem with pipelined systems. The fix for the kernel was simple - just ensure a few bytes of unused padding at the end of physical memory. However, the problem exists for user-mode programs as well. Suppose you have a shared text program where the code is exactly some multiple of the basic block size used by the MMU (1k on our system). Further suppose that your kernel allocates exactly that amount of memory. If the processor prefetches, the MMU will fault (and the program dump core) when the very last instruction is executed. The MMU, of course, has no way of knowing that the processor would never have actually used those bytes. Non-shared text programs don't have the problem, since there is usually data and/or stack above the code. There are a number of solutions. The loader could always pad shared text images by a few bytes. Perhaps a better solution is to have the exec code in the kernel check to see if the shared text segment is exactly a multiple of the MMU block size, and allocate an extra block.