Path: utzoo!attcan!uunet!husc6!think!ames!pasteur!ucbvax!hplabs!nsc!curry
From: curry@nsc.nsc.com (Ray Curry)
Newsgroups: comp.sys.nsc.32k
Subject: Re: NSC532MB "THE '532 Project"
Keywords: questionaire, the Vision
Message-ID: <8109@nsc.nsc.com>
Date: 28 Nov 88 17:58:30 GMT
References: <445@sdrc.UUCP> <1061@raspail.UUCP>
Reply-To: curry@nsc.nsc.com.UUCP (Ray Curry)
Organization: National Semiconductor, Sunnyvale
Lines: 48

In article <1061@raspail.UUCP> bga@raspail.UUCP (Bruce Albrecht) writes:
>I'm not very hardware knowledgeable, so bear with me.  I also don't have any
>'532 datasheets, although my local sales office has ordered the 1988 databook
>for me.
>
>If we were to use the 25 Mhz '532 out of the 532DK (can we do this?  Can the
>'532 be removed from the 532DK PC board?), and 100 ns DRAM, are there any
>problems?  I realize that any memory request would need wait states.  Would
>we be able to use burst mode?  The '532 has onboard cache.  Wouldn't it have
>a large enough hit rate that the '532 would still have a very acceptable 2-3
>MIPS rating?  If we went this route, what sort of special hardware might we
>need?

Perhaps I can answer several questions at the same time.  The 532DK comes 
totally non-assembled including a socket for the NS32532 so the CPU is 
available.  Secondly the '532 interfaces pretty well with 100 ns memory.
If you don't have an external cache to have search a tag table for, the
memory available time is two bclocks (80ns) less address valid (8ns) and
less data input setup (11ns).  This available time is distributed over
address and data buffers as well as the memory access time.  With each
wait state, this increases by 40 ns.  Depending upon the buffers used and
decode technique, you should be able to run 100 ns memory with 2 waits.
You can also consider running at 20MHz and get by with 1 wait, or use
80 ns for 1 wait at 25 MHz.  To get the most performance either way, 
you would want to use burst memory access which uses one clock for each
of the next 3 reads.  To do this with your 100 ns memory, you would have
to interleave memory banks so that while one bank is being read, the
address is setup to the second bank.  (Or consider using the nibble mode
RAM from INMOS).  

As to performance under these circumstances, the main reason for the
external cache on the VME532 board is to allow easy (slow) memory access
to main memory.  Running out of main memory on the VME532 board uses 
something like 7 waits.  In running Dhrystones, I noticed about 10%
degradation for the first wait state and am told by our Israeli chip
experts that Dhrystones are a bad case, worse than average.

My advice for a good but not absolutely highest performance UNIX or
similar based machine would be to do just that, simplify the memory
design, forget an external cache, and use burst mode.

Another posting asked about whether the MMU took an extra 50 ns.  Accesses
with the MMU enabled do not take extra time as long as the translation table
entry is in the MMU's TLB.  Each time the CPU enters a new page table that
doesn't not have an entry in the TLB, the MMU will do a look up in the
external tables.  The table entries stored in the MMU are the 64 most 
recently used. Since the pages are 4K, the effect of the table look ups
are minimal.