Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site redwood.UUCP
Path: utzoo!watmath!clyde!bonnie!akgua!sdcsvax!sdcrdcf!hplabs!hpda!fortune!foros1!redwood!rpw3
From: rpw3@redwood.UUCP (Rob Warnock)
Newsgroups: net.micro,net.micro.68k
Subject: Re: UniPress System V for Lisa, reviews?
Message-ID: <51@redwood.UUCP>
Date: Thu, 27-Sep-84 22:40:45 EDT
Article-I.D.: redwood.51
Posted: Thu Sep 27 22:40:45 1984
Date-Received: Sun, 30-Sep-84 04:35:39 EDT
References: <1190@pucc-h> <231@pertec.UUCP> <815@dual.UUCP>
Organization: Rob Warnock, Redwood City, CA
Lines: 85
Xref: 7992 410

+---------------
| Slow disks on the Lisa? I don't know what YOU all expect, but any
| disk running through a parallel port is bound to be slow...
| 
|     Mats Wichmann
|     Dual Systems Corp.
|     ...{ucbvax,amd,ihnp4,cbosgd,decwrl,fortune}!dual!mats
+---------------

Not necessarily as bad as you make it sound. It is possible that the Lisa's
PIO was so hard to program that no possible improvement could be made in the
performance, but I doubt that. (The 32:16's early PIO [see below] was about
as plain as one could get, and it did o.k.)

It is more likely that it was a quick & dirty port of a "standard" kernel,
without a lot of time spent tuning. [Anyone who actually worked on the
port want to comment on that?]

As a partial counter-example, the first units of the Fortune Systems 32:16
(the ones that went to dealers for demos) had an off-the-shelf SASI-to-ST506
controller board (I forget whose), driven byte-at-a-time by a simple non-DMA
parallel port.  (The custom DMA-based controller wasn't ready yet.) While
it was certainly much slower than the final production hardware/software
combination, surprisingly little of that was due to the parallel interface.
The major improvements were in selecting the proper "software interlace", and
certain changes to the handling of the disk buffer cache. As I recall (and just
confirmed by a quick back-of-the-envelope calculation), the overhead of the
byte-at-a-time interface was about 10 milliseconds/block. (block = 1024 byte)

Now while 10 ms/blk of CPU time sounds horrendous, remember that this was
on early ST506-type disks with an average access of ~100 ms., rotational
latency of 16.66+ ms., 2-4 heads/cylinder (compared to 8 or more today),
and a non-tuned filesystem/buffer-cache of "several" milliseconds processing
per request. I doubt if the parallel port cost more than 10-15% in overall
performance.  Particularly in a single-user situation, I doubt the average
user would notice.

(Tuning the interlace got them at least a 50% improvement. Using faster
access-time disks was worth nearly the exact ratio of access times, in
multi-user scenarios.  8-head disks bought a few percent, particularly
in swapping and program loading, since there was more data under the
heads as a result of each seek.)

The Lisa, however, at least the one I saw at a show sitting side-by-side
with a Radio Shack Model 16, BOTH running Xenix... the Lisa was a dog.
Compile times for a short program were easily double that of the Mod 16
(and maybe worse), while the Mod 16 and the Fortune had similar times
(19-22 seconds, now down to about 9 sec. for the Fortune System XP with
30ms. disks).

This points up the usual error in performance analysis -- pointing the
finger at that feature which is the most visible and for which the pointer
has the greatest personal distaste. The one lesson I have learned and seen
others learn again and again in this area is: performance is not a field
where (untrained) intuition is very accurate. [I put the "(untrained)" in
there because I have met performance-analysis people who had laboriously
trained their intuition to give reasonable results. For myself, I know just
enough to know not to trust it.]

True story:
A long time ago, I used to help design/sell/support terminal multiplexer
systems, tranferring character-at-a-time/interrupt-per-character (like a
DZ-11), that outperformed by a factor of 2 a competing front-end system
with co-processors, DMA, and fancy command queues -- simply because the
fancy system had too high an overhead in the setting-up and tearing-down
of requests.  Yet customers sometimes were convinced (by the other guy)
that DMA "had to be" faster. It was, for that tiny fraction of time
spent actually passing data across the interface. But the total system
overhead was less in the non-DMA system (INCLUDING the host CPU time),
and the overhead wasn't in the critical path of the inner loop of sending
characters, so the net performance of the non-DMA system was much better.

(Of course, knowing what we did about the critical paths, we could have
built a DMA-based system that would have been faster than BOTH the others...
but its cost/performance didn't look as good as the non-DMA version.)

So I remain curious about WHY the Lisa disks are so slow, and unconvinced
it was (entirely) the fault of the byte-parallel interface.

Rob Warnock

UUCP:	{ihnp4,ucbvax!amd}!fortune!redwood!rpw3
DDD:	(415)369-7437
Envoy:	rob.warnock/kingfisher
USPS:	Suite 203, 4012 Farm Hill Blvd, Redwood City, CA  94061