Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!philabs!cmcl2!seismo!harvard!godot!johnl
From: johnl@godot.UUCP
Newsgroups: net.arch
Subject: Re: Arbitrary byte alignment
Message-ID: <426@ima.UUCP>
Date: Mon, 8-Oct-84 23:35:20 EDT
Article-I.D.: ima.426
Posted: Mon Oct  8 23:35:20 1984
Date-Received: Wed, 10-Oct-84 06:08:33 EDT
Lines: 0
Nf-ID: #R:houxl:-47000:ima:4600002:000:1745
Nf-From: ima!johnl    Oct  8 16:53:00 1984

There seem to have been four stages of byte addressing philosophy.

1.  Prehistoric: Machines like the 1620 and Z80 which were 
addressed a digit at a time, and built that way.  No alignment 
constraints, since there were no performance implications 
thereof.  

2.  Early, such as IBM System 360 and the PDP-11:  Byte addressed but word-
implemented.  Objects must be aligned on "natural" boundaries, i.e. multiples
of their own size, and you get a program fault if they're not.  Sometimes
software caught the faults and made it appear that arbitrary alignment was
possible, although very slowly.

3.  Decadent, such as IBM 370 and Vax:  Assembler programmers complained
about having to align stuff, so the misalignment was handled in microcode.
There's still a penalty for misalignment, but it's not so bad.

4.  Post-modern, such as Pyramid 90X, Berkeley RISC, and Stanford MIPS:
Hardware and software designers start to talk to each other, and find that
a) teaching compilers to deal with alignment isn't that hard, and b) if you
do so, you buy back a lot of performance.

There have also been strange intermediate stages such as least one
post-modern machine that enforces alignment by ignoring the low-order bits
of the address.

I suppose it would be possible to have fiendishly clever memory designs
where adjacent words were always in different memory banks so you could
cycle both at the same time.  Sounds pretty awful, though, since you have
to determine for each memory reference how many memories to cycle and how
to splice the parts together.  As far as I can tell, it's never been
seriously proposed for implementation, except perhaps incidentally in very
large cached architectures such as the IBM 308X.

John Levine, ima!johnl