Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!rutgers!ucla-cs!zen!ucbvax!jade!saturn!ucscc.UCSC.EDU!haynes
From: haynes@ucscc.UCSC.EDU.ucsc.edu (99700000)
Newsgroups: comp.unix.wizards,comp.arch
Subject: Re: *Why* do modern machines mostly have 8-bit bytes?
Message-ID: <565@saturn.ucsc.edu>
Date: Wed, 22-Jul-87 19:41:37 EDT
Article-I.D.: saturn.565
Posted: Wed Jul 22 19:41:37 1987
Date-Received: Sat, 25-Jul-87 12:37:59 EDT
References: <142700010@tiger.UUCP> <2792@phri.UUCP> <8315@utzoo.UUCP> <2807@phri.UUCP>
Sender: usenet@saturn.ucsc.edu
Reply-To: haynes@ucscc.UCSC.EDU (Jim Haynes)
Organization: California State Home for the Weird
Lines: 75
Xref: mnetor comp.unix.wizards:3402 comp.arch:1678

Several old one-of-a-kind machines had 40-bit words because there was
no floating point in those days and the people using them thought
40 bits was about enough.

With the really ancient mercury delay line memory machines you could
have just about any word size you wanted, because the amount of hardware
was nearly the same regardless of word size.  It was just a speed/pre-
cision tradeoff.  Since addresses were small (small memories) and
words were wide they often had multi-address architectures.

36 bits was popular because it has more divisors than any other number
of about that size. (1,2,3,4,6,12,18) so you could conveniently pack
various sizes of operands into integral words.  AND in those days of
punched cards and upper-case-only 6 bits was enough for an alphanumeric
character set.

8 bits as a byte size came about for a number of rational and emotional
reasons.  A decimal digit takes four bits, so you can pack two of them
into an 8-bit byte.  6 bits wasn't enough for alphanumeric characters
with upper and lower case, and if we were going to go to all the trouble
of a new character set we probably should make it plenty big, hence
7 bits might not be enough.  The widest punched paper tape equipment
in production was 8 bits wide, and a lot of people thought 7-bit 
ASCII should be punched with a parity bit added.  7 is a prime number,
whereas 8 has lots of divisors (1,2,4,8) so aside from decimal digits
there were other kinds of things that might be packed into an 8-bit byte.
The IBM Project Stretch furnished a lot of ideas that were used in S/360,
as well as a lot that were not.  An 8-bit character set was designed
for Stretch, along with a 64-bit word and addressability down to the
bit.

If you're going to address down to the bit, or down to the byte, you
would like to have addresses with no unused bit combinations, for
maximum information density.  For instance, if you have 6 bytes
per word then the byte part of an address goes 000, 001, 010, 011,
100,101   and then the combinations 110 and 111 are not used.
Aside from the wasted information density this leads to complexity
in doing arithmetic on addresses - you'd have to do the byte part
modulo-6 and the rest of it modulo-2.

So IBM decided a 32 bit or 64 bit word was a reasonable way to go,
and the rest of the world had to follow.

(Personally I'm partial to 48 bit floating point as in the Burroughs
machines, but...  The B6500 and later had to accomodate 4-bit packed
decimal and 8-bit bytes, and for reasons of compatibility they also
handle 6-bit characters and floating point numbers with the exponent
being a power of eight.  All these different data sizes must have
complicated the machine enormously.)

The PDP-10 scheme for arbitrary size bytes looks pretty good to me.
Of course there is some wasted space in the word if the byte size
is not an integral sub-multiple of 36.  The GE/Honeywell scheme
(36 bit words with 6 and 9 bit byte sizes) leads to a lot of
grubbiness.  Either way there are annoyances.  In a PDP10 if you
want to write data from memory to tape you have to choose whether
to write integral 36-bit words, or whether the data are bytes and
should be written byte-by-byte to tape and the unused bits in the
word left out.  In the Honeywell machines they can represent ASCII
in 9-bit bytes.  So if you write to tape should you similarly have
to decide whether to write all the bits in the word, or the 8 bits
of each byte that contain an ASCII character and omit the other
bits.  Whereas in our 8/16/32/64 bit machines you just write all
the bits to tape as they come regardless of what the bits mean.
You may have a problem with characters written backwards (the
big-endian versus little-endian problem) but at least you don't lose
any bits.  Not that tape has to be 8 bits wide either; but if
the dominant vendor makes 8 bit wide tapes any other vendor had better
be able to read and write them.


bits in the word, or
haynes@ucscc.ucsc.edu
haynes@ucscc.bitnet
..ucbvax!ucscc!haynes