Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!rutgers!ucla-cs!zen!ucbvax!jade!saturn!ucscc.UCSC.EDU!haynes From: haynes@ucscc.UCSC.EDU.ucsc.edu (99700000) Newsgroups: comp.unix.wizards,comp.arch Subject: Re: *Why* do modern machines mostly have 8-bit bytes? Message-ID: <565@saturn.ucsc.edu> Date: Wed, 22-Jul-87 19:41:37 EDT Article-I.D.: saturn.565 Posted: Wed Jul 22 19:41:37 1987 Date-Received: Sat, 25-Jul-87 12:37:59 EDT References: <142700010@tiger.UUCP> <2792@phri.UUCP> <8315@utzoo.UUCP> <2807@phri.UUCP> Sender: usenet@saturn.ucsc.edu Reply-To: haynes@ucscc.UCSC.EDU (Jim Haynes) Organization: California State Home for the Weird Lines: 75 Xref: mnetor comp.unix.wizards:3402 comp.arch:1678 Several old one-of-a-kind machines had 40-bit words because there was no floating point in those days and the people using them thought 40 bits was about enough. With the really ancient mercury delay line memory machines you could have just about any word size you wanted, because the amount of hardware was nearly the same regardless of word size. It was just a speed/pre- cision tradeoff. Since addresses were small (small memories) and words were wide they often had multi-address architectures. 36 bits was popular because it has more divisors than any other number of about that size. (1,2,3,4,6,12,18) so you could conveniently pack various sizes of operands into integral words. AND in those days of punched cards and upper-case-only 6 bits was enough for an alphanumeric character set. 8 bits as a byte size came about for a number of rational and emotional reasons. A decimal digit takes four bits, so you can pack two of them into an 8-bit byte. 6 bits wasn't enough for alphanumeric characters with upper and lower case, and if we were going to go to all the trouble of a new character set we probably should make it plenty big, hence 7 bits might not be enough. The widest punched paper tape equipment in production was 8 bits wide, and a lot of people thought 7-bit ASCII should be punched with a parity bit added. 7 is a prime number, whereas 8 has lots of divisors (1,2,4,8) so aside from decimal digits there were other kinds of things that might be packed into an 8-bit byte. The IBM Project Stretch furnished a lot of ideas that were used in S/360, as well as a lot that were not. An 8-bit character set was designed for Stretch, along with a 64-bit word and addressability down to the bit. If you're going to address down to the bit, or down to the byte, you would like to have addresses with no unused bit combinations, for maximum information density. For instance, if you have 6 bytes per word then the byte part of an address goes 000, 001, 010, 011, 100,101 and then the combinations 110 and 111 are not used. Aside from the wasted information density this leads to complexity in doing arithmetic on addresses - you'd have to do the byte part modulo-6 and the rest of it modulo-2. So IBM decided a 32 bit or 64 bit word was a reasonable way to go, and the rest of the world had to follow. (Personally I'm partial to 48 bit floating point as in the Burroughs machines, but... The B6500 and later had to accomodate 4-bit packed decimal and 8-bit bytes, and for reasons of compatibility they also handle 6-bit characters and floating point numbers with the exponent being a power of eight. All these different data sizes must have complicated the machine enormously.) The PDP-10 scheme for arbitrary size bytes looks pretty good to me. Of course there is some wasted space in the word if the byte size is not an integral sub-multiple of 36. The GE/Honeywell scheme (36 bit words with 6 and 9 bit byte sizes) leads to a lot of grubbiness. Either way there are annoyances. In a PDP10 if you want to write data from memory to tape you have to choose whether to write integral 36-bit words, or whether the data are bytes and should be written byte-by-byte to tape and the unused bits in the word left out. In the Honeywell machines they can represent ASCII in 9-bit bytes. So if you write to tape should you similarly have to decide whether to write all the bits in the word, or the 8 bits of each byte that contain an ASCII character and omit the other bits. Whereas in our 8/16/32/64 bit machines you just write all the bits to tape as they come regardless of what the bits mean. You may have a problem with characters written backwards (the big-endian versus little-endian problem) but at least you don't lose any bits. Not that tape has to be 8 bits wide either; but if the dominant vendor makes 8 bit wide tapes any other vendor had better be able to read and write them. bits in the word, or haynes@ucscc.ucsc.edu haynes@ucscc.bitnet ..ucbvax!ucscc!haynes