Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!syntron!orcisi!urip
From: urip@orcisi.UUCP
Newsgroups: comp.arch,comp.sys.intel,comp.sys.m68k
Subject: byte order: be reasonable - do it my way...
Message-ID: <760@orcisi.UUCP>
Date: Thu, 8-Jan-87 10:27:37 EST
Article-I.D.: orcisi.760
Posted: Thu Jan  8 10:27:37 1987
Date-Received: Thu, 8-Jan-87 19:28:39 EST
Distribution: net
Organization: Optical Recording Corporation, Toronto, Ontario
Lines: 415
Xref: syntron comp.arch:168 comp.sys.intel:80 comp.sys.m68k:93

Although it's coming a little late, and some readers may have forgotten the 
original article (On Holy Wars and a Plea for Peace by Danny Cohen) by now, 
I still hope that my article will get enough audience.

My point is that the Least-Significant-Byte-first camp (LSBians,
pronounced: elesbians) has a more correct way than the Most-Significant-byte-
first (MSBians, pronounced: emesbians), and I am going to try to convince
the MSBians to go my way. 

Before I start with the main issue, let me comment about the side issue.
As someone who's native language is Hebrew and also knows some
Arabic from school, I would like to confirm almost everything that was said 
in the article and in the responses about the order of digits etc. including
the examples from the Bible and computer terminals in Arabic/Hebrew. 
There was a slight inaccuracy about the way numbers are read in Arabic:
Only the units and tens are read the LSBian way, and the rest of
the number is read the MSBian way. For example, the year 1984 is read: 
"one thousand nine hundreds four and eighty". Also, for those who don't know,
the digit characters in Arabic are different from the Latin forms, but
in Hebrew they are the same.

The article was written in 1980, and things have changed since then.
Six years are a lifetime in the world of computers, and sentences like:

"I failed to find a Little-Endians' system which is totally consistent"

cannot be left without an objection in 1986 (almost 1987).
The Intel 80*86 micro processors are true, consistent LSBians. They do not
have combined shift operations (the article suggested these as a good criterion
to tell between LSBians and MSBians), but the multiply operation sure leaves
the most significant part of the result in the high register, and the floating
point format is totally consistent with the rest of the data types.
The same is true for the National Series 32000 and I believe that Zilog
is with the LSBians too.

So in the micro processor area, it seems that Motorola 68000
is the only (though major...) MSBian around. Now, is it really as clean
and pure MSBian as claimed in the article? Let me refresh your memory
with a quote from the article:

"Hence, the M68000 is a consistent Big-Endian, except for its bit
designation, which is used to camouflage its true identity.
Remember: the Big-Endians were the outlaws."

The author did not try to claim that the funny floating point format
of the VAX was to camouflage the VAX's true identity, so why should one believe
that the LSBian bit order of the M68000 is because "the Big-Endians were
the outlaws" ? I suspect that the true reason behind the inconsistency 
of the M68000 is the fact that only with an LSBian bit order, the value of 
bit number 'i' in a word is always equal to  

		b[i] * 2^i

(where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power 
of i) 
and the designers of M68000 wanted to keep this important feature in spite of 
the overall MSBian architecture.


There is another difference between LSBian and MSBian memory order that
was not mentioned in the article. 
In the LSBian scheme, if a long-word in memory contains a small value,
then a word or a byte in the same memory location still hold the same
value (if the value is small enough to fit into these).
For example, assume we have the value 0x00000002 in a (32 bit) long-word
in memory address 100.

                LSB in lower address

address	       104  103  102  101  100
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that a long-word, short word, byte and even nibble at address 100, all
contain value 2.
On the other hand,

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that only a long-word at address 100 contains 2. All the rest contain 0.

This may not seem to be a key issue, but it has some significance in type 
conversion as illustrated by the following C program segment:

/*=================================*/
int	i;
char	ch;

ch = i;
/*=================================*/

The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. 
In LSBian, this conversion is just a simple 'movb' (move byte) instruction 
from 'i' to 'ch':

		movb	i, ch

since both byte and long-word contain the same value. 

In MSBian it may involve an expensive bit field instruction (or worse, 
shifts and ands). Luckily for the M68000, it is byte addressable, so the 
compiler can do the trick and generate:

		movb	i+3, ch

So it is still a simple machine instruction, but it involves a small trick.
Not clean, but still consistent, as long as we stick to byte addressable
memory. 

But what about registers? registers are not byte addressable. 
There is only one byte of a register that can be accessed by a 'movb' 
instruction. All the other 3 bytes can be accessed only through bit field
instructions (or worse, shifts and ands). 

Let's look at another program segment:

/*=================================*/
extern  int fgetc();
char	ch;

ch = fgetc(file);
/*=================================*/

The C library routine 'fgetc' returns an 'int' result and it has to be
converted to 'char'. Most implementations return function results in 
register 0. 

Assume that register D0 contains 'int' (32 bits) value 2,
and so does the long-word at address 100.

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+
                +----+----+----+----+
register D0     | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

The instructions

	movl	100,x
	movl	D0,x

both move a long-word containing value 0x00000002 to location 'x'.
so
	movb	100,x
	movb	D0,x

both should move a byte containing value 00 to locaion 'x'.
So the code generated for above program segment in a true consistent
MSBian machine would be:

	jbsr	fgets
	movl	24,d1
	lsrl	d1,d0
	movb	d0,ch

But in M68000 this is not true. No shift operation is needed because
a 'movb' instruction with a register operand takes the byte that contains 2,
that is, the HIGH address bit, so the compiler can generate a

	jbsr	fgets
	movb	d0,ch

In other words, we see that the byte/word/long-word overlap of registers 
in the M68000 is implemented according to the more efficient LSBian way!!


Conclusion:
==========

I have shown that there are two aspects in which the LSBian way is more 
suitable and more efficent for binary computers. This is in addition
to the argument of easier serial addition and multiplication that was mentioned
in the article (though the latter is balanced, to some extent, by serial 
comparison and division).

The main argument left against the LSBians is the more readable MSBian
dump format. I think that in the modern days of optimizing compilers and 
symbolic debuggers, dumps are almost an extinct species, and please 
let them stay that way. 

I don't have any illusions. I don't expect Motorola to change their byte order
after reading my article. I don't even expect users to prefer LSBian 
machines just for the sake of beauty and consistency. 
But I do hope that some day the LSBian method will prevail (or, maybe,
someone will convince me of the superiority of the MSBian method...).


Uri Postavsky (utcs!syntron!orcisi!urip)

		(currently with O.R.C Toronto,
		  formerly with National Semiconductor Tel Aviv).


From postnews Thu Jan  8 10:22:34 1987

Subject: Byte Order: be reasonable - do it my way...
Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel



Although it's coming a little late, and some readers may have forgotten the 
original article (On Holy Wars and Plea for Peace by Danny Cohen) by now, 
I still hope that my article will get enough audience.

My point is that the Least-Significant-Byte-first camp (LSBians,
pronounced: elesbians) has a more correct way than the Most-Significant-byte-
first (MSBians, pronounced: emesbians), and I am going to try to convince
the MSBians to go my way. 

Before I start with the main issue, let me comment about the side issue.
As someone who's native language is Hebrew and also knows some
Arabic from school, I would like to confirm almost everything that was said 
in the article and in the responses about the order of digits etc. including
the examples from the Bible and computer terminals in Arabic/Hebrew. 
There was a slight inaccuracy about the way numbers are read in Arabic:
Only the units and tens are read the LSBian way, and the rest of
the number is read the MSBian way. For example, the year 1984 is read: 
"one thousand nine hundreds four and eighty". Also, for those who don't know,
the digit characters in Arabic are different from the Latin forms, but
in Hebrew they are the same.

The article was written in 1980, and things have changed since then.
Six years are a lifetime in the world of computers, and sentences like:

"I failed to find a Little-Endians' system which is totally consistent"

cannot be left without an objection in 1986 (almost 1987).
The Intel 80*86 micro processors are true, consistent LSBians. They do not
have combined shift operations (the article suggested these as a good criterion
to tell between LSBians and MSBians), but the multiply operation sure leaves
the most significant part of the result in the high register, and the floating
point format is totally consistent with the rest of the data types.
The same is true for the National Series 32000 and I believe that Zilog
is with the LSBians too.

So in the micro processor area, it seems that Motorola 68000
is the only (though major...) MSBian around. Now, is it really as clean
and pure MSBian as claimed in the article? Let me refresh your memory
with a quote from the article:

"Hence, the M68000 is a consistent Big-Endian, except for its bit
designation, which is used to camouflage its true identity.
Remember: the Big-Endians were the outlaws."

The author did not try to claim that the funny floating point format
of the VAX was to camouflage the VAX's true identity, so why should one believe
that the LSBian bit order of the M68000 is because "the Big-Endians were
the outlaws" ? I suspect that the true reason behind the inconsistency 
of the M68000 is the fact that only with an LSBian bit order, the value of 
bit number 'i' in a word is always equal to  

		b[i] * 2^i

(where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power 
of i) 
and the designers of M68000 wanted to keep this important feature in spite of 
the overall MSBian architecture.


There is another difference between LSBian and MSBian memory order that
was not mentioned in the article. 
In the LSBian scheme, if a long-word in memory contains a small value,
then a word or a byte in the same memory location still hold the same
value (if the value is small enough to fit into these).
For example, assume we have the value 0x00000002 in a (32 bit) long-word
in memory address 100.

                LSB in lower address

address	       104  103  102  101  100
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that a long-word, short word, byte and even nibble at address 100, all
contain value 2.
On the other hand,

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

Note that only a long-word at address 100 contains 2. All the rest contain 0.

This may not seem to be a key issue, but it has some significance in type 
conversion as illustrated by the following C program segment:

/*=================================*/
int	i;
char	ch;

ch = i;
/*=================================*/

The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. 
In LSBian, this conversion is just a simple 'movb' (move byte) instruction 
from 'i' to 'ch':

		movb	i, ch

since both byte and long-word contain the same value. 

In MSBian it may involve an expensive bit field instruction (or worse, 
shifts and ands). Luckily for the M68000, it is byte addressable, so the 
compiler can do the trick and generate:

		movb	i+3, ch

So it is still a simple machine instruction, but it involves a small trick.
Not clean, but still consistent, as long as we stick to byte addressable
memory. 

But what about registers? registers are not byte addressable. 
There is only one byte of a register that can be accessed by a 'movb' 
instruction. All the other 3 bytes can be accessed only through bit field
instructions (or worse, shifts and ands). 

Let's look at another program segment:

/*=================================*/
extern  int fgetc();
char	ch;

ch = fgetc(file);
/*=================================*/

The C library routine 'fgetc' returns an 'int' result and it has to be
converted to 'char'. Most implementations return function results in 
register 0. 

Assume that register D0 contains 'int' (32 bits) value 2,
and so does the long-word at address 100.

                MSB in lower address

address	       100  101  102  103  104
                +----+----+----+----+
value           | 00 | 00 | 00 | 02 | 
                +----+----+----+----+
                +----+----+----+----+
register D0     | 00 | 00 | 00 | 02 | 
                +----+----+----+----+

The instructions

	movl	100,x
	movl	D0,x

both move a long-word containing value 0x00000002 to location 'x'.
so
	movb	100,x
	movb	D0,x

both should move a byte containing value 00 to locaion 'x'.
So the code generated for above program segment in a true consistent
MSBian machine would be:

	jbsr	fgets
	movl	24,d1
	lsrl	d1,d0
	movb	d0,ch

But in M68000 this is not true. No shift operation is needed because
a 'movb' instruction with a register operand takes the byte that contains 2,
that is, the HIGH address bit, so the compiler can generate a

	jbsr	fgets
	movb	d0,ch

In other words, we see that the byte/word/long-word overlap of registers 
in the M68000 is implemented according to the more efficient LSBian way!!


Conclusion:
==========

I have shown that there are two aspects in which the LSBian way is more 
suitable and more efficent for binary computers. Even an MSBian machine
like M68000 is LSBian in these aspects. This is in addition
to the argument of easier serial addition and multiplication that was mentioned
in the article (though the latter is balanced, to some extent, by serial 
comparison and division).

The main argument left against the LSBians is the more readable MSBian
dump format. I think that in the modern days of optimizing compilers and 
symbolic debuggers, dumps are almost an extinct species, and please 
let them stay that way. 

I don't have any illusions. I don't expect Motorola to change their byte order
after reading my article. I don't even expect users to prefer LSBian 
machines just for the sake of beauty and consistency. 
But I do hope that some day the LSBian method will prevail (or, maybe,
someone will convince me of the superiority of the MSBian method...).


Uri Postavsky (utcs!syntron!orcisi!urip)

		(currently with O.R.C Toronto,
		  formerly with National Semiconductor Tel Aviv).