Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!syntron!orcisi!urip From: urip@orcisi.UUCP Newsgroups: comp.arch,comp.sys.intel,comp.sys.m68k Subject: byte order: be reasonable - do it my way... Message-ID: <760@orcisi.UUCP> Date: Thu, 8-Jan-87 10:27:37 EST Article-I.D.: orcisi.760 Posted: Thu Jan 8 10:27:37 1987 Date-Received: Thu, 8-Jan-87 19:28:39 EST Distribution: net Organization: Optical Recording Corporation, Toronto, Ontario Lines: 415 Xref: syntron comp.arch:168 comp.sys.intel:80 comp.sys.m68k:93 Although it's coming a little late, and some readers may have forgotten the original article (On Holy Wars and a Plea for Peace by Danny Cohen) by now, I still hope that my article will get enough audience. My point is that the Least-Significant-Byte-first camp (LSBians, pronounced: elesbians) has a more correct way than the Most-Significant-byte- first (MSBians, pronounced: emesbians), and I am going to try to convince the MSBians to go my way. Before I start with the main issue, let me comment about the side issue. As someone who's native language is Hebrew and also knows some Arabic from school, I would like to confirm almost everything that was said in the article and in the responses about the order of digits etc. including the examples from the Bible and computer terminals in Arabic/Hebrew. There was a slight inaccuracy about the way numbers are read in Arabic: Only the units and tens are read the LSBian way, and the rest of the number is read the MSBian way. For example, the year 1984 is read: "one thousand nine hundreds four and eighty". Also, for those who don't know, the digit characters in Arabic are different from the Latin forms, but in Hebrew they are the same. The article was written in 1980, and things have changed since then. Six years are a lifetime in the world of computers, and sentences like: "I failed to find a Little-Endians' system which is totally consistent" cannot be left without an objection in 1986 (almost 1987). The Intel 80*86 micro processors are true, consistent LSBians. They do not have combined shift operations (the article suggested these as a good criterion to tell between LSBians and MSBians), but the multiply operation sure leaves the most significant part of the result in the high register, and the floating point format is totally consistent with the rest of the data types. The same is true for the National Series 32000 and I believe that Zilog is with the LSBians too. So in the micro processor area, it seems that Motorola 68000 is the only (though major...) MSBian around. Now, is it really as clean and pure MSBian as claimed in the article? Let me refresh your memory with a quote from the article: "Hence, the M68000 is a consistent Big-Endian, except for its bit designation, which is used to camouflage its true identity. Remember: the Big-Endians were the outlaws." The author did not try to claim that the funny floating point format of the VAX was to camouflage the VAX's true identity, so why should one believe that the LSBian bit order of the M68000 is because "the Big-Endians were the outlaws" ? I suspect that the true reason behind the inconsistency of the M68000 is the fact that only with an LSBian bit order, the value of bit number 'i' in a word is always equal to b[i] * 2^i (where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power of i) and the designers of M68000 wanted to keep this important feature in spite of the overall MSBian architecture. There is another difference between LSBian and MSBian memory order that was not mentioned in the article. In the LSBian scheme, if a long-word in memory contains a small value, then a word or a byte in the same memory location still hold the same value (if the value is small enough to fit into these). For example, assume we have the value 0x00000002 in a (32 bit) long-word in memory address 100. LSB in lower address address 104 103 102 101 100 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that a long-word, short word, byte and even nibble at address 100, all contain value 2. On the other hand, MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that only a long-word at address 100 contains 2. All the rest contain 0. This may not seem to be a key issue, but it has some significance in type conversion as illustrated by the following C program segment: /*=================================*/ int i; char ch; ch = i; /*=================================*/ The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. In LSBian, this conversion is just a simple 'movb' (move byte) instruction from 'i' to 'ch': movb i, ch since both byte and long-word contain the same value. In MSBian it may involve an expensive bit field instruction (or worse, shifts and ands). Luckily for the M68000, it is byte addressable, so the compiler can do the trick and generate: movb i+3, ch So it is still a simple machine instruction, but it involves a small trick. Not clean, but still consistent, as long as we stick to byte addressable memory. But what about registers? registers are not byte addressable. There is only one byte of a register that can be accessed by a 'movb' instruction. All the other 3 bytes can be accessed only through bit field instructions (or worse, shifts and ands). Let's look at another program segment: /*=================================*/ extern int fgetc(); char ch; ch = fgetc(file); /*=================================*/ The C library routine 'fgetc' returns an 'int' result and it has to be converted to 'char'. Most implementations return function results in register 0. Assume that register D0 contains 'int' (32 bits) value 2, and so does the long-word at address 100. MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ +----+----+----+----+ register D0 | 00 | 00 | 00 | 02 | +----+----+----+----+ The instructions movl 100,x movl D0,x both move a long-word containing value 0x00000002 to location 'x'. so movb 100,x movb D0,x both should move a byte containing value 00 to locaion 'x'. So the code generated for above program segment in a true consistent MSBian machine would be: jbsr fgets movl 24,d1 lsrl d1,d0 movb d0,ch But in M68000 this is not true. No shift operation is needed because a 'movb' instruction with a register operand takes the byte that contains 2, that is, the HIGH address bit, so the compiler can generate a jbsr fgets movb d0,ch In other words, we see that the byte/word/long-word overlap of registers in the M68000 is implemented according to the more efficient LSBian way!! Conclusion: ========== I have shown that there are two aspects in which the LSBian way is more suitable and more efficent for binary computers. This is in addition to the argument of easier serial addition and multiplication that was mentioned in the article (though the latter is balanced, to some extent, by serial comparison and division). The main argument left against the LSBians is the more readable MSBian dump format. I think that in the modern days of optimizing compilers and symbolic debuggers, dumps are almost an extinct species, and please let them stay that way. I don't have any illusions. I don't expect Motorola to change their byte order after reading my article. I don't even expect users to prefer LSBian machines just for the sake of beauty and consistency. But I do hope that some day the LSBian method will prevail (or, maybe, someone will convince me of the superiority of the MSBian method...). Uri Postavsky (utcs!syntron!orcisi!urip) (currently with O.R.C Toronto, formerly with National Semiconductor Tel Aviv). From postnews Thu Jan 8 10:22:34 1987 Subject: Byte Order: be reasonable - do it my way... Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel Although it's coming a little late, and some readers may have forgotten the original article (On Holy Wars and Plea for Peace by Danny Cohen) by now, I still hope that my article will get enough audience. My point is that the Least-Significant-Byte-first camp (LSBians, pronounced: elesbians) has a more correct way than the Most-Significant-byte- first (MSBians, pronounced: emesbians), and I am going to try to convince the MSBians to go my way. Before I start with the main issue, let me comment about the side issue. As someone who's native language is Hebrew and also knows some Arabic from school, I would like to confirm almost everything that was said in the article and in the responses about the order of digits etc. including the examples from the Bible and computer terminals in Arabic/Hebrew. There was a slight inaccuracy about the way numbers are read in Arabic: Only the units and tens are read the LSBian way, and the rest of the number is read the MSBian way. For example, the year 1984 is read: "one thousand nine hundreds four and eighty". Also, for those who don't know, the digit characters in Arabic are different from the Latin forms, but in Hebrew they are the same. The article was written in 1980, and things have changed since then. Six years are a lifetime in the world of computers, and sentences like: "I failed to find a Little-Endians' system which is totally consistent" cannot be left without an objection in 1986 (almost 1987). The Intel 80*86 micro processors are true, consistent LSBians. They do not have combined shift operations (the article suggested these as a good criterion to tell between LSBians and MSBians), but the multiply operation sure leaves the most significant part of the result in the high register, and the floating point format is totally consistent with the rest of the data types. The same is true for the National Series 32000 and I believe that Zilog is with the LSBians too. So in the micro processor area, it seems that Motorola 68000 is the only (though major...) MSBian around. Now, is it really as clean and pure MSBian as claimed in the article? Let me refresh your memory with a quote from the article: "Hence, the M68000 is a consistent Big-Endian, except for its bit designation, which is used to camouflage its true identity. Remember: the Big-Endians were the outlaws." The author did not try to claim that the funny floating point format of the VAX was to camouflage the VAX's true identity, so why should one believe that the LSBian bit order of the M68000 is because "the Big-Endians were the outlaws" ? I suspect that the true reason behind the inconsistency of the M68000 is the fact that only with an LSBian bit order, the value of bit number 'i' in a word is always equal to b[i] * 2^i (where b[i] is 0 or 1 according to bit number 'i', and 2^i is 2 to the power of i) and the designers of M68000 wanted to keep this important feature in spite of the overall MSBian architecture. There is another difference between LSBian and MSBian memory order that was not mentioned in the article. In the LSBian scheme, if a long-word in memory contains a small value, then a word or a byte in the same memory location still hold the same value (if the value is small enough to fit into these). For example, assume we have the value 0x00000002 in a (32 bit) long-word in memory address 100. LSB in lower address address 104 103 102 101 100 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that a long-word, short word, byte and even nibble at address 100, all contain value 2. On the other hand, MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ Note that only a long-word at address 100 contains 2. All the rest contain 0. This may not seem to be a key issue, but it has some significance in type conversion as illustrated by the following C program segment: /*=================================*/ int i; char ch; ch = i; /*=================================*/ The 'int' (assume int is 32 bits) value has to be converted to 'char' or byte. In LSBian, this conversion is just a simple 'movb' (move byte) instruction from 'i' to 'ch': movb i, ch since both byte and long-word contain the same value. In MSBian it may involve an expensive bit field instruction (or worse, shifts and ands). Luckily for the M68000, it is byte addressable, so the compiler can do the trick and generate: movb i+3, ch So it is still a simple machine instruction, but it involves a small trick. Not clean, but still consistent, as long as we stick to byte addressable memory. But what about registers? registers are not byte addressable. There is only one byte of a register that can be accessed by a 'movb' instruction. All the other 3 bytes can be accessed only through bit field instructions (or worse, shifts and ands). Let's look at another program segment: /*=================================*/ extern int fgetc(); char ch; ch = fgetc(file); /*=================================*/ The C library routine 'fgetc' returns an 'int' result and it has to be converted to 'char'. Most implementations return function results in register 0. Assume that register D0 contains 'int' (32 bits) value 2, and so does the long-word at address 100. MSB in lower address address 100 101 102 103 104 +----+----+----+----+ value | 00 | 00 | 00 | 02 | +----+----+----+----+ +----+----+----+----+ register D0 | 00 | 00 | 00 | 02 | +----+----+----+----+ The instructions movl 100,x movl D0,x both move a long-word containing value 0x00000002 to location 'x'. so movb 100,x movb D0,x both should move a byte containing value 00 to locaion 'x'. So the code generated for above program segment in a true consistent MSBian machine would be: jbsr fgets movl 24,d1 lsrl d1,d0 movb d0,ch But in M68000 this is not true. No shift operation is needed because a 'movb' instruction with a register operand takes the byte that contains 2, that is, the HIGH address bit, so the compiler can generate a jbsr fgets movb d0,ch In other words, we see that the byte/word/long-word overlap of registers in the M68000 is implemented according to the more efficient LSBian way!! Conclusion: ========== I have shown that there are two aspects in which the LSBian way is more suitable and more efficent for binary computers. Even an MSBian machine like M68000 is LSBian in these aspects. This is in addition to the argument of easier serial addition and multiplication that was mentioned in the article (though the latter is balanced, to some extent, by serial comparison and division). The main argument left against the LSBians is the more readable MSBian dump format. I think that in the modern days of optimizing compilers and symbolic debuggers, dumps are almost an extinct species, and please let them stay that way. I don't have any illusions. I don't expect Motorola to change their byte order after reading my article. I don't even expect users to prefer LSBian machines just for the sake of beauty and consistency. But I do hope that some day the LSBian method will prevail (or, maybe, someone will convince me of the superiority of the MSBian method...). Uri Postavsky (utcs!syntron!orcisi!urip) (currently with O.R.C Toronto, formerly with National Semiconductor Tel Aviv).