Sources on optimizing code for x86 segmented architecture? [message #393757] |
Sat, 25 April 2020 10:58 |
|
Originally posted by: Johann 'Myrkraverk' Oskarsson
Dear alt.folklore.computers,
I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
> Writing efficient segmented code is very tricky and has been well
> documented elsewhere.
The book has no references on the subject, and there's nothing in the
bibliography either, and by now such documents, if they still exist,
may be hard to find. This is in a chapter on the x86, and the writing
is about the 16bit architecture of it, mostly.
I am interested in retro programming every now and then, but mostly do
my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
interested in resources on efficient segmented code, if any still exist.
Are there any such books, articles, or documentation still available
somewhere? A quick web search does not yield any promising results.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
|
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393759 is a reply to message #393757] |
Sat, 25 April 2020 13:18 |
John Levine
Messages: 1405 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
> I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
>
>> Writing efficient segmented code is very tricky and has been well
>> documented elsewhere.
>
> The book has no references on the subject, and there's nothing in the
> bibliography either, and by now such documents, if they still exist,
> may be hard to find. This is in a chapter on the x86, and the writing
> is about the 16bit architecture of it, mostly.
Now of course I can't remember what I was referring to when I wrote
that 20 years ago.
I am not aware of any compiler optimizations specifically for
segmented address code. On the 286 segment loads were very slow,
even if you were reloading the same segment number into the same
segment register. I think the compilers of the time could generate
code with a single segment load for stuff like this if p is a long
pointer:
p->a = p->b;
p[i] = p[j];
but I don't think it could do much more than that.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393760 is a reply to message #393757] |
Sat, 25 April 2020 13:35 |
|
Originally posted by: Johann 'Myrkraverk' Oskarsson
On 26/04/2020 1:20 am, Andreas Kohlbach wrote:
> On Sat, 25 Apr 2020 22:58:44 +0800, Johann 'Myrkraverk' Oskarsson wrote:
>>
>> Dear alt.folklore.computers,
>>
>> I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
>>
>>> Writing efficient segmented code is very tricky and has been well
>>> documented elsewhere.
>>
>> The book has no references on the subject, and there's nothing in the
>> bibliography either, and by now such documents, if they still exist,
>> may be hard to find. This is in a chapter on the x86, and the writing
>> is about the 16bit architecture of it, mostly.
>>
>> I am interested in retro programming every now and then, but mostly do
>> my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
>> interested in resources on efficient segmented code, if any still exist.
>
> Consider to use an emulator, may be for the IBM 5150. The MAME emulator
> covers this if you have a "BIOS Rom" (mail me, if you want to use MAME and
> don't have this Rom).
>
>> Are there any such books, articles, or documentation still available
>> somewhere? A quick web search does not yield any promising results.
>
> In mu library I have a "iAPX_86_88_Users_Manual" in PDF format. Mail me,
> if you want this.
Mail sent.
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393761 is a reply to message #393759] |
Sat, 25 April 2020 13:39 |
Peter Flass
Messages: 8375 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
John Levine <johnl@taugh.com> wrote:
>> I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
>>
>>> Writing efficient segmented code is very tricky and has been well
>>> documented elsewhere.
>>
>> The book has no references on the subject, and there's nothing in the
>> bibliography either, and by now such documents, if they still exist,
>> may be hard to find. This is in a chapter on the x86, and the writing
>> is about the 16bit architecture of it, mostly.
>
> Now of course I can't remember what I was referring to when I wrote
> that 20 years ago.
>
> I am not aware of any compiler optimizations specifically for
> segmented address code. On the 286 segment loads were very slow,
> even if you were reloading the same segment number into the same
> segment register. I think the compilers of the time could generate
> code with a single segment load for stuff like this if p is a long
> pointer:
> p->a = p->b;
>
> p[i] = p[j];
>
> but I don't think it could do much more than that.
>
Obviously organizing your code to avoid cross-segment references would
help.
--
Pete
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393762 is a reply to message #393761] |
Sat, 25 April 2020 18:00 |
John Levine
Messages: 1405 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
In article <2131518304.609529099.964612.peter_flass-yahoo.com@news.eternal-september.org>,
Peter Flass <peter_flass@yahoo.com> wrote:
> John Levine <johnl@taugh.com> wrote:
>>> I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
>>>
>>>> Writing efficient segmented code is very tricky and has been well
>>>> documented elsewhere. ...
> Obviously organizing your code to avoid cross-segment references would
> help.
Oh, sure. Each source file would turn into a module within which all
of the routines could call each other with fast "near" calls, while
inter-routine calls used "far" calls. I think with some effort it was
possible to tell the linker to combine the code from several object
modules into single code and data segments.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393763 is a reply to message #393760] |
Sun, 26 April 2020 15:38 |
|
Originally posted by: Kerr-Mudd,John
On Sat, 25 Apr 2020 17:35:37 GMT, Johann 'Myrkraverk' Oskarsson
<johann@myrkraverk.invalid> wrote:
> On 26/04/2020 1:20 am, Andreas Kohlbach wrote:
>> On Sat, 25 Apr 2020 22:58:44 +0800, Johann 'Myrkraverk' Oskarsson
>> wrote:
>>>
>>> Dear alt.folklore.computers,
>>>
>>> I came across this sentence in Linkers & Loaders, by John Levine, p.
>>> 42.
>>>
>>>> Writing efficient segmented code is very tricky and has been well
>>>> documented elsewhere.
>>>
>>> The book has no references on the subject, and there's nothing in
>>> the bibliography either, and by now such documents, if they still
>>> exist, may be hard to find. This is in a chapter on the x86, and
>>> the writing is about the 16bit architecture of it, mostly.
>>>
>>> I am interested in retro programming every now and then, but mostly
>>> do my code for 32bit extended DOS to run in DOSBox. Yet, I find
>>> myself interested in resources on efficient segmented code, if any
>>> still exist.
>>
>> Consider to use an emulator, may be for the IBM 5150. The MAME
>> emulator covers this if you have a "BIOS Rom" (mail me, if you want
>> to use MAME and don't have this Rom).
>>
>>> Are there any such books, articles, or documentation still available
>>> somewhere? A quick web search does not yield any promising results.
>>
>> In mu library I have a "iAPX_86_88_Users_Manual" in PDF format. Mail
>> me, if you want this.
>
> Mail sent.
>
Available on line from good ol' BitSavers!
https://archive.org/details/bitsavers_inteldataBrsManual_570 11881
--
Bah, and indeed, Humbug.
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393771 is a reply to message #393757] |
Mon, 27 April 2020 12:12 |
scott
Messages: 4237 Registered: February 2012
Karma: 0
|
Senior Member |
|
|
Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> writes:
> Dear alt.folklore.computers,
>
> I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
>
>> Writing efficient segmented code is very tricky and has been well
>> documented elsewhere.
>
> The book has no references on the subject, and there's nothing in the
> bibliography either, and by now such documents, if they still exist,
> may be hard to find. This is in a chapter on the x86, and the writing
> is about the 16bit architecture of it, mostly.
>
> I am interested in retro programming every now and then, but mostly do
> my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
> interested in resources on efficient segmented code, if any still exist.
Historically, leaving aside the 8086, there were a wide variety of
segmented systems running; The Burroughs machines (both large systems
and medium systems), the HP 3000, et alia.
You may start looking at older architectures documented at bitsavers.org.
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393951 is a reply to message #393757] |
Mon, 04 May 2020 13:41 |
|
Originally posted by: timcaffrey420
On Saturday, April 25, 2020 at 10:59:11 AM UTC-4, Johann 'Myrkraverk' Oskarsson wrote:
> Dear alt.folklore.computers,
>
> I came across this sentence in Linkers & Loaders, by John Levine, p. 42.
>
>> Writing efficient segmented code is very tricky and has been well
>> documented elsewhere.
>
> The book has no references on the subject, and there's nothing in the
> bibliography either, and by now such documents, if they still exist,
> may be hard to find. This is in a chapter on the x86, and the writing
> is about the 16bit architecture of it, mostly.
>
> I am interested in retro programming every now and then, but mostly do
> my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
> interested in resources on efficient segmented code, if any still exist.
>
> Are there any such books, articles, or documentation still available
> somewhere? A quick web search does not yield any promising results.
>
> --
> Johann | email: invalid -> com | www.myrkraverk.com/blog/
> I'm not from the Internet, I just work there. | twitter: @myrkraverk
There different approaches to optimizing code for segments based on:
1) Memory model (tiny, small, medium, large, huge)
2) execution mode (real, protected).
So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC, between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different. Most big programs were "large", "huge" was fairly rare in practice.
The execution mode changed how expensive it was to do a segment load, so for instance doing:
LES BX,[BP+4]
PUSH ES
PUSH BX
was fairly efficient in real mode, was overly expensive in protected mode. If you were not going to use ES:BX to address something, it was MUCH more efficient to do:
PUSH [BP+4]
PUSH [BP+6]
In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
I worked on a version of the MS Pascal compiler (outside of Microsoft) that we used for cross-compiling. It was created in the days of the 8086, so had absolutely no optimizations related to protected mode. I added peep-hole optimization step to basically remove unneeded (re)loads of the ES register, partially by tracking register usage, and also by using the method mentioned above (pushing addresses). If you had a bit of code that was (my Pascal is rusty, and this example is a bit contrived):
RECORD myrec BEGIN
somestuff : INTEGER^;
morestuff : INTEGER^;
END;
PROCEDURE anotherProc(int1:INTEGER^, int2:INTEGER^)
BEGIN
int1^ = int1^ + int2^
END
PROCEDURE myproc(VAR data_in: myrec)
BEGIN
anotherProc(data_in^.somestuff, data_in^.morestuff)
END;
For myproc would originally generate something like:
LES BX,[BP+data_in]
LES BX,ES:[BX+somestuff]
PUSH ES
PUSH BX
LES BX[BP+data_in]
LES BX,ES:[BX+morestuff]
PUSH ES
PUSH BX
CALL anotherProc
The first pass would remove the unnecessary loads of ES:BX:
LES BX,[BP+data_in]
PUSH ES:[BX+somestuff+2]
PUSH ES:[BX+somestuff]
LES BX,[BP+data_in]
PUSH ES:[BX+morestuff+2]
PUSH ES:[BX+morestuff]
CALL anotherProc
The second pass removes the redundant load:
LES BX,[BP+data_in]
PUSH ES:[BX+somestuff+2]
PUSH ES:[BX+somestuff]
PUSH ES:[BX+morestuff+2]
PUSH ES:[BX+morestuff]
CALL anotherProc
The result was the code was about 5% smaller, and about 15% faster.
- Tim
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393952 is a reply to message #393951] |
Mon, 04 May 2020 16:30 |
John Levine
Messages: 1405 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
In article <5bb7b4cb-616f-45f0-b3b1-d9ae0264078e@googlegroups.com>,
<timcaffrey420@gmail.com> wrote:
> So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC,
> between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different.
> Most big programs were "large", "huge" was fairly rare in practice.
For medium model code, we did a fair amount of fiddling with our code
organization so routines that frequently called each other were in the
same segment and could use short calls and returns.
I agree that huge model was rare, the code was really slow and on PCs
it wasn't common to have large flat data structures that needed it.
> In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing
> was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
Oh, gross. Clever, but gross.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393953 is a reply to message #393952] |
Mon, 04 May 2020 17:22 |
scott
Messages: 4237 Registered: February 2012
Karma: 0
|
Senior Member |
|
|
John Levine <johnl@taugh.com> writes:
> In article <5bb7b4cb-616f-45f0-b3b1-d9ae0264078e@googlegroups.com>,
> <timcaffrey420@gmail.com> wrote:
>> So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC,
>> between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different.
>> Most big programs were "large", "huge" was fairly rare in practice.
>
> For medium model code, we did a fair amount of fiddling with our code
> organization so routines that frequently called each other were in the
> same segment and could use short calls and returns.
>
> I agree that huge model was rare, the code was really slow and on PCs
> it wasn't common to have large flat data structures that needed it.
>
>> In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing
>> was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
>
> Oh, gross. Clever, but gross.
In modern times, %fs is used as a base register for thread-local data, and %gs
is used by the kernel for kernel 'thread' specific data.
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #393954 is a reply to message #393952] |
Mon, 04 May 2020 18:49 |
|
Originally posted by: timcaffrey420
On Monday, May 4, 2020 at 4:30:56 PM UTC-4, John Levine wrote:
> In article <5bb7b4cb-616f-45f0-b3b1-d9ae0264078e@googlegroups.com>,
> <timcaffrey420@grip.com> wrote:
>> So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC,
>> between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different.
>> Most big programs were "large", "huge" was fairly rare in practice.
>
> For medium model code, we did a fair amount of fiddling with our code
> organization so routines that frequently called each other were in the
> same segment and could use short calls and returns.
>
> I agree that huge model was rare, the code was really slow and on PCs
> it wasn't common to have large flat data structures that needed it.
>
>> In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing
>> was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
>
> Oh, gross. Clever, but gross.
>
It was a library to draw lines in graphics mode for printers, in color.
It could handle almost any printer (including daisy wheels), but it only
drew lines. It required 140K of data memory for the highest resolution/widest paper printers that users were likely to have (24 pin wide color printers).
It also optimized the print out to skip over as much whitespace as possible (something Microsoft Windows Dot matrix drivers didn't bother with).
Since memory was at a premium, it needed to take up as little memory as possible (8K for the code). Sacrifices Had To Be Made! :)
- Tim
|
|
|
Re: Sources on optimizing code for x86 segmented architecture? [message #394131 is a reply to message #393951] |
Wed, 06 May 2020 09:51 |
|
Originally posted by: timcaffrey420
On Monday, May 4, 2020 at 1:42:00 PM UTC-4, timcaf...@gmail.com wrote:
> On Saturday, April 25, 2020 at 10:59:11 AM UTC-4, Johann 'Myrkraverk' Oskarsson wrote:
>> Dear alt.folklore.computers,
>>
>> I came across this sentence in Linkers & Loaders, by John Levine, p. 42..
>>
>>> Writing efficient segmented code is very tricky and has been well
>>> documented elsewhere.
>>
>> The book has no references on the subject, and there's nothing in the
>> bibliography either, and by now such documents, if they still exist,
>> may be hard to find. This is in a chapter on the x86, and the writing
>> is about the 16bit architecture of it, mostly.
>>
>> I am interested in retro programming every now and then, but mostly do
>> my code for 32bit extended DOS to run in DOSBox. Yet, I find myself
>> interested in resources on efficient segmented code, if any still exist..
>>
>> Are there any such books, articles, or documentation still available
>> somewhere? A quick web search does not yield any promising results.
>>
>> --
>> Johann | email: invalid -> com | www.myrkraverk.com/blog/
>> I'm not from the Internet, I just work there. | twitter: @myrkraverk
>
> There different approaches to optimizing code for segments based on:
> 1) Memory model (tiny, small, medium, large, huge)
> 2) execution mode (real, protected).
>
> So, tiny, small (and I think medium) memory models don't count (mostly) because you are not really dealing with segments. The difference, IIRC, between large & huge, is how you treat the stack segment. For large, the default segment & the stack are the same, for huge they are different. Most big programs were "large", "huge" was fairly rare in practice.
>
> The execution mode changed how expensive it was to do a segment load, so for instance doing:
> LES BX,[BP+4]
> PUSH ES
> PUSH BX
>
> was fairly efficient in real mode, was overly expensive in protected mode.. If you were not going to use ES:BX to address something, it was MUCH more efficient to do:
> PUSH [BP+4]
> PUSH [BP+6]
>
> In real mode, one trick I used was to use the ES register as another base register, which worked great as long as the object I was addressing was aligned on a 16 byte boundary. For protected mode, this was a bad idea (oh well).
>
> I worked on a version of the MS Pascal compiler (outside of Microsoft) that we used for cross-compiling. It was created in the days of the 8086, so had absolutely no optimizations related to protected mode. I added peep-hole optimization step to basically remove unneeded (re)loads of the ES register, partially by tracking register usage, and also by using the method mentioned above (pushing addresses). If you had a bit of code that was (my Pascal is rusty, and this example is a bit contrived):
>
> RECORD myrec BEGIN
> somestuff : INTEGER^;
> morestuff : INTEGER^;
> END;
>
> PROCEDURE anotherProc(int1:INTEGER^, int2:INTEGER^)
> BEGIN
> int1^ = int1^ + int2^
> END
> PROCEDURE myproc(VAR data_in: myrec)
> BEGIN
> anotherProc(data_in^.somestuff, data_in^.morestuff)
> END;
>
> For myproc would originally generate something like:
>
> LES BX,[BP+data_in]
> LES BX,ES:[BX+somestuff]
> PUSH ES
> PUSH BX
> LES BX[BP+data_in]
> LES BX,ES:[BX+morestuff]
> PUSH ES
> PUSH BX
> CALL anotherProc
>
> The first pass would remove the unnecessary loads of ES:BX:
>
> LES BX,[BP+data_in]
> PUSH ES:[BX+somestuff+2]
> PUSH ES:[BX+somestuff]
> LES BX,[BP+data_in]
> PUSH ES:[BX+morestuff+2]
> PUSH ES:[BX+morestuff]
> CALL anotherProc
>
> The second pass removes the redundant load:
> LES BX,[BP+data_in]
> PUSH ES:[BX+somestuff+2]
> PUSH ES:[BX+somestuff]
> PUSH ES:[BX+morestuff+2]
> PUSH ES:[BX+morestuff]
> CALL anotherProc
>
> The result was the code was about 5% smaller, and about 15% faster.
>
> - Tim
I thought to also mention that since the Stack Segment and the Default Segment are the same with Large memory model, if you need to point to two segments you can use DS & ES, and if you need to address something in the global data segment just use an SS override.
- Tim
|
|
|