Apple II Benchmarking ? [message #262936] |
Wed, 30 July 2014 19:10 |
oz390gta
Messages: 71 Registered: March 2013
Karma: 0
|
Member |
|
|
I have a number of Apple Iic, IIc+, IIgs which are either stock or with zip chips, ZipGS and Transwarp GS. I want to do some performance comparisons. Is there any benchmarking software for the Apple II? I have never seen any.
0z390gta
|
|
|
Re: Apple II Benchmarking ? [message #262937 is a reply to message #262936] |
Wed, 30 July 2014 20:01 |
mdj
Messages: 301 Registered: December 2012
Karma: 0
|
Senior Member |
|
|
On Thursday, 31 July 2014 09:10:32 UTC+10, oz390gta wrote:
> I have a number of Apple Iic, IIc+, IIgs which are either stock or with zip chips, ZipGS and Transwarp GS. I want to do some performance comparisons. Is there any benchmarking software for the Apple II? I have never seen any.
Dave Schmenck recently posted a Sieve of Eratosthenes implementation in his PLASMA virtual machine that should be quite interesting to see results for across various Apple II acceleration architectures.
The 'Sieve' is one of many almost universal benchmarks. You'd have to do the timing of it yourself, though adding some time calculations to it wouldn't be hard baring in mind that a IIc or IIc+ is quite unlikely to have a real time clock.
Matt
|
|
|
|
Re: Apple II Benchmarking ? [message #263572 is a reply to message #262936] |
Wed, 06 August 2014 13:12 |
Egan Ford
Messages: 304 Registered: October 2012
Karma: 0
|
Senior Member |
|
|
On 7/30/14, 5:10 PM, oz390gta wrote:
> Is there any benchmarking software for the Apple II?
I wrote a Pi to 1000 digits benchmark in assembly I can send you. It
takes long enough that you can time with a wall clock (single pass).
IIe, ~114 seconds @ 1MHz
IIgs (6502 code), ~44 seconds @ ~2.6MHz
IIgs (65816 code), ~35 seconds @ ~2.6MHz
I would like to get a IIc+ run. I have two versions, 1000 and 10000
digits. The 1000 digit version needs ~4K (code+data). That should run
in the IIc+ accelerated 8K cache. The 10000 digit version needs just
under 20K, that should make the IIc+ run more interesting.
10000 digits takes 100x longer than 1000 digits (O(n^2)). There is no
need to run 10000 digit benchmarks on the IIe or IIgs since memory
access time is uniform.
|
|
|
Re: Apple II Benchmarking ? [message #263594 is a reply to message #263572] |
Wed, 06 August 2014 16:48 |
bpiltz
Messages: 78 Registered: October 2012
Karma: 0
|
Member |
|
|
On Wednesday, August 6, 2014 10:12:54 AM UTC-7, Egan Ford wrote:
> On 7/30/14, 5:10 PM, oz390gta wrote:
>
>> Is there any benchmarking software for the Apple II?
>
>
>
> I wrote a Pi to 1000 digits benchmark in assembly I can send you. It
>
> takes long enough that you can time with a wall clock (single pass).
>
>
>
> IIe, ~114 seconds @ 1MHz
>
> IIgs (6502 code), ~44 seconds @ ~2.6MHz
>
> IIgs (65816 code), ~35 seconds @ ~2.6MHz
>
>
>
> I would like to get a IIc+ run. I have two versions, 1000 and 10000
>
> digits. The 1000 digit version needs ~4K (code+data). That should run
>
> in the IIc+ accelerated 8K cache. The 10000 digit version needs just
>
> under 20K, that should make the IIc+ run more interesting.
>
>
>
> 10000 digits takes 100x longer than 1000 digits (O(n^2)). There is no
>
> need to run 10000 digit benchmarks on the IIe or IIgs since memory
>
> access time is uniform.
I would love to test your pi code, both the 1000 and 10000 digit version. What is the max # of digits you can calculate of pi in 48k? 128k? Perhaps as many as 25000 digits? If you could send me your program, I'd greatly appreciate, or just post it here.
|
|
|
Re: Apple II Benchmarking ? [message #263603 is a reply to message #263594] |
Wed, 06 August 2014 19:13 |
Egan Ford
Messages: 304 Registered: October 2012
Karma: 0
|
Senior Member |
|
|
On 8/6/14, 2:48 PM, bpiltz@gmail.com wrote:
>What is the max # of digits you can calculate of pi in 48k? 128k?
>Perhaps as many as 25000 digits?
The number of bytes you need is ~ ceil(num_digits / log(256)) per array
and my code needs 4 arrays. The memory must be contiguous for each
array. For 25000 digits you'd need 4 x ~10382 contiguous bytes of
memory. I'll have to check a memory map, but if $800 - $BFFF is free,
well then it should fit.
I just reran 1000 digits in 111 seconds (I guess my 114 seconds was an
old timing or a typo in my spreadsheet). 25000 digits will take
111*(25000/1000)^2 = 69375 seconds (~19.3 hr) @ 1 MHz.
128K will require swapping arrays in and out. But very doable and would
have minimal impact since moving arrays will be relatively infrequent.
~48K*2 would make more sense. 50000+ could fit. 19.3*4 hours however.
> or just post it here.
http://asciiexpress.net/files/6502
I uploaded a variable version where you pick the range from 100-10000.
It could be modified to support more.
Source included.
|
|
|
|
Re: Apple II Benchmarking ? [message #263682 is a reply to message #263603] |
Fri, 08 August 2014 11:16 |
Egan Ford
Messages: 304 Registered: October 2012
Karma: 0
|
Senior Member |
|
|
On 8/6/14, 5:13 PM, Egan Ford wrote:
> I uploaded a variable version where you pick the range from 100-10000.
> It could be modified to support more.
I uploaded a new disk image (source included) with a max of 22950 digits
supported. I was going for 25000 but found a bug that prevents more
than 22950 digits (My div16 code does not support divisors >
32767--common error with div16 code when optimizing for speed.). I'll
have to decide later if I really want to change it and what the impact
will be.
http://asciiexpress.net/files/6502
BTW, all my 8-bit (e.g. 6800, 6809, z80, etc...) versions have this bug,
but none of the 16 or 32 bit versions. I uploaded 10K (3400 seconds)
and 25K versions (simulation tested only, expect 21250 seconds) of the
IIgs 65816 native code as well to:
http://asciiexpress.net/files/65816
They can be easily changed to support ~46000 digits. Anything beyond
that will require a 48bit/24bit div24 routine.
|
|
|
Re: Apple II Benchmarking ? [message #263727 is a reply to message #263682] |
Fri, 08 August 2014 16:33 |
bpiltz
Messages: 78 Registered: October 2012
Karma: 0
|
Member |
|
|
On Friday, August 8, 2014 8:16:29 AM UTC-7, Egan Ford wrote:
> On 8/6/14, 5:13 PM, Egan Ford wrote:
>
>> I uploaded a variable version where you pick the range from 100-10000.
>
>> It could be modified to support more.
>
>
>
> I uploaded a new disk image (source included) with a max of 22950 digits
>
> supported. I was going for 25000 but found a bug that prevents more
>
> than 22950 digits (My div16 code does not support divisors >
>
> 32767--common error with div16 code when optimizing for speed.). I'll
>
> have to decide later if I really want to change it and what the impact
>
> will be.
>
>
>
> http://asciiexpress.net/files/6502
>
>
>
> BTW, all my 8-bit (e.g. 6800, 6809, z80, etc...) versions have this bug,
>
> but none of the 16 or 32 bit versions. I uploaded 10K (3400 seconds)
>
> and 25K versions (simulation tested only, expect 21250 seconds) of the
>
> IIgs 65816 native code as well to:
>
>
>
> http://asciiexpress.net/files/65816
>
>
>
> They can be easily changed to support ~46000 digits. Anything beyond
>
> that will require a 48bit/24bit div24 routine.
In testing, I came upon this bug! I tried 23000 digits, no go, but 22900 worked without problems. There was still some available memory in the $Bxxx range, so I wondered why I couldn't get it to go up to 25000+. Now I know.
Incidently, I modified the program to not print a space between every ten digits nor a crlf + 2 spaces at the end of 7 blocks of ten digits. The display is much cleaner that way, with a contiguous printout. Both the e and pi calculations verify correctly to 22900 digits using external pi calculation programs on the PC.
Also, when you query for a slot to Print (0 to 7), or enter 0 or return for no printout, I at first thought *any* screen or printer display would be suppressed, but the program still prints out to the screen (pr#0). From a purely benchmarking standard, wouldn't it make sense to implement an option to suppress any digit reporting entirely, eg calculate 10000 digits internally, do not print anything out, give elapsed time if a clock is present in system.
|
|
|
Re: Apple II Benchmarking ? [message #263755 is a reply to message #263727] |
Fri, 08 August 2014 18:54 |
Egan Ford
Messages: 304 Registered: October 2012
Karma: 0
|
Senior Member |
|
|
On 8/8/14, 2:33 PM, bpiltz@gmail.com wrote:
> In testing, I came upon this bug! I tried 23000 digits, no go, but 22900 worked without problems. There was still some available memory in the $Bxxx range, so I wondered why I couldn't get it to go up to 25000+. Now I know.
When I started on this project two years ago I knew that someday it'd be
a problem. :-) When I rediscovered the bug, 22950, seem vaguely
familiar and it took me an hour of researching my notes and code to
remember the cause.
> Incidently, I modified the program to not print a space between every ten digits nor a crlf + 2 spaces at the end of 7 blocks of ten digits. The display is much cleaner that way, with a contiguous printout. Both the e and pi calculations verify correctly to 22900 digits using external pi calculation programs on the PC.
That's how the code started. All my other versions have no fancy
output. Look at apple1pi.s in the same download directory.
I test all my benchmarks in simulators that output to standard out. To
compare byte for bye with tkdiff I use the following for the "fancy" print:
calc -d22950 pi | perl -pi -e 's/([0-9]{10})/$1 /g' | perl -pi -e
's/(([0-9]{10} ){7})/$1\n /g'
> Also, when you query for a slot to Print (0 to 7), or enter 0 or return for no printout, I at first thought*any* screen or printer display would be suppressed, but the program still prints out to the screen (pr#0). From a purely benchmarking standard, wouldn't it make sense to implement an option to suppress any digit reporting entirely, eg calculate 10000 digits internally, do not print anything out, give elapsed time if a clock is present in system.
The printout was added for any that really wanted a printout. If I ever
do a 50000 version I'll probably print it out for fun and give the user
80 or 132 columns.
As for a no print to screen feature, I do that manually now, When I
just want to time the computation, including conversion to base 10 I
just comment out prbyte.
In the article I've been working for this, only 1000 digits are being
printed for each micro (slowest: 8008, 1514 seconds, fastest: 8088, 15
seconds, the 68008 IIRC was actually the fastest, but I cannot find the
time, I'll have to rerun, I think it was 12 seconds), the overhead of
printing is not that great to skew the results that much. Lastly this
benchmark, unlike synthetic benchmarks that do no real work, was
selected to test both performance and accuracy. Printing was required
for visual confirmation. I guess I could store a type of chksum and use
that to verify.
|
|
|
Re: Apple II Benchmarking ? [message #263776 is a reply to message #263682] |
Sat, 09 August 2014 05:53 |
|
Originally posted by: Denis Molony
On Saturday, 9 August 2014 01:16:29 UTC+10, Egan Ford wrote:
> On 8/6/14, 5:13 PM, Egan Ford wrote:
>
> I uploaded a new disk image (source included) with a max of 22950 digits
>
> supported.
Did you know that the apple2pi.dsk is (almost) empty?
|
|
|
Re: Apple II Benchmarking ? [message #263784 is a reply to message #263776] |
Sat, 09 August 2014 10:04 |
Egan Ford
Messages: 304 Registered: October 2012
Karma: 0
|
Senior Member |
|
|
On 8/9/14, 3:53 AM, Denis Molony wrote:
> Did you know that the apple2pi.dsk is (almost) empty?
It's not a standard DOS disk, I used c2d
(http://asciiexpress.net/files/c2d-0.1.zip) to create a disk that just
loads up apple2pi and runs, e.g.:
$ c2d apple2pi,800 apple2pi.dsk
output:
Reading apple2pi, type BINARY, start: $0800, length: 5785
Number of sectors: 23
Sector page range: $08 - $1E
After boot, jump to: $0800
Writing apple2pi to T:01/S:00 - T:02/S:06 on apple2pi.dsk
If you just want the apple2pi binary to place on a DOS disk, then
download this instead:
http://asciiexpress.net/files/6502/apple2pi
|
|
|