Megalextoria
Retro computing and gaming, sci-fi books, tv and movies and other geeky stuff.

Home » Digital Archaeology » Computer Arcana » Apple » Apple II » Counting from 1 to 1,000,000 on the FASTChip
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Counting from 1 to 1,000,000 on the FASTChip [message #354797] Fri, 20 October 2017 04:37 Go to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
Mysteries of the deep. I've got my spanking new FASTchip at a whopping 16.6 Mhz and my trusty Apple //e does not count to a million any faster than before!

Simple question, can anyone provide the quickest native machine language routine equivalent to

FOR I = 1 TO 1E6: PRINT I: NEXT

Of course a POKE 34,34 to turn off line feed will speed this up.

Let's make it a contest, like the recent HAT plotting routine, which I think someone got down to 59 seconds on a stock //e. I admit that my native machine language ability is so rusty that I can't remember how to code this simple "Hello world! Let's count to a million!" task.

The BASIC routines seem to be totally unaffected by the boost in speed. All BASIC code seems to run at 1 Mhz. Even compiled programs with Beagle Compiler only run about twice as fast as on a stock machine, when the speed is set at 16.6 Mhz.

All my binary games, even the LZMA decompression code to unpack those compressed games runs *lightning* fast compared to a stock 1.023 Mhz.

Any advice on getting and / or compiling Applesoft code to make better use (ie, scale in speed better) of the new FastChip?
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355002 is a reply to message #354797] Fri, 20 October 2017 10:37 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: John Brooks

On Friday, October 20, 2017 at 1:37:02 AM UTC-7, bpi...@gmail.com wrote:
> Mysteries of the deep. I've got my spanking new FASTchip at a whopping 16..6 Mhz and my trusty Apple //e does not count to a million any faster than before!
>
> Simple question, can anyone provide the quickest native machine language routine equivalent to
>
> FOR I = 1 TO 1E6: PRINT I: NEXT
>
> Of course a POKE 34,34 to turn off line feed will speed this up.
>
> Let's make it a contest, like the recent HAT plotting routine, which I think someone got down to 59 seconds on a stock //e. I admit that my native machine language ability is so rusty that I can't remember how to code this simple "Hello world! Let's count to a million!" task.
>
> The BASIC routines seem to be totally unaffected by the boost in speed. All BASIC code seems to run at 1 Mhz. Even compiled programs with Beagle Compiler only run about twice as fast as on a stock machine, when the speed is set at 16.6 Mhz.
>
> All my binary games, even the LZMA decompression code to unpack those compressed games runs *lightning* fast compared to a stock 1.023 Mhz.
>
> Any advice on getting and / or compiling Applesoft code to make better use (ie, scale in speed better) of the new FastChip?

The 'slowdown' is in your example code is the print, which has to write to (and scroll) 1MHz video memory, effectively disabling the FASTChip during the PRINT.

Try this:

PRINT CHR$(7) : FOR I = 1 to 1E6 : NEXT : PRINT CHR$(7)

Use a stopwatch to measure the time between beeps at both 1MHz and 16MHz.

-JB
@JBrooksBSI
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355018 is a reply to message #354797] Fri, 20 October 2017 15:04 Go to previous messageGo to next message
scott is currently offline  scott
Messages: 4237
Registered: February 2012
Karma: 0
Senior Member
In article <f6a6cd5a-7366-4b7e-8aef-86bacf0e7947@googlegroups.com>,
<bpiltz@gmail.com> wrote:
> Mysteries of the deep. I've got my spanking new FASTchip at a whopping
> 16.6 Mhz and my trusty Apple //e does not count to a million any faster
> than before!
>
> Simple question, can anyone provide the quickest native machine language
> routine equivalent to
>
> FOR I = 1 TO 1E6: PRINT I: NEXT

A few minutes' banging into the mini-assembler in AppleWin yielded this:

0300 A9 00 LDA #0
0302 A2 00 LDX #0
0304 A0 00 LDY #0
0306 C8 INY
0307 D0 06 BNE $30F
0309 E8 INX
030A D0 03 BNE $30F
030C 18 CLC
030D 69 01 ADC #1
030F C0 40 CPY #$40
0311 D0 F3 BNE $306
0313 E0 42 CPX #$42
0315 D0 EF BNE $306
0317 C9 0F CMP #$F
0319 D0 EB BNE $306
031B 60 RTS

The count is maintained in the three registers, with the least-significant
byte in the Y register and the most-significant byte in the accumulator.
Since there's no increment-accumulator instruction, we have to fake it with
an add, which takes two instructions...perhaps the carry flag never changes
and it could be moved out of the loop? The loop counts up from 0 to end at
$F4240...oops, that's one more iteration.

One more consideration: I just realized your loop prints each iteration.
Mine doesn't. Since I use the registers instead of memory, I'd have to save
the registers to the stack and call a Monitor routine (the entry point of
which I don't recall offhand) that prints hexadecimal numbers.

With AppleWin running at an emulated 1 MHz, this loop takes 11 seconds to
run.

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( https://alfter.us/ Top-posting!
\_^_/ >What's the most annoying thing on Usenet?
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355019 is a reply to message #355018] Fri, 20 October 2017 15:13 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Friday, October 20, 2017 at 9:05:00 PM UTC+2, Scott Alfter wrote:
>
> With AppleWin running at an emulated 1 MHz, this loop takes 11 seconds to
> run.
>

LOL, nowadays in a browser that takes 8ms !!!

(function (i,t) {t=Date.now(); while(i<1e6) i+=1; alert(i+ ' -> '+ (Date.now()-t)+ 'ms') })(0)

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355022 is a reply to message #355019] Fri, 20 October 2017 15:47 Go to previous messageGo to next message
anthonypaulo is currently offline  anthonypaulo
Messages: 531
Registered: September 2013
Karma: 0
Senior Member
No fair, the browser is taking advantage of the fact that 'i' is a 32-bit value and can do the addition natively on a single register, the Apple ii needs theee registers. Change the JavaScript code to use three byte counters and watch it slow down to a go-out-and-get-some-coffee 32ms! *note: timing just a guess
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355023 is a reply to message #355018] Fri, 20 October 2017 15:51 Go to previous messageGo to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
..
>
> One more consideration: I just realized your loop prints each iteration.
> Mine doesn't. Since I use the registers instead of memory, I'd have to save
> the registers to the stack and call a Monitor routine (the entry point of
> which I don't recall offhand) that prints hexadecimal numbers.
>
> With AppleWin running at an emulated 1 MHz, this loop takes 11 seconds to
> run.

Thanks for some insight into the problem. So, the PRINT is what slows my loop down? Granted, the video memory is not sped up, what is the ML way to "PRINT", and will putting that routine into your code above improve in any significant way displaying the 1 to 1E6 counting loop on the screen on the FASTChip? If scrolling is turned off, the loop should complete much faster, I think.

I had thought the reason Applesoft code is not sped up on the FastChip has to do with using the built-in ROM routines, and I (perhaps erroneously) thought that the FastChip does not speed those rountines in $D000 to $FFFF up, by design. That is why I asked if there's a simple ML program to print from 1 to 1E6 on screen.

On that note, will programs like COMPACT, RENUMBER, and similar utilities speed up intrepreted code on the FastCHIP. Which compiler, if any, will have the greatest effect?
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355024 is a reply to message #355023] Fri, 20 October 2017 16:14 Go to previous messageGo to next message
scott is currently offline  scott
Messages: 4237
Registered: February 2012
Karma: 0
Senior Member
In article <493b3e95-915d-469e-b033-0e043f70519f@googlegroups.com>,
<bpiltz@gmail.com> wrote:
> I had thought the reason Applesoft code is not sped up on the FastChip
> has to do with using the built-in ROM routines, and I (perhaps
> erroneously) thought that the FastChip does not speed those rountines in
> $D000 to $FFFF up, by design. That is why I asked if there's a simple ML
> program to print from 1 to 1E6 on screen.

I can't imagine that those ROM routines aren't being sped up. What's
imposing an upper limit on the visible speed increase is the accelerator
having to push updates to screen memory across an expansion bus that runs at
1 MHz. Scrolling the screen imposes a particularly heavy hit, as the bottom
23 lines all have to be copied to the line above; while reads could be
serviced from a cache running at full speed, all those writes will trigger a
slowdown.

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( https://alfter.us/ Top-posting!
\_^_/ >What's the most annoying thing on Usenet?
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355025 is a reply to message #355022] Fri, 20 October 2017 16:18 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Friday, October 20, 2017 at 9:47:17 PM UTC+2, Anthony Ortiz wrote:
> No fair, the browser is taking advantage of the fact that 'i' is a 32-bit value and can do the addition natively on a single register, the Apple ii needs theee registers. Change the JavaScript code to use three byte counters and watch it slow down to a go-out-and-get-some-coffee 32ms! *note: timing just a guess

True, let's compare apples to apples, assembly vs a high level language isn't fair!

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355026 is a reply to message #355025] Fri, 20 October 2017 16:33 Go to previous messageGo to next message
anthonypaulo is currently offline  anthonypaulo
Messages: 531
Registered: September 2013
Karma: 0
Senior Member
Apples to apples is correct; ultimately that high level code gets compiled to machine language by a compiler that can often (but not always) do a better job than we can, and your JavaScript to machine language instructions do the calcs on a single register. Change it to three registers and it will be more of an apples to apples comparison. :)
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355030 is a reply to message #355024] Fri, 20 October 2017 16:50 Go to previous messageGo to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
On Friday, October 20, 2017 at 1:14:15 PM UTC-7, Scott Alfter wrote:
> In article <493b3e95-915d-469e-b033-0e043f70519f@googlegroups.com>,

>> I had thought the reason Applesoft code is not sped up on the FastChip
>> has to do with using the built-in ROM routines, and I (perhaps
>> erroneously) thought that the FastChip does not speed those rountines in
>> $D000 to $FFFF up, by design. That is why I asked if there's a simple ML
>> program to print from 1 to 1E6 on screen.
>
> I can't imagine that those ROM routines aren't being sped up. What's
> imposing an upper limit on the visible speed increase is the accelerator
> having to push updates to screen memory across an expansion bus that runs at
> 1 MHz. Scrolling the screen imposes a particularly heavy hit, as the bottom
> 23 lines all have to be copied to the line above; while reads could be
> serviced from a cache running at full speed, all those writes will trigger a
> slowdown.
>

Alright, if video scrolling and refresh occurring at 1 Mhz was the whole story, why does the following INTEGER BASIC routine complete 3-4x faster than its equilvalent Applesoft one?

10 POKE 34,34
20 FOR I = 1 TO 32767
30 PRINT I
40 NEXT I

Now, I know IB is operating in integers only, but does this alone explain why the numbers 1 to 32767 can be printed 4x faster than in Applesoft intrepreted mode? It seems the ML routines of Integer BASIC are *much faster* and / or more efficient than those in Applesoft. This I've known academically for 40 years, but the simple counting loop proves this beyond a doubt. How do we count by Intergers only in Floating Point Applesoft BASIC? And how to do this Machine language?
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355031 is a reply to message #355030] Fri, 20 October 2017 17:01 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Friday, October 20, 2017 at 10:50:21 PM UTC+2, bpi...@gmail.com wrote:
>
> How do we count by Intergers only in Floating Point Applesoft BASIC?

There's no way, it can't. Thank Bill Gates.

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355033 is a reply to message #354797] Fri, 20 October 2017 17:13 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

There you go:

(function (i,j,k,t) {t=Date.now(); while(i<1e6) i+=1, j+=1, k+=1; alert([i,j,k]+ ' -> '+ (Date.now()-t)+ 'ms') })(0,0,0)

11 ms

Bill Gates' Applesoft takes 1000s:

10 FOR I=1 TO 1E6 : NEXT : ?"<CTRL-G>"

Which is 1000/0.011 -> 90909x times slower
--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355034 is a reply to message #355033] Fri, 20 October 2017 17:15 Go to previous messageGo to next message
anthonypaulo is currently offline  anthonypaulo
Messages: 531
Registered: September 2013
Karma: 0
Senior Member
No way, put the alert *inside* the loop! Muhahahahahahahahaha! :)
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355037 is a reply to message #355030] Fri, 20 October 2017 17:35 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: John Brooks

On Friday, October 20, 2017 at 1:50:21 PM UTC-7, bpi...@gmail.com wrote:
> On Friday, October 20, 2017 at 1:14:15 PM UTC-7, Scott Alfter wrote:
>
>>> I had thought the reason Applesoft code is not sped up on the FastChip
>>> has to do with using the built-in ROM routines, and I (perhaps
>>> erroneously) thought that the FastChip does not speed those rountines in
>>> $D000 to $FFFF up, by design. That is why I asked if there's a simple ML
>>> program to print from 1 to 1E6 on screen.
>>
>> I can't imagine that those ROM routines aren't being sped up. What's
>> imposing an upper limit on the visible speed increase is the accelerator
>> having to push updates to screen memory across an expansion bus that runs at
>> 1 MHz. Scrolling the screen imposes a particularly heavy hit, as the bottom
>> 23 lines all have to be copied to the line above; while reads could be
>> serviced from a cache running at full speed, all those writes will trigger a
>> slowdown.
>>
>
> Alright, if video scrolling and refresh occurring at 1 Mhz was the whole story, why does the following INTEGER BASIC routine complete 3-4x faster than its equilvalent Applesoft one?
>
> 10 POKE 34,34
> 20 FOR I = 1 TO 32767
> 30 PRINT I
> 40 NEXT I
>
> Now, I know IB is operating in integers only, but does this alone explain why the numbers 1 to 32767 can be printed 4x faster than in Applesoft intrepreted mode? It seems the ML routines of Integer BASIC are *much faster* and / or more efficient than those in Applesoft. This I've known academically for 40 years, but the simple counting loop proves this beyond a doubt. How do we count by Intergers only in Floating Point Applesoft BASIC? And how to do this Machine language?


>> How do we count by Intergers only in Floating Point Applesoft BASIC?

vars & arrays can use % for 16-bit integers (but not as a FOR var):

10 I% = 3e4
20 HTAB 1
30 PRINT I%" ";
40 I% = I% - 1
50 ON SGN(I%)+1 GOTO 10,20

Applesoft allows 16-bit integer types primarily for storing compact arrays in memory, and does not have 'fast-path' optimizations for them. For speed, use Beagle Basic which runs <much> faster than Applesoft.

-JB
@JBrooksBSI
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355038 is a reply to message #354797] Fri, 20 October 2017 17:53 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Tom Porter

Sadly... support above 16bit for Idiot Compiler is Very limited... It can COUNT up beyond 65xxx but has no Print variable command above 16bit... here is its output for 0-65536 and print value...

7530- 4C 39 75 JMP $7539
7533- 00 BRK
7534- 00 BRK
7535- 00 BRK
7536- 00 BRK
7537- 00 BRK
7538- 00 BRK
7539- EE 36 75 INC $7536
753C- D0 03 BNE $7541
753E- EE 37 75 INC $7537
7541- AD 37 75 LDA $7537
7544- AE 36 75 LDX $7536
7547- 20 24 ED JSR $ED24
754A- A9 8D LDA #$8D
754C- 20 ED FD JSR $FDED
754F- 4C 39 75 JMP $7539
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355039 is a reply to message #355038] Fri, 20 October 2017 17:59 Go to previous messageGo to next message
anthonypaulo is currently offline  anthonypaulo
Messages: 531
Registered: September 2013
Karma: 0
Senior Member
0-65535 :P
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355044 is a reply to message #355033] Sat, 21 October 2017 00:00 Go to previous messageGo to next message
gids.rs is currently offline  gids.rs
Messages: 1395
Registered: October 2012
Karma: 0
Senior Member
On Friday, October 20, 2017 at 3:13:32 PM UTC-6, Jorge wrote:
> There you go:
>
> (function (i,j,k,t) {t=Date.now(); while(i<1e6) i+=1, j+=1, k+=1; alert([i,j,k]+ ' -> '+ (Date.now()-t)+ 'ms') })(0,0,0)
>
> 11 ms
>
> Bill Gates' Applesoft takes 1000s:
>
> 10 FOR I=1 TO 1E6 : NEXT : ?"<CTRL-G>"
>
> Which is 1000/0.011 -> 90909x times slower
> --
> Jorge.


I totally hate the FOR/NEXT loop which pushes 18 bytes onto the stack. The DO/WHILE or REPEAT/UNTIL only pushes 7 bytes and takes up less code in applesoft ROM.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355049 is a reply to message #355018] Sat, 21 October 2017 03:03 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: awanderin

scott@alfter.diespammersdie.us (Scott Alfter) writes:

> In article <f6a6cd5a-7366-4b7e-8aef-86bacf0e7947@googlegroups.com>,
> <bpiltz@gmail.com> wrote:
>> Mysteries of the deep. I've got my spanking new FASTchip at a whopping
>> 16.6 Mhz and my trusty Apple //e does not count to a million any faster
>> than before!
>>
>> Simple question, can anyone provide the quickest native machine language
>> routine equivalent to
>>
>> FOR I = 1 TO 1E6: PRINT I: NEXT
>
> A few minutes' banging into the mini-assembler in AppleWin yielded this:
>
> 0300 A9 00 LDA #0
> 0302 A2 00 LDX #0
> 0304 A0 00 LDY #0
> 0306 C8 INY
> 0307 D0 06 BNE $30F
> 0309 E8 INX
> 030A D0 03 BNE $30F
> 030C 18 CLC
> 030D 69 01 ADC #1
> 030F C0 40 CPY #$40
> 0311 D0 F3 BNE $306
> 0313 E0 42 CPX #$42
> 0315 D0 EF BNE $306
> 0317 C9 0F CMP #$F
> 0319 D0 EB BNE $306
> 031B 60 RTS
>
> The count is maintained in the three registers, with the least-significant
> byte in the Y register and the most-significant byte in the accumulator.
> Since there's no increment-accumulator instruction, we have to fake it with
> an add, which takes two instructions...perhaps the carry flag never changes
> and it could be moved out of the loop? The loop counts up from 0 to end at
> $F4240...oops, that's one more iteration.
>
> One more consideration: I just realized your loop prints each iteration.
> Mine doesn't. Since I use the registers instead of memory, I'd have to save
> the registers to the stack and call a Monitor routine (the entry point of
> which I don't recall offhand) that prints hexadecimal numbers.
>
> With AppleWin running at an emulated 1 MHz, this loop takes 11 seconds to
> run.

Back in the Nibble magazine days they had an article on how to count to
a million (or some number) quickly. The fastest program used the
text-screen as its storage and directly incremented screen data. That
way you got to watch it count, without the overhead of generalized
number-printing routines in Applesoft.

--
--
Jerry awanderin at gmail dot com
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355051 is a reply to message #355049] Sat, 21 October 2017 05:31 Go to previous messageGo to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
> Back in the Nibble magazine days they had an article on how to count to
> a million (or some number) quickly. The fastest program used the
> text-screen as its storage and directly incremented screen data. That
> way you got to watch it count, without the overhead of generalized
> number-printing routines in Applesoft.

Hey, that's just what I'm after. I knew something like this just had to be possible. "directly incrementing screen data" via an ML routing bypasses the ROM routines, bypasses BASIC, and should be super quick at 16.6 Mhz, even while allowing for the video memory still being unaccelerated.

Kudos to anyone who can locate either this Nibble Article, or the program contained therein. Is there a searchable database located anywhere?
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355056 is a reply to message #355051] Sat, 21 October 2017 09:40 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Saturday, October 21, 2017 at 11:31:05 AM UTC+2, bpi...@gmail.com wrote:
>> Back in the Nibble magazine days they had an article on how to count to
>> a million (or some number) quickly. The fastest program used the
>> text-screen as its storage and directly incremented screen data. That
>> way you got to watch it count, without the overhead of generalized
>> number-printing routines in Applesoft.
>
> Hey, that's just what I'm after. I knew something like this just had to be possible. "directly incrementing screen data" via an ML routing bypasses the ROM routines, bypasses BASIC, and should be super quick at 16.6 Mhz, even while allowing for the video memory still being unaccelerated.
>
> Kudos to anyone who can locate either this Nibble Article, or the program contained therein. Is there a searchable database located anywhere?



* = $300
digitos = 5
zero = 176
dospuntos = 186
base = 1024
kbd = $c000
cls = $fc58
bell = $ff3a

jsr cls
ldx #digitos
lda #zero
fill sta base,x
dex
bpl fill
jmp restart

step
..byte 0

restart jsr bell
begin ldx #digitos
loop lda step
bmi wait_key
lda kbd
bmi key
cont ldy base,x
iny
cpy #dospuntos
beq carry
tya
sta base,x
jmp loop
carry lda #zero
sta base,x
dex
bmi restart
ldy base,x
iny
cpy #dospuntos
beq carry
tya
sta base,x
jmp begin

wait_key lda kbd
bpl wait_key

key and #$7f
cmp #27 ;ESC es quit
bne key2
rts

key2 cmp #83 ;S es step toggle
bne key3
lda #$ff
eor step
sta step

key3 sta $c010
jmp cont



0300:20 58 FC A2 05 A9 B0 9D
:00 04 CA 10 FA 4C 11 03
:00 20 3A FF A2 05 AD 10
:03 30 2B AD 00 C0 30 2B
:BC 00 04 C8 C0 BA F0 07
:98 9D 00 04 4C 16 03 A9
:B0 9D 00 04 CA 30 DA BC
:00 04 C8 C0 BA F0 F0 98
:9D 00 04 4C 14 03 AD 00
:C0 10 FB 29 7F C9 1B D0
:01 60 C9 53 D0 08 A9 FF
:4D 10 03 8D 10 03 8D 10
:C0 4C 20 03

300G

ESC is quit
S toggles single step
any other key steps if in single step mode or does nothing

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355060 is a reply to message #355056] Sat, 21 October 2017 12:41 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Saturday, October 21, 2017 at 3:40:04 PM UTC+2, Jorge wrote:
>>
>> Kudos to anyone who can locate either this Nibble Article, or the program contained therein.
>

Miles better than the other. ESC quits.

0300:20 58 FC A2 05 A9 B0 9D
:00 04 CA 10 FA 20 3A FF
:A2 05 BD 00 04 18 9D 00
:04 AC 00 C0 30 1E 69 01
:C9 BA D0 F2 A9 B0 9D 00
:04 CA 30 E1 BD 00 04 18
:69 01 C9 BA F0 EE 9D 00
:04 4C 10 03 8C 10 C0 C0
:9B 18 D0 DA 60

300G




* = $300
digitos = 5
zero = 176
dospuntos = 186
base = 1024
kbd = $c000
kbd_stb = $c010
cls = $fc58
bell = $ff3a

jsr cls
ldx #digitos
lda #zero
fill
sta base,x
dex
bpl fill
restart
jsr bell
begin
ldx #digitos
lda base,x
clc
loop
sta base,x
ldy kbd
bmi key
sigue
adc #1
cmp #dospuntos
bne loop
carry
lda #zero
sta base,x
dex
bmi restart
lda base,x
clc
adc #1
cmp #dospuntos
beq carry
sta base,x
jmp begin
key
sty kbd_stb
cpy #155 ;ESC es quit
clc
bne sigue
rts

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355064 is a reply to message #355060] Sat, 21 October 2017 13:25 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Saturday, October 21, 2017 at 6:42:00 PM UTC+2, Jorge wrote:
>
> Miles better than the other. ESC quits.
>

Last one, I promise. But there's no need to poll the keyboard so often:

0300:20 58 FC A2 05 A9 B0 9D
:00 04 CA 10 FA 20 3A FF
:A2 05 BD 00 04 18 AC 00
:C0 30 21 9D 00 04 69 01
:C9 BA D0 F7 A9 B0 9D 00
:04 CA 30 E1 BD 00 04 18
:69 01 C9 BA F0 EE 9D 00
:04 4C 10 03 8C 10 C0 C0
:9B 18 D0 D7 60

300G

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355068 is a reply to message #355051] Sat, 21 October 2017 14:49 Go to previous messageGo to next message
Michael AppleWin Debu is currently offline  Michael AppleWin Debu
Messages: 1262
Registered: March 2013
Karma: 0
Senior Member
This is rather trivial to write:

- - - 8< count.s - - -

ROW_00 = $400

SW_40COL = $C00C

SETTXT = $FB39
HOME = $FC58

KEY = $C000
KEYSTROBE = $C010
KEY_ESC = $1B ; Ctrl-[
KEY_SPC = $20

; 1,000,000
DIGITS = 6


ORG $300
Main
STA SW_40COL
JSR SETTXT
JSR HOME

LDX #DIGITS
LDA #'0' + $80
Init
STA ROW_00,X
DEX
BNE Init
STX gbStepping

Count
INC ROW_00 + DIGITS

LDX #DIGITS

IsCarry
LDA ROW_00,X
CMP #'9' + $81
BCC NoCarry
SBC #$0A
STA ROW_00,X
DEX
BEQ Exit
INC ROW_00,X
DB $A9 ; LDA #immed, skip next DEX
NoCarry
DEX
BNE IsCarry


; State Key Next
;-----------------------
; Stepping=0 No Count
; Yes OnKeyPress
; Stepping=1 No WaitKeyStep
; Yes OnKeyPress

LDY gbStepping
BEQ NoStepping
WaitKeyStep
LDA KEY
BPL WaitKeyStep
BMI OnKeyPress

NoStepping
LDA KEY
BPL Count
OnKeyPress
STA KEYSTROBE
AND #$7F

CMP #KEY_ESC
BEQ Exit
CMP #KEY_SPC
BNE Count
ToggleStepping
INY
TYA
AND #1
STA gbStepping
BPL Count ; Always
Exit
RTS

gbStepping
DB 0



0300:8D 0C C0 20 39 FB 20 58
0308:FC A2 06 A9 B0 9D 00 04
0310:CA D0 FA 8E 59 03 EE 06
0318:04 A2 06 BD 00 04 C9 BA
0320:90 0C E9 0A 9D 00 04 CA
0328:F0 2E FE 00 04 A9 CA D0
0330:EA AC 59 03 F0 07 AD 00
0338:C0 10 FB 30 05 AD 00 C0
0340:10 D4 8D 10 C0 29 7F C9
0348:1B F0 0D C9 20 D0 C7 C8
0350:98 29 01 8D 59 03 10 BE
0358:60 00
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355069 is a reply to message #355068] Sat, 21 October 2017 15:24 Go to previous messageGo to next message
Michael AppleWin Debu is currently offline  Michael AppleWin Debu
Messages: 1262
Registered: March 2013
Karma: 0
Senior Member
There are 2 optimizations that can be done:

* Once there is no carry, move directly to checking for keyboard input
* gbStepping can be inlined into a register

0300:8D 0C C0 20 39 FB 20 58
0308:FC A0 06 A9 B0 99 00 04
0310:88 D0 FA EE 06 04 A2 06
0318:BD 00 04 C9 BA 90 0F E9
0320:0A 9D 00 04 CA F0 2A FE
0328:00 04 A9 CA D0 EA 98 F0
0330:07 AD 00 C0 10 FB 30 05
0338:AD 00 C0 10 D6 8D 10 C0
0340:29 7F C9 1B F0 0B C9 20
0348:D0 C9 C8 98 29 01 A8 10
0350:C2 60

Instructions:

* ESC exits
* SPC toggle single-stepping.

While in single stepping press any key other than (*) ESC and SPC will single step.

(*) Thanks for who ever corrected me about "then vs than"a month or two ago. :-)
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355070 is a reply to message #355024] Sat, 21 October 2017 15:57 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: MG

On Friday, October 20, 2017 at 1:14:15 PM UTC-7, Scott Alfter wrote:
> I can't imagine that those ROM routines aren't being sped up. What's
> imposing an upper limit on the visible speed increase is the accelerator
> having to push updates to screen memory across an expansion bus that runs at
> 1 MHz. Scrolling the screen imposes a particularly heavy hit, as the bottom
> 23 lines all have to be copied to the line above; while reads could be
> serviced from a cache running at full speed, all those writes will trigger a
> slowdown.

The FastChip doesn't speed up the ROM. Copy AppleSoft into the language card, switch to it, and run the program again, you should see a significant boost.

To have a discussion about this, we must first consider the two prominent accelerator designs: The caching accelerator which manages a small cache in the usual fashion, the main example being the Zip Chip; and the "substitute address space" type which the original TransWarp is the main example (the IIgs's fast side is also in this category), which doesn't cache anything, it simply replaces the existing address space with faster RAM and has functionality to recognize writes to sensitive spaces (screen buffers) and write them through to the main system RAM as well with all reads coming from the fast RAM.

The FastChip is in the latter category. Now let's look at its additional feature set (512K model): up to 192K of RamWorks-style RAM, and up to 256K of slinky-style RAM. So that leaves us with 64K of RAM to work with, which happens to be the size of RAM in the main bank of an Apple //e. That doesn't leave any room for the additional 16K of ROM. I haven't tested it but I suspect the aux bank isn't sped up when the RamWorks functionality is off.

Accelerating the ROM could be complex business in the II, because the ROM isn't always the ROM when cards are present that assert /INH. I suspect that was part of the tradeoff. There's also the matter of copying the ROM into the fast RAM like the TransWarp does. That takes probably a fair bit of additional logic.

For a new-design accelerator that only has three chips on it, it's a really good first revision, and I am looking forward to what Plamen comes up with next.


MG
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355074 is a reply to message #355069] Sat, 21 October 2017 19:46 Go to previous messageGo to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
On Saturday, October 21, 2017 at 12:24:39 PM UTC-7, Michael 'AppleWin Debugger Dev' wrote:
> There are 2 optimizations that can be done:
>
> * Once there is no carry, move directly to checking for keyboard input
> * gbStepping can be inlined into a register
>
> 0300:8D 0C C0 20 39 FB 20 58
> 0308:FC A0 06 A9 B0 99 00 04
> 0310:88 D0 FA EE 06 04 A2 06
> 0318:BD 00 04 C9 BA 90 0F E9
> 0320:0A 9D 00 04 CA F0 2A FE
> 0328:00 04 A9 CA D0 EA 98 F0
> 0330:07 AD 00 C0 10 FB 30 05
> 0338:AD 00 C0 10 D6 8D 10 C0
> 0340:29 7F C9 1B F0 0B C9 20
> 0348:D0 C9 C8 98 29 01 A8 10
> 0350:C2 60
>
> Instructions:
>
> * ESC exits
> * SPC toggle single-stepping.
>
> While in single stepping press any key other than (*) ESC and SPC will single step.
>
> (*) Thanks for who ever corrected me about "then vs than"a month or two ago. :-)

There we go! Thanks for posting this. That gets us up to 1E6 (or 1E7, 1E8 ....) very, very quickly. Although it's "cheating", the end result is the same. It's actually an interesting excercise to go through, and this counting exercise does seem to scale with an increase in CPU speed in the FASTChip.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355075 is a reply to message #355070] Sat, 21 October 2017 19:57 Go to previous messageGo to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
> The FastChip doesn't speed up the ROM. Copy AppleSoft into the language card, switch to it, and run the program again, you should see a significant boost.
>
> To have a discussion about this, we must first consider the two prominent accelerator designs: The caching accelerator which manages a small cache in the usual fashion, the main example being the Zip Chip; and the "substitute address space" type which the original TransWarp is the main example (the IIgs's fast side is also in this category), which doesn't cache anything, it simply replaces the existing address space with faster RAM and has functionality to recognize writes to sensitive spaces (screen buffers) and write them through to the main system RAM as well with all reads coming from the fast RAM.
>
> The FastChip is in the latter category. Now let's look at its additional feature set (512K model): up to 192K of RamWorks-style RAM, and up to 256K of slinky-style RAM. So that leaves us with 64K of RAM to work with, which happens to be the size of RAM in the main bank of an Apple //e. That doesn't leave any room for the additional 16K of ROM. I haven't tested it but I suspect the aux bank isn't sped up when the RamWorks functionality is off.
>
> Accelerating the ROM could be complex business in the II, because the ROM isn't always the ROM when cards are present that assert /INH. I suspect that was part of the tradeoff. There's also the matter of copying the ROM into the fast RAM like the TransWarp does. That takes probably a fair bit of additional logic.
>
> For a new-design accelerator that only has three chips on it, it's a really good first revision, and I am looking forward to what Plamen comes up with next.
>
>
> MG

Thanks for the verification and explanation. I couldn't even get a simple FOR I = 1 TO 10000: NEXT to speed up, so I suspected the built-in ROM routines were indeed unaccelerated. I shall try moving FPBASIC into the language card ($D000-$DFFF) and seeif I get a speed-up.

Having purchased the 1MB version of the card, I wonder if 16kb of RAM in the 512k-1024k area could be used to cache pr shadow the ROM routines, or if the option could be implemented in the settings menu somehow. But, as you state, it's probably a lot more complicated than that alone.

Your suggestion that turning off the Ramworks functionality might speed up the auxillary memory bank is a good one, which I shall test. Probably it will have no effect on auxmem speed, I suspect.

I agree, for a rev 1 card, the FASTChip is fantastic. What a difference.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355077 is a reply to message #355075] Sat, 21 October 2017 22:56 Go to previous messageGo to next message
Michael J. Mahon is currently offline  Michael J. Mahon
Messages: 1767
Registered: October 2012
Karma: 0
Senior Member
<bpiltz@gmail.com> wrote:
>
>> The FastChip doesn't speed up the ROM. Copy AppleSoft into the language
>> card, switch to it, and run the program again, you should see a significant boost.
>>
>> To have a discussion about this, we must first consider the two
>> prominent accelerator designs: The caching accelerator which manages a
>> small cache in the usual fashion, the main example being the Zip Chip;
>> and the "substitute address space" type which the original TransWarp is
>> the main example (the IIgs's fast side is also in this category), which
>> doesn't cache anything, it simply replaces the existing address space
>> with faster RAM and has functionality to recognize writes to sensitive
>> spaces (screen buffers) and write them through to the main system RAM as
>> well with all reads coming from the fast RAM.
>>
>> The FastChip is in the latter category. Now let's look at its
>> additional feature set (512K model): up to 192K of RamWorks-style RAM,
>> and up to 256K of slinky-style RAM. So that leaves us with 64K of RAM
>> to work with, which happens to be the size of RAM in the main bank of an
>> Apple //e. That doesn't leave any room for the additional 16K of ROM.
>> I haven't tested it but I suspect the aux bank isn't sped up when the
>> RamWorks functionality is off.
>>
>> Accelerating the ROM could be complex business in the II, because the
>> ROM isn't always the ROM when cards are present that assert /INH. I
>> suspect that was part of the tradeoff. There's also the matter of
>> copying the ROM into the fast RAM like the TransWarp does. That takes
>> probably a fair bit of additional logic.
>>
>> For a new-design accelerator that only has three chips on it, it's a
>> really good first revision, and I am looking forward to what Plamen comes up with next.
>>
>>
>> MG
>
> Thanks for the verification and explanation. I couldn't even get a simple
> FOR I = 1 TO 10000: NEXT to speed up, so I suspected the built-in ROM
> routines were indeed unaccelerated. I shall try moving FPBASIC into the
> language card ($D000-$DFFF) and seeif I get a speed-up.
>
> Having purchased the 1MB version of the card, I wonder if 16kb of RAM in
> the 512k-1024k area could be used to cache pr shadow the ROM routines, or
> if the option could be implemented in the settings menu somehow. But, as
> you state, it's probably a lot more complicated than that alone.
>
> Your suggestion that turning off the Ramworks functionality might speed
> up the auxillary memory bank is a good one, which I shall test. Probably
> it will have no effect on auxmem speed, I suspect.
>
> I agree, for a rev 1 card, the FASTChip is fantastic. What a difference.
>

Most accelerators, including the Zip Chip, cache ROM and accelerate it.

Practically nothing pulls on /INH, and if something does, disable the
accelerator!

--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355079 is a reply to message #355074] Sat, 21 October 2017 23:59 Go to previous messageGo to next message
Michael AppleWin Debu is currently offline  Michael AppleWin Debu
Messages: 1262
Registered: March 2013
Karma: 0
Senior Member
How is it cheating? Optimization is doing the least work necessary to arrive at the correct answer at the right time.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355081 is a reply to message #355079] Sun, 22 October 2017 02:09 Go to previous messageGo to next message
bpiltz is currently offline  bpiltz
Messages: 78
Registered: October 2012
Karma: 0
Member
On Saturday, October 21, 2017 at 8:59:45 PM UTC-7, Michael 'AppleWin Debugger Dev' wrote:
> How is it cheating? Optimization is doing the least work necessary to arrive at the correct answer at the right time.

Indeed, your ingenious method at arriving at the right answer by thinking outside the box of a purely mathematical "x=x+1" formula is absolutely stellar!

Certainly, not cheating on my end goal, which was to get to 1e6 as quickly as possible.

Incidently, on AppleWin on my machine, your little gem of ML programming counts at nearly 1E7 per second. And it displays the result onscreen. Fabulous!
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355084 is a reply to message #355069] Sun, 22 October 2017 08:57 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

But my program is 16 bytes less and counts twice as fast my friend :-)
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355089 is a reply to message #355075] Sun, 22 October 2017 11:09 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Alexandre Suaide

FastChip does not cache ROM is fast memory just by design choice. This is a known procedure for many years. Most of the common accelerators for Apple II accelerates applesoft by design. They know how to deal with ROM mirror and what to do when interfacing with a peripheral card (external ROMS). Plamen mentioned that he could, eventually, implement it but it seems this is not an issue for him at the moment. If we do not ask for it from him we will never get it. What bothers me is the fact that this "feature" is not mentioned anywhere in the fastchip docs. In this case I can not see this as a "feature" but as a bug that is of everybody interest to have it fixed. Added to that, if you copy applesoft to fastchip RAM you break PRODOS. All disk access with PRODOS becomes broken. At least it happens with the command sequence provided by Plamen.

Best

Alex
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355389 is a reply to message #355084] Sun, 22 October 2017 13:29 Go to previous messageGo to next message
Michael AppleWin Debu is currently offline  Michael AppleWin Debu
Messages: 1262
Registered: March 2013
Karma: 0
Senior Member
On Sunday, October 22, 2017 at 5:57:22 AM UTC-7, Jorge wrote:
> But my program is 16 bytes less and counts twice as fast my friend :-)

It also has less features.

Also, your version is a "bloated" 69 bytes. =P

Let's see -- if I rip out all the Input ... then we end up with 38 bytes:

0300:A0 06 A9 B0 99 00 04 88
0308:D0 FA EE 06 04 A2 06 BD
0310:00 04 C9 BA 90 F4 E9 0A
0318:9D 00 04 CA F0 07 FE 00
0320:04 A9 CA D0 EA 60

- - - 8< count_no_input.s - - -
ROW_00 = $400

; 1,000,000
DIGITS = 6


ORG $300
Main
LDY #DIGITS
LDA #'0' + $80
Init
STA ROW_00,Y
DEY
BNE Init

Count
INC ROW_00 + DIGITS

LDX #DIGITS
IsCarry
LDA ROW_00,X
CMP #'9' + $81
BCC Count
SBC #$0A
STA ROW_00,X
DEX
BEQ Exit
INC ROW_00,X
DB $A9 ; LDA #immed, skip next DEX
NoCarry
DEX
BNE IsCarry
Exit
RTS
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355397 is a reply to message #355389] Sun, 22 October 2017 14:46 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

On Sunday, October 22, 2017 at 7:29:01 PM UTC+2, Michael 'AppleWin Debugger Dev' wrote:
> On Sunday, October 22, 2017 at 5:57:22 AM UTC-7, Jorge wrote:
>> But my program is 16 bytes less and counts twice as fast my friend :-)
>
> It also has less features.
>
> Also, your version is a "bloated" 69 bytes. =P
>
> Let's see -- if I rip out all the Input ... then we end up with 38 bytes:
>
> 0300:A0 06 A9 B0 99 00 04 88
> 0308:D0 FA EE 06 04 A2 06 BD
> 0310:00 04 C9 BA 90 F4 E9 0A
> 0318:9D 00 04 CA F0 07 FE 00
> 0320:04 A9 CA D0 EA 60


38 bytes???? That's bloated! :-P

Look, 36 and just as fast:

0300:A2 05 A9 B0 9D 00 04 CA
:10 FA A2 05 BD 00 04 C9
:B9 F0 06 FE 00 04 4C 0A
:03 A9 B0 9D 00 04 CA 10
:EB 60

300G


* = $300
digitos = 5
zero = 176
nueve = 185
base = $400

ldx #digitos
lda #zero
fill
sta base,x
dex
bpl fill
count
ldx #digitos
carry
lda base,x
cmp #nueve
beq diez
inc base,x
jmp count
diez
lda #zero
sta base,x
dex
bpl carry
rts

--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355401 is a reply to message #355397] Sun, 22 October 2017 14:56 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

JMP count can be BNE count so 35 bytes :-P

0300:A2 05 A9 B0 9D 00 04 CA
:10 FA A2 05 BD 00 04 C9
:B9 F0 05 FE 00 04 D0 F2
:A9 B0 9D 00 04 CA 10 EC
:60

300G
--
Jorge.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355415 is a reply to message #355401] Sun, 22 October 2017 16:27 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Jorge

Note to self: learn to count: 8*4=32 plus one, 33 not 35.
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355436 is a reply to message #355401] Sun, 22 October 2017 17:49 Go to previous messageGo to next message
Michael AppleWin Debu is currently offline  Michael AppleWin Debu
Messages: 1262
Registered: March 2013
Karma: 0
Senior Member
Nicely done!

From a code golf POV you Win.
However from a run-time POV you Lose.

disasm6502 -a -c -o 0x300 count_michael

ORG $0300 ;
0300: LDY #$06 ; Cycles: 2
0302: LDA #$B0 ; Cycles: 2
0304: STA $0400,Y ; Cycles: 4/5
0307: DEY ; Cycles: 2
0308: BNE $0304 ; Cycles: 2/3
030A: INC $0406 ; Cycles: 6
030D: LDX #$06 ; Cycles: 2
030F: LDA $0400,X ; Cycles: 4/5
0312: CMP #$BA ; Cycles: 2
0314: BCC $030A ; Cycles: 2/3
0316: SBC #$0A ; Cycles: 2
0318: STA $0400,X ; Cycles: 4/5
031B: DEX ; Cycles: 2
031C: BEQ $0325 ; Cycles: 2/3
031E: INC $0400,X ; Cycles: 7
0321: LDA #$CA ; Cycles: 2
0323: BNE $030F ; Cycles: 2/3
0325: RTS ; Cycles: 6

Total:
Best.: 55
Worst: 62


disasm6502 -a -c -o 0x300 ../apple2_count/count_jorge

ORG $0300 ;
0300: LDX #$05 ; Cycles: 2
0302: LDA #$B0 ; Cycles: 2
0304: STA $0400,X ; Cycles: 4/5
0307: DEX ; Cycles: 2
0308: BPL $0304 ; Cycles: 2/3
030A: LDX #$05 ; Cycles: 2
030C: LDA $0400,X ; Cycles: 4/5
030F: CMP #$B9 ; Cycles: 2
0311: BEQ $0318 ; Cycles: 2/3
0313: INC $0400,X ; Cycles: 7
0316: BNE $030A ; Cycles: 2/3
0318: LDA #$B0 ; Cycles: 2
031A: STA $0400,X ; Cycles: 4/5
031D: DEX ; Cycles: 2
031E: BPL $030C ; Cycles: 2/3
0320: RTS ; Cycles: 6
Total:
Best.: 47
Worst: 55

So in theory your version should be faster.
However in practise it is slower.

Why?

You didn't optimize the inner loop:

Mine: 1,000,000 x INC $0406 = 6,000,000 cycles
Your: 1,000,000 x INC $0400,X = 7,000,000 cycles


Now if there was only a way to get an accurate total cycle count (*) ...
(*) Left as an excercise for the reader.

Hmm, well can get an estimate instead. Running AppleWin (1.26.2.4) and using the debugger's TF (TraceFile) command:

BPC
BPX 300
BPX 325
0300:A0 06 A9 B0 99 00 04 88
0308:D0 FA EE 06 04 A2 06 BD
0310:00 04 C9 BA 90 F4 E9 0A
0318:9D 00 04 CA F0 07 FE 00
0320:04 A9 CA D0 EA 60
G 300
300G
TF "cycle_michael.txt" V
G
// ... Wait 8 mins for a 408 MB text file ...
TF
T

BPC
BPX 300
BPX 320
0300:A2 05 A9 B0 9D 00 04 CA
0308:10 FA A2 05 BD 00 04 C9
0310:B9 F0 05 FE 00 04 D0 F2
0318:A9 B0 9D 00 04 CA 10 EC
0320:60
G 300
300G
TF "cycle_jorge.txt" V
G
// .. Wait 9 mins for a 454 MB text file
TF
T

Comparing:

michael's version (408 MB)
Line: 6111125
009B 0039 05EE A0 B0 00 00 01F4 ..RB.IZC 031C:F0 07 BEQ $0325

jorge's version: (454 MB)
Line: 6777793
0043 000E 041B B6 B0 FF 00 01F4 N.RB.I.C 031E:10 EC BPL $030C


Your version has 6777793 - 6111125 = 666,668 MORE instructions executed then mine which seems to confirm the hunch that your version is slower.


Q. Can we do better?
A. Yes! "Know Thy Data!"

Let's inline x += 10
Classic size vs speed trade-off.

- - - 8< count_turbo - - -
ROW_00 = $400

; 1,000,000
DIGITS = 6


ORG $300
Main
LDY #DIGITS
LDA #'0' + $80
Init
STA ROW_00,Y
DEY
BNE Init

Count
INC ROW_00 + DIGITS ; +1
INC ROW_00 + DIGITS ; +2
INC ROW_00 + DIGITS ; +3
INC ROW_00 + DIGITS ; +4
INC ROW_00 + DIGITS ; +5
INC ROW_00 + DIGITS ; +6
INC ROW_00 + DIGITS ; +7
INC ROW_00 + DIGITS ; +8
INC ROW_00 + DIGITS ; +9
INC ROW_00 + DIGITS ; +10

LDX #DIGITS
BNE HaveCarry
IsCarry
LDA ROW_00,X
CMP #'9' + $81
BCC Count
HaveCarry
LDA #'0' + $80
STA ROW_00,X
DEX
BEQ Exit
INC ROW_00,X
BNE IsCarry
Exit
RTS
- - - 8< count_turbo - - -


BPC
BPX 300
BPX 340
0300:A0 06 A9 B0 99 00 04 88
0308:D0 FA EE 06 04 EE 06 04
0310:EE 06 04 EE 06 04 EE 06
0318:04 EE 06 04 EE 06 04 EE
0320:06 04 EE 06 04 EE 06 04
0328:A2 06 D0 07 BD 00 04 C9
0330:BA 90 D7 A9 B0 9D 00 04
0338:CA F0 05 FE 00 04 D0 EC
0340:60
G 300
300G
TF "cycle_turbo.txt" V
G
// ... Wait 3 minutes for a 147 MB text file ...
TF
T

Turbo version (147MB):
Line: 2200015
00AA 0016 06CB A0 B0 00 00 01F4 ..RB.IZC 0339:F0 05 BEQ $0340

The turbo version is executing 6777793 - 2200015 = 4,577,778 few instructions.

Maybe if someone is bored then can provide actual cycle timings.

Where did I place that "mic drop" meme ... =P
https://tenor.com/view/neil-degrasse-tyson-mic-drop-gif-5236 150

QED. :-)
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355437 is a reply to message #355415] Sun, 22 October 2017 17:53 Go to previous messageGo to next message
Michael AppleWin Debu is currently offline  Michael AppleWin Debu
Messages: 1262
Registered: March 2013
Karma: 0
Senior Member
/Oblg. "What's an off-by-one-bug amongst programmers?" ;-)
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355439 is a reply to message #355437] Sun, 22 October 2017 18:03 Go to previous messageGo to next message
anthonypaulo is currently offline  anthonypaulo
Messages: 531
Registered: September 2013
Karma: 0
Senior Member
You know that one day someone's gonna optimize this down to one instruction... it's gonna happen!
Re: Counting from 1 to 1,000,000 on the FASTChip [message #355452 is a reply to message #355089] Sun, 22 October 2017 19:10 Go to previous messageGo to previous message
gids.rs is currently offline  gids.rs
Messages: 1395
Registered: October 2012
Karma: 0
Senior Member
On Sunday, October 22, 2017 at 9:09:19 AM UTC-6, Alexandre Suaide wrote:
> FastChip does not cache ROM is fast memory just by design choice. This is a known procedure for many years. Most of the common accelerators for Apple II accelerates applesoft by design. They know how to deal with ROM mirror and what to do when interfacing with a peripheral card (external ROMS). Plamen mentioned that he could, eventually, implement it but it seems this is not an issue for him at the moment. If we do not ask for it from him we will never get it. What bothers me is the fact that this "feature" is not mentioned anywhere in the fastchip docs. In this case I can not see this as a "feature" but as a bug that is of everybody interest to have it fixed. Added to that, if you copy applesoft to fastchip RAM you break PRODOS. All disk access with PRODOS becomes broken. At least it happens with the command sequence provided by Plamen.
>
> Best
>
> Alex


I had uploaded to Asimov a revised Prodos Split program that works with applesoft in AuxLC and Prodos in MainLC. One can also change applesoft and load it from a hard drive.
Pages (5): [1  2  3  4  5    »]  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: ProDOS Access Bits?
Next Topic: ReactiveMicro power supply kit
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Mar 29 06:46:36 EDT 2024

Total time taken to generate the page: 0.08538 seconds