Megalextoria
Retro computing and gaming, sci-fi books, tv and movies and other geeky stuff.

Home » Digital Archaeology » Computer Arcana » Commodore » Commodore 8-bit » Wanted: Assembler (long post)
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Wanted: Assembler (long post) [message #94948] Mon, 27 December 2004 09:48 Go to previous message
Jody Bruchon is currently offline  Jody Bruchon
Messages: 33
Registered: December 2004
Karma:
Member
I am seeking an assembler that probably does not exist. The specific
features I need are:

* Support for #ifdef, #include, #define, #undef, #else, #endif
(preprocessing conditional includes like lupo/luna/lld)
* Support for 6502, 6510 illegal ops, 65816 (emulation mode) ops, AND 65C02 ops

And I need them *in the same assembler*. The luna toolkit looked promising
until I saw that it doesn't do 65C02 and 6510 illegal ops. I can't program
in C (though I'm great in BASIC ;) so I'm not particularly up to adding this
support to luna. Any suggestions?

The reasoning behind this is: I'm making that darn C02 OS and I was
thinking, "Hey, it would be pretty smart to rewrite some parts of the system
in 65C02 ops and take advantage of 65C02/65816 new ops if the user wants to
build it that way!" I want the OS to become "portable and optimized" across
the entire available 6502 family.

For those who don't want to see lots of assembly code, close this message.

*** INSANELY LENGTHY CODE COMPARISON FOLLOWS ***

Take this code from the scheduler I posted on c.b.cbm:

irq
pha ; Save A so we won't lose it!
txa ; X too!
pha
ldx task ; Find out what task we're on
lda #$00 ; Init A for countdown

That was for a stock 6502/6510 NMOS CPU. 7 bytes, 13 cycles. Rework it for
the 65C02 and we can get:

irq
pha ; Save A so we won't lose it!
phx ; X too!
ldx task ; Find out what task we're on
lda #$00 ; Init A for countdown

6 bytes, 11 cycles. Okay, okay, two cycles ain't a big deal, but that's
just the init for the IRQ routine, and that's triggering once every 60
seconds at a minimum, so that's 120 cycles per second that can go to
something else.

How about running on a 65816 in Emulation mode? NMOS has:

irqsav
tax ; Load index
sty t1y,x ; Store Y
pla ; Pull X
sta t1x,x ; Store X
pla ; Pull A
sta t1a,x ; Store A
pla ; Pull P (IRQ stored)
sta t1p,x ; Store P
pla ; Pull PC low byte
sta t1pc,x ; Store PC low
pla ; Pull PC high
sta t1pc+1,x ; Store PC high
stx temp ; Save index
tsx ; Get current SP
txa ; Save SP elsewhere
ldx temp ; Restore index
sta t1sp,x ; Save SP
ldx task ; Load task number
inx ; Increment task number
cpx tasks ; Compare with running task counter
bne irqtsk ; If not at max, proceed normally
ldx #$01 ; Reset task number to 1

That counts out to 35 bytes and 70 cycles if the task counter hasn't maxed
out, and 72 cycles if it has. 65816 says:

irqsav
tax ; Load index
sty t1y,x ; Store Y
pla ; Pull X
sta t1x,x ; Store X
pla ; Pull A
sta t1a,x ; Store A
pla ; Pull P (IRQ stored)
sta t1p,x ; Store P
pla ; Pull PC low byte
sta t1pc,x ; Store PC low
pla ; Pull PC high
sta t1pc+1,x ; Store PC high
tsc ; Get SP
sta t1sp,x ; Save SP
ldx task ; Load task number
inx ; Increment task number
cpx tasks ; Compare with running task counter
bne irqtsk ; If not at max, proceed normally
ldx #$01 ; Reset task number to 1

That knocked out stx $xx, txa, ldx $xx, saving 5 bytes and 8 cycles...only
30 bytes and 62 cycles to save a context on a 65816 in emulation mode.
Don't forget the same silliness executes on the context restore operation as
well (X denotes lines that would be removed entirely):

irqload
tax ; Load index
X stx temp ; Save index to memory temporarily
lda t1sp,x ; Load SP
X tax ; Prepare SP for change
txs ; Change SP
X ldx temp ; Restore index
lda t1pc+1,x ; Load PC high
pha ; Push PC high
lda t1pc,x ; Load PC low
pha ; Push PC low
lda t1p,x ; Load P
pha ; Push P
lda t1a,x ; Load A
sta temp ; Temporarily save A
ldy t1y,x ; Load Y
lda t1x,x ; Load X with A
tax ; Move X's value from A into X
lda temp ; Load A from temporary location
lda $dc0d ; c64: Silence the CIA 1 interrupts
lda $dd0d ; c64: Silence the CIA 2 interrupts
rti ; Return from IRQ into next task

I'm sure there are other things I could do to optimize more, such as
replacing the "sta temp" and "lda temp" with XBA instructions (no cycle
count gain, but drop two bytes of code size). It appears that I stand to
gain about 18 cycles per IRQ for this extremely minimal IRQ handler on a
65816, which gives me 1,080 cycles per second that can go to something
else...and the IRQ handler is definitely going to expand. If someone wanted
to port C02 to an Apple IIgs, an SNES, a VIC-20 with a 65C02 stuck in there
to replace the NMOS 6502, or even their own homebuilt computer system, they
would be able to reap the full benefits of their CPUs by being able to plug
in optimized versions of the most important pieces of code. Unfortunately,
it seems I would have to mix and match some sort of preprocessor with a
different assembler to achieve these results. Any suggestions?

JodyZ
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Re: What FPGA kits did you buy?
Next Topic: ACME Assembler
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Apr 19 07:33:25 EDT 2024

Total time taken to generate the page: 0.06862 seconds