Megalextoria
Retro computing and gaming, sci-fi books, tv and movies and other geeky stuff.

Home » Digital Archaeology » Computer Arcana » Apple » Apple II » Fast GS graphics programming techniques
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Fast GS graphics programming techniques [message #363899] Mon, 19 February 2018 21:27 Go to next message
Anonymous
Karma:
Originally posted by: John Brooks

In Open Apple podcast #75, Quinn Dunki had a nice segment discussing fast GS graphics programming techniques.

http://www.open-apple.net/2017/12/24/show-075-seth-sternberg er-class-apples-gs-graphics/

After listening to the episode, I sent Quinn an email with some additional techniques and historical notes, which I've also posted below at the request of Huibert Aalbers.

Looking around online, I don't see my late-80's BBS-posted picture disks which helped popularize SHR stack techniques. I'll go through my floppy collection and put those demo disks online too:

1) 1988: 3200-color picture disk slideshow
2) 1989 1500-color compressed pictures slideshow

-JB
@JBrooksBSI


On Sat, Dec 30, 2017 at 11:57 AM, Quinn Dunki wrote:
That’s really interesting, and you make a compelling point about saving backgrounds with the stack. Thanks for that!

Q


On Dec 30, 2017, at 11:48 AM, John Brooks wrote:

> Regarding PEA/PEI for fastest pixel writes, I found Brutal Deluxe’s writeup on this pretty informative, as they use register pushes to get slightly faster. If you can optimize the sprite into four bit-patterns on word boundaries (or as close to that as practical), you can load the patterns into X,Y,A, and D, and just push as needed. You get 1 cycle per pixel this way, versus 1.25 cycles/pixel with PEA.

Yes, PHA/PHX/PHY/PHD is a bit faster than PEA/PEI if shadowing is disabled, but if shadowing is enabled, they all go at the same speed: 3 usec for 2 bytes. It's basically 1 usec for the 2.8MHz insn fetch, then 2 usec to write 2x bytes to VRAM @ 1MHz. The 1MHz VRAM writes are the key bottleneck with SHR graphics.

IMO Mr. Sprite, while great for certain types of games, has several drawbacks compared to Rastan's sprite compiler. Rastan draws to the direct page which has these advantages compared to drawing to the stack:

1) DP writes allow random access without having to update the base register.. Drawing to the stack requires pushes for fast writes to the top of the stack only. Stack offsets cannot be negative, which is where sprites want to skip-then-draw.

2) DP offsets are 1 cycle faster than stack offset writes if the DP register is page aligne

3) DP writes have more opcodes than stack writes: STZ, TSB, TRB, INC, DEC, ASL, LSR, ROL, ROR

4) The biggest win is that DP writes free up the stack to be used as a backsave buffer so sprites can be quickly erased later. Stack draws tend to have slow erases. Using the stack for backsave is fast: either 4-cycle PHA as part of LDA DP, (PHA), AND #, OR #, STA DP, or as a 6-cycle PEI before a STZ (0), STA(const repeat), STX (const1), STY (const2)

I think Mr Sprite will be good when drawing large sprites with minimal masking or skipping, and where the background is static and can be restored quickly. Rastan is better for scrolling or animating backgrounds, small sprites, or sprites with a lot of holes or masking.

Hope that helps,
-JB


On Sat, Dec 30, 2017 at 10:46 AM, Quinn Dunki wrote:
Hey John!

Thanks for all the additional details. They will indeed help our listeners, and my own current project. ;-)

Regarding PEA/PEI for fastest pixel writes, I found Brutal Deluxe’s writeup on this pretty informative, as they use register pushes to get slightly faster. If you can optimize the sprite into four bit-patterns on word boundaries (or as close to that as practical), you can load the patterns into X,Y,A, and D, and just push as needed. You get 1 cycle per pixel this way, versus 1.25 cycles/pixel with PEA.

http://www.brutaldeluxe.fr/products/crossdevtools/mrspritete ch/index.html

The historical context is great also- thanks for sharing!

Q


On Dec 29, 2017, at 9:48 AM, John Brooks wrote:

Hi Quinn. I just listened to Open-Apple podcast #75 and enjoyed your tech-talk about fast SHR drawing techniques.

Here is some additonal info & references which might be useful to GS devs & listeners:

1) STATEREG at $C068 contains most of the Apple II & //e MMU bankswitch controls in a single byte.
So this:
LDA #$30
TSB $C068
Works the same as:
STA $C003
STA $C005
but is smaller & faster with a single 1MHz access stall.

2) MVN or 8-bit copies should be avoided for SHR drawing if possible since each 1MHz sync stalls for an extra ~0.5 usec on average.

3) Shadowing can be enabled in all banks, which allows 8x shadowed frame buffers per megabyte ($2000-$9FFF in each odd-numbered bank). Since higher banks can't be accessed via DPage/Stack though, the fastest way to blit to the $E1 bank is via 16-bit TSB absolute which will write two bytes every 4x usec.

4) A better method than all-bank shadowing is to use unrolled PEA code which can draw about 2.5 SHR screens every 128K of code. This is the technique I used in Tomahawk GS to race the beam changing SHR palettes to create 3200 color mode. PEA can draw 2 bytes every 3x usec.

PEA can also be used to draw multiple parallax planes as in the GTE engine & Super Mario demo:

http://iigs.dreamhosters.com/gte/gte.html

5) My preferred method for fast SHR drawing is PEI. Like PEA it can draw 2 bytes every 3x usec, but it can read from anywhere in banks 0 or 1 which enables scrolling (as used in Rastan GS). Note that if the DP reg is not page-aligned, draw speed drops to 2 bytes every 4x usec.

BTW, PEA takes 5 cycles and PEI is 6 cycles normally, but for drawing to shadowed memory, unrolled PEA seems to suffer slightly more 1MHz-sync stalls than PEI.

6) Tomahawk GS used fast SHR routines to draw a 'hardware' cursor. I used a scan-line interrupt on the line above the cursor, then saved the SHR bytes under the cursor and drew the cursor. Then I checked the beam position and restored lines already displayed. After all lines have been restored, I exit the IRQ handler. Voila, a flicker-free 'sprite' cursor.

7) Rastan GS sprite drawing used the stack for bkg save/restore buffer, the dpage for drawing to SHR memory, and compiled code (K bank) for the source art. This allowed fast draw & erase of an arbitrary number of sprites over a variable (scrolling) background.


As to date of first-use, as far as I know, I was the first GS dev to apply the stack/direct-page fast path techniques. I used them in Tomahawk GS in Sep 1987 and created the PEA-based 3200 color mode in Jan 1988 which was first made public in Tomahawk GS (3200-color title screen).

In mid-1988 I released a public demo 3200-color picture disk via BBS and reported the techniques via BBS posts and to Apple in hopes of getting a faster Finder & Quickdraw. I later found out that the FTA guys reverse-engineered Tomahawk GS and applied the fast SHR drawing techniques and zero-page based sound streaming techniques to their products and demos.

Burger Heineman was also very inventive with the GS and may have independently found this technique or predated me, but it's safe to say that Apple's 1989 tech note was at least 2 years behind the rest of the Apple II dev community.

-JB
Fast GS graphics programming techniques [message #363907 is a reply to message #363899] Tue, 20 February 2018 00:19 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: jgandersen

Thank you for sharing! Very cool stuff.
Re: Fast GS graphics programming techniques [message #363913 is a reply to message #363899] Tue, 20 February 2018 02:36 Go to previous message
Anonymous
Karma:
Originally posted by: as4565683

@John Brooks:
Do you remember or know if Task Force is using the same save-the-background-technique as Rastan? Judging from the amount of sprites on the screen (especially in later levels and with the helicopter), I guess there is a limit when it would be simpler/faster to draw the entire background again...?
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Some questions about prodos
Next Topic: dissasmbler
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Tue Apr 23 22:08:57 EDT 2024

Total time taken to generate the page: 0.03886 seconds