So I was looking into popslide by tepples, and while reading though the code another technique occurred to me. It doesn't work with the NES Stripe Image format, but it should be quite a lot faster (and thus be able to push more VRAM data per vblank).
While popslide and similar techniques works like a sort of mini-interpreter, using an unused part of the stack as a blob of instructions and data, my technique drops the "interpreter" part in favor of using the "Reverse RTS trick", filling the stack with addresses to various video updater methods followed by their data.
So, it might look something like this:
Code: Select all
SetPalette1: ; Set vram address to palette 1 LDA #$3F STA $2006 PLA #$01 STA $2006 ; Pull out the palette values and give them to VRAM PLA STA $2007 PLA STA $2007 PLA STA $2007 RTS FillArbitraryData: ; Set arbitrary address PHA STA $2006 PHA STA $2006 ; Loop here filling $2007 RTS Terminator: ; Restore stack to it's normal self here RTS
Each "RTS" call on those methods will magically chain into the next one (costing only 6 cycles!), until it hits the terminator method which takes us out of this loop. Unlike a mini-interpreter, things don't become slower the more alternatives we create, so we can create very fast highly specialized methods for setting specific kinds of VRAM (like the palette).
We can even use a macro to create an unrolled "LDA # -> STA" loop for the absolutely fastest bandwidth 1 byte per 6 cycles for static ROM data like text.
Furthermore, since we can jump to arbitrary addresses, we can create huge unrolled loops of "PLA -> STA", and then jump an arbitrary distance into that unrolled loop and use that as our starting point to push a certain amount of bytes, at no additional cost!
The possibilities are limitless. You can write methods that push data exactly to the dot, not even needing to push a length byte to the stack.
The cost of this technique:
Adjusting the stack and starting the process: 16 cycles (including initial JSR)
Static cost to start a segment: 6 cycles
Cost per segment: variable (but always lower than popslide due to specialization)
Cost to end the process: 15 cycles (including final RTS)