This code requires that your shadow PPU registers reside within a single 256-byte page in low RAM, in the same order as the physical PPU registers and without padding. Clobbers A, X, Y and D. Skips the OAM/VRAM/CGRAM address/data ports for obvious reasons, and COLDATA because of its specialness. If some registers are irrelevant to your game (e.g. BG4HOFS/BG4VOFS if you never use Mode 0) or if you always use HDMA to write to them (the M7 registers and window coordinates are likely candidates) then omit them.
Code: Select all
; assume A/X/Y/D have just been stacked and m = x = 0
lda #shadowPPUpage
tcd
ldx #($2100 - shadowPPUpage)
sep #$20
ldy z:<shadowINIDISP ; also gets OBJSEL
sty z:<INIDISP,x
ldy z:<shadowBGMODE ; also gets MOSAIC
sty z:<BGMODE,x
ldy z:<shadowBG1SC ; also gets BG2SC
sty z:<BG1SC,x
ldy z:<shadowBG3SC ; also gets BG4SC
sty z:<BG3SC,x
ldy z:<shadowBG12NBA ; also gets BG34NBA
sty z:<BG12NBA,x
; this unrolled code is smaller than it looks--every instruction is direct page
lda z:<shadowBG1HOFS
sta z:<BG1HOFS,x
lda z:<(shadowBG1HOFS+1)
sta z:<BG1HOFS,x
lda z:<shadowBG1VOFS
sta z:<BG1VOFS,x
lda z:<(shadowBG1VOFS+1)
sta z:<BG1VOFS,x
lda z:<shadowBG2HOFS
sta z:<BG2HOFS,x
lda z:<(shadowBG2HOFS+1)
sta z:<BG2HOFS,x
lda z:<shadowBG2VOFS
sta z:<BG2VOFS,x
lda z:<(shadowBG2VOFS+1)
sta z:<BG2VOFS,x
lda z:<shadowBG3HOFS
sta z:<BG3HOFS,x
lda z:<(shadowBG3HOFS+1)
sta z:<BG3HOFS,x
lda z:<shadowBG3VOFS
sta z:<BG3VOFS,x
lda z:<(shadowBG3VOFS+1)
sta z:<BG3VOFS,x
lda z:<shadowBG4HOFS
sta z:<BG4HOFS,x
lda z:<(shadowBG4HOFS+1)
sta z:<BG4HOFS,x
lda z:<shadowBG4VOFS
sta z:<BG4VOFS,x
lda z:<(shadowBG4VOFS+1)
sta z:<BG4VOFS,x
lda z:<shadowM7SEL
sta z:<M7SEL,x
lda z:<shadowM7A
sta z:<M7A,x
lda z:<(shadowM7A+1)
sta z:<M7A,x
lda z:<shadowM7B
sta z:<M7B,x
lda z:<(shadowM7B+1)
sta z:<M7B,x
lda z:<shadowM7C
sta z:<M7C,x
lda z:<(shadowM7C+1)
sta z:<M7C,x
lda z:<shadowM7D
sta z:<M7D,x
lda z:<(shadowM7D+1)
sta z:<M7D,x
lda z:<shadowM7X
sta z:<M7X,x
lda z:<(shadowM7X+1)
sta z:<M7X,x
lda z:<shadowM7Y
sta z:<M7Y,x
lda z:<(shadowM7Y+1)
sta z:<M7Y,x
ldy z:<shadowW12SEL ; also gets W34SEL
sty z:<W12SEL,x
lda z:<shadowWOBJSEL
sta z:<WOBJSEL,x
ldy z:<shadowWH0 ; also gets WH1
sty z:<WH0,x
ldy z:<shadowWH2 ; also gets WH2
sty z:<WH2,x
ldy z:<shadowWBGLOG ; also gets WOBJLOG
sty z:<WBGLOG,x
ldy z:<shadowTM ; also gets TS
sty z:<TM,x
ldy z:<shadowTMW ; also gets TSW
sty z:<TMW,x
ldy z:<shadowCGSWSEL ; also gets CGADSUB
sty z:<CGSWSEL,x
lda z:<shadowSETINI
sta z:<SETINI,x
If your shadow registers are in SA-1 IRAM or BWRAM you can still use this without changing anything. ($2100 - shadowPPUpage) will be negative, but direct page indexed addressing wraps within bank 00 so that's just fine.
More generally, if you can spare the 16-bit X register, then you only have to set D once (to the page you access most often, or the one you need to do indirect addressing out of) and you can reach all of bank 00 with direct page indexed addressing. On the SNES, this is handy when you've got DB set to a bank that doesn't contain the MMIO registers. pea page; pld takes a distressing number of cycles and lda #page; tcd clobbers A and might require a rep #$20, but ldx #(page - directpage) is just one fast instruction.
I'm pretty sure the only way to improve on this is by using self-modifying code in RAM, where the "shadow registers" are the immediate operands of some load instructions.