This code requires that your shadow PPU registers reside within a single 256-byte page in low RAM, in the same order as the physical PPU registers and without padding. Clobbers A, X, Y and D. Skips the OAM/VRAM/CGRAM address/data ports for obvious reasons, and COLDATA because of its specialness. If some registers are irrelevant to your game (e.g. BG4HOFS/BG4VOFS if you never use Mode 0) or if you always use HDMA to write to them (the M7 registers and window coordinates are likely candidates) then omit them.
Code: Select all
; assume A/X/Y/D have just been stacked and m = x = 0 lda #shadowPPUpage tcd ldx #($2100 - shadowPPUpage) sep #$20 ldy z:<shadowINIDISP ; also gets OBJSEL sty z:<INIDISP,x ldy z:<shadowBGMODE ; also gets MOSAIC sty z:<BGMODE,x ldy z:<shadowBG1SC ; also gets BG2SC sty z:<BG1SC,x ldy z:<shadowBG3SC ; also gets BG4SC sty z:<BG3SC,x ldy z:<shadowBG12NBA ; also gets BG34NBA sty z:<BG12NBA,x ; this unrolled code is smaller than it looks--every instruction is direct page lda z:<shadowBG1HOFS sta z:<BG1HOFS,x lda z:<(shadowBG1HOFS+1) sta z:<BG1HOFS,x lda z:<shadowBG1VOFS sta z:<BG1VOFS,x lda z:<(shadowBG1VOFS+1) sta z:<BG1VOFS,x lda z:<shadowBG2HOFS sta z:<BG2HOFS,x lda z:<(shadowBG2HOFS+1) sta z:<BG2HOFS,x lda z:<shadowBG2VOFS sta z:<BG2VOFS,x lda z:<(shadowBG2VOFS+1) sta z:<BG2VOFS,x lda z:<shadowBG3HOFS sta z:<BG3HOFS,x lda z:<(shadowBG3HOFS+1) sta z:<BG3HOFS,x lda z:<shadowBG3VOFS sta z:<BG3VOFS,x lda z:<(shadowBG3VOFS+1) sta z:<BG3VOFS,x lda z:<shadowBG4HOFS sta z:<BG4HOFS,x lda z:<(shadowBG4HOFS+1) sta z:<BG4HOFS,x lda z:<shadowBG4VOFS sta z:<BG4VOFS,x lda z:<(shadowBG4VOFS+1) sta z:<BG4VOFS,x lda z:<shadowM7SEL sta z:<M7SEL,x lda z:<shadowM7A sta z:<M7A,x lda z:<(shadowM7A+1) sta z:<M7A,x lda z:<shadowM7B sta z:<M7B,x lda z:<(shadowM7B+1) sta z:<M7B,x lda z:<shadowM7C sta z:<M7C,x lda z:<(shadowM7C+1) sta z:<M7C,x lda z:<shadowM7D sta z:<M7D,x lda z:<(shadowM7D+1) sta z:<M7D,x lda z:<shadowM7X sta z:<M7X,x lda z:<(shadowM7X+1) sta z:<M7X,x lda z:<shadowM7Y sta z:<M7Y,x lda z:<(shadowM7Y+1) sta z:<M7Y,x ldy z:<shadowW12SEL ; also gets W34SEL sty z:<W12SEL,x lda z:<shadowWOBJSEL sta z:<WOBJSEL,x ldy z:<shadowWH0 ; also gets WH1 sty z:<WH0,x ldy z:<shadowWH2 ; also gets WH2 sty z:<WH2,x ldy z:<shadowWBGLOG ; also gets WOBJLOG sty z:<WBGLOG,x ldy z:<shadowTM ; also gets TS sty z:<TM,x ldy z:<shadowTMW ; also gets TSW sty z:<TMW,x ldy z:<shadowCGSWSEL ; also gets CGADSUB sty z:<CGSWSEL,x lda z:<shadowSETINI sta z:<SETINI,x
If your shadow registers are in SA-1 IRAM or BWRAM you can still use this without changing anything. ($2100 - shadowPPUpage) will be negative, but direct page indexed addressing wraps within bank 00 so that's just fine.
More generally, if you can spare the 16-bit X register, then you only have to set D once (to the page you access most often, or the one you need to do indirect addressing out of) and you can reach all of bank 00 with direct page indexed addressing. On the SNES, this is handy when you've got DB set to a bank that doesn't contain the MMIO registers. pea page; pld takes a distressing number of cycles and lda #page; tcd clobbers A and might require a rep #$20, but ldx #(page - directpage) is just one fast instruction.
I'm pretty sure the only way to improve on this is by using self-modifying code in RAM, where the "shadow registers" are the immediate operands of some load instructions.