unregistered wrote: ↑Thu Jan 30, 2020 4:49 pm
tokumaru, this is my lack of experience response: But pla is 4 cycles. That’s twice as many cycles as lda #.
You're right,
lda # is indeed as fast as it gets - I didn't know you were doing that. Few people use that for VRAM transfers because of how much memory it needs (5x the amount of actual data actually being transferred!).
8 extra bytes bc: 1 4kb CHR file is 4096 bytes. (4096 bytes / 256 tiles = 16 bytes per tile.) 4096 / 12 frames = 341.3333. So our function writes 342 bytes per frame... 342 bytes * 12 frames = 4104 bytes. 4104 - 4096 = 8 extra bytes.
Wait... so you have one unrolled function that writes 342 immediate values to VRAM? That function is 1710 bytes (plus 1 for the RTS) long! Do you have the RAM for that? Or do you have many such functions in ROM, meaning you're dealing with the x5 expansion of your CHR data?
If you have this function in RAM, I don't see why you can't call it by JSR'ing to
FunctionStart+(5*8*16) to skip the first 8 tiles on the last transfer. And if you have these functions in ROM, why not just make the last one shorter?
The write2SaveRAM loop writes hex codes for 2 byte stores per iteration bc 342/2<256, so hmm... maybe it would be good to decrease the loop by 1 so 340 bytes are written, then manually write the other hex codes to SaveRAM so that 341 bytes are written per vblank, and then add even more code to write the last 4 bytes (341 * 12 = 4092 + 4 = 4096) to CHRRAM. I’ll do this
;
Since you have a "write2SaveRAM" loop, I assume your unrolled function is in RAM. So I suggest one of the following:
1- Start populating the transfer list after the first 8 tiles, and during vblank, skip these first transfers by JSR'ing directly to the 9th transfer. You can JSR to an indirect JMP to simulate an indirect JSR, so you can pre-calculate the entry point to the function (no conditional logic during vblank).
-OR-
2- When you finish buffering the 334 bytes of the last block, overwrite the following
LDA # ($A9) with
RTS ($60), causing the function to exit early and skip the last 8 tiles. You have to remember to change the RTS back into an LDA after the transfer ends. This solution is better IMO because you don't need to change the NMI handler at all.
it would have been sweet if writing to $2007 set some flag once the VRAM address reached an $x000 byte.
You have to stop wishing for these oddly specific behaviors that would be useful only in your particular implementation of things. Blocking VRAM writes in such cases would cause much more harm than good, because you could end up inadvertently triggering this when clearing VRAM or updating random pattern or name table regions, both of which are very common tasks. Like it or not, this system architecture has been set in stone with the release of the Famicom nearly 37 years ago, so there's no point in wishing for these things now. Instead of thinking how the platform could be changed to meet your needs, think of how you can improve your code to make better use of the platform as it is.