Here is my file (the crc is used to assert that we are using the same ROM dump):
If not the same rom, add the unknown address given by the converted rom and convert it again until it works.
Updating vram by column:
We update columns of tiles. 8 colums, they can be updated by DMA using a 32 words increment.
Updating vram by Line case:
We e update lines. Same thing but one by one increment, 30 lines.
This can be made through DMA (10 times faster than a loop).
However, the line or column must be identified. The background mirroring bit can help.
2 columns or line will be updated at each NMI, not overloading the console. It can be done by DMA or by HDMA.
Bit masks can determine what to update.
However I checked if it was really the BG update who were slowing down the program. I removed it from the NMI routine and it changes nothing to the speed, maybe it's the sprite zero emulation? Balloon Fight is always fluid, it does not use it.
Btw the Sprite0 emulation does not work well on the test rom, something is wrong with it or when writing in the scrolling registers while rendering. Sometimes it works, sometimes it stays at (0, 0). That's why the screen is shaking so much.
Maybe the scrolling values should be always written using the HDMA.
However the full bank copy to VRAM seems to be fast enough. I assumed it was not and I was wrong, "assume nothing" when optimising. According to the documentation, the DMA can copy up to 6KB during vblank.
The Sprite0 flag and change the scroll registers during rendering seems to be the main problem.
And by stopping the program randomly it shows that it updates the sprite CHR data in VRAM all the time.
I will fix this
I tried the test rom ont the console and the picture is fine, and it is fast. While it flickers on bsnes plus.
On the other side it flickers on smb on the real hardware like on the emulator. And seems slow.
The title screen of SMB does not show, I read that it could be because of some missing delay in PPU reads.
Here are the reads performed by SMB:
.DW rlda_2007 <- it works, the latching was tested
The mirroring must be emulated by making a copy of the nametables bank 1 after bank 2.
Nametables bank 2 is not very clean, the lower part is missing.
Donkey kong still slow.
Battle city shows something.
Excite bike shows the title screen, need to add more indirect addresses.
Galaga does nothing
Ice climber hangs
So it looks like something is slowing everything down. But it does not show any-more on the Sprite0 test Rom.
The nametable mirroring is achieved by making a copy of the first bank after the second one. And this is slooow, I must implement the line and column update.
The sprite 0 hit system shakes, it misses vblanks because of the nametables copy at each NMI. I will update the Names only by columns. The direction of mirroring is in the rom file header, therefore it will be vertical/line or horizontal/lines updating.
Enabling the video sync on bsnes plus solves the problem of blinking images on the sprite 0 test ROM.
Another thing to add, is a branch tree for the indirect jumps. There are hundreds of indirect jumps addresses in SMB1 and testing each one slows everything down.
And a read of every byte in the program would count probable missing io accesses or missing indirect jumps in order to check for missing patches.
However, even when reducing the cost of the transfer, the sprite zero emulation glitches. Event without handling the indirect jumps. While it is faster, like on the original but it glitches a lot. It is like it misses the end of the VBlank or the collision. It could also be a bug.
I have to look at it.
I fixed that, but it still glitches.
edit: it seems to be the values written in the register causing problem, once it is like 4 and the other time it is 260 (the correct value). I must find out why incorrect values a written into the PPUSCROLL register. Maybe it is because 260 = $0104 and the highest bit is lost. This bit seems to be changing a lot.
I will try to fix the title screen first. It does not show the "Super Mario Bros." picture, nor 1 or 2 players. Anyone knows what could be the cause? By cause I mean a IO read having a peculiar behaviour.
When you read video memory using $2007, the PPU returns the last value read from video memory and then reads the next value from VRAM. This means if I set the video memory address to $0F00, I have to read once and throw that value away
Code: Select all
lda #$0F sta $2006 lda #$00 sta $2006 ; VRAM address = $0F00 lda $2007 ; A := last byte (usually garbage), and last byte := value at $0F00 lda $2007 ; A := last byte (value at $0F00), and last byte := value at $0F01 lda $2007 ; A := last byte (value at $0F01), and last byte := value at $0F02
I tend to keep the data in nes format, but I am not sure about the CHR data, however it works.
The title screen is all right now. Thanks