upernes

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I have found why it blinks, the SMB NonMaskableInterrupt code sets the name table address to be $2800 at the beginning.
And then it does his sprite zero detection. But it does not restore it when setting the scrolling values, it is later when rendering with the PPU_CTRL_REG1 bit0 to 1 that the next nametable bank will be used. And therefore it will render ok on the nes, but in my current super nes emulation, the bit in use at the moment of the sprite zero position will be used. And therefore I must also update the scrolling registers when the bit changes in PPU_CTRL_REG1.

Edit: I have much less glitches on this, but still going to horizontal position 0.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

It seems that I have a speed problem with what is before sprite 0 hit flag. But maybe not, the top still glitches somehow if I make it collide at 96. It means that PPUCTRL1 does not always point to the first name bank at the end of vblank. But it is much better if the HScroll value is below 256 (no glitch at all).
Is it a way to profile snes code? To know how many cycles and scanlines were used between 2 points?

My way would be to read the V counter before when PPUCTRL1 is accessed.
Anyway it seems (from the scroll after 256) that the number of available cycles are not enough to complete the NMI routine before the end of vblank.

What this "reset flipflop" from the end of SMB's NMI means?

Code: Select all

               jsr OperModeExecutionTree ;otherwise do one of many, many possible subroutines
SkipMainOper:  lda PPU_STATUS            ;reset flip-flop
               pla
               ora #%10000000            ;reactivate NMIs
               sta PPU_CTRL_REG1
               rti                       ;we are done until the next frame!
The PPU_CTRL_REG1's bit 1 state seems to be the cause of the glitches because the scroll offset changes from 0 to 256 while his value is 00, hence changing banks.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I had many bugs in the Vblank bits, and I had a wait Vblank in my sprite DMA IO emulation.
Finally, instead of a profiler, I used the V line counter in bsnes+ debugger view, to look at how much time was used in the nes NMI routine.
I tried to enable the fast mode by writing in the register and setting the rom type on the header ($30). I do not know if it worked. Do anyone know how to test if a rom is recognised as fast rom?

The glitch seems to come from

Code: Select all

UpdateScreen -> WriteBufferToScreen -> WritePPUReg1 (code from the Smb disassmbly project n github)

               lda Mirror_PPU_CTRL_REG1  ;load mirror of $2000,
               ora #%00000100            ;set ppu to increment by 32 by default
               bcs SetupWrites           ;if d7 of third byte was clear, ppu will
               and #%11111011            ;only increment by 1
SetupWrites:   jsr WritePPUReg1          ;write to register
               pla                       ;pull from stack and shift to left again
               asl
               bcc GetLength             ;if d6 of thir
It writes $11 in the PPUCTRLREG1 during the Vblank and this "pushes" the Score bar to the left. Because the bit zero is the higher bit of HScroll. It is weird, maybe the screen is disabled on the nes when calling this Updatescreen routine?

However, it seems to be just enough in the SNES in terms of CPU power to run the emulation. I hope to remove this scrolling glitch.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

Hi
I looked at the effect on the PPUCTRLREG1 in Fceux and it behaves the same way (I did not notice the scanline number until today).
Glitch case:

Code: Select all

Write to PPUCTRL1                                              Nes Line            Snes Line
$10 @ Begining of NMI interrupt routine at line                  239                 246 (because of prior dma copy for sprites and BG)
$11 @ UpdateScreen glitch                                        239                 290
$11 @ Sprite0 Hit                                                 31                  31
$91 @ End of NMI routine                                          95                  88 (slightly ahead)
Normal case:

Code: Select all

Write to PPUCTRL1                                              Nes Line            Snes Line
$10 @ Begining of NMI interrupt routine at line                  239                 246 (because of prior dma copy for sprites and BG)
$11 @ Sprite0 Hit                                                 31                  31
$91 @ End of NMI routine                                          95                  88 (slightly ahead)
The Update screen sets the bank to $2400 at line 290 and therefore the score bar does not show up. The score bar is in bank zero, PPUCTRLREG1 should be $10 like configured at line 239 at the begining of NMI.
However the problem does not show up on the NES. It does not disapear.

Can anyone explain why the score bar does not blink on the nes?
It is like the PPUCTRL1 reg write is ignored when the rendering is disabled in PPUCTRL2.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: upernes

Post by tepples »

Are there writes to $2006 after the write to $2000? It'd overwrite the value set in $2000.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

It deos both writing in PPU_ADDRESS and WritePPUReg1. It enters in the routine at the UpdateScreen label.

Code: Select all

WriteBufferToScreen:
               sta PPU_ADDRESS           ;store high byte of vram address
               iny
               lda ($00),y               ;load next byte (second)
               sta PPU_ADDRESS           ;store low byte of vram address
               iny
               lda ($00),y               ;load next byte (third)
               asl                       ;shift to left and save in stack
               pha
               lda Mirror_PPU_CTRL_REG1  ;load mirror of $2000,
               ora #%00000100            ;set ppu to increment by 32 by default
               bcs SetupWrites           ;if d7 of third byte was clear, ppu will
               and #%11111011            ;only increment by 1
SetupWrites:   jsr WritePPUReg1          ;write to register   <----------------- It writes $11 or $15 here
               pla                       ;pull from stack and shift to left again
               asl
               bcc GetLength             ;if d6 of third byte was clear, do not repeat byte
               ora #%00000010            ;otherwise set d1 and increment Y
               iny
GetLength:     lsr                       ;shift back to the right to get proper length
               lsr                       ;note that d1 will now be in carry
               tax
OutputToVRAM:  bcs RepeatByte            ;if carry set, repeat loading the same byte
               iny                       ;otherwise increment Y to load next byte
RepeatByte:    lda ($00),y               ;load more data from buffer and write to vram
               sta PPU_DATA
               dex                       ;done writing?
               bne OutputToVRAM
               sec          
               tya
               adc $00                   ;add end length plus one to the indirect at $00
               sta $00                   ;to allow this routine to read another set of updates
               lda #$00
               adc $01
               sta $01
               lda #$3f                  ;sets vram address to $3f00
               sta PPU_ADDRESS
               lda #$00
               sta PPU_ADDRESS
               sta PPU_ADDRESS           ;then reinitializes it for some reason
               sta PPU_ADDRESS
UpdateScreen:  ldx PPU_STATUS            ;reset flip-flop
               ldy #$00                  ;load first byte from indirect as a pointer
               lda ($00),y  
               bne WriteBufferToScreen   ;if byte is zero we have no further updates to make here
InitScroll:    sta PPU_SCROLL_REG        ;store contents of A into scroll registers
               sta PPU_SCROLL_REG        ;and end whatever subroutine led us here
               rts
We have the "reinitializes it for some reason" which looks dubious.
Usually the Mirror value of PPUCTRL1 (the copy in ram) is anded with $FE to remove the lower bit at the begining of the Vblank NMI routine. But here it simply writes it without masking.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I have found this code in the fceux source code (ppu.cpp).

Code: Select all

static DECLFW(B2006) {
	FCEUPPU_LineUpdate();
	PPUGenLatch = V;
	if (!vtoggle) {
		TempAddr &= 0x00FF;
		TempAddr |= (V & 0x3f) << 8;
		ppur._vt &= 0x07;
		ppur._vt |= (V & 0x3) << 3;
		ppur._h = (V >> 2) & 1;
		ppur._v = (V >> 3) & 1;
		ppur._fv = (V >> 4) & 3;
	} else {
		TempAddr &= 0xFF00;
		TempAddr |= V;
		RefreshAddr = TempAddr;
		DummyRead = 1;
		if (PPU_hook)
			PPU_hook(RefreshAddr);
		ppur._vt &= 0x18;
		ppur._vt |= (V >> 5);
		ppur._ht = V & 31;
		ppur.install_latches();
	}
	vtoggle ^= 1;
}
Where _h is the value of PPUCTRLREG1 bit 0 and V the byte written in $2006. It seems to tell that the bit 2 of this write will go to the bank select bit of PPUCTRL1.
Last edited by Patrick FR on Sun Jun 25, 2017 9:53 am, edited 1 time in total.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: upernes

Post by tepples »

Patrick FR wrote:It deos both writing in PPU_ADDRESS and WritePPUReg1. It enters in the routine at the UpdateScreen label.

Code: Select all

               lda Mirror_PPU_CTRL_REG1  ;load mirror of $2000,
               ora #%00000100            ;set ppu to increment by 32 by default
               bcs SetupWrites           ;if d7 of third byte was clear, ppu will
               and #%11111011            ;only increment by 1
SetupWrites:   jsr WritePPUReg1          ;write to register   <----------------- It writes $11 or $15 here
               ; [snipped several lines]
               lda #$3f                  ;sets vram address to $3f00
               sta PPU_ADDRESS
               lda #$00
               sta PPU_ADDRESS
               sta PPU_ADDRESS           ;then reinitializes it for some reason
               sta PPU_ADDRESS
We have the "reinitializes it for some reason" which looks dubious.
The going theory is that a programmer saw that leaving rendering off with the VRAM address pointed at $3F01-$3F1F caused that color to be sent to the composite output block instead of the color at $3F00. The programmer internalized a (wrong but close enough) model of the hardware in which the CGRAM had a separate address pointer, and writing $3F00 then $0000 initialized both the CGRAM address and the VRAM address, just as it (actually) has separate OAM and VRAM pointers.

In 1999, loopy discovered the skinny on why the last two writes on $2006 keep the status bar from flickering. Let me summarize:

Bits 1-0 of the value written to $2000 get copied into bits 11-10 of t, the top-left corner address. This address is used to reset vertical parts of v, the VRAM address, during the pre-render line's hsync pulse and the horizontal parts of v at the start of hblank. But a pair of writes to $2006 overwrites both t and v. Here, writing $0000 clears both t and v to 0, causing the PPU to read from the first nametable.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

Thanks tepples, this one was rough :D . The wiki page is detailed, it will help me fix this.
I am actually reworking the transition from patched prg to emulation, I will post as soon as it works.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

The score bar glitch problem is solved. I have found a speed problem causing a scrolling glitch, the code does not finish prior to Vblank End and it skips a frame wile the scrolling is at (0, 0). It recovers on the sprite 0 hit flag assertion on line 30 of the next frame.
It seems to happen during scrolling when colums are updated in the nametables, but I have seen that behaviour at the end of the map near the flag pole while not moving. An IO emulation routine may take a lot of time, it takes 35 lines to complete, maybe I should make an array of counters per IO call and reset it on vblank start. It would indicate what is called.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I checked the calls to IO emulation and sometimes I find 26 writes to PPUMEMDATA or 42 writes to PPUMEMADDR. The calls caused a missed vblank end.
My code was not in bank $80! Therefore the fast mode was not used. Now it is faster, and the 26 writes to PPUDATA do not cause a glitch. But the 42 PPUMEMADDR writes still cause a frame miss.
This function changes the address of the routine called when writing. Maybe I could put it in RAM and only call the code in the other bank when the toggle is 0 after writing. This will cut in half the number of calls to the bank 0, it will surely work here.
But I doubt that it is enough cycles to add sound to it, and also have the scrolling working but I will try it. The ratio between available cycles and IO call cost is low.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I moved part of "sta $2006" in ram, reducing from 42 calls to 21. The missing frame is still here, even if it is faster with the latest optimisation. It does not glitch when returning directly for all the 26 writes to PPU RAM, and therefore it is close to working on super mario bros. Maybe by using a 2KB jump table in the WRAM bank depending on the PPUADDR, it will spare a ton of cycles both in $2006 and $2007 emulation. But it looks like the snes has not enough power to emulate scrolling nes Games at 100%. I have cut a lot of cycles and optimisation makes the code less clear. I could get more cycles during Vblank by changing the background update to a fifo instead of a rolling DMA transfer. But all in all, if everything must be optimised, the development will be very slow. + On the console, it shows little rendering mistakes.
Anyway, it works with non scrolling games and Super Mario Bros can be played directly from the conversion.

I will take a look at what Memblers did with the APU emulation, in order to see how it could fit in, and I end here. It was interesting, it went further than what I expected but it is not an aesthetic conversion where everything fits (that was my goal if possible). However it is fast, despite the few missing frames it really feels like the NES. I am not going to squeeze the cycle count for each IO access. There is no room for a correct PPU emulation and therefore no room for improvement.
Thanks for your help and for the amazing emulators and their integrated debugger. It is impressive that even on such code, bsnes behaves like the real hardware.

edit:
I can't stop thinking... the super FX or a custom FPGA program sould be able to do PPU emulation. ...someday maybe.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I could not stop thinking about this cycle problem, and I found something really effective about PPUADDRESS. I use a table of routines in WRAM (4KB) to be able to jump to the routine of the current PPU address quickly. It removes the address increment routine cost. And I moved the PPUADDRESS IO write acces to the ram code. This removed a shitload of cycles, I gained 20 rendering lines, but it still glitches. 2 more lines needed.. :lol: I could solve it by moving the sprite0 hit routine to the ram code area (using the timers or the DMA to update the flags). This last change will remove the cost of 10 calls and spare 10 rendering lines. It will remove the glitch on SMB1.
However I doubt that I can add sound to it. Any opinion on that topic? The sound takes 5 or 6 calls per frame, the cost of bank-switching is already included in the current code. It goes to the sound routine but the routine does nothing.
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

Sound emulation could be fine if we can update the registers in the SPC700 when we want.
On Smb1 we have from line 80 to line 239 to do it. The timer will call the update routine on a given line from the romname.txt file, like SoundLine: 120.
LOLZ
Patrick FR
Posts: 78
Joined: Tue Jan 19, 2010 10:35 am
Location: Lyon

Re: upernes

Post by Patrick FR »

I managed to gain enough cycles to eradicate the glitches. And it was not easy but I used the IRQ to emulate the PPUSTATUS flag. It was not easy because it turns out the IRQ must not occur close to the NMI interrupt or the program bank will be lost. Unless the first instruction of the NMI interrupt is an sei but in that case you never have an IRQ and therefore it needs to add an variable telling that it is an IRQ during the NMI and then enabling interrupts...
I just insterted the PPUSTATUS update in the NMI :beer: .
The trace recording was very useful.

I need a way to update a column of tiles from RAM to VRAM? A line update is easy with the DMA, but I do not see how to do it with an increment of 32 to transfer a column. With HDMA?
Any idea?
Post Reply