"The frame and NMIs" - buffering issues

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
sav
Posts: 9
Joined: Fri Dec 04, 2015 7:26 pm

"The frame and NMIs" - buffering issues

Post by sav »

Hey everyone,

I was wondering if someone could help me out with an issue I'm seeing.
I'm trying to implement a horizontal-scrolling platformer. I've followed "The frame and NMIs" and kept the game logic and NMI logic separate. I implemented the scrolling the following way:
- Every 16 times the scroll is incremented I buffer two columns of background sprites (i.e. one column of 16x16 "metatiles")
- Every 32 times the scroll is incremented I buffer one column of attribute data

When I run my implementation, the screen glitches out - it kind of "jumps" up and down every now and then. It doesn't happen when I turn the PAL emulation on, so it makes me think that I'm doing too much during NMI and it causes these issues. Pretty much every 32 times the scroll is incremented I have to write 30 + 30 + 8 = 68 bytes. I should be able to do that, right? "The frame and NMIs" mentions it should be possible to write 160 bytes?

Maybe my drawing routine is too complicated, but I've really tried to make it as simple as possible (not an expert on assembly though). Here's my code:

Code: Select all


; Input data has the following format:           
;   Byte 0  = length                             
;   Byte 1  = high byte of the PPU address       
;   Byte 2  = low byte of the PPU address        
;   Byte 3  = reserved for now
;   Byte 4+ = {length} bytes                     
;                                                
; Repeat until length == 0 is found.             
;
; Buffer starts at $0100, drawBuffer is declared as
;
;  .rsset $0100
; drawBuffer .rs 160

DoDrawing:

  LDX #$00 
  LDA $2002                   ; read PPU status to reset the high/low latch        
  
  .drawLoop:                    
    
    LDY drawBuffer, x         ; load the length of the data to the Y register
    BEQ ResetBuffer           ; length equal 0 means that the drawing is done  
    
    INX                       ; X = 1
    LDA drawBuffer, x         ; load the high byte of the target address
    STA $2006                 ; write the high byte to PPU
    
    INX                       ; X = 2
    LDA drawBuffer, x         ; load the low byte of the target address
    STA $2006                 ; write the low byte to PPU
    
    INX                       ; X = 3 (reserved for now)
        
    .setLoop:
      INX                     ; increment X so it points to the next byte
      LDA drawBuffer, x       ; load a byte of the data
      STA $2007               ; write it to PPU
      DEY                     ; decrement Y
      BNE .setLoop            ; if Y != 0 jump to .setLoop
      
    INX                       ; increment X so it points to the next segment      
    JMP .drawLoop             ; jump back to .drawLoop
 
ResetBuffer:

  CPX #$00
  BEQ DoDrawingDone           ; there was no data buffered
 
  LDA #$00 
  .resetBufferLoop:
    DEX                       ; decrement X so it points on the previous byte
    STA drawBuffer, x         ; reset to 0
    BNE .resetBufferLoop      ; X > 0 means there's more resetting to do

DoDrawingDone:  
  RTS
One thing that's not that great is that when buffering a column of attributes, I have to buffer them as 8 separate segments (since the address increments by 8). So in the case when I'm buffering two columns of sprites and one column of attributes, the buffer holds 108 bytes (34 bytes for sprite column 1, 34 bytes for sprite column 2, 8*5=40 bytes for attributes). But I don't know if that matters, or how it can be resolved.

I could probably fix this by buffering less stuff at one time - I could buffer one column of sprites when "scroll == (multiple of 16)", second column of sprites a frame before that, and the attributes a frame before that - but before I make it too complicated I just wanted to make sure I'm not missing anything obvious.

Any help is appreciated! :)
User avatar
dougeff
Posts: 3078
Joined: Fri May 08, 2015 7:17 pm

Re: "The frame and NMIs" - buffering issues

Post by dougeff »

I believe you should be able to transfer 100-130 ish bytes per vblank with this code. Yes.

Are you remembering to set the scroll after PPU updates, as PPU address and scroll affect each other.

Also, are you aware that the hardware stack is also in the $100 page? I've seen more than one official NES games where the stack fills / overflows from time to time.
nesdoug.com -- blog/tutorial on programming for the NES
sav
Posts: 9
Joined: Fri Dec 04, 2015 7:26 pm

Re: "The frame and NMIs" - buffering issues

Post by sav »

I've tried moving the buffer to $0600, didn't help.
As for updating PPU before the scroll - I believe I'm doing that. This is my NMI handler (it's basically the same as the one on the "The frame and NMIs" wiki page):

Code: Select all

NMI:
  PHA                       ; back up registers
  TXA                       
  PHA                       
  TYA                       
  PHA                       
                            
  LDA needDma               
  BEQ DmaDone               
    LDA #SPRITES_LOW_BYTE   ; do sprite DMA
    STA $2003               ; conditional via the 'needDma' flag
    LDA #SPRITES_HIGH_BYTE  
    STA $4014               
    DEC needDma             
  DmaDone:                  
                            
  LDA needDraw              ; do other PPU drawing
  BEQ DrawDone              ; conditional via the 'needDraw' flag
    BIT $2002               ; clear VBl flag, reset $2005/$2006 toggle
    JSR DoDrawing           ; draw the stuff from the drawing buffer
    DEC needDraw            
  DrawDone:                 
                            
  LDA needPpuReg            ; PPU register updates
  BEQ PpuRegDone            ; conditional via the 'needPpuReg' flag
    LDA soft2001            ; copy buffered $2000/$2001
    STA $2001               
    LDA soft2000            
    STA $2000               
                            
    BIT $2002               ; set the scroll
    LDA scroll              ; set horizontal scroll
    STA $2005                
    LDA #$00                ; no vertical scroll
    STA $2005               
    DEC needPpuReg          
  PpuRegDone:               
                     
  LDA #$00                  ; clear the sleeping flag so that WaitForFrame will exit
  STA sleeping              
                            
  PLA                       ; restore regs and exit
  TAY                       
  PLA
  TAX
  PLA
  RTI

  RTI
I guess the easiest way to fix would be to draw columns on "scroll == multiple of 8" - should fix it and won't be a huge code change. But still, I'm wondering why it doesn't work currently. I've noticed that if I only write 6 bytes of attributes instead of 8, it works (except for the fact that the colors on the bottom of the screen are wrong, obviously) - so it seems NMI is taking too long, but just a little bit :)
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: "The frame and NMIs" - buffering issues

Post by rainwarrior »

Code: Select all

    .setLoop:
      INX                     ; increment X so it points to the next byte
      LDA drawBuffer, x       ; load a byte of the data
      STA $2007               ; write it to PPU
      DEY                     ; decrement Y
      BNE .setLoop            ; if Y != 0 jump to .setLoop
This takes 13 cycles (or 14 if the BNE is unfortunate enough to cross a page-- you might want to verify this with an assert).
68 x 13 = 884 cycles. You have about 2200 total in NMI. OAM DMA also eats 512 on its own, and there's a bunch of other overhead in there. If you want to know exactly when $2005l gets set, just put a breakpoint on it in FCEUX or some other debugger that can display the current scanline etc.

After the "DoDrawing" routine you must always set the scroll, so I think it should bypass that test for "needPpuReg" (it's always needed) with a JMP, maybe? Are you certain the needPpuReg flag is always getting set whenever needDraw is set?

By the way "LDA #0, STA zp" takes the same number of cycles as "DEC zp", though it does save two bytes of code. I'm only mentioning it because they look more like booleans than counters, and it makes me wonder what happens if needDraw ended up getting incremented an extra time or something?
nostromo
Posts: 9
Joined: Tue Aug 12, 2014 11:25 pm
Location: Sonora, Mexico

Re: "The frame and NMIs" - buffering issues

Post by nostromo »

I think the resetbuffer rutine is something you could do after updating the ppu.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: "The frame and NMIs" - buffering issues

Post by rainwarrior »

nostromo wrote:I think the resetbuffer rutine is something you could do after updating the ppu.
Or do not at all. Why do you need to "reset" the buffer? It shouldn't be used again until needsDraw is set, right?
User avatar
dougeff
Posts: 3078
Joined: Fri May 08, 2015 7:17 pm

Re: "The frame and NMIs" - buffering issues

Post by dougeff »

I've tried moving the buffer to $0600, didn't help.
When I was talking about the $100 page, my worry isn't about a short term fix to your current problem. It's a potential existential threat to the long term integrity of the game.

Imaging, a small hidden bug causes the stack to start growing, but only after playing the game for a long period of time. It won't show up in short testing sessions. When it grows enough...colliding with your buffer, it will crash your game. Just be aware of that potential before you put something in the $100 page. Not that this is forbidden.
nesdoug.com -- blog/tutorial on programming for the NES
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: "The frame and NMIs" - buffering issues

Post by tepples »

I too put my buffers in $0100-$01BF or thereabouts. The only way I know to have a "silent" stack overflow like the one seen in Solar Wars (which had to be fixed for a mapper hack thereof to UNROM) is to try to JSR from your main loop into a subroutine but JMP back out without removing the return address from the stack. Otherwise, if you're just leaving values pushed with PHA, your game will fail fast when it tries to return to a garbage address.

But there are a couple defenses to that. One is to wrap your main loop inside a scoping structure, such as ca65's .proc, so that you're not tempted to JMP to internal labels. Another is to play-test your game in a debugging emulator with a write breakpoint on the stack's high water mark, such as $01C0. Watch $0100-$01FF in a debugging emulator's memory viewer (such as Debug > Hex Editor in FCEUX for Windows) to find the appropriate high water mark for your game.
sav
Posts: 9
Joined: Fri Dec 04, 2015 7:26 pm

Re: "The frame and NMIs" - buffering issues

Post by sav »

Thanks everyone for replying! To sum up:

1) So the scroll must always be set after nametable updates? Didn't know that. I've updated my code to do just that - however, now I'm also always writing $2000 and $2001 in that case, even if they haven't changed - that shouldn't be an issue, right (it could be easily fixed, but I'm not sure if it's worth it)? Here's the new NMI code (without pushing registers to the stack):

Code: Select all

  LDA needDma               
  BEQ DmaDone               
    LDA #SPRITES_LOW_BYTE   ; do sprite DMA
    STA $2003               ; conditional via the 'needDma' flag
    LDA #SPRITES_HIGH_BYTE  
    STA $4014
  DmaDone:                  
                            
  LDA needDraw              ; do other PPU drawing
  BEQ DrawDone              ; conditional via the 'needDraw' flag
    BIT $2002               ; clear VBl flag, reset $2005/$2006 toggle
    JSR DoDrawing           ; draw the stuff from the drawing buffer
    JMP DoPpuReg            ; after drawing PPU reg is required
  DrawDone:
                            
  LDA needPpuReg            ; PPU register updates
  BEQ PpuRegDone            ; conditional via the 'needPpuReg' flag
  DoPpuReg:
    LDA soft2001            ; copy buffered $2000/$2001
    STA $2001               
    LDA soft2000            
    STA $2000               
                            
    BIT $2002               ; set the scroll
    LDA scroll              ; set horizontal scroll
    STA $2005                
    LDA #$00                ; no vertical scroll
    STA $2005               
  PpuRegDone:
                     
  LDA #$00                  ; clear the sleeping flag so that WaitForFrame will exit, also clear all conditional flags
  STA needDma
  STA needDraw
  STA needPpuReg
  STA sleeping
2) Right, I don't need to zero-out the buffer. Duh. After removing that, the glitching is gone! Awesome. I guess that whole loop executed 68 times must have used like 600-700 cycles? That explains why it took too long. I'll still change it to draw a single column of background tiles every 8 times the scroll is incremented though (I guess it's better to do less, but more often).
3) About the stack - I guess I'll worry about overflows when they happen :) I don't know how much memory my game will be using, I may be able to move the buffer to a different place anyway

Thanks again for the help!
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: "The frame and NMIs" - buffering issues

Post by rainwarrior »

sav wrote:1) So the scroll must always be set after nametable updates? Didn't know that. I've updated my code to do just that - however, now I'm also always writing $2000 and $2001 in that case, even if they haven't changed - that shouldn't be an issue, right (it could be easily fixed, but I'm not sure if it's worth it)? Here's the new NMI code (without pushing registers to the stack):
You need to write $2000 and 2 x $2005 to completely set the scroll. The two $2005 writes are not enough to select the current nametable on their own.

Writing $2001 is irrelevant to scroll, but it's not like you need to optimize away one redundant write.
sav wrote:3) About the stack - I guess I'll worry about overflows when they happen :) I don't know how much memory my game will be using, I may be able to move the buffer to a different place anyway
For most NES games probably ~32 bytes of stack space is plenty. Every nested JSR is just 2 bytes. An IRQ just 3 (+3 more if you save A/X/Y). Unless you have very deep subroutine calls, or perhaps rely on some recursive structure for your game (e.g. a binary tree) your stack probably doesn't get very deep at all.

So, using the $100 page for extra storage can help if you're short on RAM, but also some people like to put NMI update data on the $100 page because you can temporarily swap out the stack pointer and use PHA instead of LDA $100, X + INX.
Post Reply