Optimizing scroll changes after MMC3 IRQs

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
Bregalad
Posts: 7889
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: Optimizing scroll changes after MMC3 IRQs

Post by Bregalad » Thu May 07, 2020 8:39 am

OK what about this IRQ ? 1 cycle faster than before for the part before $2006/2 write.

Code: Select all

IRQ :
   sta irq_zp_save_a    ;3
   stx irq_zp_save_x    ;6

   ldx irqnum           ;9
   lda scrolltableh,x   ;13
   sta $2006            ;17
   lda scrolltablel,x   ;21
   sta $2006            ;25
   dex
   bmi _stop_irqs:

   ; (code to reload MMC3 IRQ here)
   stx irqnum
   ldx irq_zp_save_x
   lda irq_zp_save_a
   rti
 
stop_irqs:
   ; (code to stop the chain of IRQ and set scroll/CHR bank for the HUD correctly)
   rti
There's absolutely no, ZERO reason to wait a whole scanline within your IRQ. If you can't be in time to write to $2006/2 before the automatic PPU scroll update, then at worst you should wait for it to happen and write to $2006/2 just after this update. With jittering this makes a small window to avoid because this makes your write happen sometimes after and sometimes before the scroll update, depening on the jittering. So if you don't manage to do "always before", then you should aim for "always after" style. But don't wait one scanline for the "always before of next scanline", that's a waste of CPU time.

This is valid no matter wherever you need to write to $2005 or not; but the only advantage of writing to $2005 is that you can do it twice less often because you scroll the fine scroll to 6 and it automatically wraps around from 7 to 0. This means twice less interrupts, but singificantly more overhead per interrupt. In the end it should be about the same in terms of performance.

User avatar
tokumaru
Posts: 11742
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing scroll changes after MMC3 IRQs

Post by tokumaru » Thu May 07, 2020 10:11 am

I think you're missing the point. The problem with the late scroll changes is not that they are late, it's that they're *inconsistently* late. If they happened at (nearly) the same cycle in the scanline, you could simply adjust the X scroll to compensate.

The real problem is that there's a latency of up to 7 CPU cycles when calling the IRQ handler, depending on the instruction the CPU is running (and how far into it the CPU is) when the interrupt request is made. That's 7 CPU cycles, or 21 pixels in NTSC. The PPU auto-increments the X scroll every 8 pixels, so it may end up incrementing several times in that 21-pixel window, and since the latency varies each time you can't predict what the X scroll should be at the time you update it, resulting in bad horizontal jitter in the scanlines where the scroll changes. Your theory about "always before" and "always after" doesn't hold up when auto-increments happen every 8 pixels and the latency is between 0 and 21 pixels! There's just no way to make the scroll change happen consistently between 2 specific auto-increments, no matter how far into the scanline.

This is not a problem at the beginning of hblank, where the PPU doesn't auto-increment the scroll for several PPU cycles (60 or so, more than enough to accommodate a 21-pixel jitter). It's just that with MMC3 IRQs you just can't catch that nice window of PPU inactivity, you have to wait for the next one.

lidnariq
Posts: 9491
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Optimizing scroll changes after MMC3 IRQs

Post by lidnariq » Thu May 07, 2020 10:14 am

tokumaru wrote:
Thu May 07, 2020 6:59 am
But then that introduces the problem of IRQ latency buildup, unless there's a mapper out there that automatically reloads the previous cycle count and keeps counting without interference from the programmer, but I'm not aware of any.
VRC4. That's why I suggested it.

User avatar
tokumaru
Posts: 11742
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing scroll changes after MMC3 IRQs

Post by tokumaru » Thu May 07, 2020 10:16 am

lidnariq wrote:
Thu May 07, 2020 10:14 am
VRC4. That's why I suggested it.
Really? Reading the wiki I was under the impression you had to manually write to a register in order to reload the counter.

lidnariq
Posts: 9491
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Optimizing scroll changes after MMC3 IRQs

Post by lidnariq » Thu May 07, 2020 10:17 am

Yes, but it has a prescaler that makes sure that each IRQ is at the same X position.

calima
Posts: 1150
Joined: Tue Oct 06, 2015 10:16 am

Re: Optimizing scroll changes after MMC3 IRQs

Post by calima » Thu May 07, 2020 10:30 am

ctrl-f "scaler" on the vrc4 wiki page gives zero hits...

lidnariq
Posts: 9491
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Optimizing scroll changes after MMC3 IRQs

Post by lidnariq » Thu May 07, 2020 10:33 am

Do we need to transclude [[VRC IRQs]] into the VRC4 page?

User avatar
tokumaru
Posts: 11742
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing scroll changes after MMC3 IRQs

Post by tokumaru » Thu May 07, 2020 10:38 am

I see. Using the VRC4 wouldn't be much of an improvement in PAL, though... right? I assume I'd have to set IRQs for 3 scanlines instead, busy wait the rest of the time (which's not a whole scanline, but still a lot), and manually reload the counter at the correct time because the next IRQ wouldn't fire at the same X in PAL. Is this correct?

lidnariq
Posts: 9491
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Optimizing scroll changes after MMC3 IRQs

Post by lidnariq » Thu May 07, 2020 11:05 am

Right. VRC4 won't work as-is on a 2A07. It will work fine on the PAL famiclones.

Even a perfect M2-based IRQ without slip will still have problems due to the noninteger M2 cycles per wanted IRQ.

Although the VRC4 can be configured to count raw M2 cycles instead of emulated scanlines, its counter has only 8 bits, so the widest spacing given a 2A07 is only 2.4 scanlines. But maybe for a PAL release, it'd be ok to configure it for exactly 213 M2, almost exactly two scanlines (fast by 0.4 px / IRQ), and maybe use the faster 2006/2006 scroll changes??

tepples
Posts: 22014
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Optimizing scroll changes after MMC3 IRQs

Post by tepples » Thu May 07, 2020 11:06 am

On PAL, if you are making your own cartridges with all new parts, you'll probably need to build a VRC4 clone on a CPLD. Then you can extend VRC4 to divide M2 by either 106.5625 (for PAL NES) or 113.6667 (for Famicom, NTSC NES, and PAL famiclones).

User avatar
Bregalad
Posts: 7889
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: Optimizing scroll changes after MMC3 IRQs

Post by Bregalad » Thu May 07, 2020 11:29 am

lidnariq wrote:
Thu May 07, 2020 11:05 am
Even a perfect M2-based IRQ without slip will still have problems due to the noninteger M2 cycles per wanted IRQ.
Nothing that can't be easily compensated for by software though.
The real problem is that there's a latency of up to 7 CPU cycles when calling the IRQ handler, depending on the instruction the CPU is running (and how far into it the CPU is) when the interrupt request is made. That's 7 CPU cycles, or 21 pixels in NTSC. The PPU auto-increments the X scroll every 8 pixels, so it may end up incrementing several times in that 21-pixel window, and since the latency varies each time you can't predict what the X scroll should be at the time you update it, resulting in bad horizontal jitter in the scanlines where the scroll changes. Your theory about "always before" and "always after" doesn't hold up when auto-increments happen every 8 pixels and the latency is between 0 and 21 pixels! There's just no way to make the scroll change happen consistently between 2 specific auto-increments, no matter how far into the scanline.
Oh... I though for some reason there was a window void of auto-increments both before and after the reload. I suppose I didn't do enough nesdev lately. So I suppose the following would work (if you go for the IRQ each 4 lines, add two more tables and two $2005 writes inbetween).

Code: Select all

IRQ :
   sta irq_zp_save_a    ;3
   stx irq_zp_save_x    ;6

   ldx irqnum           ;9
   lda scrolltableh,x   ;13
   sta $2006            ;17
                    ; 2 $2005 writes can optionally be added there if needed
   lda scrolltablel,x   ;21   
   sta $2006            ;25
   lda #$00
   sta $2005        ; reset the horizontal scroll that might have been screwed up if the IRQ interupted a long instruction
   sta $2005        ; (needed for latch alignment only; could be replaced with a bit $2002 at the start of the IRQ)
   dex
   bmi _stop_irqs:

   ; (code to reload MMC3 IRQ here)
   stx irqnum
   ldx irq_zp_save_x
   lda irq_zp_save_a
   rti
 
stop_irqs:
   ; (code to stop the chain of IRQ and set scroll/CHR bank for the HUD correctly)
   rti
   
Last but not least, it is seems rather simple to avoid 6502 instructions using 7 CPU cycles entierely :
  • LSR $abcd,X
  • ASL $abcd,X
  • ROL $abcd,X
  • ROR $abcd,X
  • INC $abcd,X
  • DEC $abcd,X
  • BRK
If you ever need to shift, rotate, increment or decrement an array put it in zero page. This would reduce possible jitter to 6 cycles or 18 pixels.
Last edited by Bregalad on Thu May 07, 2020 11:47 am, edited 2 times in total.

User avatar
tokumaru
Posts: 11742
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing scroll changes after MMC3 IRQs

Post by tokumaru » Thu May 07, 2020 11:45 am

Having this on cartridges is something I'd really like to happen, but I can't do it myself, not only because of my lack of experience with creating/building hardware but also because of the financial issues of doing it from the country I live in.

For this reason, I'm trying to stick to common cartridge configurations, taking advantage of widespread emulator support for digital distribution, and hopefully not destroying the chances of a future physical release. There's obviously the 4-Screen thing, which's not very common and doesn't work on many Famiclones, but I'm trying to find a balance between the mapper features I absolutely need in order to get this going and the current requirements for a physical release.

User avatar
tokumaru
Posts: 11742
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Optimizing scroll changes after MMC3 IRQs

Post by tokumaru » Thu May 07, 2020 11:54 am

Bregalad wrote:
Thu May 07, 2020 11:29 am
sta $2005 ; reset the horizontal scroll that might have been screwed up if the IRQ interupted a long instruction
This won't work. The course X scroll is buffered (in the temporary VRAM addresses register) and only applied at the beginning of hblank, when the X scroll is reset for the next scanline. Only the fine X scroll is changed immediately, and that isn't affected by auto-increments. But even if the coarse X scroll did change mid-scanline with $2005 writes, the timing of that write would still suffer from the IRQ latency problem.

User avatar
Bregalad
Posts: 7889
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: Optimizing scroll changes after MMC3 IRQs

Post by Bregalad » Thu May 07, 2020 1:07 pm

Oh I can't believe I forgot all this in my few years where I was a bit less involved in NESdev.

I suppose the only lasting option that wasn't discussed yet is to purposely clock the counter with 8x16 sprites and make your IRQ fire on those... this might get your counter clocked once more per line because of the sprties, and if you wait for an odd number of scanlines it will trigger during the sprite fetches, so hopefully eariler than normal, which allows to write to the scrolling regs immediately, before the line gets incremented.

Neededless to say this will make your game hard to emulate and fail in most emus. Also it means wasting part of your sprtie budget for this, and one less "real" sprite per scanline.
Last edited by Bregalad on Thu May 07, 2020 1:38 pm, edited 1 time in total.

lidnariq
Posts: 9491
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Optimizing scroll changes after MMC3 IRQs

Post by lidnariq » Thu May 07, 2020 1:22 pm

And cleverly using 8x16 sprites that way only helps if it's useful to trigger the IRQ somewhere between dots 260 and 324; the 34 fetches from background tiles will always have A12 at the same value and so can only clock the MMC3's IRQ counter the once.

Post Reply