Timing the change in rendering status

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
NewRisingSun
Posts: 1510
Joined: Thu May 19, 2005 11:30 am

Timing the change in rendering status

Post by NewRisingSun »

I am confused about the timing of the effect of a change in the rendering status via $2001.

Suppose that rendering is active, then a game writes $00 to $2001. The way I understand the code in Nintendulator is this:
  1. (MemSet): Run one CPU cycle and thus three PPU cycles, rendering three pixels.
  2. (Memset): Call the $2xxx write handler, which is handled by the PPU emulation, immediately setting IsRendering to false.
  3. The next PPU clock will begin with IsRendering already set to false and all the behavioral changes that follow from that.
This will lead to several glitches in Micro Machines, but causes Crash Dummies to shake as it should with MMC3C.

As I understand Mesen, it basically does the same thing, but delays a change in the rendering status by one PPU cycle (variable _prevRendering). It is not completely delayed however, because some operations query _isRendering, while others query _prevRendering, which still confuses me greatly, and Sour has conceded in that thread that it was not really hardware-validated.

How does this work on real hardware? Obviously, the above step 2 cannot happen that way, because the PPU cannot actually set IsRendering to false "between PPU clocks". So what this assumes is that the PPU, on the next clock, first checks its CPU bus whether there is a write pending, processes the write, then proceeds to render the next pixel. Quite unrealistic, and explains why some amount of delay would be needed for this (and possibly other) state change(s). I tried modifying the code to have the write handler, instead of immediately processing the write, just "note it down", and have it processed on the next PPU clock after the pixel has been rendered and the various increments have been done. That fixes some of the glitches in Micro Machines, but not all, and worse, causes Crash Dummies to no longer shake with MMC3C.

So what is the real order of PPU operations within a cycle? Does it first render one pixel, then processes any pending CPU writes to its registers, then perform the various increments? Or does that all happen simultaneously, because different parts of the chip are involved, and what does that mean for the timing of when the change in rendering status actually takes place?
User avatar
Quietust
Posts: 1920
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: Timing the change in rendering status

Post by Quietust »

As far as I understand it, PPU I/O port writes are totally asynchronous - it's technically possible for a $2000 write to happen in the middle of a pixel, but in practice they'll be aligned because the CPU and PPU are running from the same master clock. Of course, some of the registers might be buffered based on the PPU's pixel clock, which is probably why Mesen has that one-pixel delay.

One thing I should point out: while Nintendulator is pretty accurate, it is NOT 100% accurate, so you shouldn't be using it as a reference guide. Much of its PPU behavior actually dates back to before Visual 2C02, and I haven't updated it to all be 100% cycle-accurate in every way (e.g. things like Sprite 0 Hit and Sprite Overflow might be off by a few pixels because I don't emulate most of the pipelining that the real PPU has).
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Timing the change in rendering status

Post by lidnariq »

I spent a bit of time walking through Visual2C02, finding that a huge amount of the behavior is asynchronous.

At some point, the "not_rendering" signal gets synchronized via a "right half dot" then "left half dot" pair of transmission gates (t6297 and t6807 respectively), ultimately forming "++/in_visible_frame_and_rendering" ... but between all the asynchronous places that "not_rendering" goes and all the synchronous places that "in_visible_frame_and_rendering" goes there's 60 different things to just start tracking down, nevermind whatever branching factor each of them have.
Quietust wrote:As far as I understand it, PPU I/O port writes are totally asynchronous - it's technically possible for a $2000 write to happen in the middle of a pixel, but in practice they'll be aligned because the CPU and PPU are running from the same master clock
Not only in the middle of a pixel, but they will always happen across multiple pixels. Because M2 on the 2A03E/G/H (E & up) is true for 7.5 master clock cycles, that's just under 2 pixels. Additionally, when the 2C02 does synchronize things, it does so using two different non-overlapping 5.4MHz clocks - what I was referring to as "left half dot" and "right half dot" in the 2000/2005/2006 shoot-through bug. And the first write may contain garbage, depending on CPU/PPU alignment.

For several alignments, the write will even happen over 3 pixels. (master clocks: 0.5 / 4 / 3 ; 1.5 / 4 / 1 2 ; 2.5 / 4 / 1. In only one alignment does a write occur during only 2 pixels )

On the letterless 2A03, where M2 is true for 9 master clock cycles, all alignments should result in taking 3 pixels.
Last edited by lidnariq on Fri Jan 18, 2019 11:16 pm, edited 1 time in total.
NewRisingSun
Posts: 1510
Joined: Thu May 19, 2005 11:30 am

Re: Timing the change in rendering status

Post by NewRisingSun »

Thanks for investigating this in Visual2C02. At some point, cycle-accurate emulation will probably require a systematic analysis of all the (non-)buffering and pipelining of the 2C02, to replace the ad-hoc delays and buffers that several of the "accurate" emulators now have.
Quietust wrote:One thing I should point out: while Nintendulator is pretty accurate, it is NOT 100% accurate, so you shouldn't be using it as a reference guide.
Correct; I am using it as a starting point. And I managed to modify it to pass all those blargg test ROMs that claim to check various things down to one PPU clock, as well as that Battletoads game, so I thought I was getting close.

So far, only Micro Machines seems to require the IsRendering delay, while several games rely on the 2006 delay, which is easier to implement.
ap9
Posts: 43
Joined: Sat Jun 01, 2013 11:55 am
Location: Maine, U.S.A.
Contact:

Re: Timing the change in rendering status

Post by ap9 »

Horizontal alignment in Micro Machines on hardware is pretty sound (no shaking), as seen in this capture video (presumably).

https://www.youtube.com/watch?v=BMpZznee74I

During cut screens the background color can be set to black mid-frame where it's only visible in horizontal blank area (something generally unimplemented in emulators), sometimes only a single small line, sometimes entirely black. So there's still a number of things to implement to get things genuinely accurate.

If only there was an unrolled list of all dots of what the PPU actually does per scanline at the master bus cycle or half-dot level, that would be a large text file but possibly worth it at quickly pinpointing problem areas.
wasdwdsa
Posts: 3
Joined: Sat Mar 26, 2022 2:01 am

Re: Timing the change in rendering status

Post by wasdwdsa »

Sorry to dig this thread out.
I want to confirm does Battletoads also rely on proper delay of rendering flag set (writing $2001)?

It seems this game sometimes (very seldom) writes $08 to $2001 at PPU pos [14,254].
If PPU rendering flag is set to true without any delay, there will be v-scroll increment at pos [14,256], this is problematic, because it will make the v-scroll of next scanline off-by-one, fail the sp0 hit check.

Code: Select all

CPU            : [       W $08 to $2001    ][
PPU [vpos,hpos]: [ 14,253][ 14,254][ 14,255][ 14,256][ 14,257]
                                            // v-scroll shall not increment here
The $2001 write at hpos 254 is originally found in emulator of myself. But then I also observed it in Mesen's Event Viewer:
Attachments
屏幕截图 2023-04-11 201029.png
My tiny cycle-accurate NES emulator https://github.com/wasdwdsa/tiny_nes
User avatar
James
Posts: 431
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

Re: Timing the change in rendering status

Post by James »

I also observed the same behavior in my emulator and fixed by delaying ppu updates following a register write by one cycle. Here's the comments I have in my code:

Code: Select all

//Battletoads will enable rendering at cycle 255 and break sprite 0 hit (because vertical update happens on cycle 256 and misaligns the background).
//Delaying the update by one cycle seems to fix this.  Unsure if this is actually how it works.
//Need to research how enabling things mid-screen affects PPU state.
//
//NMI at scanline 241, cycle 28
//enabled rendering at scanline 14, cycle 300
//enabled rendering at scanline 15, cycle 316
//NMI at scanline 241, cycle 29
//enabled rendering at scanline 14, cycle 271
//enabled rendering at scanline 15, cycle 287
//NMI at scanline 241, cycle 30
//enabled rendering at scanline 14, cycle 293
//enabled rendering at scanline 15, cycle 309
//NMI at scanline 241, cycle 25
//enabled rendering at scanline 14, cycle 255 <-- breaks here
I can't find my notes on this, so take it for what it's worth, but I did some additional investigation at the time (a couple of years ago?). I recall Mesen also delaying updates by one cycle and I think I also wrote a test ROM to confirm this behavior in visual 2c02.
get nemulator
http://nemulator.com
Post Reply