Suppose that rendering is active, then a game writes $00 to $2001. The way I understand the code in Nintendulator is this:
- (MemSet): Run one CPU cycle and thus three PPU cycles, rendering three pixels.
- (Memset): Call the $2xxx write handler, which is handled by the PPU emulation, immediately setting IsRendering to false.
- The next PPU clock will begin with IsRendering already set to false and all the behavioral changes that follow from that.
As I understand Mesen, it basically does the same thing, but delays a change in the rendering status by one PPU cycle (variable _prevRendering). It is not completely delayed however, because some operations query _isRendering, while others query _prevRendering, which still confuses me greatly, and Sour has conceded in that thread that it was not really hardware-validated.
How does this work on real hardware? Obviously, the above step 2 cannot happen that way, because the PPU cannot actually set IsRendering to false "between PPU clocks". So what this assumes is that the PPU, on the next clock, first checks its CPU bus whether there is a write pending, processes the write, then proceeds to render the next pixel. Quite unrealistic, and explains why some amount of delay would be needed for this (and possibly other) state change(s). I tried modifying the code to have the write handler, instead of immediately processing the write, just "note it down", and have it processed on the next PPU clock after the pixel has been rendered and the various increments have been done. That fixes some of the glitches in Micro Machines, but not all, and worse, causes Crash Dummies to no longer shake with MMC3C.
So what is the real order of PPU operations within a cycle? Does it first render one pixel, then processes any pending CPU writes to its registers, then perform the various increments? Or does that all happen simultaneously, because different parts of the chip are involved, and what does that mean for the timing of when the change in rendering status actually takes place?