It is currently Sun Sep 15, 2019 7:13 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Fri Jan 18, 2019 11:50 am 
Offline

Joined: Thu May 19, 2005 11:30 am
Posts: 975
I am confused about the timing of the effect of a change in the rendering status via $2001.

Suppose that rendering is active, then a game writes $00 to $2001. The way I understand the code in Nintendulator is this:
  1. (MemSet): Run one CPU cycle and thus three PPU cycles, rendering three pixels.
  2. (Memset): Call the $2xxx write handler, which is handled by the PPU emulation, immediately setting IsRendering to false.
  3. The next PPU clock will begin with IsRendering already set to false and all the behavioral changes that follow from that.
This will lead to several glitches in Micro Machines, but causes Crash Dummies to shake as it should with MMC3C.

As I understand Mesen, it basically does the same thing, but delays a change in the rendering status by one PPU cycle (variable _prevRendering). It is not completely delayed however, because some operations query _isRendering, while others query _prevRendering, which still confuses me greatly, and Sour has conceded in that thread that it was not really hardware-validated.

How does this work on real hardware? Obviously, the above step 2 cannot happen that way, because the PPU cannot actually set IsRendering to false "between PPU clocks". So what this assumes is that the PPU, on the next clock, first checks its CPU bus whether there is a write pending, processes the write, then proceeds to render the next pixel. Quite unrealistic, and explains why some amount of delay would be needed for this (and possibly other) state change(s). I tried modifying the code to have the write handler, instead of immediately processing the write, just "note it down", and have it processed on the next PPU clock after the pixel has been rendered and the various increments have been done. That fixes some of the glitches in Micro Machines, but not all, and worse, causes Crash Dummies to no longer shake with MMC3C.

So what is the real order of PPU operations within a cycle? Does it first render one pixel, then processes any pending CPU writes to its registers, then perform the various increments? Or does that all happen simultaneously, because different parts of the chip are involved, and what does that mean for the timing of when the change in rendering status actually takes place?


Top
 Profile  
 
PostPosted: Fri Jan 18, 2019 5:26 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1476
As far as I understand it, PPU I/O port writes are totally asynchronous - it's technically possible for a $2000 write to happen in the middle of a pixel, but in practice they'll be aligned because the CPU and PPU are running from the same master clock. Of course, some of the registers might be buffered based on the PPU's pixel clock, which is probably why Mesen has that one-pixel delay.

One thing I should point out: while Nintendulator is pretty accurate, it is NOT 100% accurate, so you shouldn't be using it as a reference guide. Much of its PPU behavior actually dates back to before Visual 2C02, and I haven't updated it to all be 100% cycle-accurate in every way (e.g. things like Sprite 0 Hit and Sprite Overflow might be off by a few pixels because I don't emulate most of the pipelining that the real PPU has).

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
PostPosted: Fri Jan 18, 2019 5:42 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8557
Location: Seattle
I spent a bit of time walking through Visual2C02, finding that a huge amount of the behavior is asynchronous.

At some point, the "not_rendering" signal gets synchronized via a "right half dot" then "left half dot" pair of transmission gates (t6297 and t6807 respectively), ultimately forming "++/in_visible_frame_and_rendering" ... but between all the asynchronous places that "not_rendering" goes and all the synchronous places that "in_visible_frame_and_rendering" goes there's 60 different things to just start tracking down, nevermind whatever branching factor each of them have.

Quietust wrote:
As far as I understand it, PPU I/O port writes are totally asynchronous - it's technically possible for a $2000 write to happen in the middle of a pixel, but in practice they'll be aligned because the CPU and PPU are running from the same master clock
Not only in the middle of a pixel, but they will always happen across multiple pixels. Because M2 on the 2A03E/G/H (E & up) is true for 7.5 master clock cycles, that's just under 2 pixels. Additionally, when the 2C02 does synchronize things, it does so using two different non-overlapping 5.4MHz clocks - what I was referring to as "left half dot" and "right half dot" in the 2000/2005/2006 shoot-through bug. And the first write may contain garbage, depending on CPU/PPU alignment.

For several alignments, the write will even happen over 3 pixels. (master clocks: 0.5 / 4 / 3 ; 1.5 / 4 / 1 2 ; 2.5 / 4 / 1. In only one alignment does a write occur during only 2 pixels )

On the letterless 2A03, where M2 is true for 9 master clock cycles, all alignments should result in taking 3 pixels.


Last edited by lidnariq on Fri Jan 18, 2019 11:16 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Jan 18, 2019 9:26 pm 
Offline

Joined: Thu May 19, 2005 11:30 am
Posts: 975
Thanks for investigating this in Visual2C02. At some point, cycle-accurate emulation will probably require a systematic analysis of all the (non-)buffering and pipelining of the 2C02, to replace the ad-hoc delays and buffers that several of the "accurate" emulators now have.

Quietust wrote:
One thing I should point out: while Nintendulator is pretty accurate, it is NOT 100% accurate, so you shouldn't be using it as a reference guide.
Correct; I am using it as a starting point. And I managed to modify it to pass all those blargg test ROMs that claim to check various things down to one PPU clock, as well as that Battletoads game, so I thought I was getting close.

So far, only Micro Machines seems to require the IsRendering delay, while several games rely on the 2006 delay, which is easier to implement.


Top
 Profile  
 
PostPosted: Mon Jan 21, 2019 8:34 pm 
Offline

Joined: Sat Jun 01, 2013 11:55 am
Posts: 36
Location: Maine, U.S.A.
Horizontal alignment in Micro Machines on hardware is pretty sound (no shaking), as seen in this capture video (presumably).

https://www.youtube.com/watch?v=BMpZznee74I

During cut screens the background color can be set to black mid-frame where it's only visible in horizontal blank area (something generally unimplemented in emulators), sometimes only a single small line, sometimes entirely black. So there's still a number of things to implement to get things genuinely accurate.

If only there was an unrolled list of all dots of what the PPU actually does per scanline at the master bus cycle or half-dot level, that would be a large text file but possibly worth it at quickly pinpointing problem areas.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group