PPU Cycle 0 convention and frame timing

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
User avatar
LightStruk
Posts: 45
Joined: Sat May 04, 2013 6:44 am

PPU Cycle 0 convention and frame timing

Post by LightStruk » Wed Sep 16, 2020 9:48 am

I am revisiting PPU emulation and aiming for cycle accuracy this time. There are a few things tripping me up. Some questions:
  1. The NTSC timing diagram on the wiki says that actual pixel output is delayed 4 ticks. Does this mean that changes to the mask or emphasis bits appear on-screen 4 pixels earlier than one might expect?
  2. Since each tile is fetched up to 12 cycles before it appears on-screen, does this mean up to two fully-fetched tiles can be waiting to be drawn in some shift registers inside of the PPU? Do I need to worry about this queuing behavior for emulation?
  3. Where does the convention of calling the idle tick Cycle 0 come from? Is there a literal cycle counter inside the PPU that goes to 0 at that tick? Does it come from some piece of official developer documentation? Or is it merely an agreed-upon convention in the emudev scene, the wiki, and Visual 2C02?

lidnariq
Posts: 9689
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: PPU Cycle 0 convention and frame timing

Post by lidnariq » Wed Sep 16, 2020 11:13 am

1. Emphasis bits: No. They're delayed at most a pixel. I forget the exact timing. Mask bits: I really don't remember at all.

2. Yes. Otherwise your behavior around raster splits will be wrong, and your MMC3 timing will require a compensatory bodge.

3. If you load visual2c02, the X coordinates are stored in the 9 signals "hpos0" through "hpos8".

Unfortunately, there are two conventions: one based on what we assumed (x=0 is where the PPU fetches the 3rd background sliver and the skippable pixel is x=340) and one based on what we saw after we decapped the 2C02 (x=0 is the skippable pixel and x=1 is where the PPU fetches the 3rd background sliver).

Someone begged us to please fix the wiki to be consistent, but ... I don't even know where to start.

Fiskbit
Posts: 179
Joined: Sat Nov 18, 2017 9:15 pm

Re: PPU Cycle 0 convention and frame timing

Post by Fiskbit » Wed Sep 16, 2020 2:13 pm

1. The delay for settings in PPU MASK varies. Greyscale is known to be the fastest, and toggling rendering is the slowest. A test I posted here shows a 4 pixel difference when toggling greyscale and rendering at the same time (also shown in a screenshot there). Kitrinx, who implemented MiSTer's NES APU, speculated based on the transistor map that blue emphasis may be a dot faster than red and green emphasis; apparently red and green emphasis are held until the write completes, while greyscale and blue emphasis (and NMI enable and slave mode) are sent immediately.

I've also found that timings can apparently vary by CPU/PPU alignment, using a test where the length of a greyscale region differs by 1 dot across alignment changes.

3. There was some discussion on the nesdev Discord about the skipped dot a while back. According to Visual 2C02, hpos skips from 339 to 0 when skipping the dot, so 340 is the one that gets skipped, but the work that would normally be done on 340 is instead completed on dot 0. Pretty sure this is functionally identical to skipping dot 0, though.

lidnariq
Posts: 9689
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: PPU Cycle 0 convention and frame timing

Post by lidnariq » Wed Sep 16, 2020 2:42 pm

Fiskbit wrote:
Wed Sep 16, 2020 2:13 pm
Kitrinx, who implemented MiSTer's NES APU, speculated based on the transistor map that blue emphasis may be a dot faster than red and green emphasis; apparently red and green emphasis are held until the write completes, while greyscale and blue emphasis (and NMI enable and slave mode) are sent immediately.
Why!? (Because they ran out of space on the metal layer)

emph0: _io_db5 →(gate:write_2001_reg)→ 6256 →invert→ /emph0 →invert→ emph0 →(gate:/write_2001_reg)→ 6241 →invert→ /emph0_out
emph2: _io_db7 →(gate:write_2001_reg)→ 6254 →invert→ /emph2 →invert→ emph2 →invert→ /emph2_out

The difference should always be 1.5-ish dots long (the length of M2 high, minus whatever delay it takes for the PPU to drive the data bus). Red and green emphasis are latched on PPU de-selection, while blue emphasis is completely asynchronous. It should be possible to get some sparkle from repeatedly writing (e.g.) $80 to $2001, and open bus will clear the bit and then after the 2A03 drives the data bus, re-set the bit.

The output of pal_mono is ultimately synchronized on dot edges, so it should be possible to get the wrong value latched for a single pixel.

Post Reply