- The NTSC timing diagram on the wiki says that actual pixel output is delayed 4 ticks. Does this mean that changes to the mask or emphasis bits appear on-screen 4 pixels earlier than one might expect?
- Since each tile is fetched up to 12 cycles before it appears on-screen, does this mean up to two fully-fetched tiles can be waiting to be drawn in some shift registers inside of the PPU? Do I need to worry about this queuing behavior for emulation?
- Where does the convention of calling the idle tick Cycle 0 come from? Is there a literal cycle counter inside the PPU that goes to 0 at that tick? Does it come from some piece of official developer documentation? Or is it merely an agreed-upon convention in the emudev scene, the wiki, and Visual 2C02?
PPU Cycle 0 convention and frame timing
Moderator: Moderators
- LightStruk
- Posts: 45
- Joined: Sat May 04, 2013 6:44 am
PPU Cycle 0 convention and frame timing
I am revisiting PPU emulation and aiming for cycle accuracy this time. There are a few things tripping me up. Some questions:
Re: PPU Cycle 0 convention and frame timing
1. Emphasis bits: No. They're delayed at most a pixel. I forget the exact timing. Mask bits: I really don't remember at all.
2. Yes. Otherwise your behavior around raster splits will be wrong, and your MMC3 timing will require a compensatory bodge.
3. If you load visual2c02, the X coordinates are stored in the 9 signals "hpos0" through "hpos8".
Unfortunately, there are two conventions: one based on what we assumed (x=0 is where the PPU fetches the 3rd background sliver and the skippable pixel is x=340) and one based on what we saw after we decapped the 2C02 (x=0 is the skippable pixel and x=1 is where the PPU fetches the 3rd background sliver).
Someone begged us to please fix the wiki to be consistent, but ... I don't even know where to start.
2. Yes. Otherwise your behavior around raster splits will be wrong, and your MMC3 timing will require a compensatory bodge.
3. If you load visual2c02, the X coordinates are stored in the 9 signals "hpos0" through "hpos8".
Unfortunately, there are two conventions: one based on what we assumed (x=0 is where the PPU fetches the 3rd background sliver and the skippable pixel is x=340) and one based on what we saw after we decapped the 2C02 (x=0 is the skippable pixel and x=1 is where the PPU fetches the 3rd background sliver).
Someone begged us to please fix the wiki to be consistent, but ... I don't even know where to start.
Re: PPU Cycle 0 convention and frame timing
1. The delay for settings in PPU MASK varies. Greyscale is known to be the fastest, and toggling rendering is the slowest. A test I posted here shows a 4 pixel difference when toggling greyscale and rendering at the same time (also shown in a screenshot there). Kitrinx, who implemented MiSTer's NES APU, speculated based on the transistor map that blue emphasis may be a dot faster than red and green emphasis; apparently red and green emphasis are held until the write completes, while greyscale and blue emphasis (and NMI enable and slave mode) are sent immediately.
I've also found that timings can apparently vary by CPU/PPU alignment, using a test where the length of a greyscale region differs by 1 dot across alignment changes.
3. There was some discussion on the nesdev Discord about the skipped dot a while back. According to Visual 2C02, hpos skips from 339 to 0 when skipping the dot, so 340 is the one that gets skipped, but the work that would normally be done on 340 is instead completed on dot 0. Pretty sure this is functionally identical to skipping dot 0, though.
I've also found that timings can apparently vary by CPU/PPU alignment, using a test where the length of a greyscale region differs by 1 dot across alignment changes.
3. There was some discussion on the nesdev Discord about the skipped dot a while back. According to Visual 2C02, hpos skips from 339 to 0 when skipping the dot, so 340 is the one that gets skipped, but the work that would normally be done on 340 is instead completed on dot 0. Pretty sure this is functionally identical to skipping dot 0, though.
Re: PPU Cycle 0 convention and frame timing
Why!? (Because they ran out of space on the metal layer)Fiskbit wrote: ↑Wed Sep 16, 2020 2:13 pmKitrinx, who implemented MiSTer's NES APU, speculated based on the transistor map that blue emphasis may be a dot faster than red and green emphasis; apparently red and green emphasis are held until the write completes, while greyscale and blue emphasis (and NMI enable and slave mode) are sent immediately.
emph0: _io_db5 →(gate:write_2001_reg)→ 6256 →invert→ /emph0 →invert→ emph0 →(gate:/write_2001_reg)→ 6241 →invert→ /emph0_out
emph2: _io_db7 →(gate:write_2001_reg)→ 6254 →invert→ /emph2 →invert→ emph2 →invert→ /emph2_out
The difference should always be 1.5-ish dots long (the length of M2 high, minus whatever delay it takes for the PPU to drive the data bus). Red and green emphasis are latched on PPU de-selection, while blue emphasis is completely asynchronous. It should be possible to get some sparkle from repeatedly writing (e.g.) $80 to $2001, and open bus will clear the bit and then after the 2A03 drives the data bus, re-set the bit.
The output of pal_mono is ultimately synchronized on dot edges, so it should be possible to get the wrong value latched for a single pixel.