CPU/PPU timing

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: CPU/PPU timing

Post by blargg » Mon Apr 22, 2013 6:48 pm

Well, here's what I've got for now, just figuring out some kind of test framework to make these measurements.

I've got code that synchronizes with VBL, then has a STA ABS instruction execute as late as possible just before NMI is vectored, and also have NMI just get vectored before that instruction (same code, does the before/after depending on CPU-PPU alignment after reset). On the scope is /NMI going low, and to show what the CPU is doing, A8. The STA ABS is at an address with A8 high, and the instruction just before it has A8 low (a JMP) to it. The STA ABS stores into $00xx, so A8 goes low during the store. During NMI vectoring, A8 is high. So you can see what's happening. I figured an address line would assert sooner than anything else. If anyone has better ideas, I'm open to them.

In the pictures there are 1250ns/division (250ns/subdivision dots). A=A8 B=/NMI. Sorry about leaving the vertical cursor bars up.

In the first one, /NMI goes low just too late, so the STA ABS executes. A8 goes high about 612ns after /NMI goes low. It does three fetches with A8 high (opcode, two bytes of address), 1780ns, then low for 560ns, then high for 3920ns, then low (the NMI handler code has A8 low).

In the second one, /NMI goes low just in time, so you get 3920ns of vectoring with A8=high, then A8=low for NMI handler.

(For triggering this, I run the scope in one-shot digital capture mode, and trigger it using EXT TRIG connected to the $4016 read strobe with a BIT $4016 a little before these events)
NMI after.JPG
NMI before.JPG
I've figured out how to capture the scope's RS-232 plotter output for better pictures (hp2xx FTW!), but need to get a DB-9 F-F null modem adapter before it's convenient.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: CPU/PPU timing

Post by ulfalizer » Mon Apr 22, 2013 10:04 pm

I just figured out that you can make reads start at an exact master cycle in Visual 2C02 (see the updated http://wiki.nesdev.com/w/index.php/Visual_2C02 for how. The io_ce node can be used to confirm where the read starts). The duty cycle is the same as the 2A03's, and it assumes the value is sampled at the end of the read cycle.

There seems to be four cycles where the read can start where the interrupt line and the VBL flag setting gets suppressed completely, so for those it's probably safe to say that no NMI would happen on the real thing. For other start cycles INT/VBL rises momentarily, and those might be trickier I guess. Maybe this info could be combined with some of the earlier stuff to figure out what's going on.

Brain dump below with results from Visual 2C02. The numbers at the top are PPU dots on the line after the post-render line. The 0's are the first phase of the read, the 1's the second (the duty cycle is 5/8).

Code: Select all

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
000000000111111111111111                                  CPU      Reads as clear
                          ----------------------------... VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
  000000000111111111111111                                CPU      Reads as clear
                          ----------------------------... VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
    000000000111111111111111                              CPU      Reads as clear
                            --------------------------... VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
      000000000111111111111111                            CPU      Reads as clear
                              ------------------------... VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
        000000000111111111111111                          CPU      Reads as clear
                                ----------------------... VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
          000000000111111111111111                        CPU      Reads as clear, INT and VBL completely suppressed 
                                                          VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
            000000000111111111111111                      CPU      Reads as clear, INT and VBL completely suppressed 
                                                          VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
              000000000111111111111111                    CPU      Reads as clear, INT and VBL completely suppressed
                                                          VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                000000000111111111111111                  CPU      Reads as clear, INT and VBL completely suppressed 
                                                          VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                  000000000111111111111111                CPU      Reads as set (note the short INT assertion)
                          -                               VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                    000000000111111111111111              CPU      Reads as set
                          ---                             VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                      000000000111111111111111            CPU      Reads as set
                          -----                           VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                        000000000111111111111111          CPU      Reads as set
                          -------                         VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                          000000000111111111111111        CPU      Reads as set
                          ---------                       VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                            000000000111111111111111      CPU      Reads as set
                          -----------                     VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                              000000000111111111111111    CPU      Reads as set
                          -------------                   VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                                000000000111111111111111  CPU      Reads as set
                          ---------------                 VBL/INT
Some kind of buffering is used in the PPU for the value "returned" by the read, which is why it can be read as set even though INT drops really quickly.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: CPU/PPU timing

Post by ulfalizer » Tue Apr 23, 2013 5:45 am

Taking just the ticks from the "preferred" alignment, you get the following:

Code: Select all

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
  000000000111111111111111                                CPU      Reads as clear
                          ----------------------------... VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
          000000000111111111111111                        CPU      Reads as clear, INT and VBL completely suppressed
                                                          VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                  000000000111111111111111                CPU      Reads as set (note the short INT assertion)
                          -                               VBL/INT

  -2      -1      0       1       2       3       4
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * MASTER
                          000000000111111111111111        CPU      Reads as set
                          ---------                       VBL/INT
I'm guessing that those correspond, in order, to 2 before, 1 before, at, and 1 after, as specified in http://wiki.nesdev.com/w/index.php/PPU_frame_timing and the vbl_nmi_timing tests. If the NMI input is sampled near the end of a CPU cycle (not sure if it is), then it makes sense that the last two would miss the NMI even though it momentarily rises. Later starting points would have parts of the NMI assertion overlapping the end of the previous CPU cycle however, and so would not be missed.

Edit: I'm assuming this alignment corresponds to your 742. Is that correct? Just realized it's a bit tricky to see whether this one would be 742 or 743 in http://i.imgur.com/nq78U8I.gif . :?

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: CPU/PPU timing

Post by blargg » Thu Apr 25, 2013 9:45 am

I've done several new test and am finishing them and going to post the readings. I'm forming a picture that isn't pleasing of the meta-situation here. I'm thinking there's no clear "this happens at this moment" situation, especially since the CPU can be at any master clock offset with respect to the PPU. That's less than 50ns, so a few gate propagation delays can mean the difference between it being read on one cycle or another. So because of buffers/counters there are between the clock and what the CPU reads/writes, timing is spread across cycles and dots. I think this means that there won't be a clear "this happens on this cycle, that happens on that", unless you're emulating to master clocks and emulating all four CPU-PPU alignments. Even then, you'll have to implement delays, where something can happen but not be visible until a master clock later or so.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: CPU/PPU timing

Post by ulfalizer » Thu Apr 25, 2013 10:11 am

blargg wrote:I've done several new test and am finishing them and going to post the readings. I'm forming a picture that isn't pleasing of the meta-situation here. I'm thinking there's no clear "this happens at this moment" situation, especially since the CPU can be at any master clock offset with respect to the PPU. That's less than 50ns, so a few gate propagation delays can mean the difference between it being read on one cycle or another. So because of buffers/counters there are between the clock and what the CPU reads/writes, timing is spread across cycles and dots. I think this means that there won't be a clear "this happens on this cycle, that happens on that", unless you're emulating to master clocks and emulating all four CPU-PPU alignments. Even then, you'll have to implement delays, where something can happen but not be visible until a master clock later or so.
What I'm doing above is picking a particular alignment for which the behavior is known (742 I believe - would be nice to have this confirmed), checking the results you get for different PPU clock offsets near a point (or region perhaps) of interest with that alignment in Visual 2C02 (VBL setting and INT assertion in this case), and then doing some elimination to figure out what these offsets correspond to ("1 before", "at", "1 after", etc.). Of course you can only be completely sure by testing on the real hw, but I think you could still come up with some pretty good guesses.

Barring me having messed up, there doesn't seem to be any other way to assign 1 before, etc., with the 742 alignment that would be likely to produce the observed behavior, even taking propagation delays and such into account.

Edit: s/master clock offsets/PPU clock offsets/

Edit 2: Calling it "1 before", etc. is a bit arbitrary. It's not a point, as you say, and depends on CPU/PPU interaction around the region of interest. I'm basically just trying to figure out what happens when a read starts at PPU tick n for the 742 alignment at the moment.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: CPU/PPU timing

Post by blargg » Wed May 01, 2013 6:00 pm

OK, finally posting a lot of timing test results. Not complete by any measure, but if I don't post they might get lost in the shuffle of life. Pictures and some docs: nes-signal-timings.zip

The goal here is to get a clearer picture of when things happen and how these correspond to the various test ROMs and timings we see from a purely programming perspective.

* terms - lays out basic terms, the four alignments, why they aren't trivial, how we can synchronize to the PPU frame, and a common timing reference in software to use
* timing pictures - descriptions of all the scope pictures showing various timings
* CPU cycle timing - distillation of the timings into one diagram

One thing that is wanted is a picture of how CPU cycles correspong to PPU dots. It's still not clear when those begin.

I believe that for the CPU, it may be considered beginning/ending on the falling edges of PHI2. I've used the CPU address line changes as a 0 reference, since it's useful for showing which cycle is of interest. I measured PHI2's falling edge to be 60ns before the address line change, and have read that this might be the "proper" beginning a CPU cycle.

Do we have verification that /VBL goes low on the first dot of a frame? If so, then we have enough to form a clear picture of the timings for the four alignments.

The PPU apparently latches internally when reading from it, so the CPU sees the state of things around the time PPU /CE is asserted. For example, even though /VBL goes high during the read, D7 to the CPU doesn't change, indicating that the PPU has a latch for outputs when reading from it.

For what I'm calling the +0 clocks alignment (in the previously posted list, the last is +0 clocks, the next-to-last +1 clocks, etc.), we have CPU A8 going low for the access at +0ns, then at +163ns PPU /CE going low, presumably latching the VBL flag internally. If the CPU had read even one clock sooner, it wouldn't have found VBL set. So if VBL is set on the first dot, then we have a dot beginning around +163ns, and the CPU cycle at +0ns. A PPU dot is 186.24ns, so this is slightly less than a dot. There's probably a little propagation delay from the dot beginning to VBL getting set internally, so the dot probably begins a little sooner.

Code: Select all

   ns -------          11111111112222222222333333333344444444445555555555
      7654321012345678901234567890123456789012345678901234567890123456789
      _______                                                         ___
A0-A15_______\_______________________________________________________/___
             0                                                      559
      _                     _________________________________
PHI2   \___________________/                                 \___________
      -60                 135                               487
          ___________________                                     _______
PPU /CE                      \___________________________________/
                            163                                 517

dots 0clk[                 ][VBL set         ][                 ][
dots 1clk             ][VBL set         ][                 ][
dots 2clk         ][VBL set         ][                 ][                ]
dots 3clk    ][VBL set         ][                 ][                ]
cpu a8      ][              Read                                    ][
  phi2 [                    Read                              ][
So there are the probable timings, with the two interpretations of a CPU cycle shown.

Note that if we examine only CPU reads, it's irrelevant where the CPU cycle is, since all the reads occur at the same point. The important thing is that an alignment of +0 clocks corresponds to the CPU reading the VBL flag the earliest possible in the frame.

I haven't looked as much at writes. I'm guessing that they occur on the rising edge of the /CE line, once data lines have stabilized. SO that would put them about 352ns after reads, or about 7.6 clocks later, slightly less than 2 dots.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: CPU/PPU timing

Post by ulfalizer » Wed May 22, 2013 11:48 am

About to look into this some more. I poked around a bit in Visual 2A03/2C02. Here's some stuff that might be useful before I forget it:

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: CPU/PPU timing

Post by ulfalizer » Thu Jun 06, 2013 9:22 am

I've figured out roughly what goes on with the VBlank in the PPU that causes the weird reading behavior.

Inside the PPU, there are two key players when it comes to signal timing:
  1. Read/write signals like /r2002 and /w2001.
  2. The _io_ce signal.
The read/write signals are generated directly by stuffing the r/w line and the three address bus lines from the CPU into a decoder (around node 419 in Visual 2C02). The r/w line and the address bus is set at the beginning of a read/write CPU cycle, so the read/write signals will have the corresponding values without delay as soon as the read or write starts (ignoring propagation delays - not sure if they're significant here).

The _io_ce line comes from the address decoder (the chip marked "LS139" in http://wiki.nesdev.com/w/images/f/f3/Neswires.jpg). The address decoder is set up so that the chip enable signals are generated during the second, high phase of M2 (it basically has an "AND M2" condition on all the outputs, though some are inverted). The /DBE input to the PPU in the diagram corresponds to _io_ce, and is also inverted, so that _io_ce will be low during the second, high phase of M2. M2 has a modified non-50% duty cycle (see http://wiki.nesdev.com/w/index.php/CPU_ ... escription).

To summarize: For a read/write cycle from the CPU, the internal read/write signals in the PPU will get their values right away, while _io_ce will go low during the second (high) phase of M2.

Inside the PPU, the same signal, read_2002_output_vblank_flag, is used both to clear the VBlank flag and to hold the value of the read buffer (see the tutorial). While the read buffer is not held (closed off), it simply mirrors the VBlank flag. The relationship between the signals is

Code: Select all

read_2002_output_vblank_flag = /r2002 NOR _io_ce
This means that read_2002_output_vblank_flag goes high during the second phase of M2 during a read from $2002. Since this is the point where the value is held, we can make the following observation:
  • The value returned by the read is the value the VBlank flag has when the high phase of M2 starts.
To figure out the rest, there's one more piece needed: The signal that sets the VBlank flag (node 5807 - let's call it "set_vbl_flag") is high during the first PPU half-cycle (i.e., during pclk0) of vpos=241/hpos=1. This means that if a read ends before the end of set_vbl_flag, the VBlank flag will still get set and remain set. However, if a read ends after set_vbl_flag (e.g. in a case where the high phase of M2 completely overlaps set_vbl_flag), then the flag will be cleared during and after it is being set, suppressing the setting.

The NMI output is directly tied to the VBlank flag (it just has an additional "AND nmi_enabled" condition). Looking at the above, you can see how an NMI might be missed, as the high "clearing" phase of M2 might completely override set_vbl_flag, and thus the NMI.

It's a bit confusing, but clicking around a bit helps. I also put some notes (mostly for myself) at http://wiki.nesdev.com/w/index.php/User:Ulfalizer .

Sorry for not looking into the timing stuff you posted yet btw, but attacking the issue from two angles is probably a good idea at least. :)

Maybe read_2002_output_vblank_flag should be renamed to something like "read_2002_clear_vblank_flag_and_hold_value" by the way.

Post Reply