Instruction timings

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
LilaQ
Posts: 25
Joined: Mon Sep 02, 2019 3:28 pm

Instruction timings

Post by LilaQ » Tue Sep 08, 2020 10:03 am

Hi guys,

I'm currently working on an SNES emulator. And now I have came across the information that instructions take different amounts of times, depending on the memory they access.

I have seen the table from no$, and if I'm not mistaken a memory access from fastROM takes up 6 master cycles and slowROM takes up 8 master cycles.

I can't really make sense of how / where this works together with the CPU cycles (I have developed a GB, NES and C64 emulator so far, but havn't come across this scenario). Maybe it's the language barrier - I hope someone of you can help me out here.

My CPU (which passes all of krom's/PeterLemon's tests) was done base on the page of undisbeliever: https://undisbeliever.net/snesdev/65816-opcodes.html
I implemented all of the (CPU-)cycles timing, including the extra cycles on branches, page-breaches, DP.i != 0 etc. so I would be able to know how long a CPU instruction takes in terms of CPU cycles.

But where / how does the memory access and the different timing come into place now? Can someone maybe give me 2-3 examples, so I can make sense of it?

I hope I was able to get my problem across, sometimes it's hard to put it into words.


Thank you so much in advance!

LilaQ

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Instruction timings

Post by none » Tue Sep 08, 2020 11:28 am

Theres a good document here that has a paragraph about it

https://wiki.superfamicom.org/timing
For specifics on particular instructions, see any generic 65816 doc. The GTE datasheet is particularly nice, as it identifies the CPU activity for each cycle of the instruction.

To determine the exact length of any CPU instruction, you must examine its behavior for each cycle, and count 6, 8, or 12 master cycles as appropriate.

The WAI instruction stops the processor. The processor restarts when either the /NMI or /IRQ line is low (or /RESET, but we don't care about that too much). It takes 12 master cycles (2 IO cycles) to end the WAI instruction, at which point the NMI or IRQ handler may actually be executed.
You can find a table that breaks down the instructions here:

https://archive.org/details/vl65c816dat ... 0/mode/1up

it has "Table 6 - Detailed Instruction Operation" in pp.267. Look at the "Data Bus" Column there, this should help you identify what kind of cycle you are dealing with.

creaothceann
Posts: 253
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Instruction timings

Post by creaothceann » Tue Sep 08, 2020 11:41 am

As far as I know...

See the memmap.txt file here for the memory regions and their speed(s): https://www.romhacking.net/?page=docume ... csearch=Go
timing.txt explains, well, the timing. The SNES mainboard clock (X1) runs at 5*7*9/88 * 6 * 1,000,000 Hz.

The mainboard CPU of the SNES is the Ricoh 5A22 (die shot). It has a WDC 65c816 core (die shot) and surrounds it with the circuits responsible for the $4xxx registers, e.g. the DMA engine.

The 5A22 controls the timing of the 65c816. Depending on what value the 65c816 puts on its address bus, it holds the 65c816's clock input line for 6, 8 or 12 X1 cycles. Then you can derive the 65c816's timing from that: a simple opcode like CLI takes two "CPU cycles", one for loading the opcode byte (6-12 "master cycles" depending on where it is loaded from) and one "internal operation" CPU cycle (6 master cycles). The 65c816 indicates via output pins if it's accessing the current byte as program code, data, or if it ignores the data bus. If you look at page 46 of the 65c816 data sheet you can see that CLI has an "IO" cycle, in which the VDA and VPA ("valid data/program address") output pins are zero. The 5A22 uses that info and doesn't add additional bus hold delay cycles to the bus access.


EDIT: was too slow :)
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

LilaQ
Posts: 25
Joined: Mon Sep 02, 2019 3:28 pm

Re: Instruction timings

Post by LilaQ » Tue Sep 08, 2020 12:19 pm

Ahhh, awesome, that explains it a lot better, now I can actually make sense of it. So, I could actually make a state machine of it.

How do the VDA and VPA pins correlate to the hold delay exactly though? From your comment I guess 0/0 always ends up in the current subinstruction lasting 6 master clock cycles, but e.g. the opcode fetch of the CLI instruction has 1/1 setting - what exactly happens here? Where does the 5A22 know from if it has to wait for 6 or 8 master clock cycles now?

Thanks in advance, you have already helped me a bunch! :)

creaothceann
Posts: 253
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Instruction timings

Post by creaothceann » Tue Sep 08, 2020 1:16 pm

LilaQ wrote:
Tue Sep 08, 2020 12:19 pm
How do the VDA and VPA pins correlate to the hold delay exactly though? From your comment I guess 0/0 always ends up in the current subinstruction lasting 6 master clock cycles, but e.g. the opcode fetch of the CLI instruction has 1/1 setting - what exactly happens here? Where does the 5A22 know from if it has to wait for 6 or 8 master clock cycles now?
That's where the bits in the address bus get involved... specific bits are combined in various ways. Here's the memory map converted to bits (corrections welcome):

Code: Select all

---------------------------------------------------------------------
bank            offset                  cycles          notes
---------------------------------------------------------------------
00xxxxxx        000xxxxxxxxxxxxx        8               WRAM mirror
                -----------------------------------------------------
                001xxxxxxxxxxxxx        6               $2000-$3FFF
                -----------------------------------------------------
                0100000xxxxxxxxx        12              $4xxx
                -----------------------------------------------------
                0100001000000000-       6               $42xx / bus A
                0101111111111111
                -----------------------------------------------------
                011xxxxxxxxxxxxx        8               $6000-$7FFF
                -----------------------------------------------------
                1xxxxxxxxxxxxxxx        8               $8000-$FFFF
---------------------------------------------------------------------
01000000-       xxxxxxxxxxxxxxxx        8               $40-$7D:xxxx
01111101
---------------------------------------------------------------------
0111111x        xxxxxxxxxxxxxxxx        8               WRAM
---------------------------------------------------------------------
#####################################################################
---------------------------------------------------------------------
10xxxxxx        000xxxxxxxxxxxxx        8               WRAM mirror
                -----------------------------------------------------
                001xxxxxxxxxxxxx        6               $2000-$3FFF
                -----------------------------------------------------
                0100000xxxxxxxxx        12              $4xxx
                -----------------------------------------------------
                0100001000000000-       6               $42xx / bus A
                0101111111111111
                -----------------------------------------------------
                011xxxxxxxxxxxxx        8               $6000-$7FFF
                -----------------------------------------------------
                1xxxxxxxxxxxxxxx        6/8             $8000-$FFFF
---------------------------------------------------------------------
11xxxxxx        xxxxxxxxxxxxxxxx        6/8             cartridge
---------------------------------------------------------------------
Each bit can be used as an input to a logic gate, and the outputs from several gates can in turn be used as inputs to other gates. (For example bit 7 of the bank address is somehow combined with bit 0 of register $420D.) How exactly the bus delays were implemented is unknown; we'd need someone who can take apart the layers of the chips and read the circuits.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

Post Reply