Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by tepples »

While trying to compose a reply to Proper way to emulate DMA transfers. I thought of something:

The Super NES master clock is a 945/44 = 21.47 MHz crystal oscillator, making each cycle 1000/(945/44) = 46.56 ns long. A DMA copy takes 8 master clocks (372.5 ns) per byte, putting separate addresses on the A and B address bus but sending the read value over one data bus and holding it there while the other device accepts the write. Cartridge ROM is on the A bus, and WRAM in the Control Deck can be accessed through either bus (but not both simultaneously). But WRAM is slow memory, and cartridge memory is also potentially slow (200 ns nominal). So how can DMA from slow ROM to slow WRAM work? Wouldn't the 200 ns response time of cartridge ROM plus the 200 ns hold time of WRAM exceed the 372 ns DMA cycle?
niconii
Posts: 219
Joined: Sun Mar 27, 2016 7:56 pm

Re: Timing of slow-to-slow DMA: How is 372 ns divided?

Post by niconii »

Access to WRAM is 2.6 MHz via the A bus, but 3.5 MHz through the IO port.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Timing of slow-to-slow DMA: How is 372 ns divided?

Post by tepples »

If WRAM is capable of fast access through the I/O port, then why does the 5A22's memory controller treat it as slow when accessed through the A bus? Is there some sort of internal prefetch to let it respond fast to I/O port access?
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by nocash »

tepples wrote:Wouldn't the 200 ns response time of cartridge ROM plus the 200 ns hold time of WRAM exceed the 372 ns DMA cycle?
The math isn't 200ns+200ns=400ns. The read cycle & write cycle do occur simultaneously, so two 200ns chips communicating with each other should only need 200ns in total. Ie. the data is transferred directly from ROM-to-RAM (not from ROM-to-CPU, and then from CPU-to-RAM).

Or the other way around: Theoretically, a 350ns ROM should be fast enough for the SNES... but, for some reason, Nintendo has specified 200ns for slow ROM (and 120ns for fast ROM). Maybe the read cycle & write cycle do only partly overlap each other, or maybe there's some pause between the separate bytes.
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Timing of slow-to-slow DMA: How is 372 ns divided?

Post by nocash »

tepples wrote:If WRAM is capable of fast access through the I/O port, then why does the 5A22's memory controller treat it as slow when accessed through the A bus? Is there some sort of internal prefetch to let it respond fast to I/O port access?
For the Why: That's a mystery.
For the Prefetch: I would have thought so, too, but I think there's no such thing (as far as I remember, I've tested it by setting the WRAM address via I/O ports (which whould load the prefetch), and then changed the addressed byte by writing a NEW value via non-I/O, and then read the byte via I/O... which returned the NEW value, ie. no OLD prefeteched value).
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by tepples »

nocash wrote:
tepples wrote:Wouldn't the 200 ns response time of cartridge ROM plus the 200 ns hold time of WRAM exceed the 372 ns DMA cycle?
The math isn't 200ns+200ns=400ns. The read cycle & write cycle do occur simultaneously, so two 200ns chips communicating with each other should only need 200ns in total. Ie. the data is transferred directly from ROM-to-RAM (not from ROM-to-CPU, and then from CPU-to-RAM).
But the memory controller still has to somehow tell the RAM that the value from ROM has settled on the RAM's data inputs and is ready to be written. According to this article, "setup time" is how long the data needs to be valid before the write enable is set, and "hold time" is how long the data needs to be valid after the write enable is set. If there's a buffer for the written value inside the WRAM, then the hold time is short, and actual completion of the write can overlap the next read from ROM. Otherwise, the data on the data bus has to remain stable for the whole write. It's sort of a latch vs. flip-flop thing: a latch's hold time is the entire time that write enable is asserted, while a flip-flop's hold time is only a few ns after write enable starts to be asserted.

I know the WRAM is a custom part in order to allow mirrored access and B bus access. But is there a datasheet for a similar contemporary COTS DRAM that specifies its setup and hold time characteristics?
Or the other way around: Theoretically, a 350ns ROM should be fast enough for the SNES... but, for some reason, Nintendo has specified 200ns for slow ROM (and 120ns for fast ROM).
That's because when a 6502 family CPU is running (forget DMA for a moment), it takes up to half a cycle for the new address to settle on the address bus. So on a fast cycle, you get 3 master clocks (140 ns) for the new address, followed by 3 master clocks for the memory to put a value on the data bus. On a slow cycle, you get 3 master clocks for the new address, followed by 5 master clocks (232 ns) for the memory to put a value on the data bus. Subtract a margin of error to get 120 and 200 ns.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by lidnariq »

tepples wrote:"setup time" is how long the data needs to be valid before the write enable is set, and "hold time" is how long the data needs to be valid after the write enable is set.
Brief pedantic moment: "setup time" is how long the data must be valid before the write condition ends (e.g. rising edge of /WR), and "hold time" is how long is must be valid after the write condition ends.
I know the WRAM is a custom part in order to allow mirrored access and B bus access. But is there a datasheet for a similar contemporary COTS DRAM that specifies its setup and hold time characteristics?
For no good reason I have an anti-static bag full of 30-pin SIMMs. One of them contains two KM44C256AP FPM DRAMs which should be contemporary; I had no difficulty finding its datasheet. It appears to have negligible setup time, and a very quick (20ns) hold time.
So on a fast cycle, you get 3 master clocks (140 ns) for the new address, followed by 3 master clocks for the memory to put a value on the data bus. On a slow cycle, you get 3 master clocks for the new address, followed by 5 master clocks (232 ns) for the memory to put a value on the data bus. Subtract a margin of error to get 120 and 200 ns.
Except that, in practice, the address from the CPU/DMA unit is stable within a single master clock. (Or at least, so says Poot36's logic analyzer traces)
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by nocash »

Hmmm, yeah, I've checked some random 200ns EPROM datasheet: http://www.ti.com/lit/ds/symlink/tms27c512.pdf and the 200ns refers to the setup time (counted from when address got stable). So the total access time might be as so:

140ns (or less) for getting address stable
200ns for the 200ns ROM/EPROM's setup time
20ns (or whatever) for the hold time needed by the target chip
plus maybe a few ns for tolerance, or in case they've specified/rounded 200ns because it was the closest commonly manufactured type.

Subtracting 3 master clocks from the total access time does make sense (in terms of explaining why nintendo specified 200ns/120ns for slow/fast ROMs). I don't know if all of that 3 master clocks are for the address, or if some are for hold time.

Are that logic analyzer traces for the SNES memory signals online somewhere, with some address lines and chip select etc?

Even if addresses seems to be stable after 1 master clock, Nintendo might have still designed the console to use 3 master clocks for getting perfectly stable addresses (with perfect HIGH and LOW levels, and working even if there's a lot of stuff connected to cartridge slot and expansion port or other worst-case conditions).
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by lidnariq »

nocash wrote:Are that logic analyzer traces for the SNES memory signals online somewhere, with some address lines and chip select etc?
Yeah, look through Poot36's thread. They're from a damged 2-1-3 console, but the only flaw seemed to be the PLB and PLD instructions destroy the stack pointer.

Also I have a few of S-PPU bus activity here and here—these are from a 1-1-1 console.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: Timing of SlowROM-to-RAM DMA: How is 372 ns divided?

Post by AWJ »

nocash wrote:Hmmm, yeah, I've checked some random 200ns EPROM datasheet: http://www.ti.com/lit/ds/symlink/tms27c512.pdf and the 200ns refers to the setup time (counted from when address got stable). So the total access time might be as so:

140ns (or less) for getting address stable
200ns for the 200ns ROM/EPROM's setup time
20ns (or whatever) for the hold time needed by the target chip
plus maybe a few ns for tolerance, or in case they've specified/rounded 200ns because it was the closest commonly manufactured type.

Subtracting 3 master clocks from the total access time does make sense (in terms of explaining why nintendo specified 200ns/120ns for slow/fast ROMs). I don't know if all of that 3 master clocks are for the address, or if some are for hold time.

Are that logic analyzer traces for the SNES memory signals online somewhere, with some address lines and chip select etc?

Even if addresses seems to be stable after 1 master clock, Nintendo might have still designed the console to use 3 master clocks for getting perfectly stable addresses (with perfect HIGH and LOW levels, and working even if there's a lot of stuff connected to cartridge slot and expansion port or other worst-case conditions).
There are some SNES CPU bus traces in this thread, and one with a bit of annotation here.

The 65816, like the original 6502, is based on a two phase clock. The clock input goes through some inverters to produce two non-overlapping clock outputs, phi1 and phi2. Basically, phi1 high is the phase when the address is being put on the address bus, and phi2 high is the phase when memory is expected to put data on the data bus, or when the CPU puts data on the bus during a write cycle. Likewise, inside the CPU some steps of each instruction cycle occur during phi1 and others occur during phi2.

In a regular 65816, the address and data busses are actually multiplexed: during phi1 the "data" bus holds the upper 8 bits of the address (i.e. the program or data bank) and external hardware is required to latch the complete 24-bit address. In the SNES CPU the bank address latching is built in, and the external data and address busses are completely separate. In fact there are two address busses (as you know), each having its own RD and WR signals. Basically, on the SNES CPU, the 65816 bus signals are being "translated" by on-chip hardware to something that more resembles a Z80 bus.

The 65816 core inside the SNES CPU more or less runs off of the master clock divided by 6, so phi1 and phi2 each last for 3 master clocks. However, when accessing "slow" address ranges, phi2 is stretched by 2 master clocks to a total of 5--presumably by the same on-chip hardware that's responsible for demultiplexing the bank address, differentiating A-bus and B-bus addresses and activating the appropriate RD or WR, and generating the RAMSEL and ROMSEL signals.

The important thing to note from the bus traces is that this on-chip address decoding seems to occur at the falling edge of phi1/rising edge of phi2--you can see that CPURD or CPUWR (or PARD or PAWR for B-bus addresses) aren't asserted until 3 master clocks in.
Post Reply