Some basic questions...

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Some basic questions...

Post by tepples »

Block copy DMA to PPU memory runs during vertical blanking or forced blanking. On NTSC with tall display turned off, vertical blanking is lines 225 through 261, or 37 lines, or 37 * 1324 = 48988 qpels, or 48988 / 8 = 6123 bytes. But you probably won't be able to use all those for a couple reasons:
  1. Interrupt latency
    It takes a few cycles to get into the interrupt handler, push all registers, and write to the block copy DMA start port ($420B).
  2. OAM and CGRAM
    OAM is 544 bytes, and CGRAM is 512 bytes. Writing to those if necessary takes time away from writing to VRAM.
  3. Non-contiguous transfers
    If there are more than a handful of DMA transfers, which is common when transferring tiles to various parts of VRAM, it takes vblank time to load addresses and lengths for each transfer into $43xx.
creaothceann
Posts: 610
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Some basic questions...

Post by creaothceann »

Señor Ventura wrote:I thought the DMA achieved transferring 6.14KB per frame cause it works during all the frame while the cpu works by its side.
They can't work at exactly the same time because there's only one address bus and one data bus over which bytes can be transferred.

CPU instructions need several CPU cycles to execute (up to 8 for complicated instructions), and for each CPU cycle the CPU core is stopped for 6, 8 or 12 master clock cycles. When DMA is initiated, the CPU core doesn't resume until the DMA has finished.
Last edited by creaothceann on Tue Nov 21, 2017 5:15 am, edited 2 times in total.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
User avatar
Señor Ventura
Posts: 233
Joined: Sat Aug 20, 2016 3:58 am

Re: Some basic questions...

Post by Señor Ventura »

Then, it could be said that DMA is practically useless if i use the cpu itself to transfer tiles at 3.58mhz?

If using DMA pauses CPU, and using CPU pauses DMA, instead of using DMA, Why not use the cpu to do that work? (anyway, it must be stopped, so, i don't see the sense).
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Some basic questions...

Post by lidnariq »

Señor Ventura wrote:Then, it could be said that DMA is practically useless if i use the cpu itself to transfer tiles at 3.58mhz?
The CPU cannot transfer tiles at 3.6MHz.

The CPU's fastest ability to move data from one location to another would be using the MVN/MVP instructions to do a block move, and that takes 7 CPU cycles—42 master cycles at fastest—per byte. DMA, on the other hand, always takes 8 master cycles per byte, plus a little fixed overhead for synchronization and the greater complexity of setup.
niconii
Posts: 219
Joined: Sun Mar 27, 2016 7:56 pm

Re: Some basic questions...

Post by niconii »

Yeah, the key here is that, although DMA runs on a 2.68 MHz clock, it takes only one DMA cycle per byte transferred. That's way faster than the CPU could ever transfer bytes, even when running at 3.58 MHz, because the CPU always takes multiple CPU cycles to transfer a byte.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Some basic questions...

Post by psycopathicteen »

Actually the fastest (non DMA) way to move bytes is by PEI which would be 3 cycles per byte, but that is only possible when moving stuff to and from bank 0, and not very useful for writing to a fixed register (like $2118), but you can use it to set up DMA.
User avatar
Señor Ventura
Posts: 233
Joined: Sat Aug 20, 2016 3:58 am

Re: Some basic questions...

Post by Señor Ventura »

I see, thank you to all :)

Well, it has been a disappointment that thing with the clock of the DMA... 6.14KB per frame is good too, but one Byte every 6 or 7 master clocks would have probably been one of the most awaited features of snes nowadays...
creaothceann
Posts: 610
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Some basic questions...

Post by creaothceann »

Or 2 master clock cycles, to keep up with the SA1 (10.738{63} MHz).
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Some basic questions...

Post by psycopathicteen »

You can do something like this to make sure sprite animations never go beyond a certain limit.

Code: Select all

ldx new_metasprite
cpx old_metasprite
beq +

lda metasprite_size
clc
adc dma_size
cmp #dma_limit+1
bcs +
sta dma_size

stx old_metasprite
jsr place_metasprite_on_dma_table

+;
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: Some basic questions...

Post by 93143 »

Señor Ventura wrote:using CPU pauses DMA
No. The only thing that can pause DMA is HDMA (since HDMA generally has to happen during HBlank to avoid graphical corruption, the DMA unit prioritizes it, and will pause a bulk DMA transfer until the HDMA transfer completes).

If you want to do a bulk DMA transfer, you have the CPU set up the registers and tell the DMA unit to start the transfer. Once this is done, the CPU gets one more cycle, and then the DMA unit takes over the bus, pausing the CPU. Once the DMA transfer is finished, the CPU starts up again.

I don't know where your head is at with this "using CPU" business. As a programmer, you are simply providing a sequence of instructions to the CPU. There is no master controller that can "use" the CPU; the CPU is the master controller, and it's always running unless the DMA unit stalls it or it encounters a wai or stp instruction (the former can be cancelled by an interrupt provided by the raster timer, the latter can only be cancelled by a reset). I'm pretty sure there's nothing in the system that could conceivably halt a DMA transfer partway through, other than the RESET line, and as far as I know the only thing that's hooked up to that is the actual physical reset button.
Señor Ventura wrote:6.14KB per frame
That's the absolute theoretical limit for a normal NTSC VBlank in 224-line mode; a real game won't hit that. (The corresponding numbers for PAL, which runs at 50 Hz and thus has 312 lines in a frame instead of 262, are 11.8 KB in 239-line mode or 14.2 KB in 224-line mode. Hopefully with the information you've been given, you can figure out why this is.)

Furthermore, from what tepples said earlier I'm wondering if VRAM is actually open during line 0 (the last blanked line before the frame starts) like I've been assuming. OAM certainly isn't, since during line 0 the PPU is loading sprite data from OAM for line 1. But I imagine CGRAM is open during that last line, since no actual rendering is going on...

Like I said earlier, if you use forced blanking, you can transfer more data. The usual way to do this is to letterbox the screen a bit; during the extra blank lines, the PPU memories are all open for writes just like during VBlank, so you can transfer more data than you could normally.

The real theoretical DMA limit per frame is over 43 KB for NTSC, or over 50 KB for PAL. But to achieve that you need to leave forced blank on for the whole frame, so you get a black screen... and of course since the DMA unit is hogging the bus the whole time, you get no computing done...
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Some basic questions...

Post by tepples »

Practically, 43K per frame on Super NES means you can turn rendering off for two frames and get your whole initial tile set and map loaded, reducing the visible impact of scene transitions. It's not like NES where you struggle to push 2K per frame in forced blanking, meaning you have to turn off rendering for 6 frames at a time to send 8K of tiles and 2K of maps. In addition, you can decompress most or all of the new scene to bank $7F before sending it to VRAM, unlike the NES where you have to decompress as you go, making the transition take even longer.
Post Reply