DMA operation in APU

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
Zepper
Formerly Fx3
Posts: 3220
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper » Sun Jul 18, 2010 7:56 pm

- How did you manage to get this test to pass? I mean, do you control the number of "eaten" CPU cycles by the DMC DMA?

User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Post by cpow » Mon Jul 19, 2010 2:24 am

Zepper wrote:- How did you manage to get this test to pass? I mean, do you control the number of "eaten" CPU cycles by the DMC DMA?
Yes. My PPU/CPU/APU are all cycle driven. Most emulators (Nestopia 1.40, for one) do the sprite DMA all in one big chunk (a quick for loop) on any write to $4014 and just let the cycle accounting mechanisms take care of the held-off CPU time. My PPU does the sprite DMA inline with its cycle emulation, one cycle at a time, for 513 or 514 cycles. A write to $4014 simply sets up the internal state, the cycle-by-cycle emulate method does the rest.

Thus, whenever a DMC DMA occurs, I know whether or not I'm in the middle of a sprite DMA or not. And, because the PPU actually does the 513/514 cycle DMA one cycle at a time, I know exactly where in the sprite DMA I am when the DMC DMA occurs. My CPU also provides the necessary "read/fetch, or write" information about the most recent cycle (useful, of course, only if the PPU isn't in the middle of a sprite DMA). From there it's just a few easy checks to see whether to have 3, 2, or 1 wait state cycles before the DMC DMA cycle.

I now pass both of the sprdma_and_dmc_dma test ROMs and have shifted attention over to trying to pass the $4016/DMC effect test ROMs.

My code is all up on Gitorious if you're curious.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Mon Jul 19, 2010 6:16 am

And, because the PPU actually does the 513/514 cycle DMA one cycle at a time, I know exactly where in the sprite DMA I am when the DMC DMA occurs.
The 2A03 does the DMA, not the PPU, as far as I know. I'm pretty sure the PPU just sees it as a series of $2004 writes every other CPU cycle.

User avatar
Zepper
Formerly Fx3
Posts: 3220
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper » Mon Jul 19, 2010 3:08 pm

NESICIDE wrote: From there it's just a few easy checks to see whether to have 3, 2, or 1 wait state cycles before the DMC DMA cycle.
- That's my question, what checks do you do? I couldn't locate anything in your code. :)

User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Post by cpow » Mon Jul 19, 2010 7:14 pm

Zepper wrote:
NESICIDE wrote: From there it's just a few easy checks to see whether to have 3, 2, or 1 wait state cycles before the DMC DMA cycle.
- That's my question, what checks do you do? I couldn't locate anything in your code. :)
@Zepper: Look for calls to C6502::STEALCYCLES and calls to C6502::DMA. Also look at CPPU::PPU(unsigned short addr, unsigned char data) for IOSPRITEDMA (where $4014 is written and the transfer is set up).

@blargg: I agree the DMA cycles originate from the 6502...I just left the implementation in the PPU because I originally just did the quick 256-byte for-loop copy there. I only do a DMA or STEALCYCLES call once every 3 or 3.2 PPU cycles. I'll look into moving it to be more correct.

EDIT: files are emulator/cnes6502.h and .cpp, emulator/cnesppu.h and .cpp, and emulator/cnesapu.h and .cpp.

I'm struggling with an APU/SDL sync problem at the moment tho so even though the APU is "flawless" according to blargg's tests it sounds like poop warmed over. I really hate the SDL callback interface. Not flexible *at all*!

User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal » Wed Dec 22, 2010 6:04 pm

Hey all, this is a bit of a bump but I think it's worth it. I recently posed my question #1 (from wayyyyy back in the original post that created this thread) to Kevtris over PM. We had a few back and forths, but this is the ultimate answer that he provided:
kevtris wrote:In my implementation, the sprite DMA takes precidence as it must, to prevent graphics corruption. The sample DMA is not stalled though and it will attempt to fetch a sample if it needs one, which happens to be sprite data at the time.

Be that as it may, I have not heard any audible artifacts from this happening even when it plays a byte of sprite data as sample data; probably because the samples are really noisy as-is, and 8 samples doesn't make much difference because the rest of the waveform is noisy as hell.

A test would be to play a continuous DPCM stream of 00 or ff to peg the DAC counter at one of the ends, then sprite DMA the opposite.

like if you DPCM 00h's sprite DMA ff's and watch the output. if it blips then I am doing it right. if not, either sprite fetches are getting screwed, or DPCM is being deferred or injecting extra reads (doubtful).
I think this is incredibly interesting. Essentially what Kevtris' is saying is that a DMC DMA operation will never be stalled when an OAM DMA operation is in progress - nor will an OAM DMA operation be stalled as a result of an in-progress DMC DMA operation. The DMC DMA (at least in his design) will just receive corrupted data (which will actually be sprite data) whenever an OAM DMA xfer needs to occur. Given how accurate Kevtris' emulator has been touted to be, especially for the most bizarre and picky games, I would expect that this is the correct implementation. Not only that but for some reason it just seems to me like the kind of thing Nintendo would do to save engineering time and money.

So it's pretty funny because the short, one-word answer to my original question #1 is: Neither. :)

Anyway, I just wanted to share that bit of info and make sure that it got out into the community. Kevtris said he was cool with me sharing his answer. I'm curious if anyone has any additional thoughts on it.

Pz!

User avatar
Dwedit
Posts: 4354
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit » Fri Jun 08, 2012 11:27 pm

Bump, has the behavior (corrupted DMC samples during sprite DMA) been confirmed?
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal » Tue Jun 12, 2012 1:11 am

Hiya dwedit,

I'm pretty positive that Kevtris' implementation is correct. I tested the following three different implementations using Duck Hunt:

Method 1: Stop OAM DMA while DMC DMA is occurring
Result: duck/dog sprites visually corrupted on screen whenever "Quack!/Arff!" sounds occurred

Method 2: Stop DMC DMA while OAM DMA is occurring
Result: heard audible distortion in "Quack!/Arff!" sounds

Method 3: Do not pause DMC DMA if OAM DMA occurs, but DMC receives sprite data while OAM DMA is occurring (this is kevtris' method)
Result: no audible or visual distortion whatsoever

I also tested it with Kung Fu and received similar (and worse) results. Kung Fu will eventually freeze if you use method 1. Likely because it uses the DMC channel much more frequently than Duck Hunt.

Hope that helps!

User avatar
Dwedit
Posts: 4354
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit » Tue Jun 12, 2012 1:15 am

After re-reading the thread, Blargg mentioned that DMC used two cycles if it happened during a 4014 sprite dma transfer. So does it interrupt the sprite DMA to perform the DMC read? What's it doing during those two cycles?

Why would stopping Sprite DMA to do a DMC read corrupt the sprites?

Also wondering because Nintendulator doesn't pass the timing tests for DMC reads during sprite DMA.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Post by cpow » Sun Jun 17, 2012 5:36 pm

Dwedit wrote:After re-reading the thread, Blargg mentioned that DMC used two cycles if it happened during a 4014 sprite dma transfer. So does it interrupt the sprite DMA to perform the DMC read? What's it doing during those two cycles?

Why would stopping Sprite DMA to do a DMC read corrupt the sprites?

Also wondering because Nintendulator doesn't pass the timing tests for DMC reads during sprite DMA.
I finally found the post I wanted to reply to with this topic post. I plan to implement what I'm observing from my Visual2A03 trials in my emulator. As you said, it does indeed interrupt the sprite DMA to perform the DMC read...for two cycles. The read/write/read beat of the sprite DMA is kept such that the DMC read occurs where the sprite read from the sprite memory page would otherwise occur. Then in the cycle that should be a write to 2004 there's just a CPU read from the PC value before the sprite DMA started. Then the sprite DMA picks up where it left off.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Mon Aug 26, 2013 6:46 am

Is source code available for the sprdma_and_dmc_dma test roms?

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Mon Aug 26, 2013 11:24 am

Looks like it was an off-the-cuff test. I've located the source and will package it the next time I'm on my old Mac.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Mon Aug 26, 2013 11:42 am

blargg wrote:Looks like it was an off-the-cuff test. I've located the source and will package it the next time I'm on my old Mac.
Would be appreciated. Could PM it too if you have it and feel like it - off-the-cuff state if better than nothing. :)

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Tue Aug 27, 2013 12:51 am

Haven't traced through the code to figure out what's going on yet, but for some reason the output ends up way off. I pass all the apu_test, apu_reset, and cpu_interrupts_v2 tests (and all the ppu_vbl_nmi tests except 07-nmi_on_timing.nes (off by 1-2 ticks - might be some analog thing going on there)).

I end up with

Code: Select all

tests/sprdma_and_dmc_dma/sprdma_and_dmc_dma.nes FAILED
T+ Clocks (decimal)
00 3768
01 3767
02 3768
03 3767
04 3768
05 3767
06 3766
07 3765
08 3766
09 3765
0A 3766
0B 3765
0C 3766
0D 3765
0E 3766
0F 3765

7461977F
SPRDMA and DMC DMA

Failed

tests/sprdma_and_dmc_dma/sprdma_and_dmc_dma_512.nes FAILED
T+ Clocks (decimal)
00 3766
01 3765
02 3766
03 3765
04 3766
05 3765
06 3766
07 3767
08 3768
09 3767
0A 3768
0B 3767
0C 3768
0D 3767
0E 3768
0F 3767

2EA11D4D
SPRDMA and DMC DMA

Failed
Here's the DMC code in case anyone can spot any obvious bugs/misunderstandings. The sample loading timing is ballparked at the moment and misses some corner cases, though I don't think by enough to warrant the huge error. channel_updated is for sample generation and can be ignored.

Registers:

Code: Select all

// $4010
void write_dmc_reg_0(uint8_t value) {
    static uint16_t const dmc_ntsc_periods[] =
      { 428, 380, 340, 320, 286, 254, 226, 214, 190, 160, 142, 128, 106, 84, 72, 54 };

    if (!(dmc_irq_enabled = value & 0x80)) {
        dmc_irq = false;
        update_irq_status();
    }
    dmc_loop_sample = value & 0x40;
    dmc_period      = dmc_ntsc_periods[value & 0x0F];
}

// $4011
void write_dmc_reg_1(uint8_t value) {
    unsigned const old_dmc_counter = dmc_counter;

    dmc_counter = value & 0x7F;

    if (dmc_counter != old_dmc_counter)
        channel_updated = true;
}

// $4012
void write_dmc_reg_2(uint8_t value) {
    dmc_sample_start_addr = 0x4000 | (value << 6);
}

// $4013
void write_dmc_reg_3(uint8_t value) {
    dmc_sample_len = (value << 4) + 1;
}
Clocking:

Code: Select all

void tick_apu() {
    ...
    if (--dmc_period_cnt == 0) {
        dmc_period_cnt = dmc_period;
        clock_dmc();
    }
    ...
}

static void clock_dmc() {
    if (dmc_bits_remaining > 0) {
        if (dmc_sample_buffer & 1) {
            if (dmc_counter < 126) {
                dmc_counter += 2;
                channel_updated = true;
            }
        }
        else
            if (dmc_counter > 1) {
                dmc_counter -= 2;
                channel_updated = true;
            }
        dmc_sample_buffer >>= 1;
        if (--dmc_bits_remaining == 0 && dmc_bytes_remaining > 0)
            load_dmc_sample_byte();
    }
}
Status:

Code: Select all

// $4015
uint8_t read_apu_status() {
    uint8_t const res =
      (dmc_irq                   << 7) |
      (frame_irq                 << 6) |
      (cpu_data_bus            & 0x20) | // Open bus
      ((dmc_bytes_remaining > 0) << 4) |
      ((noise_len_cnt       > 0) << 3) |
      ((tri_len_cnt         > 0) << 2) |
      ((pulse[1].len_cnt    > 0) << 1) |
       (pulse[0].len_cnt    > 0);
    frame_irq = false;
    update_irq_status();
    return res;
}

// $4015
void write_apu_status(uint8_t value) {
    ...

    // We need to clear the DMC IRQ before handling the DMC enable/disable in
    // case a one-byte sample is loaded below, which will immediately fire a
    // DMC IRQ
    dmc_irq = false;
    update_irq_status();

    // DMC enable bit. We model DMC enabled/disabled through the number of
    // sample bytes that remain (greater than zero => enabled).
    if (!(value & 0x10))
        dmc_bytes_remaining = 0;
    else {
        if (dmc_bytes_remaining == 0) {
            dmc_sample_cur_addr = dmc_sample_start_addr;
            dmc_bytes_remaining = dmc_sample_len;

            // If a sample byte is currently being played, the sample is
            // restarted only after it has finished
            if (dmc_bits_remaining == 0)
                load_dmc_sample_byte();
        }
    }

    ...
}
Sample loading:

Code: Select all

static void load_dmc_sample_byte() {
    // Timing: http://forums.nesdev.com/viewtopic.php?p=62690#p62690
    // TODO: Open bus?
    assert(dmc_bytes_remaining > 0);
    assert(dmc_bits_remaining == 0);

    dmc_sample_buffer = prg(dmc_sample_cur_addr);

    // We use tick() since the PPU as as well as the rest of the APU should
    // keep ticking during the fetch.
    // TODO: Is this done before or after the IRQ is generated? (Should be
    // invisible though.)
    unsigned const delay = doing_oam_dma ? 2 : doing_read ? 4 : 3;
    for (unsigned i = 0; i < delay; ++i) tick();

    dmc_sample_cur_addr = (dmc_sample_cur_addr + 1) & 0x7FFF;

    // Putting this after the delay ensures that we can't get a recursive
    // invocation of load_dmc_sample_byte(), since it can only called be called
    // through dmc_bits_remaining going from 1 to 0 while the CPU is stalled
    dmc_bits_remaining = 8;

    if (--dmc_bytes_remaining == 0) {
        if (dmc_loop_sample) {
            dmc_sample_cur_addr = dmc_sample_start_addr;
            dmc_bytes_remaining = dmc_sample_len;
        }
        else
            if (dmc_irq_enabled) {
                dmc_irq = true;
                update_irq_status();
            }
    }
}
Last edited by ulfalizer on Sat Aug 31, 2013 12:39 pm, edited 1 time in total.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Tue Aug 27, 2013 2:21 pm

Full sources + rom: sprdma_and_dmc_dma.zip

Post Reply