DMA operation in APU

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Wed Aug 28, 2013 10:55 pm

blargg wrote:Full sources + rom: sprdma_and_dmc_dma.zip
Thanks!

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Sat Aug 31, 2013 10:40 am

Still a bit confused as to what this test is doing and expects to happen at different points.

For example, the test_ routine which gets called 10 times to test different sample loading locations starts like

Code: Select all

test_:
	jsr print_a
	pha
	eor #$FF
	pha
	
	setb $4012,<((dmc_sample-$C000)/$40)
	
	jsr pre_test
	
	jsr time_code_begin
	
	; Start DMC
	setb $4015,$10 ; fill sample buffer
	setb $4015,dma*$10
   ...
time_code_begin is just an alias for begin_dmc_timer, and dma is 1, meaning #$10 gets written to $4015 twice.

Does this code expect the first write to $4015 to immediately start a new sample? When I get there the last byte of the sample used for synchronization in begin_dmc_timer is still playing, meaning the first write to $4015 queues up another sample and the second write to $4015 has no effect at all.

Similary, there's a call to time_code_end (alias for end_dmc_timer) at the end of test_. end_dmc_timer starts with

Code: Select all

.align 64
end_dmc_timer:
	; Restart
	lda #$1F
	sta SNDCHN
	nop
	sta SNDCHN
	
	; Rough sync
	ldy #-$45
@coarse:
	nop
	lda #$10
	bne :+
:   dey
	bit SNDCHN
	bne @coarse
	
	; DO NOT write to memory. It affects timing.
	
	; Fine sync
	ldx #-$2
@sync:
   ....
When this routine gets called the sample has already finished playing with some margin. Is this expected? What's the significance of -$45 as a constant?

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Sat Aug 31, 2013 11:55 am

I believe this repeatedly does sprite DMA and has a DMC DMA read occur at various relative times, then shows how long the sprite DMA took, in cycles.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Sat Aug 31, 2013 12:02 pm

Yup, got that much. Trying to understand how the test code itself works and what assumptions it makes though since the output is so off despite passing all the APU tests.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Sat Aug 31, 2013 3:45 pm

I added more comments in sync_dmc.s and dmc_timer.s: sprdma_and_dmc_dma2.zip

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Mon Sep 02, 2013 2:19 pm

blargg wrote:I added more comments in sync_dmc.s and dmc_timer.s: sprdma_and_dmc_dma2.zip
Thanks.

Was occupied for a while, but I'll start looking into it now.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Tue Sep 03, 2013 9:06 pm

There is something I don't get in end_dmc_timer, which might also be a clue to what's wrong with my code/understanding. The coarse sync part looks as follows:

Code: Select all

; Returns in XA number of cycles elapsed since call to
; time_code_begin, MOD dmc_timer_modulo. Unreliable if
; result is dmc_timer_max or greater.
.align 64
end_dmc_timer:
	; The arbitrary starting X and Y values for the
	; loops merely set an adjustment added to the
	; final count.
	
	; Restart sample, which will immediately
	; finish since nothing's playing, then
	; start again which will ensure the flag
	; stays set until the second one begins.
	; This means that bit 4 of SNDCHN will be set
	; a fixed amount of time after begin_dmc_timer
	; completed.
	lda #$1F
	sta SNDCHN
	nop
	sta SNDCHN
	
	; Coarse sync
	; Get within a few cycles of when DMC sample finishes.
	; Keep a count since each iter is 16 cycles.
	ldy #-$45
@coarse:
	; 16 cycles/iter
	nop
	lda #$10
	bne :+
:       dey
	bit SNDCHN
	bne @coarse
	...
ldy #-$45 is the same as ldy #187, and y will be decremented every 16 cycles until one sample byte has played.

Since the period is set to 428, it should take roughly between 428*7 and 428*8 cycles to play one sample byte, depending on the current alignment with the DMC timer. However, (428*7)/16 = 187.25 and (428*8)/16 = 214, meaning y will either end up very small or underflow. For y << 4 | x to have the expected value of ~528 y would have to be about 32 though, which seems impossible.

Any ideas? Is this some off-by-one error? :|

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Wed Sep 04, 2013 12:30 am

OK, I've given in and done a careful timing analysis so we can see exactly how this works (and thankfully it all squares away):

Code: Select all

begin_dmc_timer:
    php
    jsr sync_dmc
    
sync_dmc:
        ...
        bit SNDCHN      ;   reads just as bit 4 is cleared
                        ;   new bit cycle begins at current high rate
        bne @sync       ; 3
                        ; -1
        pla             ; 4
        rts             ; 6

    pha                 ; 3
    lda #$00            ; 2
    sta $4010           ; 4 switch to lowest rate for remaining 7 bits
    pla                 ; 4
    
    nop                 ; 2
    plp                 ; 4
    rts                 ; 6

    ... code
    
    jsr end_dmc_timer   ; 6
    
end_dmc_timer:
    lda #$1F            ; 2
    sta SNDCHN          ; 4 starts immediately, thus causes DMA read now
                        ; 4 DMC DMA read
    nop                 ; 2
    sta SNDCHN          ; 4
    
    ldy #-$45           ; 2
@coarse:
    ; 16 cycles/iter
    nop                 ; 2
    lda #$10            ; 2
    bne :+              ; 3
:   dey                 ; 2
    bit SNDCHN          ; 4 reads 74+timed code cycles after earlier BIT SNDCHN
    bne @coarse         ; 3
The DMC timer's rate doesn't change until the next bit, so the first bit is 54 cycles, and the remaining 7 bits are 428 cycles. So SNDCHN bit 4 will be cleared 54+428*7=3050 cycles later. Thus if the loop finds it clear the first time through, it's been at minimum 3050 cycles since the earlier BIT SNDCHN, or 2976 cycles of user code between calls. 2976/16 = 186, so Y should be 186 if the coarse loop never iterates. -$45 is 187, and the DEY always executes at least once, so this gives the correct Y.

Code: Select all

                        ; -1
    ldx #-$2            ; 2
@sync:
    lda #$1F            ; 2
    sta SNDCHN          ; 4
                        ; 4 DMC DMA
    
    lda #179            ; 3402 delay
:   nop
    nop
    nop
    nop
    nop
    nop
    sec
    sbc #1
    bne :-
    
    inx                 ; 2
    lda #$10            ; 2
    bit SNDCHN          ; 4 reads 3424 cycles after BIT SNDCHN in coarse loop
                        ;   then every 3423 cycles
    beq @sync           ; 3
The fine sync loop reads at the same relative time to the DMC sample ending as the coarse loop, thus it always reads after the sample ended on the first iteration and loops back at least once. If the user code took 2976 cycles, it will find the sample not ended on the second iteration and exit the loop, leaving X at 0 as desired.

If the user code took one cycle more, the two loops would have read one cycle later, and the fine sync loop would have run one time more, leaving X one greater at 1.

I think I used negative initial values for X and Y because I tuned this empirically; I started out with 0 for X and Y, timed some zero-cycle user code, then took the resulting value, broke it into the high 8 bits and low 4 bits, negated these, and put them in Y and X, so that I'd get 0. I never did the above careful timing analysis, because in my experience it's time-consuming and can easily overlook something critical, while empirical testing and tuning is efficient and reliable since it involves writing edge-case tests to be find the stable range. It was fun to verify this by careful analysis though, so thanks for the opportunity.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Wed Sep 04, 2013 3:48 pm

But wouldn't the coarse loop in end_dmc_timer always iterate between (428*7)/16 = 187.25 and (428*8)/16 = 214 times, regardless of timing? end_dmc_timer starts a new sample, and that should set the number of remaining bits to 8, meaning it'll have to go through 8 DMC clocks before loading the final sample byte and clearing SNDCHN bit 4.

Iterating between 187 and 214 times makes y way out of range compared to the expected value, so I must be missing something.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Wed Sep 04, 2013 3:52 pm

end_dmc_timer starts a new sample, and that should set the number of remaining bits to 8, meaning it'll have to go through 8 DMC clocks before loading the final sample byte and clearing SNDCHN bit 4.
I think we've found the problem :)
APU DMC wrote:When an output cycle ends, a new cycle is started as follows:
* The bits-remaining counter is loaded with 8.
* If the sample buffer is empty, then the silence flag is set; otherwise, the silence flag is cleared and the sample buffer is emptied into the shift register.

When the timer outputs a clock, the following actions occur in order:
[...]
* The bits-remaining counter is decremented. If it becomes zero, a new cycle is started.

Nothing can interrupt a cycle; every cycle runs to completion before a new cycle is started.
If there's no sample byte loaded in time, it'll output eight silence bits, even if you get a sample byte loaded just after the first silence bit begins.

Just to further verify, when I change end_dmc_timer to load Y and X with zero before the loops and uncomment the jsr print_y/jsr print_x lines, and run this,

Code: Select all

jsr time_code_begin
jsr time_code_end
I get 45 02. So for zero user cycles, $45 needs to be subtracted from Y at that point, and $02 from X. So it's equivalent to just load Y with -$45 and X with -$02 before the loops.

(And can I get an amen! I've finally begun porting my console development programming setup to my Linux box, and was able to do this testing just now without having to power up my old Mac. Very convenient now.)

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Wed Sep 04, 2013 4:24 pm

Ahh, that explains it. Hadn't realized silence on the DPCM channel worked like that. :mrgreen:

So the bits remaining count is updated each DMC clock regardless of whether a sample is playing or not, and the DPCM can only transition from silent to playing at the boundary after a "silent sample byte"?

Thanks! :beer:

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: DMA operation in APU

Post by blargg » Wed Sep 04, 2013 4:49 pm

So the bits remaining count is updated each DMC clock regardless of whether a sample is playing or not, and the DPCM can only transition from silent to playing at the boundary after a "silent sample byte"?
Right.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Wed Sep 04, 2013 4:59 pm

A test for that would be nice if you ever update the apu_test tests. I pass all of them without implementing it.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Sat Sep 07, 2013 12:36 am

Started digging into the circuitry a bit, and it looks like the 2A03 handles rdy for OAM DMA and PCM reads in a pretty clever way. Rather than just pulling rdy low for a fixed safe "minimum" time, it first pulls it low and then waits for a CPU read. Once it sees a read, it knows rdy must have kicked in, and moves on to doing the transfer.

User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: DMA operation in APU

Post by ulfalizer » Wed Sep 11, 2013 6:02 am

In case it's helpful to future people pulling their hair out over this one:

To pass, it's important that the first $4014 write lands on an even cycle (and so only adds a single dummy cycle). Since the test synchronizes to the DMC, getting this right depends on the DMC clocks happening on even cycles too, which in turn depends on the power-on value of the DMC timer (the one counting down and getting reloaded with the period when it reaches 0). On the real thing, I think the power-on value is 428 (or the equivalent at least - it uses a linear feedback shift register), though only the even/oddness should matter for this test.

If you get 528,527,528,527,... instead of 527,528,527,528,..., that's likely the problem.

Post Reply