Thanks!blargg wrote:Full sources + rom: sprdma_and_dmc_dma.zip
DMA operation in APU
Moderator: Moderators
Re: DMA operation in APU
Re: DMA operation in APU
Still a bit confused as to what this test is doing and expects to happen at different points.
For example, the test_ routine which gets called 10 times to test different sample loading locations starts like
time_code_begin is just an alias for begin_dmc_timer, and dma is 1, meaning #$10 gets written to $4015 twice.
Does this code expect the first write to $4015 to immediately start a new sample? When I get there the last byte of the sample used for synchronization in begin_dmc_timer is still playing, meaning the first write to $4015 queues up another sample and the second write to $4015 has no effect at all.
Similary, there's a call to time_code_end (alias for end_dmc_timer) at the end of test_. end_dmc_timer starts with
When this routine gets called the sample has already finished playing with some margin. Is this expected? What's the significance of -$45 as a constant?
For example, the test_ routine which gets called 10 times to test different sample loading locations starts like
Code: Select all
test_:
jsr print_a
pha
eor #$FF
pha
setb $4012,<((dmc_sample-$C000)/$40)
jsr pre_test
jsr time_code_begin
; Start DMC
setb $4015,$10 ; fill sample buffer
setb $4015,dma*$10
...
Does this code expect the first write to $4015 to immediately start a new sample? When I get there the last byte of the sample used for synchronization in begin_dmc_timer is still playing, meaning the first write to $4015 queues up another sample and the second write to $4015 has no effect at all.
Similary, there's a call to time_code_end (alias for end_dmc_timer) at the end of test_. end_dmc_timer starts with
Code: Select all
.align 64
end_dmc_timer:
; Restart
lda #$1F
sta SNDCHN
nop
sta SNDCHN
; Rough sync
ldy #-$45
@coarse:
nop
lda #$10
bne :+
: dey
bit SNDCHN
bne @coarse
; DO NOT write to memory. It affects timing.
; Fine sync
ldx #-$2
@sync:
....
Re: DMA operation in APU
I believe this repeatedly does sprite DMA and has a DMC DMA read occur at various relative times, then shows how long the sprite DMA took, in cycles.
Re: DMA operation in APU
Yup, got that much. Trying to understand how the test code itself works and what assumptions it makes though since the output is so off despite passing all the APU tests.
Re: DMA operation in APU
I added more comments in sync_dmc.s and dmc_timer.s: sprdma_and_dmc_dma2.zip
Re: DMA operation in APU
Thanks.blargg wrote:I added more comments in sync_dmc.s and dmc_timer.s: sprdma_and_dmc_dma2.zip
Was occupied for a while, but I'll start looking into it now.
Re: DMA operation in APU
There is something I don't get in end_dmc_timer, which might also be a clue to what's wrong with my code/understanding. The coarse sync part looks as follows:
ldy #-$45 is the same as ldy #187, and y will be decremented every 16 cycles until one sample byte has played.
Since the period is set to 428, it should take roughly between 428*7 and 428*8 cycles to play one sample byte, depending on the current alignment with the DMC timer. However, (428*7)/16 = 187.25 and (428*8)/16 = 214, meaning y will either end up very small or underflow. For y << 4 | x to have the expected value of ~528 y would have to be about 32 though, which seems impossible.
Any ideas? Is this some off-by-one error?
Code: Select all
; Returns in XA number of cycles elapsed since call to
; time_code_begin, MOD dmc_timer_modulo. Unreliable if
; result is dmc_timer_max or greater.
.align 64
end_dmc_timer:
; The arbitrary starting X and Y values for the
; loops merely set an adjustment added to the
; final count.
; Restart sample, which will immediately
; finish since nothing's playing, then
; start again which will ensure the flag
; stays set until the second one begins.
; This means that bit 4 of SNDCHN will be set
; a fixed amount of time after begin_dmc_timer
; completed.
lda #$1F
sta SNDCHN
nop
sta SNDCHN
; Coarse sync
; Get within a few cycles of when DMC sample finishes.
; Keep a count since each iter is 16 cycles.
ldy #-$45
@coarse:
; 16 cycles/iter
nop
lda #$10
bne :+
: dey
bit SNDCHN
bne @coarse
...
Since the period is set to 428, it should take roughly between 428*7 and 428*8 cycles to play one sample byte, depending on the current alignment with the DMC timer. However, (428*7)/16 = 187.25 and (428*8)/16 = 214, meaning y will either end up very small or underflow. For y << 4 | x to have the expected value of ~528 y would have to be about 32 though, which seems impossible.
Any ideas? Is this some off-by-one error?

Re: DMA operation in APU
OK, I've given in and done a careful timing analysis so we can see exactly how this works (and thankfully it all squares away):
The DMC timer's rate doesn't change until the next bit, so the first bit is 54 cycles, and the remaining 7 bits are 428 cycles. So SNDCHN bit 4 will be cleared 54+428*7=3050 cycles later. Thus if the loop finds it clear the first time through, it's been at minimum 3050 cycles since the earlier BIT SNDCHN, or 2976 cycles of user code between calls. 2976/16 = 186, so Y should be 186 if the coarse loop never iterates. -$45 is 187, and the DEY always executes at least once, so this gives the correct Y.
The fine sync loop reads at the same relative time to the DMC sample ending as the coarse loop, thus it always reads after the sample ended on the first iteration and loops back at least once. If the user code took 2976 cycles, it will find the sample not ended on the second iteration and exit the loop, leaving X at 0 as desired.
If the user code took one cycle more, the two loops would have read one cycle later, and the fine sync loop would have run one time more, leaving X one greater at 1.
I think I used negative initial values for X and Y because I tuned this empirically; I started out with 0 for X and Y, timed some zero-cycle user code, then took the resulting value, broke it into the high 8 bits and low 4 bits, negated these, and put them in Y and X, so that I'd get 0. I never did the above careful timing analysis, because in my experience it's time-consuming and can easily overlook something critical, while empirical testing and tuning is efficient and reliable since it involves writing edge-case tests to be find the stable range. It was fun to verify this by careful analysis though, so thanks for the opportunity.
Code: Select all
begin_dmc_timer:
php
jsr sync_dmc
sync_dmc:
...
bit SNDCHN ; reads just as bit 4 is cleared
; new bit cycle begins at current high rate
bne @sync ; 3
; -1
pla ; 4
rts ; 6
pha ; 3
lda #$00 ; 2
sta $4010 ; 4 switch to lowest rate for remaining 7 bits
pla ; 4
nop ; 2
plp ; 4
rts ; 6
... code
jsr end_dmc_timer ; 6
end_dmc_timer:
lda #$1F ; 2
sta SNDCHN ; 4 starts immediately, thus causes DMA read now
; 4 DMC DMA read
nop ; 2
sta SNDCHN ; 4
ldy #-$45 ; 2
@coarse:
; 16 cycles/iter
nop ; 2
lda #$10 ; 2
bne :+ ; 3
: dey ; 2
bit SNDCHN ; 4 reads 74+timed code cycles after earlier BIT SNDCHN
bne @coarse ; 3
Code: Select all
; -1
ldx #-$2 ; 2
@sync:
lda #$1F ; 2
sta SNDCHN ; 4
; 4 DMC DMA
lda #179 ; 3402 delay
: nop
nop
nop
nop
nop
nop
sec
sbc #1
bne :-
inx ; 2
lda #$10 ; 2
bit SNDCHN ; 4 reads 3424 cycles after BIT SNDCHN in coarse loop
; then every 3423 cycles
beq @sync ; 3
If the user code took one cycle more, the two loops would have read one cycle later, and the fine sync loop would have run one time more, leaving X one greater at 1.
I think I used negative initial values for X and Y because I tuned this empirically; I started out with 0 for X and Y, timed some zero-cycle user code, then took the resulting value, broke it into the high 8 bits and low 4 bits, negated these, and put them in Y and X, so that I'd get 0. I never did the above careful timing analysis, because in my experience it's time-consuming and can easily overlook something critical, while empirical testing and tuning is efficient and reliable since it involves writing edge-case tests to be find the stable range. It was fun to verify this by careful analysis though, so thanks for the opportunity.
Re: DMA operation in APU
But wouldn't the coarse loop in end_dmc_timer always iterate between (428*7)/16 = 187.25 and (428*8)/16 = 214 times, regardless of timing? end_dmc_timer starts a new sample, and that should set the number of remaining bits to 8, meaning it'll have to go through 8 DMC clocks before loading the final sample byte and clearing SNDCHN bit 4.
Iterating between 187 and 214 times makes y way out of range compared to the expected value, so I must be missing something.
Iterating between 187 and 214 times makes y way out of range compared to the expected value, so I must be missing something.
Re: DMA operation in APU
I think we've found the problemend_dmc_timer starts a new sample, and that should set the number of remaining bits to 8, meaning it'll have to go through 8 DMC clocks before loading the final sample byte and clearing SNDCHN bit 4.

If there's no sample byte loaded in time, it'll output eight silence bits, even if you get a sample byte loaded just after the first silence bit begins.APU DMC wrote:When an output cycle ends, a new cycle is started as follows:
* The bits-remaining counter is loaded with 8.
* If the sample buffer is empty, then the silence flag is set; otherwise, the silence flag is cleared and the sample buffer is emptied into the shift register.
When the timer outputs a clock, the following actions occur in order:
[...]
* The bits-remaining counter is decremented. If it becomes zero, a new cycle is started.
Nothing can interrupt a cycle; every cycle runs to completion before a new cycle is started.
Just to further verify, when I change end_dmc_timer to load Y and X with zero before the loops and uncomment the jsr print_y/jsr print_x lines, and run this,
Code: Select all
jsr time_code_begin
jsr time_code_end
(And can I get an amen! I've finally begun porting my console development programming setup to my Linux box, and was able to do this testing just now without having to power up my old Mac. Very convenient now.)
Re: DMA operation in APU
Ahh, that explains it. Hadn't realized silence on the DPCM channel worked like that. 
So the bits remaining count is updated each DMC clock regardless of whether a sample is playing or not, and the DPCM can only transition from silent to playing at the boundary after a "silent sample byte"?
Thanks!

So the bits remaining count is updated each DMC clock regardless of whether a sample is playing or not, and the DPCM can only transition from silent to playing at the boundary after a "silent sample byte"?
Thanks!

Re: DMA operation in APU
Right.So the bits remaining count is updated each DMC clock regardless of whether a sample is playing or not, and the DPCM can only transition from silent to playing at the boundary after a "silent sample byte"?
Re: DMA operation in APU
A test for that would be nice if you ever update the apu_test tests. I pass all of them without implementing it.
Re: DMA operation in APU
Started digging into the circuitry a bit, and it looks like the 2A03 handles rdy for OAM DMA and PCM reads in a pretty clever way. Rather than just pulling rdy low for a fixed safe "minimum" time, it first pulls it low and then waits for a CPU read. Once it sees a read, it knows rdy must have kicked in, and moves on to doing the transfer.
Re: DMA operation in APU
In case it's helpful to future people pulling their hair out over this one:
To pass, it's important that the first $4014 write lands on an even cycle (and so only adds a single dummy cycle). Since the test synchronizes to the DMC, getting this right depends on the DMC clocks happening on even cycles too, which in turn depends on the power-on value of the DMC timer (the one counting down and getting reloaded with the period when it reaches 0). On the real thing, I think the power-on value is 428 (or the equivalent at least - it uses a linear feedback shift register), though only the even/oddness should matter for this test.
If you get 528,527,528,527,... instead of 527,528,527,528,..., that's likely the problem.
To pass, it's important that the first $4014 write lands on an even cycle (and so only adds a single dummy cycle). Since the test synchronizes to the DMC, getting this right depends on the DMC clocks happening on even cycles too, which in turn depends on the power-on value of the DMC timer (the one counting down and getting reloaded with the period when it reaches 0). On the real thing, I think the power-on value is 428 (or the equivalent at least - it uses a linear feedback shift register), though only the even/oddness should matter for this test.
If you get 528,527,528,527,... instead of 527,528,527,528,..., that's likely the problem.