2A03-D+PCM - Method for high-quality low-CPU sample playback

A place where you can keep others updated about your NES-related projects through screenshots, videos or information in general.

Moderator: Moderators

Post Reply
VitinhoCarneiro
Posts: 2
Joined: Sun Oct 09, 2016 7:22 am
Location: Rio de Janeiro, Brazil

2A03-D+PCM - Method for high-quality low-CPU sample playback

Post by VitinhoCarneiro » Sun Oct 09, 2016 7:48 am

So, I was brainstorming yesterday, and had an idea of a method of sample playback on the NES, at high quality, with little CPU overhead and small size (2-bit/sample average).

And this is the result: https://github.com/VitinhoCarneiro/2a03-d-plus-pcm

Unfortunately I'm not any good at 6502 ASM programming (though I might give it a try), so I just did a simulation of the encoding/decoding process in C. So, no ROMs or demos yet, though I've uploaded a sample of the program output here: https://www.youtube.com/watch?v=qgnXLjX4EMI

This codec is based on ideas from za909's MintaBOOM sample engine, available here: viewtopic.php?f=22&t=14520

How it works:
-The codec combines 7-bit PCM with DPCM deltas and uses the DMC interrupt for triggering samples.
-The audio is divided into 2-byte blocks, containing 8 samples.
-The first byte is a 7-bit PCM value to be written to the DMC delta counter.
-The second byte represents a pointer to the DPCM sample table - it's basically a sequence from 0x00 to 0xFF, where the value of the pointer is equal to the value stored in the table.
-On playback, the PCM value is written to the delta counter, and then the 1-byte DPCM sample is triggered and the interrupt set to trigger on the sample end; the CPU then returns to execution of normal code until the interrupt fires.
-The sample rate can be varied by simply changing the DMC frequency value, since that's what triggers the sample playback - this also opens possibilities for sample repitching.

Based on these ideas, this method of sample playback should be very easy to integrate into games/demos (as long as they don't require cycle-accurate execution, since the interrupt will disrupt their timing), since it should take very few CPU cycles to trigger a new block of samples every time the interrupt is fired.

Besides from low overhead, this codec also has a high compression rate (2 bits per sample, as opposed to the usual 8-bit padded PCM), and still high quality (even at ~32kbps ($C playback rate), it sounds way better than plain DPCM at $F).

I might try making a simple NES decoder for sample playback as a proof-of-concept for this method. But anyone is free to study and modify the source code, improve on it, and even making demos out of it.

I'd love to hear your feedback on this. I feel like this could be a game-changer for sample playback on the NES.

(PS: I'm thinking about a version of this that will use only 1 byte per 8 samples, using 4-bit ADPCM and vector-quantized DPCM sample blocks. I might have to test if it will decompress well, though...)

User avatar
rainwarrior
Posts: 7879
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by rainwarrior » Sun Oct 09, 2016 10:19 am

So... this is the same as the regular 1-bit DPCM format but augmented by letting it do an arbitrary jump every 8 samples?

Hmm, I would guess that the jumps at ~4kHz would split the bandwidth, giving more fidelity to lower freqencies (under ~2kHz) via the added jumps, and possibly make more headroom for higher frequencies in the 1-bit DPCM stream? I could see this being a significant improvement, though it would be easier to hear in example if you made a video/recording comparing plain 1-bit DPCM encoding to your method, side by side.


Some thoughts about implementing it:

DPCM samples have a 16 64 byte memory alignment, so a table of 256 1-byte DPCM samples actually requires 4k 16k of space. Something with RAM in the DPCM area (e.g. FDS) could avoid having a table, though. (You could stick other data in the 15 63 bytes between samples, though, if you needed to take up that space, but it's quite inconvenient.)

Also the IRQ happens when the DPCM sample byte is fetched, not when it's finished playing, so the stream should have the DPCM data shifted ahead by one. (This might make using the IRQ trickier.)

Your IRQ will happen every 432 cycles, so I'd guess this would take up at least 10% of the CPU?

Games also have the problem of needing to do an OAM DMA once per frame, which will take 514 cycles, overlapping at least 1 sample, so there's a problem to solve regarding that interruption. (If not accommodating sprite animation, might be acceptable to use a lot more CPU?)

Edit: samples are 64 byte aligned, not 16, as tepples points out below.
Last edited by rainwarrior on Sun Oct 09, 2016 7:39 pm, edited 1 time in total.

tepples
Posts: 22092
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by tepples » Sun Oct 09, 2016 11:15 am

It's worse than that. DPCM is 64-byte aligned, meaning a 256-entry table would use the first byte of the whole fixed bank, reducing the whole fixed bank to carefully chopped up 63-byte segments. (The lengths are 1 plus a multiple of 16 bytes.) This makes the vector quantization angle even more attractive, as only the last kilobyte needs the 63-byte treatment.

Still, 10% of the CPU is a lot better than 99%.

And I wonder if VQing DPCM alone could be a good way to save some space.

VitinhoCarneiro
Posts: 2
Joined: Sun Oct 09, 2016 7:22 am
Location: Rio de Janeiro, Brazil

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by VitinhoCarneiro » Sun Oct 09, 2016 11:37 am

rainwarrior wrote:Hmm, I would guess that the jumps at ~4kHz would split the bandwidth, giving more fidelity to lower freqencies (under ~2kHz) via the added jumps, and possibly make more headroom for higher frequencies in the 1-bit DPCM stream? I could see this being a significant improvement, though it would be easier to hear in example if you made a video/recording comparing plain 1-bit DPCM encoding to your method, side by side.
As originally posted, I've uploaded a sample of the program output here: https://www.youtube.com/watch?v=qgnXLjX4EMI

Definitely much better than plain DPCM.

EDIT: I've uploaded a plain DPCM version encoded by a modified version of my program: https://drive.google.com/file/d/0B4aSs6 ... sp=sharing
The quality difference is pretty noticeable, especially in the snare drums, which sound pretty muffled.
It's worse than that. DPCM is 64-byte aligned, meaning a 256-entry table would use the first byte of the whole fixed bank, reducing the whole fixed bank to carefully chopped up 63-byte segments. (The lengths are 1 plus a multiple of 16 bytes.) This makes the vector quantization angle even more attractive, as only the last kilobyte needs the 63-byte treatment.
Well, that sucks, but there can always be some use for 256 63-byte segments (who knows... I could even interleave the samples there if I do it right)... Or I could just go with VQ, since there's probably a lot of redundancy within the delta blocks.

User avatar
rainwarrior
Posts: 7879
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by rainwarrior » Sun Oct 09, 2016 7:43 pm

VitinhoCarneiro wrote:The quality difference is pretty noticeable, especially in the snare drums, which sound pretty muffled.
Yeah, thanks for the DPCM example to compare against.

Less muffled snares makes sense to me. One of 1-bit's DPCM's big failings is that loud low frequencies tend to mask higher frequencies, a problem which I believe your method solves very well (what I meant about headroom before).

User avatar
thefox
Posts: 3141
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by thefox » Sun Oct 09, 2016 9:49 pm

Interesting idea.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi

User avatar
tokumaru
Posts: 11895
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by tokumaru » Sun Oct 09, 2016 10:20 pm

Cool idea indeed. I wonder if this can be done without stealing too much CPU time.

User avatar
Bregalad
Posts: 7965
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by Bregalad » Mon Oct 10, 2016 12:20 pm

Very cool idea, but
tepples wrote:It's worse than that. DPCM is 64-byte aligned, meaning a 256-entry table would use the first byte of the whole fixed bank, reducing the whole fixed bank to carefully chopped up 63-byte segments. (The lengths are 1 plus a multiple of 16 bytes.) This makes the vector quantization angle even more attractive, as only the last kilobyte needs the 63-byte treatment.
For code, I think you can manage to deal with it by using branch/jump instructions accordingly, but for lookuptables bigger than 63 entries... it's really unusable. And I fear lookup tables are necessary to a lot of applications.

User avatar
rainwarrior
Posts: 7879
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by rainwarrior » Mon Oct 10, 2016 12:29 pm

Storing the PCM data in there would be no problem though. Very easy to just skip every 64th byte.

User avatar
B00daW
Posts: 586
Joined: Thu Jan 03, 2008 1:48 pm

Re: 2A03-D+PCM - Method for high-quality low-CPU sample play

Post by B00daW » Tue Nov 01, 2016 10:36 pm

Is it feasible to consider multiple virtual channels for sample playback in the vein of SuperNSF using this technique with 2 byte blocks of 8 samples each?

I guess for sample retriggering, and sample offset commands the chunks would be a lot longer and you'd have less control.. Also an instance of the "decoding" would be run every sample initialization, so that if you were running 2-4 additional virtual channels mixed over one another you'd also have to deal with volume control and the additional CPU overhead...

Post Reply