It is currently Tue May 23, 2017 3:39 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Sun Oct 09, 2016 7:48 am 
Offline

Joined: Sun Oct 09, 2016 7:22 am
Posts: 2
Location: Rio de Janeiro, Brazil
So, I was brainstorming yesterday, and had an idea of a method of sample playback on the NES, at high quality, with little CPU overhead and small size (2-bit/sample average).

And this is the result: https://github.com/VitinhoCarneiro/2a03-d-plus-pcm

Unfortunately I'm not any good at 6502 ASM programming (though I might give it a try), so I just did a simulation of the encoding/decoding process in C. So, no ROMs or demos yet, though I've uploaded a sample of the program output here: https://www.youtube.com/watch?v=qgnXLjX4EMI

This codec is based on ideas from za909's MintaBOOM sample engine, available here: viewtopic.php?f=22&t=14520

How it works:
-The codec combines 7-bit PCM with DPCM deltas and uses the DMC interrupt for triggering samples.
-The audio is divided into 2-byte blocks, containing 8 samples.
-The first byte is a 7-bit PCM value to be written to the DMC delta counter.
-The second byte represents a pointer to the DPCM sample table - it's basically a sequence from 0x00 to 0xFF, where the value of the pointer is equal to the value stored in the table.
-On playback, the PCM value is written to the delta counter, and then the 1-byte DPCM sample is triggered and the interrupt set to trigger on the sample end; the CPU then returns to execution of normal code until the interrupt fires.
-The sample rate can be varied by simply changing the DMC frequency value, since that's what triggers the sample playback - this also opens possibilities for sample repitching.

Based on these ideas, this method of sample playback should be very easy to integrate into games/demos (as long as they don't require cycle-accurate execution, since the interrupt will disrupt their timing), since it should take very few CPU cycles to trigger a new block of samples every time the interrupt is fired.

Besides from low overhead, this codec also has a high compression rate (2 bits per sample, as opposed to the usual 8-bit padded PCM), and still high quality (even at ~32kbps ($C playback rate), it sounds way better than plain DPCM at $F).

I might try making a simple NES decoder for sample playback as a proof-of-concept for this method. But anyone is free to study and modify the source code, improve on it, and even making demos out of it.

I'd love to hear your feedback on this. I feel like this could be a game-changer for sample playback on the NES.

(PS: I'm thinking about a version of this that will use only 1 byte per 8 samples, using 4-bit ADPCM and vector-quantized DPCM sample blocks. I might have to test if it will decompress well, though...)


Top
 Profile  
 
PostPosted: Sun Oct 09, 2016 10:19 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5175
Location: Canada
So... this is the same as the regular 1-bit DPCM format but augmented by letting it do an arbitrary jump every 8 samples?

Hmm, I would guess that the jumps at ~4kHz would split the bandwidth, giving more fidelity to lower freqencies (under ~2kHz) via the added jumps, and possibly make more headroom for higher frequencies in the 1-bit DPCM stream? I could see this being a significant improvement, though it would be easier to hear in example if you made a video/recording comparing plain 1-bit DPCM encoding to your method, side by side.


Some thoughts about implementing it:

DPCM samples have a 16 64 byte memory alignment, so a table of 256 1-byte DPCM samples actually requires 4k 16k of space. Something with RAM in the DPCM area (e.g. FDS) could avoid having a table, though. (You could stick other data in the 15 63 bytes between samples, though, if you needed to take up that space, but it's quite inconvenient.)

Also the IRQ happens when the DPCM sample byte is fetched, not when it's finished playing, so the stream should have the DPCM data shifted ahead by one. (This might make using the IRQ trickier.)

Your IRQ will happen every 432 cycles, so I'd guess this would take up at least 10% of the CPU?

Games also have the problem of needing to do an OAM DMA once per frame, which will take 514 cycles, overlapping at least 1 sample, so there's a problem to solve regarding that interruption. (If not accommodating sprite animation, might be acceptable to use a lot more CPU?)

Edit: samples are 64 byte aligned, not 16, as tepples points out below.


Last edited by rainwarrior on Sun Oct 09, 2016 7:39 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sun Oct 09, 2016 11:15 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18338
Location: NE Indiana, USA (NTSC)
It's worse than that. DPCM is 64-byte aligned, meaning a 256-entry table would use the first byte of the whole fixed bank, reducing the whole fixed bank to carefully chopped up 63-byte segments. (The lengths are 1 plus a multiple of 16 bytes.) This makes the vector quantization angle even more attractive, as only the last kilobyte needs the 63-byte treatment.

Still, 10% of the CPU is a lot better than 99%.

And I wonder if VQing DPCM alone could be a good way to save some space.


Top
 Profile  
 
PostPosted: Sun Oct 09, 2016 11:37 am 
Offline

Joined: Sun Oct 09, 2016 7:22 am
Posts: 2
Location: Rio de Janeiro, Brazil
rainwarrior wrote:
Hmm, I would guess that the jumps at ~4kHz would split the bandwidth, giving more fidelity to lower freqencies (under ~2kHz) via the added jumps, and possibly make more headroom for higher frequencies in the 1-bit DPCM stream? I could see this being a significant improvement, though it would be easier to hear in example if you made a video/recording comparing plain 1-bit DPCM encoding to your method, side by side.


As originally posted, I've uploaded a sample of the program output here: https://www.youtube.com/watch?v=qgnXLjX4EMI

Definitely much better than plain DPCM.

EDIT: I've uploaded a plain DPCM version encoded by a modified version of my program: https://drive.google.com/file/d/0B4aSs6 ... sp=sharing
The quality difference is pretty noticeable, especially in the snare drums, which sound pretty muffled.

Quote:
It's worse than that. DPCM is 64-byte aligned, meaning a 256-entry table would use the first byte of the whole fixed bank, reducing the whole fixed bank to carefully chopped up 63-byte segments. (The lengths are 1 plus a multiple of 16 bytes.) This makes the vector quantization angle even more attractive, as only the last kilobyte needs the 63-byte treatment.


Well, that sucks, but there can always be some use for 256 63-byte segments (who knows... I could even interleave the samples there if I do it right)... Or I could just go with VQ, since there's probably a lot of redundancy within the delta blocks.


Top
 Profile  
 
PostPosted: Sun Oct 09, 2016 7:43 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5175
Location: Canada
VitinhoCarneiro wrote:
The quality difference is pretty noticeable, especially in the snare drums, which sound pretty muffled.

Yeah, thanks for the DPCM example to compare against.

Less muffled snares makes sense to me. One of 1-bit's DPCM's big failings is that loud low frequencies tend to mask higher frequencies, a problem which I believe your method solves very well (what I meant about headroom before).


Top
 Profile  
 
PostPosted: Sun Oct 09, 2016 9:49 pm 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2814
Location: Tampere, Finland
Interesting idea.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Sun Oct 09, 2016 10:20 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 9641
Location: Rio de Janeiro - Brazil
Cool idea indeed. I wonder if this can be done without stealing too much CPU time.


Top
 Profile  
 
PostPosted: Mon Oct 10, 2016 12:20 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7034
Location: Jongny, VD, Switzerland
Very cool idea, but

tepples wrote:
It's worse than that. DPCM is 64-byte aligned, meaning a 256-entry table would use the first byte of the whole fixed bank, reducing the whole fixed bank to carefully chopped up 63-byte segments. (The lengths are 1 plus a multiple of 16 bytes.) This makes the vector quantization angle even more attractive, as only the last kilobyte needs the 63-byte treatment.

For code, I think you can manage to deal with it by using branch/jump instructions accordingly, but for lookuptables bigger than 63 entries... it's really unusable. And I fear lookup tables are necessary to a lot of applications.


Top
 Profile  
 
PostPosted: Mon Oct 10, 2016 12:29 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5175
Location: Canada
Storing the PCM data in there would be no problem though. Very easy to just skip every 64th byte.


Top
 Profile  
 
PostPosted: Tue Nov 01, 2016 10:36 pm 
Offline
User avatar

Joined: Thu Jan 03, 2008 1:48 pm
Posts: 439
Is it feasible to consider multiple virtual channels for sample playback in the vein of SuperNSF using this technique with 2 byte blocks of 8 samples each?

I guess for sample retriggering, and sample offset commands the chunks would be a lot longer and you'd have less control.. Also an instance of the "decoding" would be run every sample initialization, so that if you were running 2-4 additional virtual channels mixed over one another you'd also have to deal with volume control and the additional CPU overhead...


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group