What if Sinistar's voice sample chip was used to play music?

Discussion of development of software for any "obsolete" computer or video game system.
Post Reply
psycopathicteen
Posts: 2980
Joined: Wed May 19, 2010 6:12 pm

What if Sinistar's voice sample chip was used to play music?

Post by psycopathicteen » Sat Nov 21, 2020 4:12 pm

I was looking through youtube and I found this video using the NES's DPCM channel to play music:

https://m.youtube.com/watch?v=YvRqV9tBI7c&t=72s

...and it sounds pretty lousy. I'm think that Sinistar's chip chip would be similar, but slightly better just because it uses 1-bit adpcm instead of 1-bit dpcm.

lidnariq
Posts: 10265
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: What if Sinistar's voice sample chip was used to play music?

Post by lidnariq » Sat Nov 21, 2020 5:32 pm

They turned the volume up a lot in those encodings, favoring clipping to hiss. None of those should sound anywhere near that bad.

Actually comparing the same clip encoded at 33143bit/sec in NES DPCM and sox's "CVU" side-by-side is pretty comparible.

psycopathicteen
Posts: 2980
Joined: Wed May 19, 2010 6:12 pm

Re: What if Sinistar's voice sample chip was used to play music?

Post by psycopathicteen » Thu Nov 26, 2020 3:09 pm

I'm actually surprised how easy it was to find a wave to cvsd converter. https://convertio.co/wav-cvsd/

I've played around with some samples, and I noticed a 32000hz cvsd sounds about as muffled as 8000hz pcm.

lidnariq
Posts: 10265
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: What if Sinistar's voice sample chip was used to play music?

Post by lidnariq » Thu Nov 26, 2020 3:15 pm

sox's implementation of CVSD only supports 8kHz output, so I wouldn't be surprised if that's why you're hearing that muffling. That's also why I ended up going with "cvu" format for an apples-to-apples comparison.

That said, it does support arbitrary bitrates regardless of the filtering.

psycopathicteen
Posts: 2980
Joined: Wed May 19, 2010 6:12 pm

Re: What if Sinistar's voice sample chip was used to play music?

Post by psycopathicteen » Thu Nov 26, 2020 4:58 pm

What's weird is that eventhough the file comes out as 8khz, it's in slowmotion and I have to speed it up to 16khz for it to be at the correct speed. To do 32khz, I have to slow it down 2x, convert it, then speed it back up to 4x.

I'm saying even doing with 32khz, it sounds muffled like an 8khz pcm. I'm guessing the reason for the muffling is because the slope rate automatically decreases if there are less than 3 bits of the same value in a row.

psycopathicteen
Posts: 2980
Joined: Wed May 19, 2010 6:12 pm

Re: What if Sinistar's voice sample chip was used to play music?

Post by psycopathicteen » Fri Nov 27, 2020 10:01 am

I was doing experiments with the CVU converter and I had a few observations on how the algorithm worked.

- There is a max slope rate that appears to be about 1/32
- The previous sample seems to get multiplied by a value slightly less than 1 (maybe 63/64?)
- When sampling a 6kHz sine wave at 48kHz, the amplitude gradually increases
- When sampling a 12kHz sine wave a 48kHz, the amplitude gradually decreases
- It must be incrementing and decrementing the slope at different rates, based on the two observations

tepples
Posts: 22286
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What if Sinistar's voice sample chip was used to play music?

Post by tepples » Fri Nov 27, 2020 11:58 am

In backward-adaptive[1] DPCM in general, it is common to increment scale in response to slope overload faster than decrementing in response to granular noise. This ensures an adequately fast response to slope overload and decay over the course of a syllable without triggering excessive slope overload for multiple cycles in a single syllable.

I'd draw an analogy to IMA ADPCM (4-bit), as described in "DS Sound Notes" by Martin Korth. It uses a table of step scales where each is 10% bigger than the previous, updated after each sample. Sample values less than half full scale cause the next sample to use the previous entry in the scale table, whereas sample values greater than half the current step size cause jumping forward by 2, 4, 6, or 8 entries.


[1] "Backward-adaptive" refers to an encoder that decides how to encode the next sample based only on previous output. Distinguish from a "forward-adaptive" method such as SNES BRR, in which the encoder looks at future samples and tells the decoder in advance how they will be encoded. Source: "Research Interests: Adaptive Quantization" by Youngjun Yoo

psycopathicteen
Posts: 2980
Joined: Wed May 19, 2010 6:12 pm

Re: What if Sinistar's voice sample chip was used to play music?

Post by psycopathicteen » Sat Nov 28, 2020 9:41 am

I just tested an 8kHz wave (3 samples up, 3 samples down) and I found out that it does work, so there are two possibilities:

1) it increases slope every 2 same bits in a row, by a rate smaller than it decreases, but larger than half the rate
2) it increases slope every 3 same bits in a row, by a rate more than double the size it decreases

I wonder if there is an equilibrium frequency where the slope rate can sustain itself, but can't increase on it's own, where the amount of decreasing is exactly the same as the amount of increasing.

lidnariq
Posts: 10265
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: What if Sinistar's voice sample chip was used to play music?

Post by lidnariq » Sat Nov 28, 2020 11:44 am

You could look at the implementation in SoX: https://sourceforge.net/p/sox/code/ci/m ... src/cvsd.c

tepples
Posts: 22286
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What if Sinistar's voice sample chip was used to play music?

Post by tepples » Sat Nov 28, 2020 12:36 pm

Overview of CVSD:

Code: Select all

tc0 = e^(-200 / rate)  # Decay to 1/e-th in 1/200 s
tc1 = 0.1 * (1 - tc0)  # Increase stepsize to full scale in 1/20 s

for each sample:
  stepsize *= tc0  # exponential decay
  if last 3 are 000 or 111: stepsize += tc1  # linear increase
  output +stepsize if sample else -stepsize through a lowpass filter
It seems SoX also upsamples by 4 when encoding and downsamples by 4 when decoding. So if you encode an 8 kHz narrowband wave, you get 32 kbps output.

Post Reply