Linear interpolation overtones

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
User avatar
za909
Posts: 216
Joined: Fri Jan 24, 2014 9:05 am
Location: Hungary

Linear interpolation overtones

Post by za909 » Wed May 06, 2020 2:23 pm

I have attempted to add linear interpolation samples to my now working PCM stream. I have run into some strange wave physics phenomenon I believe, but there is a possible explanation that stems from the code, I just don't know if it is a correct answer.

So to demonstrate the horrible noise that I created without subjecting any of you to it, I have saved a spectral analysis of the same music track with and without the interpolation samples:

Without interpolation (the visible spike comes from the underlying DMC rate $F):
notinterpolated.png
notinterpolated.png (11.35 KiB) Viewed 3448 times
With interpolation (new overtone-like spikes added):
interpolatedpng.png
interpolatedpng.png (11.42 KiB) Viewed 3448 times
The interpolated samples are a sum and division by 2 of the last, and the next output sample from ROM. It's just that this interpolated sample can only be output on every second one of the required occasions. The other time to output a similar averaged value lies outside of the intended time spent within a DMC IRQ (which already eats 50% of the CPU time). Overall, the following output interval pattern is achieved:
216 cycles (data) -> 108 cycles (calculated) -> 108 cycles (data) -> Can not output here, outside IRQ ->...
So, could this 2-1-1 ratio be responsible for the overtones that are created (and very much amplified)?

To any moderators: If I am creating too many topics related to my raw PCM experiments, please let me know and merge topics if possible. I feel these are separate subjects that stand on their own, but they are still centered about PCM streaming on the NES so I would understand cutting down on the topics.

User avatar
rainwarrior
Posts: 7893
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Linear interpolation overtones

Post by rainwarrior » Wed May 06, 2020 7:11 pm

I don't think that's a result I'd expect. The extra spikes seem to be harmonically related to your samplerate? 4, 8, 12, 16, 20 khz? I wonder what that indicates.

Maybe to help diagnose it, try making an impulse of 1 sample of some high value surrounded by 0s. See how that responds to the process (in both spectrum and waveform), and maybe compare against some more ideal version of same just to have something to check. Maybe also make sure there isn't a difference between the high sample being on an odd or even, just in case there's a bug that has made it asymmetrical. Alternatively, try a sine wave and see how that responds?

Drag
Posts: 1327
Joined: Mon Sep 27, 2004 2:57 pm
Contact:

Re: Linear interpolation overtones

Post by Drag » Wed May 06, 2020 8:03 pm

I suspect there's something wrong with how the interpolated sample is being generated, could you post a snippet of that code?

coto
Posts: 50
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: Linear interpolation overtones

Post by coto » Wed May 06, 2020 8:19 pm

za909 wrote:
Wed May 06, 2020 2:23 pm
The interpolated samples are a sum and division by 2 of the last, and the next output sample from ROM.
I know it's NES related and this response is somewhat more general but:

Does the PCM sounds OK without the interpolation? Is it signed PCM16? signed/unsigned PCM8? how many channels?
To add interpolation you need to read each sample correctly depending on the format used, then process the sample.


Processing samples: (PCM8 data can be signed (center offset 0, between -128 and 127), unsigned(center offset 128 between 0 and 255))

Here's how to interpolate Signed PCM8 samples

Code: Select all

//Assuming char is a signed byte (1 byte)
char srcA[pcm8SignedSize];
char srcB[pcm8SignedSize];
char bufferOut[sizeof(srcA)+sizeof(srcB)]; //srcA and srcB both hold the same size, and are PCM8 signed, thus the output will be as well
for(int i = 0; i < sizeof(srcA); i++)
{   
    bufferOut[i] = round( ((srcA[i] + srcB[i])*2) - ((srcA[i]*srcB[i])/128) - 256) );
}

Note: For unsigned PCM8 the same above formula but divided by 256 and remove the last - 256
Last edited by coto on Thu May 07, 2020 1:24 pm, edited 2 times in total.

lidnariq
Posts: 9859
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Linear interpolation overtones

Post by lidnariq » Thu May 07, 2020 12:10 am

za909 wrote:
Wed May 06, 2020 2:23 pm
It's just that this interpolated sample can only be output on every second one of the required occasions.
I strongly suspect this is your problem.

Quickly testing this in Octave, I took a pure sine wave of 150Hz at 16kHz, upsampled it to 32kHz and interpolated using a plain sample-and-hold, and got this:
snh.png
Then I used your "alternating linear and sample-and-hold" and got this:
half-interpolated.png

User avatar
za909
Posts: 216
Joined: Fri Jan 24, 2014 9:05 am
Location: Hungary

Re: Linear interpolation overtones

Post by za909 » Thu May 07, 2020 2:23 am

Thank you for the answers. I am using unsigned 8-bit PCM, which is lsr-d once to discard the lowest bit that can't fit in the $4011 register. These are summed and divided by 2, I will post the relevant code later. (The next improvement is to add compatibility with the output of my external pre-processor which also applies primitive RLE to fill the unused bit)

If the problem is related to the alternating output pattern though, I might have to drop this idea altogether, and use the delta samples to bridge the gap between two successive samples. This would make the delta modulation steps more than just a passive byproduct of the timing. It would require me to put many different interpolation dpcm samples 64 bytes apart from eachother due to the limitation on starting addresses. This would be possible to keep going even outside the IRQ and even OAM DMA could be interrupted by it. There are only 4 delta bits played between every two PCM outputs but it might be enough to smooth out the edges even at larger jumps of more than 8.

Edit: Here is the code that creates and outputs the interpolated sample:

Code: Select all

	lda (samplevector+0),y ; this reads the next sample from ROM
	lsr 
	pha ; recover the sample at the end 
	clc ; output an interpolated sample
	adc samplebufferprev ; this contains the sample from ROM that was output last time
	lsr ; divide it by 2
	sta $4011
	sta $4011 ; DMC is already running, do two writes to clobber a conflicting amplitude change by the DPCM unit

Garth
Posts: 195
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Linear interpolation overtones

Post by Garth » Thu May 07, 2020 10:46 am

I think the legitimate purpose of the interpolation would be to remedy some aliasing trouble, at the expense of accepting some distortion. Is a better anti-alias filter an option? Up to 4th- or 5th-order is pretty easy with common parts, not needing precision capacitors. (Precision resistors are common and cheap today.)
http://WilsonMinesCo.com/ lots of 6502 resources

Drag
Posts: 1327
Joined: Mon Sep 27, 2004 2:57 pm
Contact:

Re: Linear interpolation overtones

Post by Drag » Thu May 07, 2020 11:40 am

I think lidnariq has the right answer. Otherwise, make sure the sample you write to samplebufferprev is indeed a 7-bit sample and not an 8-bit sample.

My original hunch (precision loss from 8-bit + 8bit = 9bit, mistaken LSR instead of ROR) doesn't apply if the two samples you're dealing with are 7-bit. :P

coto
Posts: 50
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: Linear interpolation overtones

Post by coto » Thu May 07, 2020 1:21 pm

za909 wrote:
Thu May 07, 2020 2:23 am
Thank you for the answers. I am using unsigned 8-bit PCM, which is lsr-d once to discard the lowest bit that can't fit in the $4011 register. These are summed and divided by 2, I will post the relevant code later. (The next improvement is to add compatibility with the output of my external pre-processor which also applies primitive RLE to fill the unused bit)

If the problem is related to the alternating output pattern though, I might have to drop this idea altogether, and use the delta samples to bridge the gap between two successive samples. This would make the delta modulation steps more than just a passive byproduct of the timing. It would require me to put many different interpolation dpcm samples 64 bytes apart from eachother due to the limitation on starting addresses. This would be possible to keep going even outside the IRQ and even OAM DMA could be interrupted by it. There are only 4 delta bits played between every two PCM outputs but it might be enough to smooth out the edges even at larger jumps of more than 8.

Edit: Here is the code that creates and outputs the interpolated sample:

Code: Select all

	lda (samplevector+0),y ; this reads the next sample from ROM
	lsr 
	pha ; recover the sample at the end 
	clc ; output an interpolated sample
	adc samplebufferprev ; this contains the sample from ROM that was output last time
	lsr ; divide it by 2
	sta $4011
	sta $4011 ; DMC is already running, do two writes to clobber a conflicting amplitude change by the DPCM unit
If so then you need to either use some interpolation algo that is also unsigned PCM, being 7 bits which is what you are handling. I wonder if the earlier formula would work for you if you use the (bottom) unsigned note, but instead of 256 you use 128 which addresses the 7-bit range
Last edited by coto on Thu May 07, 2020 10:28 pm, edited 1 time in total.

User avatar
za909
Posts: 216
Joined: Fri Jan 24, 2014 9:05 am
Location: Hungary

Re: Linear interpolation overtones

Post by za909 » Thu May 07, 2020 1:29 pm

Drag wrote:
Thu May 07, 2020 11:40 am
I think lidnariq has the right answer.
I agree. The samplebufferprev variable is a copy of the sample that was sent to $4011 about 90 cycles before, and had gone through an lsr long before that (in the previous IRQ). I have a very odd mixed method of buffering every odd sample, and sitting around holding on to every even sample until the right time arrives to write it the output and exit the IRQ. This will probably best be rewritten, especially to make matters simpler around adding the RLE bit 7 recognition.

I think using the deltas is a valid option though, it's definitely worth a shot because it will only ever be stopped when the DPCM bits run out during an OAM DMA, which I already can't do anything about anyway, unless I start trying to schedule a looped byte that could keep on playing during that time.

This will have to wait though because saving space takes priority, and I have to see how many cycles I'll have left to do the calculation to determin the right "gap bridging" DPCM byte. I've looked into it though and it already looks like it will cause a mild code writing annoyance around the aligned bytes:

Code: Select all

linint_addr_TBL: 
; 2D matrix of values that go into DPCM_ADDR
; Two bridges have to be found, involving three bytes from ROM
; Examine the difference between byte 0 and byte 1, then between byte 1 and byte 2
; if a difference is less than 8, that bridge is a "none (n)" bridge, which uses alternating bits (1010)
; if a difference is more than or equal to 8, determine if the bridge is an "up (u) (bits 1111)" or "down (d) (bits 0000)" bridge
; do this for both bridges and select from the matrix as needed

.db (linint_u_u-$C000)/64, (linint_u_d-$C000)/64, (linint_u_n-$C000)/64
.db (linint_d_u-$C000)/64, (linint_d_d-$C000)/64, (linint_d_n-$C000)/64
.db (linint_n_u-$C000)/64, (linint_n_d-$C000)/64, (linint_n_n-$C000)/64

.align 64
linint_u_u:
.db %11111111

.align 64
linint_u_d:
.db %00001111

.align 64
linint_u_n:
.db %10101111

.align 64
linint_d_u:
.db %11110000

.align 64
linint_d_d:
.db %00000000

.align 64
linint_d_n:
.db %10100000

.align 64
linint_n_u:
.db %11111010

.align 64
linint_n_d:
.db %00001010

.align 64
linint_n_n:
.db %10101010
But this list of samples could be expanded to allow for bridging smaller gaps as well. Something to consider is that this has to be determined one IRQ ahead of time. Whenever an IRQ fires, I can know for certain that there are still 8 bits being played from the DPCM unit's output buffer, which I can not change. I can only affect what address the next 8 bits will come from by writing to $4012, before writing to $4015 to acknowledge the IRQ, reset the DPCM bytes remaining counter to 1 and to set the new address.

User avatar
za909
Posts: 216
Joined: Fri Jan 24, 2014 9:05 am
Location: Hungary

Re: Linear interpolation overtones

Post by za909 » Thu May 14, 2020 1:09 am

I have spent a few days to implement this idea, and I got mixed results... Overall the sound is much noisier than before, though it seems to be better with low frequencies too. Sometimes those harmonics from before are still present, even though this time I can continously interpolate with the DPCM deltas. I will post the code and an audio example later (with and without the interpolation) because I might have the right idea, just not implemented correctly. I have created a concept illustration of what the waveform should look like, and while there are many such examples in the audio examples, there are many ramps that appear "too late" compared to the PCM data they should start from. I suspect that the timing is never consistent enough due to the many branches in the code.
IMG_20200514_095741.jpg
EDIT: Here is the code for finding the right DPCM ramps. Two 4-bit ramps have to be found, because approximately 4 delta bits can be played between successive $4011 writes:

Code: Select all

; Try to interpolate with delta bits in 50 cycles?
; Y = 0, X = not used, contains the number of cycles left until the second $4011 write

		sty interwork+1 ; +3 clear matrix index
		lda samplebuffer+0 ; +3 
		bpl @buff0notrept ; +2/+3 if bit 7 is set in this, default to "none" bridge
		lda #%00001100 ; +2 select row 3 of the matrix
		sta interwork+1 ; +3
		jmp @calcbridge2 ; +3 go to calculating the second bridge
		@buff0notrept:
		; +1 branch taken
		and #$7F ; +2
		sta interwork+0 ; +3
		lda samplebuffer+1 ; +3
		and #$7F ; +2
		; assume a set carry here
		sbc samplebuffer+0 ; +3 c=0 n=1 (down bridge) if Byte 0 > Byte 1 / c=1 n=0 (up bridge) if Byte 0 <= Byte 1
		bmi @byte0gtbyte1 ; +2/+3
		@byte1gtbyte0:
		; Byte 0 is less than Byte 1 (up bridge)
		cmp #8 ; +2
		bcs @calcbridge2 ; +2/+3 if the difference is less than 8, it's a none bridge
		; matrix index is still 0, and we'd need row 0 anyway for an up bridge, so no write needed!
		lda #%00001100 ; +2 select row 3 of the matrix
		sta interwork+1 ; +3
		jmp @calcbridge2 ; +3 go to calculating the second bridge
		@byte0gtbyte1:
		; Byte 0 is greater than Byte 1 (down bridge)
		; +1 branch taken
		cmp #-8 ; +2
		bcs @selectdown1 ; +2/+3 if the difference is less than 8, it's a none bridge
		lda #%00001100 ; +2 select row 3 of the matrix
		sta interwork+1 ; +3
		jmp @calcbridge2 ; +3 go to calculating the second bridge		
		@selectdown1:
		lda #%00000100 ; +2
		sta interwork+1 ; +3
		;-----------------
		@calcbridge2:
		lda samplebuffer+1 ; +3
		bpl @bridge2notnone ; +2/+3 
		@bridge2none:
		lda interwork+1 ; +3
		ora #%00000011 ; +2 select column 3
		sta interwork+1 ; +3
		jmp @applyinter ; +3
		@bridge2notnone:
		and #$7F ; +2
		sta interwork+2
		lda (pcmptr+0),y ; +5 if the next byte is 00, default to none bridge
		beq @bridge2none ; +2/+3
		and #$7F ; +2
		sec ; +2
		sbc interwork+2
		bmi @byte1gtbyte2 ; +2/+3
		@byte2gtbyte1:
		; Byte 1 is less than Byte 2 (up bridge)
		cmp #8 ; +2
		bcs @applyinter ; +2/+3 if the difference is less than 8, it's a none bridge
		; we'd need column 0 anyway for an up bridge, so no write is necessary!
		lda interwork+1 ; +3
		ora #%00000011 ; +2 select column 3 of the matrix if less than 8
		sta interwork+1 ; +3
		jmp @applyinter ; +3 go to applying the bridges
		@byte1gtbyte2:
		; Byte 1 is greater than Byte 2 (down bridge)
		; +1 branch taken
		cmp #-8 ; +2
		bcc @bridge2none ; +2/+3 if the difference is less than 8, it's a none bridge
		lda interwork+1 ; +3
		ora #%00000011 ; +2 select column 3 of the matrix
		sta interwork+1 ; +3
		jmp @applyinter ; +3 go to calculating the second bridge		
		@selectdown2:
		inc interwork+1 ; +5 select column 1 of the matrix
		@applyinter:
		ldy interwork+1 ; +3
		lda linint_addr_TBL,y ; +4
		sta $4012 ; +4

	lda #$1F ; +2
	sta $4015 ; +4

Code: Select all

linint_addr_TBL: 
; 2D matrix of values that go into DPCM_ADDR
; Two bridges have to be found, involving three bytes from ROM
; Examine the difference between byte 0 and byte 1, then between byte 1 and byte 2
; if a difference is less than 8, that bridge is a "none (n)" bridge, which uses alternating bits (1010)
; if a difference is more than or equal to 8, determine if the bridge is an "up (u)" or "down (d)" bridge
; do this for both bridges and select from the matrix as needed

.db (linint_u_u-$C000)/64, (linint_d_u-$C000)/64, (linint_n_u-$C000)/64, (linint_n_u-$C000)/64
.db (linint_u_d-$C000)/64, (linint_d_d-$C000)/64, (linint_n_d-$C000)/64, (linint_n_d-$C000)/64
.db (linint_u_n-$C000)/64, (linint_d_n-$C000)/64, (linint_n_n-$C000)/64, (linint_n_n-$C000)/64
.db (linint_u_n-$C000)/64, (linint_d_n-$C000)/64, (linint_n_n-$C000)/64, (linint_n_n-$C000)/64

.align 64
linint_u_u:
.db %11111111

.align 64
linint_u_d:
.db %00001111

.align 64
linint_u_n:
.db %10101111

.align 64
linint_d_u:
.db %11110000

.align 64
linint_d_d:
.db %00000000

.align 64
linint_d_n:
.db %10100000

.align 64
linint_n_u:
.db %11111010

.align 64
linint_n_d:
.db %00001010

.align 64
linint_n_n:
.db %10101010
There is also no way this can reliably run in less than 50-55 cycles. It surpasses this limit by just 10-15 cycles. It shouldn't have a sever effect on the sound, but I don't want the IRQ handler to protrude from the idealized 50% CPU usage limit by too much.

I have uploaded the example song (MediEvil: The Time Device by Paul Arnold & Andrew Barnabas) recorded from Mesen to Soundcloud. I hope the quality is not affected by this?
DPCM-Interpolated recording
Recording without DPCM interpolation

Another image of the interpolated waveform on top, and the non-interpolated waveform below in Audacity. You can see where the DPCM ramps kick in:
Annotation 2020-05-15 190141.png

Post Reply