## Some advice of DSP here...

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

### Re: Some advice of DSP here...

tepples wrote:
zeroone wrote:Why every 3rd CPU cycle?
Probably someone's idea of the best compromise between speed and accuracy. I would have done every 2 CPU cycles, as the majority of the APU changes its output every other cycle, and the one channel that does allow odd periods (triangle) only changes it by a comparatively small amount.
It's an optimization to reduce filter processing requirements. Every 3rd cycle is the most you can reduce it while guaranteeing that there won't be any aliasing of high frequencies into the audible range. re: 2 vs. 3, I don't hear any difference. Besides, does it matter? According to the sampling theorem, it should be fine, no?
tepples wrote:
Also, is the b-spline step necessary? Once the high frequencies are filtered away, shouldn't you just be able to sub-sample the data (i.e. just throw away samples).
That'd be fine if 48 kHz divided evenly into the 315/176 MHz clock rate of the CPU. But it doesn't. There are 37.2869 CPU cycles for each output sample. So you end up having to interpolate when you decimate, inferring output samples that lie between the samples of the filter's output. You can use nearest-neighbor resampling, choosing (say) one output sample every 37, 37, 38, 37, 37, 37, 38, ... input samples, but it adds a bit of jitter. Linear interpolation shouldn't pose a problem though with this much margin.
The interpolation method was chosen based on the work in this paper: http://yehar.com/blog/wp-content/upload ... 8/deip.pdf. The signal is approximately 12x oversampled. According to the results in the paper (interpolation between the 8x and 16x numbers...), linear interpolation would result in ~50dB SNR, while 3rd order b-spline is ~106dB SNR.
get nemulator
http://nemulator.com

tepples
Posts: 22044
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

### Re: Some advice of DSP here...

James wrote:
tepples wrote:
zeroone wrote:Why every 3rd CPU cycle?
Probably someone's idea of the best compromise between speed and accuracy. I would have done every 2 CPU cycles, as the majority of the APU changes its output every other cycle, and the one channel that does allow odd periods (triangle) only changes it by a comparatively small amount.
It's an optimization to reduce filter processing requirements. Every 3rd cycle is the most you can reduce it while guaranteeing that there won't be any aliasing of high frequencies into the audible range. re: 2 vs. 3, I don't hear any difference. Besides, does it matter? According to the sampling theorem, it should be fine, no?
Short-period noise at the highest frequency (\$400E=\$80) changes its output level every four cycles. (source) If you point-sample it every 3 cycles, then you'll pick up 1 of some levels and 2 of others. In sampling theorem terms, there's some ultrasonic aliasing that you're not completely rejecting. But I haven't tested sampling every 3 cycles, so I don't know either how it would sound. I chose 2 based on theory, as it's guaranteed to give equal weight to every output sample from square and noise.

lidnariq
Posts: 9655
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

### Re: Some advice of DSP here...

I remember this discussion from the first time he mentioned it.

Decimating by 3 without filtering folds the frequency content from 597kHz±20kHz down over the audible range.

Noise rate 0 (highest frequency) produces a moderate amount of aliasing from that range: -11 to -13 dBFS; so does noise rate 1 (-17dbFS to -21dBFS). Noise rate 2 (-23dBFS to -29dBFS) and beyond don't alias too much...

James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

### Re: Some advice of DSP here...

lidnariq wrote:Noise rate 0 (highest frequency) produces a moderate amount of aliasing from that range: -11 to -13 dBFS; so does noise rate 1 (-17dbFS to -21dBFS). Noise rate 2 (-23dBFS to -29dBFS) and beyond don't alias too much...
Noise rate 0 is ~447kHz. Wouldn't that be aliased to ~149kHz at Fs=596kHz? And noise rate 1 (~223kHz) shouldn't be aliased at all. Am I missing something?
get nemulator
http://nemulator.com

lidnariq
Posts: 9655
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

### Re: Some advice of DSP here...

The spectrum of the noise is quite broad:
4-sample-boxcar-FIR-spectrum-at-1789773Hz.png (3.94 KiB) Viewed 2325 times
So when you decimate by three, you fold the entire spectrum here into thirds:
The bend at ≈180kHz is because of the complex numbers producing constructive and destructive interference
decimated.png (3.54 KiB) Viewed 2325 times
Now, can we hear that aliasing as aliasing? I have no idea. I kinda bet the answer is no.

James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

### Re: Some advice of DSP here...

Where does the >447kHz content come from? Harmonics?
get nemulator
http://nemulator.com

lidnariq
Posts: 9655
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

### Re: Some advice of DSP here...

Take a discrete-time white noise source. LFSRs are white, so they produce equal energy at all frequencies. It sounds different from an impulse train because the phases differ from time period to time period.
( 1001 )

Define the sample rate to be the highest rate the APU generates, 447kHz. Now upsample by a factor of four to generate the CPU's "true" sample rate.
(Upsampling inserts zeros between each original sample. Upsampling also copies the spectrum, with reflection, so ABCD becomes ABCDDCBAABCDDCBA)
( 1000000000001000 )

All input magnitudes are still 1, even after upsampling.

Apply the four-sample FIR boxcar "sample-and-hold" filter. That multiplies by the spectrum in my previous post.
( 1111000000001111 )

This is true for all sample-and-hold filters ("boxcar" FIRs): they fail to filter out almost any harmonics.

zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

### Re: Some advice of DSP here...

James wrote:
rainwarrior wrote:Is there a practical IIR for resampling? (I've seen the analog equivalent in ADCs, but not in a digital resampler.)
I use an IIR-based resampler in nemulator. I generate an audio sample every 3rd CPU cycle (~596600Hz), then filter it with a 16th order IIR lowpass (equivalent to ~512th order FIR). I then interpolate (3rd order b-spline) between samples for arbitrary sample rate conversion.

The IIR filter is implemented with SSE instructions and computes biquad sections in parallel. It's fast and cache-friendly.
Did you use a Butterworth filter? I read that high order, digital IIR filters require maintaining a large number of decimal places to avoid aliasing as a consequence of the feedback mechanism. Are you computing everything with 64-bit doubles? How CPU intensive is this process?

James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

### Re: Some advice of DSP here...

zeroone wrote:
James wrote:
rainwarrior wrote:Is there a practical IIR for resampling? (I've seen the analog equivalent in ADCs, but not in a digital resampler.)
I use an IIR-based resampler in nemulator. I generate an audio sample every 3rd CPU cycle (~596600Hz), then filter it with a 16th order IIR lowpass (equivalent to ~512th order FIR). I then interpolate (3rd order b-spline) between samples for arbitrary sample rate conversion.

The IIR filter is implemented with SSE instructions and computes biquad sections in parallel. It's fast and cache-friendly.
Did you use a Butterworth filter? I read that high order, digital IIR filters require maintaining a large number of decimal places to avoid aliasing as a consequence of the feedback mechanism. Are you computing everything with 64-bit doubles? How CPU intensive is this process?
I use a 12th order Elliptic followed by a 4th order Butterworth. 32-bit single precision, no stability problems. It accounts for about 5% of the process' CPU utilization (which is ~8% on my 2.7GHz 2015 MacBook Pro).
get nemulator
http://nemulator.com

zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

### Re: Some advice of DSP here...

James wrote:I use a 12th order Elliptic followed by a 4th order Butterworth. 32-bit single precision, no stability problems. It accounts for about 5% of the process' CPU utilization (which is ~8% on my 2.7GHz 2015 MacBook Pro).
Why 2 filters? How did you test for stability problems?

James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

### Re: Some advice of DSP here...

zeroone wrote:
James wrote:I use a 12th order Elliptic followed by a 4th order Butterworth. 32-bit single precision, no stability problems. It accounts for about 5% of the process' CPU utilization (which is ~8% on my 2.7GHz 2015 MacBook Pro).
Why 2 filters? How did you test for stability problems?
The elliptic has a sharp cutoff, so I use it to kill frequencies > 20kHz. The Butterworth is used to more gently rolloff high (audible) frequencies.

re: stability - I designed it in Matlab, which will tell you if it's stable or not. It's not implemented as a single high-order filter (which would be unstable), but as a series of biquad filters.
get nemulator
http://nemulator.com

tepples
Posts: 22044
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

### Re: Some advice of DSP here...

I wonder if you can save any more CPU time by decimating the input somewhat between the elliptic and Butterworth stages, such as by providing only one out of every six outputs from elliptic to Butterworth.

James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

### Re: Some advice of DSP here...

tepples wrote:I wonder if you can save any more CPU time by decimating the input somewhat between the elliptic and Butterworth stages, such as by providing only one out of every six outputs from elliptic to Butterworth.
My implementation is vectorized, so the Butterworth stage adds no additional CPU time. In general though, yes, you could do that.
get nemulator
http://nemulator.com

zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

### Re: Some advice of DSP here...

James wrote:
tepples wrote:I wonder if you can save any more CPU time by decimating the input somewhat between the elliptic and Butterworth stages, such as by providing only one out of every six outputs from elliptic to Butterworth.
My implementation is vectorized, so the Butterworth stage adds no additional CPU time. In general though, yes, you could do that.
Can you explain what you mean by "vectorized"?

zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

### Re: Some advice of DSP here...

I decided to give some of the ideas discussed in this thread a try. Below is a decimator implemented in Java. It uses a low-pass Butterworth filter to strip out frequencies above the Nyquist frequency (half the output sampling frequency) and it uses cubic splines to generate the output samples. Its constructor allows you to configure the input sampling frequency, the output sampling frequency and the filter order. This image shows its frequency response curve for an order 16 filter with an output sampling frequency of 48000 Hz. The decimator uses 64-bit doubles, providing a reasonable range of stability. And, timing tests suggest that it does not consume a lot of CPU.

Samples to be decimated are added at the input sampling frequency via the addInputSample() method. The processOutputSample() method of the DecimatorListener, which is passed into the constructor, is called back at the output sampling rate with the decimated samples.

Per the discussion on this thread, I recommend generating audio samples every 3 CPU cycles. I.e. for NTSC, the input sampling rate would be 19687500.0 / (3.0 * 11.0) and for PAL, it would be 53203425.0 / (3.0 * 32.0). Unfortunately, if you try a higher rate, such as every other CPU cycle or every CPU cycle, at a filter order of 16, the Butterworth filter becomes unstable. For the output sampling frequency, I suggest either 44100.0 or 48000.0. The filter order needs to be a base-2 number and as mentioned, certain combinations of these values will result in an unusable filter.

Code: Select all

``````public class Decimator {

private static final double I2 = 1.0 / 2.0;
private static final double I3 = 1.0 / 3.0;
private static final double I6 = 1.0 / 6.0;

private final double[] inputSamples;
private final double[] outputSamples;
private final double[] A;
private final double[] B;
private final int filterOrder;
private final double cutoff;
private final double inputSamplingPeriod;
private final double ouputSamplingPeriod;
private final DecimatorListener listener;

private double time;
private int index;

// frequencies in Hz
public Decimator(
double inputSamplingFrequency,
double outputSamplingFrequency,
int filterOrder,
DecimatorListener listener) {

this.filterOrder = filterOrder;
this.listener = listener;

cutoff = outputSamplingFrequency / inputSamplingFrequency;
inputSamplingPeriod = 1.0 / inputSamplingFrequency;
ouputSamplingPeriod = 1.0 / outputSamplingFrequency;

inputSamples = new double[filterOrder << 1];
outputSamples = new double[inputSamples.length];

A = computeAs();
B = computeBs();
}

inputSamples[index] = value;

double outputSample = B[0] * value;
for (int i = filterOrder; i >= 1; i--) {
outputSample += B[i] * inputSamples[(index - i) & lengthMask]
- A[i] * outputSamples[(index - i) & lengthMask];
}
outputSamples[index] = outputSample;

time += inputSamplingPeriod;
if (time >= ouputSamplingPeriod) {
listener.processOutputSample(computeSpline(
1.0 - (time - ouputSamplingPeriod) / inputSamplingPeriod,
outputSamples[index]));
time -= ouputSamplingPeriod;
}

index = (index + 1) & lengthMask;
}

private double computeSpline(double t, double y0, double y1, double y2,
double y3) {
double c0 = y1;
double c1 = y2 - I3 * y0 - I2 * y1 - I6 * y3;
double c2 = I2 * (y0 + y2) - y1;
double c3 = I6 * (y3 - y0) + I2 * (y1 - y2);
return ((c3 * t + c2) * t + c1) * t + c0;
}

private double[] computeAs() {

final double theta = Math.PI * cutoff;
final double sinTheta = Math.sin(theta);
final double cosTheta = Math.cos(theta);

double[] binomials = new double[filterOrder << 1];

for (int i = filterOrder - 1; i >= 0; i--) {
int i2 = i << 1;
double poleAngle = Math.PI * (i2 + 1.0) / (filterOrder << 1);
double sinPoleAngle = Math.sin(poleAngle);
double cosPoleAngle = Math.cos(poleAngle);
double v = 1.0 + sinTheta * sinPoleAngle;
binomials[i2] = -cosTheta / v;
binomials[i2 + 1] = -sinTheta * cosPoleAngle / v;
}

binomials = computeBinomials(binomials);
double[] as = new double[filterOrder + 1];
as[0] = 1.0;
as[1] = binomials[0];
as[2] = binomials[2];
for (int i = filterOrder; i >= 3; i--) {
as[i] = binomials[(i << 1) - 2];
}

return as;
}

private double[] computeBs() {

final double[] bs = new double[filterOrder + 1];

bs[0] = 1.0;
bs[1] = filterOrder;

for (int i = 2; i <= filterOrder >> 1; i++) {
bs[i] = (filterOrder - i + 1) * bs[i - 1] / i;
bs[filterOrder - i] = bs[i];
}

bs[filterOrder - 1] = filterOrder;
bs[filterOrder] = 1;

double scale = computeScale();

for (int i = 0; i < bs.length; ++i) {
bs[i] *= scale;
}

return bs;
}

private double computeScale() {
double omega = Math.PI * cutoff;
double sinOmega = Math.sin(omega);
double poleAngle = Math.PI / (filterOrder << 1);

double scale = 1.0;
for (int i = (filterOrder >> 1) - 1; i >= 0; i--) {
scale *= 1.0 + sinOmega * Math.sin(((i << 1) + 1.0) * poleAngle);
}

omega /= 2.0;
sinOmega = Math.sin(omega);
if ((filterOrder & 1) == 1) {
scale *= sinOmega + Math.cos(omega);
}
scale = Math.pow(sinOmega, filterOrder) / scale;

return scale;
}

private double[] computeBinomials(final double[] C) {
final int N = C.length >> 1;
final double[] D = new double[N << 1];

for (int i = 0; i < N; i++) {
int i2 = i << 1;
for (int j = i; j > 0; j--) {
int j2 = j << 1;
D[j2] += C[i2] * D[j2 - 2] - C[i2 + 1] * D[j2 - 1];
D[j2 + 1] += C[i2] * D[j2 - 1] + C[i2 + 1] * D[j2 - 2];
}
D[0] += C[i2];
D[1] += C[i2 + 1];
}

return D;
}
}
``````

Code: Select all

``````public interface DecimatorListener {
void processOutputSample(double sample);
}``````