Yeah, the noise channel will never sound right unless the high frequencies are accounted for in some way. Averaging them is pretty good for that.
For a simple but pretty good sounding setup, I've used this in
a few projects:
- 1. Sample most channels at 192kHz (4 x 48kHz), but for the noise channel take an average of the full 1.8MHz output.
- 2. Apply simple highpass/lowpass IIR to the 192kHz output to approximate the NES' internal filters.
- 3. Average every 4 samples to downsample to 48kHz.
That will have some minor aliasing, especially at some edge cases, but it's a pretty good tradeoff of complexity/computation vs. sound quality. 32-bit integers are perfectly fine for this level of approach. 2x instead of 4x might be an acceptable trade as well, but in my tests 4x sounded better enough vs. the cost that it was worthwhile.
The periodic noise mode is one thing that tends to have harsh aliasing without some more thorough high frequency emulation / better downsampling, but I think the quality even for this simple version is quite acceptable. There's other edge cases with e.g. triangle set to its highest frequency in some games in lieu of being "silenced", or how MMC5 pulses can go to very high frequencies that 2A03 gets muted for. You can work around these by just muting them when at high frequencies, which might be an easier solution than trying to generate them and then filter them out.
For higher powered accuracy, I'd want to do everything at 1.8MHz and have a robust downsampler, probably a windowed sinc of some sort, but there's a lot of options. Blargg's
blip-buffer is an interesting implementation where it approximates a FIR by replacing the step changes of the input with a frequency-appropriate filtered step curve instead.