Emulation and streaming asynchronous audio resampling

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg »

I simply can't reliably detect the start of vblank and have the video code blitted to the screen. No matter how many places I tell bsnes to check to see if we've reached vblank (over 40,000 times a second), it's still not enough and I miss entire vblank periods 20% of the time. Most likely, there are too many parts in bsnes that eat up so much CPU time that it jumps right over vblank.
Can't you just add another pseudo hardware device to the emuated SNES that claims to be able to affect the CPU just after the beginning of every frame? Then that device would get control at the right spot every frame, without adding any checks in the emulator (since I'm assuming you already have the framework for this sort of event).
It's too bad video drivers and/or API developers are too incompetent to design APIs that handle page flipping completely transparently in the background without deadlocking your applications when you request to blit the image to the screen.
Could this be because your application is asking for another page flip before the first one has occurred, and the API must block that thread until the first completes?
Anyway, I've given up, sadly. I can't think of any way to do this, and I've exceeded my patience and run out of ideas.
This always happens to me when I don't want to slow down and approach the problem in isolation from the main project. At some point in the future I eventually do that, then spend a week or more experimenting with the concepts alone and figure it out. I have to become interested in the topic for its own sake, rather than as a mere problem to be solved and forgotten.
Funny that I just now realized that when sinimas made the comment that the number of audio samples generated per video frame should be constant, and while immediately thinking "no, that's not true", realized, "wait... actually, yes, that should be true".
The number of samples have to vary by one or two, since there is some fraction of a sample extra each frame.
I like cosine, because the graphs for it are prettier, they look almost identical to hermite, and besides the end points, very similar to cubic as well.
Cosine introduces discontinuities at each point, and it shows in frequency graphs. Here are the four compared (FIR using 11 point kernels), using a sweep from 16 kHz to 0 kHz in a 32 kHz sampled stream, resampled to 44.1 kHz by these.

Image

You can see the low frequency aliases in linear and cosine (cosine comes out worse in some ways), while Hermite and FIR have one that is mostly inaudible in the upper range.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

Can't you just add another pseudo hardware device to the emuated SNES that claims to be able to affect the CPU just after the beginning of every frame? Then that device would get control at the right spot every frame, without adding any checks in the emulator (since I'm assuming you already have the framework for this sort of event).
I like the idea. However, I have kind of an odd setup. I wanted to account for the possibly adding special clockrate chips to the emulator in the future (eg DSP-1, SuperFX, SA-1, etc ... though they mostly use the S-CPU clock rate fed to the cartridge pins anyway). So what I have is one variable for each two clocks that need to synchronize. Right now, there's just one for S-CPU <> S-SMP. Since the S-PPU1/2 and S-DSP are not emulated at the clock level, they are just enslaved to the CPU and SMP. Therefore, for CPU<>SMP, I keep one 64-bit variable. Whenever the CPU adds clocks, I subtract from this value by clocks * smpclockrate. Whenever the SMP adds clocks, I add to it by clocks *cpuclockrate. If the clocks were identical (eg CPU<>PPU), then the multiplication wouldn't be necessary.
I can detect if one processor is ahead by seeing if this value is >=0 or <0, respectively.
Now, the fun part is that to save speed, when I ask one processor to sync to the other, it will run nonstop until that processor needs to access the other processor, in which case it switches contexts and runs the other processor.
The way I break out, is that each time the S-CPU vcounter reaches 240 (where no video can be rendered, regardless of region or overscan settings), I context switch back to the main thread to end the "run_frame();" call.
I could add a function that does something like "keep running the SMP until the clock rate is as close to equal as possible", but the only way I could prevent it from running forever and/or switching back to the other processor is to add a check to break out when even right inside the core "add_clocks();" function for each processor. This would add a ton of overhead, since these functions are called millions of times a second. Same thing if I substituted the function with a function pointer that I switched out, indirect function call overhead would then add up. Right now the add_clocks functions that sync the two processors are force inlined and mainly consist of one add, mul and compare.
Lastly, this could get a lot more complex if and when more clock syncs were thrown into the mix. Still, it's a good idea, and the most viable one ...
Could this be because your application is asking for another page flip before the first one has occurred, and the API must block that thread until the first completes?
That's a very real possibility, however neither DDraw nor D3D give you a way to see if you already have a page flip that is pending, so you can hold off. If it did, that would be absolutely perfect.
This always happens to me when I don't want to slow down and approach the problem in isolation from the main project. At some point in the future I eventually do that, then spend a week or more experimenting with the concepts alone and figure it out. I have to become interested in the topic for its own sake, rather than as a mere problem to be solved and forgotten.
If you're still interested in this topic, then I certainly don't mind continuing to discuss it with you. I'd like to have a definitive solution for this problem as well :D
The number of samples have to vary by one or two, since there is some fraction of a sample extra each frame.
For the aforementioned reasons, I'm getting a lot more than that, sadly. Resampling by one sample or two per 25ms audio buffer should be quite easy.
Cosine introduces discontinuities at each point, and it shows in frequency graphs. Here are the four compared (FIR using 11 point kernels), using a sweep from 16 kHz to 0 kHz in a 32 kHz sampled stream, resampled to 44.1 kHz by these.

...

You can see the low frequency aliases in linear and cosine (cosine comes out worse in some ways), while Hermite and FIR have one that is mostly inaudible in the upper range.
To be honest, I really don't understand the graph or what you're meaning, but I have virtually no experience with audio. No need to explain it in layman's terms, though. I'll take your word (and picture) for it that the FIR resampler is best. By the way, how does cubic look on that graph? Comparable, or worse than hermite?
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Cubic and Hermite are often used to mean the same thing: interpolation based on the value and first differential at the start and end of each interval.
User avatar
James
Posts: 431
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

Post by James »

you can interpolate per sample rather than outright drop frames. I've been completely unable to think of how to do this, however.
My NES emulator does just what you're talking about. The emulation loop is synced to 60Hz (using my monitor's vsync rate, but it could use any timer). Every other frame, I check how full the DirectSound buffer is and adjust the playback frequency to compensate. I keep the buffer about 70% full so I never have to drop video frames to catch up or block waiting for the buffer to accept more data. The adjustments are small enough that there's no (obvious) audible frequency changes. This is with an 80ms buffer (at 70% full, an average of 3-4 frames latency) on a SoundBlaster X-Fi (so not sure how well it works with onboard audio).

If you're intersted, I'll clean up and post the current version of the code I use (the version on the web site uses a different technique, though the idea is similar).

James
get nemulator
http://nemulator.com
Post Reply