Understanding sound for the first time
Moderator: Moderators
-
- Posts: 11
- Joined: Sun Aug 16, 2015 9:02 am
Understanding sound for the first time
I have my NES emulator working great (Input, graphics, etc.) and am moving on to getting sound. I read through the wiki to understand how the APU works. There are a lot of technical terms like sampling, length counters, envelopes, pulse/triangle waves, etc.
Up to this point, everything has made sense to me. My issue is that I have no experience working with sound engineering or signal processing, so I'm not even sure where to start.
Is there some kind of book or class I can sign up for that would give me a basic understanding of whats going on here? The books I've skimmed through (mainly undergraduate DSP textbooks) have so much prerequisite knowledge and advanced math that isn't relevant to what is needed to get sound working. I just don't want to spend months reading through those kind of books only to discover that they arn't applicable to doing APU emulation.
I'm just looking for a starting point... Has anyone had the same issue, and if so, what books/articles/websites did you turn to for understanding the prerequisite knowledge?
Up to this point, everything has made sense to me. My issue is that I have no experience working with sound engineering or signal processing, so I'm not even sure where to start.
Is there some kind of book or class I can sign up for that would give me a basic understanding of whats going on here? The books I've skimmed through (mainly undergraduate DSP textbooks) have so much prerequisite knowledge and advanced math that isn't relevant to what is needed to get sound working. I just don't want to spend months reading through those kind of books only to discover that they arn't applicable to doing APU emulation.
I'm just looking for a starting point... Has anyone had the same issue, and if so, what books/articles/websites did you turn to for understanding the prerequisite knowledge?
Re: Understanding sound for the first time
Don't worry about DSP for now. Once you get it working, you may choose to learn some DSP stuff to implement better resampling, etc., but there's no DSP involved in generating the APU output (well, unless you count downsampling the signal to e.g., 48kHz, but that's as simple as taking every ~37th sample -- no complex math involved).
As far as emulating the APU goes... it was hard for me to wrap my head around it as well and I'm sure someone else will explain it better than I can. Blargg's APU reference (http://nesdev.com/apu_ref.txt), particularly the diagrams, was probably what helped me the most. Check that out if you haven't already.
As far as emulating the APU goes... it was hard for me to wrap my head around it as well and I'm sure someone else will explain it better than I can. Blargg's APU reference (http://nesdev.com/apu_ref.txt), particularly the diagrams, was probably what helped me the most. Check that out if you haven't already.
get nemulator
http://nemulator.com
http://nemulator.com
Re: Understanding sound for the first time
@Bowie90333212391 Do you have access to an API that can convert a stream of samples into sound? For instance, are you able to generate a pure tone from the points of a sine wave?
- Jarhmander
- Formerly ~J-@D!~
- Posts: 569
- Joined: Sun Mar 12, 2006 12:36 am
- Location: Rive nord de Montréal
Re: Understanding sound for the first time
If you proceed to downsample like this, it will work, but the resulting sound will be a bit harsh because of aliasing. The good news is, DSP knowledge is only required for the downsampling operation, and even then, there are libraries available that does just that.James wrote:Don't worry about DSP for now. Once you get it working, you may choose to learn some DSP stuff to implement better resampling, etc., but there's no DSP involved in generating the APU output (well, unless you count downsampling the signal to e.g., 48kHz, but that's as simple as taking every ~37th sample -- no complex math involved).
Do you have trouble playing a software-generated sound, or understanding how the APU make sound waves from the APU registers?
((λ (x) (x x)) (λ (x) (x x)))
- rainwarrior
- Posts: 8734
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Understanding sound for the first time
The simplest way to downsample is just take several samples and average them. If you generate samples at 4x your samplerate, this is enough to knock off the most unpleasant parts of the aliasing. (It's not perfect, by any means, but it's very easy to implement and has passable quality.)
To get noise to sound correct, I recommend manually averaging it over the cycles run per sample (i.e. having a "mini" downsampler inside the noise generator). It's the one thing on the NES that really won't sound right without significant oversampling.
To get noise to sound correct, I recommend manually averaging it over the cycles run per sample (i.e. having a "mini" downsampler inside the noise generator). It's the one thing on the NES that really won't sound right without significant oversampling.
-
- Posts: 11
- Joined: Sun Aug 16, 2015 9:02 am
Re: Understanding sound for the first time
Okay I'll take a look at the resources here you guys posted. I'm using SDL, and I'm just trying to understand the basics including the terminology. I guess my issue is that I lack the prerequisite knowledge to understand sound (literally zero knowledge). This is pretty clear when I was referring to DSP, which I erroneously thought was necessary for just playing the sound.
I also found this post viewtopic.php?t=8491 which is very resourceful.
I'll read through everything and see what I can get from all this. Thanks guys.
I also found this post viewtopic.php?t=8491 which is very resourceful.
I'll read through everything and see what I can get from all this. Thanks guys.
Re: Understanding sound for the first time
THE BASICS:
If you've ever used a music player that has an oscilloscope view, you'll be able to get the idea easier. Here's a video of an actual oscilloscope:
https://youtu.be/pdC_aITNFG0
Basically an oscilloscope lets you see the movement of the sound wave. These waves roughly correspond to the movement of the cone inside your speaker. No movement = no sound.
"Wider" waves are lower pitch.
"Taller" waves are louder.
"PCM" is a digital recording of such a sound wave. If you open a wav or other kind of PCM audio file in a wave editor like GoldWave or SoundForge, you'll be able to see the entire wave all at once on a graph. Here's an example:
http://imgur.com/a/uSTb5
Digital audio is just a really long wave/graph like that.
If you think of it in terms of speaker movement, the 'Y' axis is the position of the speaker, and the 'X' axis is time.
A "sample" is merely 1 point on that graph. Since computers can't really record the analog movement of a speaker, they "sample" it -- or take a snapshot of it -- every couple of milliseconds. You can think of it the same way as video --- in video you have a "frame" which is a snapshot of what is being output to the monitor at the given time.... whereas with audio you have a "sample" which is a snapshot of the speaker cone position at the given time.
Video needs to have a 'framerate' indicating the time between individual frames. 60 FPS is going to move through individual frames twice as fast as 30 FPS, and thus produce smoother output for the user. Similarly, audio has a 'samplerate' indicating the time between individual samples. 44100 samples per second is going to move through individual samples twice as fast as 22050 samples per second -- and just like with video, higher samplerates typically produce smoother output*. Samplerates are measured in hertz (Hz), which basically is just "times per second". So 44100 Hz == 44100 samples per second.
The hardest part with NES emulation -- and all this talk about "downsampling" the others are throwing at you -- is because the NES will output 1 sample every CPU cycle. This means it outputs a samplerate of 1789772.7272 Hz ... much higher than your computer outputs. Your emulator will probably only want to output 44100 Hz, which means you have to find a way to transform the NES samplerate down to the more reasonable PC samplerate.
The easiest of easy way to this is a "nearest neighbor" approach. 1789772.7272 / 44100 = ~40.58 ... so you can get away with outputting 1 sample every 40.5 CPU cycles and simply dropping the rest. The sound quality won't be great, but it'll be recognizable.
A better approach would be "linear interpolation" -- which could be done by creating an average of the output for ~40.5 cycles and output that. This is a bit more CPU intensive, but will sound MUCH better and is totally passable as far as quality goes.
There are other techniques that are more complicated, but get really good quality with lower CPU demand. Blargg wrote a document on "Band limited synthesis" which is the technique I use. It's worth checking out later, but I would not recommend trying it for your very first time.
* There's a limit as to how high of a samplerate/framerate you need, though. Human hearing is not perfect and once you reach a certain samplerate you don't really gain anything by going any higher. 44100 Hz is generally accepted to be that threshold which is why it's the common standard.
If you've ever used a music player that has an oscilloscope view, you'll be able to get the idea easier. Here's a video of an actual oscilloscope:
https://youtu.be/pdC_aITNFG0
Basically an oscilloscope lets you see the movement of the sound wave. These waves roughly correspond to the movement of the cone inside your speaker. No movement = no sound.
"Wider" waves are lower pitch.
"Taller" waves are louder.
"PCM" is a digital recording of such a sound wave. If you open a wav or other kind of PCM audio file in a wave editor like GoldWave or SoundForge, you'll be able to see the entire wave all at once on a graph. Here's an example:
http://imgur.com/a/uSTb5
Digital audio is just a really long wave/graph like that.
If you think of it in terms of speaker movement, the 'Y' axis is the position of the speaker, and the 'X' axis is time.
A "sample" is merely 1 point on that graph. Since computers can't really record the analog movement of a speaker, they "sample" it -- or take a snapshot of it -- every couple of milliseconds. You can think of it the same way as video --- in video you have a "frame" which is a snapshot of what is being output to the monitor at the given time.... whereas with audio you have a "sample" which is a snapshot of the speaker cone position at the given time.
Video needs to have a 'framerate' indicating the time between individual frames. 60 FPS is going to move through individual frames twice as fast as 30 FPS, and thus produce smoother output for the user. Similarly, audio has a 'samplerate' indicating the time between individual samples. 44100 samples per second is going to move through individual samples twice as fast as 22050 samples per second -- and just like with video, higher samplerates typically produce smoother output*. Samplerates are measured in hertz (Hz), which basically is just "times per second". So 44100 Hz == 44100 samples per second.
The hardest part with NES emulation -- and all this talk about "downsampling" the others are throwing at you -- is because the NES will output 1 sample every CPU cycle. This means it outputs a samplerate of 1789772.7272 Hz ... much higher than your computer outputs. Your emulator will probably only want to output 44100 Hz, which means you have to find a way to transform the NES samplerate down to the more reasonable PC samplerate.
The easiest of easy way to this is a "nearest neighbor" approach. 1789772.7272 / 44100 = ~40.58 ... so you can get away with outputting 1 sample every 40.5 CPU cycles and simply dropping the rest. The sound quality won't be great, but it'll be recognizable.
A better approach would be "linear interpolation" -- which could be done by creating an average of the output for ~40.5 cycles and output that. This is a bit more CPU intensive, but will sound MUCH better and is totally passable as far as quality goes.
There are other techniques that are more complicated, but get really good quality with lower CPU demand. Blargg wrote a document on "Band limited synthesis" which is the technique I use. It's worth checking out later, but I would not recommend trying it for your very first time.
* There's a limit as to how high of a samplerate/framerate you need, though. Human hearing is not perfect and once you reach a certain samplerate you don't really gain anything by going any higher. 44100 Hz is generally accepted to be that threshold which is why it's the common standard.
- rainwarrior
- Posts: 8734
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Understanding sound for the first time
The recommended samplerate these days is 48000 Hz.Disch wrote:* There's a limit as to how high of a samplerate/framerate you need, though. Human hearing is not perfect and once you reach a certain samplerate you don't really gain anything by going any higher. 44100 Hz is generally accepted to be that threshold which is why it's the common standard.
This doesn't have much to do with human hearing, it's just that most devices these days will use that by default instead of 44100 Hz (which used to be the most common), and a lot of drivers have crummy resamplers that add unpleasant ringing/distortion to source audio that is delivered to the driver at 44100 Hz.
Re: Understanding sound for the first time
I thought they settled on 44HKz back in CD days because human hearing can't pick up frequencies above 22KHz
But whatever, I'll take your word for it. =)
But whatever, I'll take your word for it. =)
Re: Understanding sound for the first time
Wikipedia has, of course, much ink spilled about the origin of the 44.1kHz sample rate.
- rainwarrior
- Posts: 8734
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Understanding sound for the first time
I meant that the standard samplerate was changed to 48000 Hz, but not to accomodate human hearing (44100 Hz already encodes enough information to cover human hearing, but 48000 Hz had some implementation advantages).
- TmEE
- Posts: 960
- Joined: Wed Feb 13, 2008 9:10 am
- Location: Norway (50 and 60Hz compatible :P)
- Contact:
Re: Understanding sound for the first time
Since introduction of AC97, pretty much all sound cards have 24.576000MHz clock the only thing connected to the codec chip or ADC or DAC making them 32000Hz, 48000Hz and their multiples based (32000 * 768, 48000 * 512). 44100Hz (and multiples) are based on 33.868800MHz or 16.934400MHz clock (44100 * 768 or 384).
If possible 48/64/96/128/144/192KHz should be the preferred choice, it will avoid some most probably nasty resampling on the OS/driver side on most hardware.
If possible 48/64/96/128/144/192KHz should be the preferred choice, it will avoid some most probably nasty resampling on the OS/driver side on most hardware.