It is currently Thu Jul 19, 2018 2:33 pm

 All times are UTC - 7 hours

 Page 1 of 2 [ 16 posts ] Go to page 1, 2  Next
 Print view Previous topic | Next topic
Author Message
 Post subject: APU mixer function efficiencyPosted: Tue Oct 11, 2016 2:06 pm

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
The wiki contains an approximation that uses 2 lookup tables. However, since pulse, triangle and noise are 4-bit channels and DMC is a 7-bit channel, doesn't that mean a single lookup table with 2^23 = 8,388,608 entries could provide exact values? Assuming that 16-bit values are required, the lookup table only needs to be 16 MB. Are there any cache considerations for randomly reading from an array of that size?

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 2:13 pm

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7316
Location: Seattle
... Yes?
Enormous ones?

The largest in-die L3 cache I've heard of is on the order of 8 MiB, and you're talking about using more than twice that just for fixing APU non-linearity?

... That said, where did 2²³ come from? 4+4+7 = 15...

Honestly, even using 32 KiW isn't worthwhile. The lookup table approach is only useful on architectures where floating point or division on integers is really expensive, i.e. "before the pentium".

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 3:04 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6416
Location: Canada
The lookup tables are only one way to implement the given non-linearity formulae, which themselves are only an approximation of the actual output (which still needs more research, I think).

I don't think the wiki really needs to prescribe how to implement the non-linearity; the formula is the important part. The lookup table is just an implementation detail. Whether or not it's worth doing depends entirely on the context of implementation.

...and yes, those tables are much, much smaller than what you're saying.

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 3:15 pm

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
lidnariq wrote:
Honestly, even using 32 KiW isn't worthwhile. The lookup table approach is only useful on architectures where floating point or division on integers is really expensive, i.e. "before the pentium".

Let's put that claim to the test. Below is a Java application that compares 32-bit floating-point operations to lookup tables:

Code:
public final class BenchmarkApuMixers {

private static final float[] pulseTable = new float[31];
private static final float[] tndTable = new float[203];

static {
for(int i = pulseTable.length - 1; i >= 0; i--) {
pulseTable[i] = 95.52f / (8128f / i + 100f);
}
for(int i = tndTable.length - 1; i >= 0; i--) {
tndTable[i] = 163.67f / (24329f / i + 100f);
}
}

public static float noTableTest(final float pulse1, final float pulse2,
final float triangle, final float noise, final float dmc) {
return 95.88f / (8128f / (pulse1 + pulse2) + 100f) + 159.79f
/ ((1f / (triangle / 8227f + noise / 12241f + dmc / 22638f)) + 100f);
}

public static float lookupTableTest(final int pulse1, final int pulse2,
final int triangle, final int noise, final int dmc) {
return pulseTable[pulse1 + pulse2] + tndTable[3 * triangle + (noise << 1)
+ dmc];
}

public static void main(final String... args) throws Throwable {
float result = 0;
int x = 0;
int y = 0;
long startTime = System.nanoTime();
for(int i = 200_000_000; i >= 0; i--) {
if (++x == 16) {
x = 0;
}
if (++y == 128) {
y = 0;
}
result += noTableTest(x, x, x, x, y);
}
System.out.format("%f nanos/iteration%n", (System.nanoTime() - startTime)
/ 200_000_000.0);
result = 0;
x = 0;
y = 0;
startTime = System.nanoTime();
for(int i = 200_000_000; i >= 0; i--) {
if (++x == 16) {
x = 0;
}
if (++y == 128) {
y = 0;
}
result += lookupTableTest(x, x, x, x, y);
}
System.out.format("%f nanos/iteration%n", (System.nanoTime() - startTime)
/ 200_000_000.0);
System.out.println(result);
}
}

The results:

floating-point: 30.239289 nanos/iteration

lookup tables: 2.060711 nanos/iteration

That can't be right. The iterations are way too brief. This is only a 3 GHz machine.

Any suggestions for better ways to benchmark this?

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 6:24 pm

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
I tried both ways in my emulator and I can't visually see a difference in the task manager. This is a i5-2400 @ 3.1 GHz 64-bit Win7 box from mid-2012. My emulator is not particularly efficient. One of the 4 cores that appear in the task manager bounces between 30% and 40% load. Overall, it uses up about 15% of the available CPU. Maybe that's fine considering today's desktops.

To my ears, there is no audible difference. But, my hearing is not that great

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 7:49 pm

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7316
Location: Seattle
A 203-entry LUT is short enough to fit into a few lines of L1 cache, so that might be the difference you're seeing.

A tiny LUT is definitely a very different story than a multi-megabyte one.

(FWIW, adapting your core loop to C and compiling it with gcc -O3 -ffast-math produces 15ns/loop with floats, 1.9ns/loop with LUTs, and 14ns/loop (as well as wrong results) with integers)

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 7:56 pm

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
lidnariq wrote:
(FWIW, adapting your core loop to C and compiling it with gcc -O3 -ffast-math produces 15ns/loop with floats, 1.9ns/loop with LUTs, and 14ns/loop (as well as wrong results) with integers)

On what box?

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 8:05 pm

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7316
Location: Seattle
Haswell i3-4330. (nominally 3.5GHz peak, and 64B/line of cache)

Top

 Post subject: Re: APU mixer function efficiencyPosted: Tue Oct 11, 2016 8:32 pm

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 4056
Huge lookup tables = tons of cache misses = slow

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

Top

 Post subject: Re: APU mixer function efficiencyPosted: Wed Oct 12, 2016 6:50 am

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
The reason that I am reviewing the mixer code is that I want to provide sound options that enable the user to adjust the mix via sliders similar to many other emulators. Using floats, the code is straightforward and clean. Using integer math, things become messy. For ints only, a pre-scaled LUT could be used for each channel. Or, I could multiply by a constant and right-shift. But, it's not clear if either is really faster than a multiplication by a 32-bit float. Or, in my case, if it is even worth it; after mixing, my emulator applies filtering and decimation, both of which use a ton of 64-bit float multiplications, additions and table look-ups. In the fancy mixer formula, it's really the divisions that make me cringe. We're all taught to stay away from them for speed reasons, but everyone owns a personal super computer these days.

Top

 Post subject: Re: APU mixer function efficiencyPosted: Wed Oct 12, 2016 7:16 am

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20270
Location: NE Indiana, USA (NTSC)
zeroone wrote:
In the fancy mixer formula, it's really the divisions that make me cringe. We're all taught to stay away from them for speed reasons, but everyone owns a personal super computer these days.

Perhaps it's a matter of whether you want to target PCs or smartphones. On the one hand, people tend to want to play simpler games on the go rather than at home, and mobile devices with ARM CPUs have tended to perform far worse than desktop PCs at floating-point division in the past. On the other hand, the mobile devices that people are carrying nowadays have a touch screen and tilting the whole device as their only input methods, and these don't map quite so well to the twitchy actions that NES games tend to require on the Control Pad and A and B Buttons. Clip-on gamepads for phones exist, but I've never seen one in use in person.

Top

 Post subject: Re: APU mixer function efficiencyPosted: Wed Oct 12, 2016 10:30 am

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7316
Location: Seattle
I'm not certain exactly how the reciprocal-of-reciprocals model came about, but given how MOSFETs work it really "ought" to involve a square root instead. (wikipedia)

Top

 Post subject: Re: APU mixer function efficiencyPosted: Wed Oct 12, 2016 2:28 pm

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
The LUT approximation deviates from the float version by more than 10% when DMC is not being used. For instance, plug-in the max values for pulse1, pulse2, triangle, and noise, and set dmc to 0. Maybe there is an audible difference.

Top

 Post subject: Re: APU mixer function efficiencyPosted: Wed Oct 12, 2016 2:48 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6416
Location: Canada
zeroone wrote:
The LUT approximation deviates from the float version by more than 10% when DMC is not being used. For instance, plug-in the max values for pulse1, pulse2, triangle, and noise, and set dmc to 0. Maybe there is an audible difference.

Yeah, it's unclear why the lookup table implementation on the wiki is using different numbers than blargg's formula, maybe mostly because it's using 3 magic values instead of summing triangle + noise + dmc. (Is the actual mix an analog combination of 3 DACs or a digital sum output to 1 DAC?)

I hinted at this earlier, but you should know that this formula is an old approximation that blargg made up years ago to fit the data available at that time. It's not really that accurate to begin with, but it's better than the linear approximation that was used before it. Between Visual2A03, and the wealth of knowledge and available expertise this community has accumulated since then I'm sure someone could do a lot better if they were to take a crack at it now. (I intend to try once I'm finished my game project, if nobody else gets to it before then.)

As for whether you can hear a difference, probably you won't. The analog mix between the two APU outputs is highly variable from system to system anyway; every release of NSFPlay I get complaints on both sides of "this channel is too loud, it's quieter on my NES" and the other way around. Every NES is a little different in that way, and it's a much stronger difference than the kind of thing you'll notice between these approximations of the same formula.

Top

 Post subject: Re: APU mixer function efficiencyPosted: Wed Oct 12, 2016 3:05 pm

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 774
Location: New York, NY
According to the wiki, "When the values for one of the groups are all zero, the result for that group should be treated as zero rather than undefined due to the division by 0 that otherwise results."

This is not the case. In the limit, as the pulse channels approach 0, pulse_out smoothly approaches 0. Similarly, tnd_out smoothly approaches a constant. Consequentially, the checks are not necessary if the formulas are reduced to the expression below:

Code:
public static float mix(final int pulse1, final int pulse2,
final int triangle, final int noise, final int dmc) {

return (0.9588f * (pulse1 + pulse2)) / (pulse1 + pulse2 + 81.28f)
- 361.733f / (2.75167f * triangle + 1.84936f * noise + dmc + 226.38f)
+ 1.5979f;
}

Also, with fewer divisions, it should run a lot faster.

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 2 [ 16 posts ] Go to page 1, 2  Next

 All times are UTC - 7 hours

#### Who is online

Users browsing this forum: Google Adsense [Bot] and 4 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ NES / Famicom    NESdev    NESemdev    NES Graphics    NES Music    Homebrew Projects       2018 NESdev Competition       2017 NESdev Competition       2016 NESdev Competition       2014 NESdev Competition       2011 NESdev Competition    Newbie Help Center    NES Hardware and Flash Equipment       Reproduction    NESdev International       FCdev       NESdev China       NESdev Middle East Other    General Stuff    Membler Industries    Other Retro Dev       SNESdev       GBDev    Test Forum Site Issues    phpBB Issues    Web Issues    nesdevWiki
Powered by phpBB® Forum Software © phpBB Group