It is currently Mon Oct 23, 2017 10:42 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 37 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Tue Feb 21, 2017 11:50 am 
Offline

Joined: Sun Mar 27, 2011 10:49 am
Posts: 192
Doing a computed goto is (slightly) faster than using a switch, because you do it at the end of an opcode instead of looping back around. One jump instead of two.

Of course, Java doesn't have any faculties for doing a computed goto (AFAIK). And unconditional branches are all but free on a modern CPU anyway. On CPUs that execute at two billion cycles a second, a million extra unconditional jumps per second really, really doesn't matter.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 11:57 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
I'd like fceux to be more efficient, if only from a power saving standpoint. Currently it takes about 35% of one core, with X taking an additional 20% (apparently it draws inefficiently?).


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 12:11 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6303
Location: Seattle
calima wrote:
I'd like fceux to be more efficient, if only from a power saving standpoint. Currently it takes about 35% of one core, with X taking an additional 20% (apparently it draws inefficiently?).
Er. Even on my Athlon/1333 it was tremendously lighter weight than that...


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 12:20 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19122
Location: NE Indiana, USA (NTSC)
Overall, Atom and ARM tend slower than Athlon. An Atom's work per clock is close to that of a Pentium 4.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 12:32 pm 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
This is on a Phenom II, but the core is likely not at full speed.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 1:47 pm 
Offline

Joined: Fri Apr 29, 2011 9:44 pm
Posts: 267
Honestly, if 6502 emulation performance is a problem then you might be just not doing an optimal job at a serial implementation.

PPU is usually way more bottlenecky on slow CPUs.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 1:58 pm 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 984
Location: Pennsylvania, USA
It's actually working really well like I said. I'm kinda casting a wide net for improving the performance further. The actual issue I'm experiencing I think has to do with power save features on android devices, because it just mysteriously throttles down to like 1/4 speed even though my thread is doing the exact same work every time.

I don't even have a PPU :lol: not really, anyway. It just renders the tiles, straight, rather than scanline per scanline. I.e. not a real NES emulator, see GGVm thread if curious.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 3:38 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19122
Location: NE Indiana, USA (NTSC)
In other words, you have a HLE PPU (high level emulated Picture Processing Unit). The best comparison I guess would be PocketNES for Game Boy Advance, which also has a HLE PPU because it maps the NES's tiled backgrounds and sprites onto those of the GBA. Yet it somehow runs at full speed on a 16.8 MHz ARM7TDMI.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 7:12 pm 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 984
Location: Pennsylvania, USA
I have another question relevant to what I'm working on: I know that the CPU clock speed from the wiki is: 1.789773 mhz, which amounts to about 29829.55 cycles per 1/60th of a second, correct? How many actual instructions per frame does that amount to, on average? I am going to guess the average amount of cycles taken by any given instruction is about 3 (seeing some as low as 2 and some as high as 7)?

Right now, one of the metrics my cpu spits out is "instructions per second." I have no concept of ticks or cycles, it is a purely high level cpu simulator. Thus when I'm seeing a metric such as: 2,570,423 instructions per second, this is ridiculously faster than the NES needs to be, if I'm correct in the above paragraph at all.

One thing I learned today is Android throttles down CPUs when they get hot. I do seem to observe a degradation of performance over time, and if I let it sit and start over the speed is restored.

Perhaps all I need to do is manually throttle the cpu with Thread.nanosleep, at least on mobile devices in an attempt to stress the cpu less? After all, there's absolutely no reason to be "overclocking" to the extreme degree that it is right now.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 7:38 pm 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 984
Location: Pennsylvania, USA
...hmm...from what I'm reading, sleep will still use the cpu. Sounds like I want wait. Problem is I can't know when I should do that. Since GGVm already has game-specific knowledge perhaps I can tell it where all nmi wait spin loops are and do wait/notify to give the thread some rest between frames.


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 7:43 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19122
Location: NE Indiana, USA (NTSC)
Dots per frame = 341 * 262 - 0.5 = 89341.5
Cycles per frame = 89341.5 / 3 = 29780.5
where each "cycle" is one time the CPU reads or writes memory, including dummy reads for certain instructions

So how hard would it be to make your CPU spit out the metric "memory reads and writes per second"?

Yes, once you run out of memory accesses, blocking until the next host vblank would be a good idea.

Another good idea is automatic speed hacking, as implemented in PocketNES. if the system reads a location in a tight loop, stop the CPU until the next interrupt, like this:
Code:
  lda nmis
  :
    cmp nmis
    beq :-


"At least one new post has been made to this topic. You may wish to review your post in light of this."

GradualGames wrote:
Since GGVm already has game-specific knowledge perhaps I can tell it where all nmi wait spin loops are

Can you tell it to automatically recognize patterns like that?


Top
 Profile  
 
PostPosted: Tue Feb 21, 2017 8:34 pm 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 984
Location: Pennsylvania, USA
I think I have a proof of concept working for a hard-coded example (pattern recognition can wait, especially if this doesn't wind up solving the hot cpu problem on android devices). Now I just need to figure out how to measure instructions per second correctly taking into account when the thread is sleeping, so I can observe whether this really does get me an improvement or not.


Top
 Profile  
 
PostPosted: Wed Feb 22, 2017 5:16 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
Here's the top oprofile results from fceux if anyone's interested. Indeed not the 6502 core, but sound and drawing are taking most of the time.

Code:
samples  %        image name               symbol name
12098    31.7391  fceuxg                   NeoFilterSound(int*, int*, unsigned int, int*)
10644    27.9245  fceuxg                   Blit8ToHigh(unsigned char*, unsigned char*, int, int, int, int, int)
2273      5.9632  fceuxg                   RefreshLine(int)
1948      5.1106  libasound_module_rate_speexrate.so resampler_basic_interpolate_single
1869      4.9033  fceuxg                   X6502_RunDebug(int)
979       2.5684  fceuxg                   RDoSQ1()
843       2.2116  fceuxg                   RDoSQ2()
809       2.1224  fceuxg                   FCEUPPU_Loop(int)
749       1.9650  fceuxg                   FlushEmulateSound()
741       1.9440  libc-2.7.so              memset


Top
 Profile  
 
PostPosted: Wed Feb 22, 2017 8:32 pm 
Online
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 3944
You get the biggest CPU emulation speedup from *idle loop skipping*. But unless you're on a 16MHz ARM or something, you probably won't notice.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
PostPosted: Wed Mar 01, 2017 9:07 am 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 984
Location: Pennsylvania, USA
Welp, as a quick update, it turns out my 6502 core wasn't the bottleneck after all, it was how I was using opengl. One game was running around 23fps on a 3 year old phone of mine---changed some things in how I'm using opengl and now its 60fps. Crazy stuff...

That said...I actually found a simpler way to use wait/notify on my cpu thread. I just count the instructions and when it reaches 10,000 (a rough estimate based on roughly 30,000 cycles per frame) it blocks, and then I notify on every nmi.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 37 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Dwedit and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group