It is currently Fri Oct 20, 2017 2:01 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: emulator performance?
PostPosted: Wed Nov 10, 2010 12:32 pm 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
(In the interest of not turning this into a speed vs. accuracy debate, let's just assume that any cycle accurate emulator that can handle traditionally tough to emulate games without hacks is equal).

I've been working on optimizing my emulator's code recently and have bumped up the performance of the cycle accurate mode to >500fps (Intel i7 920 w/ Nvidia 9800 GTX). Just wondering how this compares to other emulators out there. In other words, have I been successful or do I have a lot of room for improvement?

James


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 12:53 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
It depends on how you define success. What requirements do you need to meet to succeed? Put another way, what does this performance allow that the previous less-optimized one didn't? Faster fast-forward? More simultaneous emulators at once?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 1:01 pm 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
Quote:
What requirements do you need to meet to succeed?...More simultaneous emulators at once?

My emulator drops to a scanline-accurate emulation mode in the selection menu because cycle-accurate is too slow. Ultimately, I'd like to run the cycle-accurate mode throughout. Still not there today (at 60 fps, at least), but there is some point where software efficiency and hardware speed will allow it. I can help one of those along.

Beyond that, there isn't a particular target in mind. I like the optimization process and am just curious as to how my work compares to others.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 1:06 pm 
Offline

Joined: Thu Jul 13, 2006 3:15 pm
Posts: 177
Sounds extremely good to me. My emulator is nowhere near that.

On my machine, I typically get 40 fps, and my emulator does not yet support sound, has major PPU issues (like SMB title screen) and only about 6 mappers implemented.

For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.

Al


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 1:23 pm 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 3943
I think Nesticle and Famtasia are the fastest Windows-based emulators right now, mainly because nobody ever ported LoopyNES to Windows and brought its accuracy up a few notches.

What do you consider to be "cycle-accurate"? Does that mean that it would simulate explicit reads and writes for each cycle within the instruction, and possibly execute something triggered for each access? Does that mean merely getting page-crossing timing correct?

What do you consider to be "hacks"? Detecting a game and tweaking the timing slightly? Idle loop skipping?

Idle loop skipping is some really good stuff, especially when you don't need to emulate the PPU.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 1:38 pm 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
Dwedit wrote:
What do you consider to be "cycle-accurate"?

Yeah, I guess that's a little vague. PPU cycle accurate. Enough for mid-scanline effects to work properly.

Dwedit wrote:
What do you consider to be "hacks"?

For example (from this thread: http://nesdev.com/bbs/viewtopic.php?t=6736), detecting Battletoads and forcing sprite 0 hits at a specific time to work around timing issues.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 2:30 pm 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
albailey wrote:
For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.

That's the attitude that's kept me going all these years. It was a long time before I could get Battletoads working, but all I learned along the way was the real reward. Keep it up!

_________________
get nemulator
http://nemulator.com


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 10, 2010 7:13 pm 
Offline
NESICIDE developer
User avatar

Joined: Mon Oct 13, 2008 7:55 pm
Posts: 1026
Location: Minneapolis, MN
James wrote:
albailey wrote:
For me, I've learned a ton and have enjoyed a lot of the time spent on the emulator, so even if I can never get Battletoads working and playing at full rate, I'm still happy.

That's the attitude that's kept me going all these years. It was a long time before I could get Battletoads working, but all I learned along the way was the real reward. Keep it up!


I couldn't agree with this more. My emulator is getting more and more accurate as the days go by--141 of 163 test roms passing! At least for me it runs sufficiently fast but I am having problems with others who use Win7 64-bit having sub-par performance.

The quest for accuracy and performance is most of the fun!


Top
 Profile  
 
PostPosted: Mon Nov 15, 2010 8:09 am 
Offline

Joined: Mon Oct 06, 2008 6:03 pm
Posts: 40
James wrote:
I've been working on optimizing my emulator's code recently and have bumped up the performance of the cycle accurate mode to >500fps (Intel i7 920 w/ Nvidia 9800 GTX). Just wondering how this compares to other emulators out there. In other words, have I been successful or do I have a lot of room for improvement?
James

My emulator is not exactly cycle-accurate (though it can handle most mid-frame PPU effects) and it runs at > 1000 FPS on an Intel i5-760 processor, for what its worth. (This is without actually copying the PPU/APU output to the screen/sound card; i.e. just calling my "calc frame" function inside a timed loop.)

What areas of your code have you been optimizing? Find any good tricks? I've been working on speeding up my emulation core over the past month and have made about a 20% improvement. I still have some more areas I want to look into, but when I'm done I was planning on posting a list of things that happened to boost performance for my particular emulator implementation. For example, I profiled a lot of games and found that LDA (zero page) was by far the most frequent instruction (accounting for about 16% of all instructions) and added a special case for that particular opcode which sped things up. Not exactly ground-breaking stuff, but it was helpful to me so maybe it will be helpful for someone else. :)


Quote:
At least for me it runs sufficiently fast but I am having problems with others who use Win7 64-bit having sub-par performance.

I just bought a new computer with Windows 7 64-bit and was disappointed to see that my emulator ran significantly worse than on a lesser machine running XP. Very frustrating. I think it is because I only have GDI and DirectDraw-based renderers, and neither appears to be hardware accelerated in Windows 7. Hopefully a Direct2D renderer will perform better.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 15, 2010 7:36 pm 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
Quote:
My emulator is not exactly cycle-accurate

What method are you using? Looks like it might be scanline-based and, if so, I'm interested in hearing about how you handle mid-frame effects. My scanline based rendered is a lot faster than the cycle accurate one, but it can't handle, for example, Marble Madness.

Quote:
What areas of your code have you been optimizing? Find any good tricks?

Nothing especially fancy. I've been doing stuff like using look up tables where it makes sense (pattern bit interleaving, attribute table stuff, etc.), and, in general, just running under a profiler and focusing on hot spots. The biggest improvements have come from rethinking stuff that's specific to my implementation.

Quote:
DirectDraw-based renderers, and neither appears to be hardware accelerated in Windows 7.

This was why I switched from DirectDraw to Direct3D -- not just for performance reasons, but also because blits on Vista+ are no longer bilenearly filtered (yeah, I could roll my own, but...). With Direct3D, I'm simply rendering a texture mapped quad and it's quite fast, I haven't tried Direct2D.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 16, 2010 7:48 am 
Offline

Joined: Mon Oct 06, 2008 6:03 pm
Posts: 40
James wrote:
What method are you using? Looks like it might be scanline-based and, if so, I'm interested in hearing about how you handle mid-frame effects. My scanline based rendered is a lot faster than the cycle accurate one, but it can't handle, for example, Marble Madness.

My approach is almost tile-based; I try to do the cycle-accurate "catch-up" design but I only sync between CPU instructions; I do not sync between all of the individual stages of a single instruction. I also do some cheating in the PPU emulation to try to make the code run a little faster. It's good enough to run games like Marble Madness and Rad Racer but it's definitely a step below the most accurate emulators out there now. A re-design is probably about 6 years overdue. :D


Quote:
This was why I switched from DirectDraw to Direct3D -- not just for performance reasons, but also because blits on Vista+ are no longer bilenearly filtered (yeah, I could roll my own, but...). With Direct3D, I'm simply rendering a texture mapped quad and it's quite fast, I haven't tried Direct2D.

That's encouraging to hear that you are getting good performance with Direct3D. As I understand it Direct2D is just a wrapper on top of Direct3D so it should perform similarly well.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 16, 2010 9:59 am 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
Hmm... it would be easy enough to convert my scanline engine into a tile-based one. Might give that a try for the boost in compatibility.

Quote:
I try to do the cycle-accurate "catch-up" design but I only sync between CPU instructions; I do not sync between all of the individual stages of a single instruction.

FWIW, I'm using PPU cycles as my timebase and am calling my CPU code every 3 ticks (NTSC only). It was easy to implement and, while I could probably get the biggest boost in performance by converting this to a catch-up design, it's not as slow as I thought it would be (heck, I think it's actually pretty fast).

Quote:
That's encouraging to hear that you are getting good performance with Direct3D. As I understand it Direct2D is just a wrapper on top of Direct3D so it should perform similarly well.

Yeah, I'm sure it will work well. My benchmarks are done with rendering enabled and I'm getting >1700 fps with the scanline engine. It's definitely not a bottleneck!


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 16, 2010 4:07 pm 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
James wrote:
FWIW, I'm using PPU cycles as my timebase and am calling my CPU code every 3 ticks (NTSC only).


- Odd. I though you should run 1 CPU cycle, then call the PPU to run 3 dots (pixels). You do the reverse... :) Interesting, anyway.

- My emu gets around 120FPS in my Core2Duo 2GHz. In a Pentium 4, it doesn't run at full speed if I use the blitter to double the image size & stretch it.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 17, 2010 7:01 pm 
Offline
NESICIDE developer
User avatar

Joined: Mon Oct 13, 2008 7:55 pm
Posts: 1026
Location: Minneapolis, MN
Zepper wrote:
James wrote:
FWIW, I'm using PPU cycles as my timebase and am calling my CPU code every 3 ticks (NTSC only).


- Odd. I though you should run 1 CPU cycle, then call the PPU to run 3 dots (pixels). You do the reverse... :) Interesting, anyway.

- My emu gets around 120FPS in my Core2Duo 2GHz. In a Pentium 4, it doesn't run at full speed if I use the blitter to double the image size & stretch it.


I also do it by PPU cycles, running one CPU and APU cycle every third PPU cycle...seems the most logical way. :shock:


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 17, 2010 7:12 pm 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
NESICIDE wrote:
I also do it by PPU cycles, running one CPU and APU cycle every third PPU cycle...seems the most logical way. :shock:


- You mean after the third PPU cycle...?

- Why "most logical way"? Indeed, I use PPU cycles to control the emulation timing. The only cycle counter used here is for PPU: from 0 to 341, plus the scanline counter, obviously.

I smell an offtopic discussion


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group