It is currently Wed Oct 18, 2017 4:27 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: cpu emu implementation
PostPosted: Wed Aug 10, 2005 10:55 pm 
Offline
User avatar

Joined: Tue Dec 21, 2004 8:35 pm
Posts: 600
Location: Argentina
My emu was horrible, i had a lot of inaccuracy in timing and it runs slow on a Pentium 3 933mhz. Altought i done it in directX
So i decided to rewrite my cpu core.. and i have a question:

let say that we have a big:

Code:
...
switch(opcode)
{
 case X:
  OpcodeEmuFunctionX();
 break;

 ...
 case N:
  OpcodeEmuFunctionN();
 break;


}
...


i have a question it isn't faster to have an array of functions pointers and call the appropiate function?
it is supposed that the addressing mode is done in the "OpcodeEmuFunction()";

_________________
ANes


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 11, 2005 12:56 am 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Now is a great time that you could check out the profiling tools. With them you can actually write different test programs and find out which is fastest (and by how much). For simple cases like this you can look up the answer, but for a real program, there is no substitute for a profiler (even an expert programmer can give you no more than speculation based on experience with past programs).


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 11, 2005 10:47 am 
Offline
User avatar

Joined: Thu Oct 21, 2004 4:02 pm
Posts: 210
Location: San Diego
I've had performance issues with my emulator as well, and have re-designed it multiple times. I've had incarnations of the CPU of both the "giant switch/case" and "array of function pointer" type. What I found was that the "switch" was faster if your compiler was smart enough to implement it as a jump table (Visual Studio did, as well as gcc with -O3). This is because in the "function pointer" solution you are always making function calls (they cannot be inlined in this solution) which uses the stack, which wastes time.

However, the most important thing I learned was that the CPU was NOT the source of my sluggishness. In fact, the CPU was running over 900% real time speeds on a 350Mhz P2 in either situation. The PPU is actually much harder to implement efficiently, as it does lot's of operations that don't map well to x86 instructions. However, my PPU was "fast enough" as well. In my situation the culprit was actually the code that drew the pixels on the screen! This was true for both DirextX and SDL graphic libraries. I'm pretty sure I am doing the pixel drawing as efficiently as possible (drawing to the backbuffer, locking before acceses, etc.) but it still is insanely slow.
So I second the above advice, use a profiler and figure out where the bottleneck is, it likely is not where you think.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 11, 2005 12:13 pm 
Offline

Joined: Sun Jul 10, 2005 10:40 am
Posts: 8
google has a good profiler at code.google.com under perftools.

As far a switch vs function pointer. The best thing to do is put the opcodes that are used the most in the switch and lesser used opcodes in a function pointer array. The reason for this is to prevent cache busting. Though the CPU is probably not the bottle neck, unless your addressing calls are not inlined. The PPU rendering is my biggest bottle neck, and I greatly improved it by creating a meta bitmap and rendering the whole thing at one time. 1 function call vs 240 .... 35% performance boost. All in all, everyting SHOULD take less than 500,000,000 cycles/sec.

Also, be carfull about O flags. A Os flag gives me the best performance, while 03 eats too much memory, and slows things down a lot. But this is all gcc and not MVC.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 11, 2005 5:40 pm 
Offline
User avatar

Joined: Tue Dec 21, 2004 8:35 pm
Posts: 600
Location: Argentina
thanks both for the advice.

_________________
ANes


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 25, 2005 5:39 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
I'm posting my reply here since it seems appropriate to this thread. There has been recent blurring of thread topics (that NES Snake thread was the worst heh) and I'm trying to keep things relevant to the topic.

Fx3 wrote:
(originally posted to the thread "Reading opcodes directly without read function")

Opcodes that access RAM (or the stack) have a pointer, rather than a (*hook)(). After fetching the opcode, it jumps to proper address mode (goto _address_mode_XX), and another jumptable to execute the opcode. I could reduce the code size by more than 75% because a few addressing modes (after the proper opcode/address data fetching) do the same of others.


I see, you're decoding the addressing mode of the instruction, then the opcode. Due to the 6502's very regular opcode layout, this is fairly easy to do. This way the address mode is handled in one place and without any function calls.

How do you efficiently determine whether the instruction has an addressing mode, and which set of modes it uses? (some instructions have a restricted set, or slight changes, i.e. LDX zp,Y).


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 28, 2005 11:26 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
Example: LDA ( A = value ). So, value = readvalue(offset), where offset is BYTE (immediate byte, zero-page) or WORD (absolute). ^_^;;

_________________
Zepper
RockNES developer


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group