Multiple switch for CPU emulation

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

tepples
Posts: 22281
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Fri Dec 04, 2009 12:15 pm

essial wrote:I also help my friend with developing an x86 operating system (for fun of course), and I sort of gotten used to the compiler doing bad things when optimizations were turned on
Perhaps you just forgot to express the correct semantics of your code to the compiler. For example, consider carefully which variables need to be marked "volatile" or "const" or both. Turning optimizations off worked because compilers just go make all variables volatile.
But secondly, it's a matter of style. I like my style and you like yours
The style of doing as much as possible in C makes your code more portable to other platforms with a different instruction set.

User avatar
essial
Posts: 72
Joined: Thu Dec 03, 2009 8:20 am

Post by essial » Fri Dec 04, 2009 12:36 pm

tepples wrote:Perhaps you just forgot to express the correct semantics of your code to the compiler. For example, consider carefully which variables need to be marked "volatile" or "const" or both. Turning optimizations off worked because compilers just go make all variables volatile.
That's completely right. I never went to college for programming, nor did I learn from anyone -- just manuals, specifications, and occasionally (VERY occasionally) tutorials. I didn't know that volatile keyword even existed until I started googling the problem. Even then I still didn't fully understand what was going on until later on (age, experience, whatever).

User avatar
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near » Fri Dec 04, 2009 3:01 pm

Code: Select all

if (!opFunc[opcode](this, opcode, opFuncParam[opcode]))

Code: Select all

parent->regs.PC = addr;
C++ has member function pointers, which will make your code a lot more terse (and readable) and also avoid crushing the global namespace.

Code: Select all

struct CNES2A03 {
  void (CNES2A03::*opFunc[256])();

  void processOpcode() {
    //...
    (this->*opFunc[opcode])();
  }

  void op_nop() {
    regs.PC++;
  }

  CNES2A03() {
    opFunc[0xea] = &CNES2A03::op_nop;
  }
};
Also, void func(void) is redundant, you only need void func().

Lastly, the debate over whether switch() or a jump table is faster is really pedantic. I have used a generator that switched between the two and I noticed virtually no speed difference. The switch statement compiles faster and produces a smaller binary (as it doesn't need as many prolog/epilogs for functions), while the jump table produces code that is much easier on the eyes and that can be templatized, eg:

Code: Select all

opFunc[0xa9] = &CNES2A03::read_const<&CNES2A03::flag_lda>;

tepples
Posts: 22281
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Fri Dec 04, 2009 3:39 pm

byuu wrote:Also, void func(void) is redundant, you only need void func().
Within a C++ class, this is true. But within C code or within code intended to be compiled under both C compilers and C++ compilers, you need to use void func(void) because C compilers interpret void func() as void func(...), with implicit variable arguments.

User avatar
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near » Fri Dec 04, 2009 4:04 pm

tepples wrote:Within a C++ class, this is true.
Yes, I'm aware of C89's restrictions (that almost every modern compiler avoids via C99 backporting.) I didn't want to elaborate and make the post too long. Do you think he's worried about C89 compatibility when he's writing functions with class qualifiers on them? :P

tepples
Posts: 22281
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Fri Dec 04, 2009 4:35 pm

I guess my perspective is just distorted by trying to make C++-style interfaces (with RAII and the like) to a C library.

User avatar
Petruza
Posts: 311
Joined: Mon Dec 22, 2008 10:45 pm
Location: Argentina

Re: Multiple switch for CPU emulation

Post by Petruza » Mon Jan 25, 2010 7:28 pm

Nessie wrote:If you're thinking of performance, I would guess that if the total size of two switches is smaller than one switch, they could possibly run faster. The few cycles wasted on table lookup are probably insignificant as long as your code fits in the cache.
The only reason for a smaller code switch running faster is whether it fits in the cache or not?
Is there another reason?
Apart from caching, which I'm not very aware of, I understand that if the switch gets translated into a jump table, it doesn't matter if the code is longer or shorter, the jumps speed shouldn't be affected, or am I wrong?
I prefer speed and source readability rather than smaller executable code, for my emulator.

And about the cache, I guess it's very platform-dependent, right?

User avatar
Petruza
Posts: 311
Joined: Mon Dec 22, 2008 10:45 pm
Location: Argentina

Post by Petruza » Fri Jan 29, 2010 8:12 am

The SNES is a 16 bit console made by Sony.
Last edited by Petruza on Fri Jan 29, 2010 9:44 am, edited 1 time in total.

User avatar
Zepper
Formerly Fx3
Posts: 3223
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper » Fri Jan 29, 2010 8:41 am

...
Last edited by Zepper on Fri Jan 29, 2010 9:13 am, edited 1 time in total.

User avatar
Petruza
Posts: 311
Joined: Mon Dec 22, 2008 10:45 pm
Location: Argentina

Post by Petruza » Fri Jan 29, 2010 9:06 am

LOL! I knew that if I posted a wrong statement as obvious as this, someone would reply :D
Ok, now that I have your attention, don't you feel like reading my previous post and answering it? :D

tepples
Posts: 22281
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Fri Jan 29, 2010 9:15 am

Petruza wrote:The SNES is a 16 bit console made by Sony.
There, fixed that for you. The SPC700 CPU and DSP are Sony parts.

Now to your original question, which pertains to emulation of both the NES and Super NES: Modern x86 compilers and CPUs are so complex that the only real answer is to code it both ways, time both approaches, and optionally look at the generated assembly language code to understand the reasons behind the measured times. But yes, level 1 cache performance is one of the big reasons for a timestamp and catch-up approach compared to strict alternation of 3 PPU cycles and 1 CPU cycle like Nintendulator does.

User avatar
Petruza
Posts: 311
Joined: Mon Dec 22, 2008 10:45 pm
Location: Argentina

Post by Petruza » Fri Jan 29, 2010 9:47 am

Haha thanks for the post fix, I fixed it in the original post.

About the switch cache question, ok thanks. Anyway, as I've absolutely zero experience in emulator programming I'll go with the most straightforward approach comes to my mind, and once I get a working emulator, I'll test and see if it badly needs rewrites. (which will surely do)

Mednafen
Posts: 60
Joined: Wed Sep 13, 2006 12:45 pm

Post by Mednafen » Fri Jan 29, 2010 11:42 am

If you only care if your code is compilable on ICC and GCC, you can use goto tables:

static void *const op_goto_table[256] =
{
&&Op00, &&Op01, [etc]
};

goto *op_goto_table[opcode];

Op00: blahblahblah; goto Finish;

Op01: vlahvlahvlah; goto Finish;

Finish: ;


Long live goto! ;p

User avatar
Petruza
Posts: 311
Joined: Mon Dec 22, 2008 10:45 pm
Location: Argentina

Post by Petruza » Fri Jan 29, 2010 11:52 am

Yes, I like goto in this kind of low level programming too.
Unfortunately I plan to do a compiler and OS independent plain standard C++ code, at some speed expense maybe, but fully portable.
The main idea will be to make a 100% portable and reusable emulator core without the need of touching a single line of it. I.e. include, instantiate and run.

Tom
Posts: 68
Joined: Wed Apr 06, 2005 5:36 am
Location: Massachusetts

Re: Multiple switch for CPU emulation

Post by Tom » Fri Jan 29, 2010 1:45 pm

Disch wrote:So I thought about it... and what if you made two separate switches... one for the addressing mode lookup, and another for the opcode execution?
A very long time ago I tried having two switches as you described in my emulator. It was slower than one giant switch, which didn't use any fall through or goto tricks, just repeated code. But I only ever ran it on a 300Mhz Powerbook G3.

Post Reply