Emulator MahNES with a different approach

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
HLorenzi
Posts: 23
Joined: Thu Jun 28, 2012 9:10 pm
Location: São Paulo, Brazil

Emulator MahNES with a different approach

Post by HLorenzi » Sat Feb 07, 2015 12:05 pm

I've rewritten my emulator from scratch, keeping decoupling patterns in mind. Every part of the emulator is isolated and needs the caller to set up callback/hook functions for them to communicate. Not only that, I've also tried other approaches to emulation, namely:
  • - The CPU is emulated for every clock tick. Instead of emulating instructions instantly and sleeping on clock timings, I've mimicked the real hardware and emulated every instruction cycle. So, memory reads and writes fall exactly on the clock they're supposed to. I was expecting this to be slow and not cache-friendly, but on my system, it eats just as much CPU time as any other emulators.

    - The PPU also runs for every clock tick. No batch-running only when state changes. The PPU alternates execution with the CPU at precise clock timings. Again, this kind of emulation eats just as much CPU time as any other, on my system.

    - Also, mappers are also isolated with hook functions, and they mimic real cartridges' pinout. There's the CIRAM Enable and Mirror pins, and they may also receive CPU/PPU reads and writes.
While I've only implemented basic features (no undocumented opcodes, immediate interrupt handling, basic PPU emulation, the same old bad APU emulator), it seems pretty solid. It seems to run tests like scanline.nes more precisely than Nintendulator or FCEUX, for instance. Of course, it could be that either that MY emulator is the wrong one, that the test is broken, or that these emulators just don't care about precise things like that. Well, does anyone happen to know which is the case? :P

Has there been any other emulator projects using precise instruction/PPU timing like this?

Here's a Windows build.
It will immediately open a file prompt to select a ROM. Press the Letter O key to select another ROM. Controls are Arrow keys = D-Pad, Space = A Button, X = B Button, Enter = Start Button, Ctrl = Select Button. Only working mappers are 0x0 and 0x2 (NROM and UxROM). Some games still freeze for unknown reasons.

Also, here's a peek at the CPU source file. You can see a big switch block taking the instruction cycle as parameter (line 186). Then, for every case, there's another switch taking the instruction addressing mode as parameter (e.g., line 213). When it is the final cycle for an instruction, it branches to yet another switch which handles specific opcode behavior (e.g., line 490 and beyond).

User avatar
rainwarrior
Posts: 8000
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Emulator MahNES with a different approach

Post by rainwarrior » Sat Feb 07, 2015 12:50 pm

CPU emulation typically requires a large amount of code that needs to be randomly accessed. I think this mitigates the ability to get a speed increase by processing cycles in "batches". Then again, the typical modern L1 cache is probably big enough. PPU emulation probably has more potential to benefit from batching. The size of the batches matters a lot, too; if your batch size is, say, a scanline of 25 instructions, that's a lot different than if you can batch 1000 instructions.

There are other ways to approach emulator optimization than just this, though.

Don't presume that inaccuracy has anything to do with batch optimization, though. They are separate issues. FCEUX's PPU inaccuracies have a lot more to do with age and legacy than anything else.

I don't know which emulators take what approach, but what are you using to measure performance? I notice a pretty significant difference in performance between Nintendulator (for which accuracy is the primary goal) and FCEUX.

User avatar
Dwedit
Posts: 4408
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Emulator MahNES with a different approach

Post by Dwedit » Sat Feb 07, 2015 1:04 pm

PocketNES processes code in very big batches. It's all fine as long as you get the timing of events correct. The batches can be many scanlines long. Sometimes the only event that the CPU has to stop for is the APU frame counter (about 4 times a frame). Events include things like Sprite 0 coming up soon, estimated time of sprite 0 hit, render start, render end, vblank start, MMC3 IRQs, etc.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

HLorenzi
Posts: 23
Joined: Thu Jun 28, 2012 9:10 pm
Location: São Paulo, Brazil

Re: Emulator MahNES with a different approach

Post by HLorenzi » Sat Feb 07, 2015 3:18 pm

Thanks, guys, but I think I wasn't clear enough. So the idea is not to make a fast emulator. By no means that. I tried to make it as precise, decoupled and customizable as possible (for where it stands now, anyways...). I was just trying to figure out whether those scanline.nes discrepancies come from -- whether it's my emulator's problem.

By the way, I was also thinking about a code profile viewer (for seeing hot paths in code, stack traces, etc.), but as an entertaining visual approach (like a zoomed-out view of code bytes as pixels colored by processing time, and arrows showing jumps and calls -- that could get messy). Also, a sound output viewer that showed notes' pitches, and possibly even displayed a dynamic music sheet.

tepples
Posts: 22277
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Emulator MahNES with a different approach

Post by tepples » Sat Feb 07, 2015 4:37 pm

rainwarrior wrote:Then again, the typical modern L1 cache is probably big enough.
On PCs or phones?
Dwedit wrote:PocketNES processes code in very big batches. It's all fine as long as you get the timing of events correct.
Are you referring to catch-up techniques like those described in this wiki page?
HLorenzi wrote:By the way, I was also thinking about a code profile viewer (for seeing hot paths in code, stack traces, etc.), but as an entertaining visual approach (like a zoomed-out view of code bytes as pixels colored by processing time, and arrows showing jumps and calls -- that could get messy). Also, a sound output viewer that showed notes' pitches, and possibly even displayed a dynamic music sheet.
I'd be interested to see what you come up with.

User avatar
rainwarrior
Posts: 8000
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Emulator MahNES with a different approach

Post by rainwarrior » Sun Feb 08, 2015 3:29 am

tepples wrote:
rainwarrior wrote:Then again, the typical modern L1 cache is probably big enough.
On PCs or phones?
I was referring to PCs, but a lot of phones have sizable L1 caches as well.

Intel Atom: 24k
Intel i7: 64k
ARM Cortex-A15: 64k (Galaxy S5)
Apple A8: 128k (iPhone 6)
ARM Cortex-A9 MPCore: 32k (Ouya)

Post Reply