Performance suggestions

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
foobaz
Posts: 20
Joined: Sat Dec 24, 2011 11:32 am

Performance suggestions

Post by foobaz » Tue Jan 24, 2012 5:25 pm

I've been writing an emulator in Java, where one of the primary targets is low-power mobile-like devices. While I can do the core CPU/PPU emulation at ~750fps on my desktop, I'm only getting ~15fps on the slowest targets.

I've already put in a lot of optimization work, and at this rate I don't see enough opportunities left. I'm wondering what corners I can cut that would allow for the serious improvements I need to see while having the smallest impact on overall compatibility. While I would love to have a perfectly compatible emulator, I've pretty much resigned myself to the fact that I'll need to break some things to reach playable speeds on the current gen hardware.

Any suggestions?

tepples
Posts: 22277
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Wed Jan 25, 2012 8:41 am

Have you profiled it to see whether the CPU or the PPU is taking the most time?

User avatar
Dwedit
Posts: 4408
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit » Wed Jan 25, 2012 10:24 am

Is this regular Java or J2ME?
If you don't have CPU speedhacks (Idle loop skipping) in there yet, that's a good feature to add. I could explain the algorithm I use in PocketNES if you're interested.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

User avatar
James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

Post by James » Wed Jan 25, 2012 11:37 am

Dwedit wrote:I could explain the algorithm I use in PocketNES if you're interested.
I'm interested. If you don't mind typing it up, I'd love to hear how you're doing this.

Thanks
get nemulator
http://nemulator.com

foobaz
Posts: 20
Joined: Sat Dec 24, 2011 11:32 am

Post by foobaz » Wed Jan 25, 2012 3:36 pm

tepples: For a given scanline, I spend about 4x more time in the PPU than the CPU. I was a bit surprised that it was that close, given that you have to render 256 pixels per 25 cpu instructions or so. That's after spending most of my effort trying to make the PPU faster, though.

Dwedit: The CPU and PPU are standard J2SE with different frontends for different targets, like Swing and Android. I'd love to hear about your speedhacks.

User avatar
Zepper
Formerly Fx3
Posts: 3223
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper » Wed Jan 25, 2012 4:06 pm

James wrote:
Dwedit wrote:I could explain the algorithm I use in PocketNES if you're interested.
I'm interested. If you don't mind typing it up, I'd love to hear how you're doing this.

Thanks
Same here. :)

tepples
Posts: 22277
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Wed Jan 25, 2012 6:06 pm

PocketNES implements two main kinds of speed hacks: jump hack and branch hack. Both in effect freeze the emulated CPU until an interrupt occurs. There are four sources of interrupts in the NES: NMI from the PPU, completion IRQ from the DMC, timer IRQ from the APU frame counter, and an IRQ from the mapper. Barring certain kinds of heavy raster effects, you won't get more than about two interrupts per frame, so you can freeze the CPU for a relatively long time.

The "jump" hack is for games that use the "superloop" structure such as Super Mario Bros. In these games, the entire game runs as NMI and IRQ handlers. The NMI handler updates VRAM and then runs the next frame of game logic.

Code: Select all

  ; initialize the registers and the game loop variables
  ; for the first time, and once that's done, just
  ; jump in place forever
forever:
  jmp forever

nmihandler:
  pha
  txa
  pha
  ; ...
  pla
  tax
  pla
  rti
For this, the CPU can just stop until the next interrupt and then adjust its timing based on which cycle of the JMP instruction the interrupt hit.

The other is for games that repeatedly read a variable that the NMI handler updates and branch based on it. For example, LJ65, Concentration Room, Lawn Mower, and Thwaite and all use this structure:

Code: Select all

  ; ...
  lda retraces
nmiwaitloop:
  cmp retraces
  beq nmiwaitloop
  ; ...

nmihandler:
  inc retraces
  rti
Some games' NMI handlers are much longer than this, for example doing all the VRAM and audio updates, and signaling at the end that NMI has occurred. But it illustrates the sort of tight loop that a "branch" speed hack exploits. The emulator can look for short loops including no store instructions, detect what address the loop is spinning on, skip running the CPU until an interrupt occurs, and then adjust the CPU timing based on where in the loop the interrupt occurred.

PocketNES gets a lot of mileage out of its speed hacks because it delegates most of the work of drawing tiles to the GBA's PPU and most of the work of playing audio to the GBC's APU. This leaves the CPU as by far the biggest item on the profile. On a platform with a dumb frame buffer, such as your PCs and Android devices, your mileage may vary.

foobaz
Posts: 20
Joined: Sat Dec 24, 2011 11:32 am

Post by foobaz » Wed Jan 25, 2012 6:28 pm

How can you delegate tile drawing to the hardware? Isn't there too much state that can change per scanline to make that possible? One of my earlier iterations worked like that, and it resulted in all sorts of glitches.

tepples
Posts: 22277
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Wed Jan 25, 2012 6:32 pm

foobaz wrote:How can you delegate tile drawing to the hardware? Isn't there too much state that can change per scanline to make that possible?
PocketNES sets up three of the GBA's four DMA channels for HDMA, pointing at the GBA's equivalents of PPUSCROLL, PPUCTRL, and PPUMASK (BG0XOFS/BG0YOFS, BG0CNT, and DISPCNT respectively). The fourth is used to stream decoded DPCM (but not $4011 PCM, unfortunately for Big Bird fans).

Post Reply