PPU Rendering Techniques

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
Posts: 5
Joined: Wed Jan 18, 2017 7:54 pm

PPU Rendering Techniques

Post by been_jamin » Fri Jan 20, 2017 11:41 pm

Hi everyone, I'm writing an nes emulator and just finished the cpu. I was looking at general structures for updating the screen; so far I have seen:

1. render 3 pixels every cpu cycle; supposed to be very accurate but also slow
2. render a scanline at a time; maintains some accuracy but and speeds up the display
3. render the entire frame before the start of a VBlank; very fast but won't handle any games that use ppu trickery

And then here was an idea I had: execute ppu instructions on a per-cpu-cycle basis, except instead of actually writing to the display just log what memory value and to what address the command was writing to. Then, render the entire frame right before the VBlank. Would this be any faster than 1 (above) or not?

Am I wrong about any of these descriptions, are there any other common methods, and what do you think the best method is?

User avatar
Posts: 1849
Joined: Wed Nov 10, 2004 6:47 pm

Re: PPU Rendering Techniques

Post by Disch » Sat Jan 21, 2017 1:39 am

#3 is unworkable since most games will at least split the screen somewhere for a status bar -- meaning this approach will fail miserably for those games.

The "logging writes" approach sounds good, until you realize that it doesn't solve the problem of $2002 reads for things like sprite-0 hit... and trying to work those into a "logging writes" system ends up being really hard (I've tried it).

The two ways I've tried with great success are:

1) Effectively your #1. Run the CPU for one cycle, then run all other subsystems enough to catch up. Easy, reliable, but slow

2) Use the tried and true 'catch up' system. Run the CPU until it does something 'interesting' like a register read or write -- then run the appropriate subsystems enough to 'catch up' to the CPU timestamp. IRQs/NMIs, and other things that cut into CPU behavior (like DMC stole cycles) can be predicted.... so you'd run the CPU up until the next interesting event (either end of frame, or NMI, or whatever).

This can be accomplished with a timestamp system, where you scale up all subsystems to a common time base. On NTSC, you can give your CPU cycles a time base of 3 (every CPU cycle incs the timestamp by 3), and a corresponding PPU time base of 1. On PAL, you can give CPU=16 and PPU=5 ... giving the appropriate 3.2:1 ratio

Example, you run the CPU for a while, then it writes to $2000 after 100 cycles.... putting it at a timestamp of '300'. Your $2000 handler will then run the PPU up to that timestamp (effectively running it for 300 cycles) to catch up, then the write is performed, then you continue with the CPU.

Tricky part is IRQ/NMI prediction. And the skipped cycle on odd NTSC frames. But those are all workable with some effort.

Post Reply