Stuck on PPU Implementation

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

RobertLoggia
Posts: 4
Joined: Sun Apr 05, 2015 7:16 am

Stuck on PPU Implementation

Post by RobertLoggia » Sun Apr 05, 2015 7:21 am

I just finished writing my CPU and tested it against Klaus' 6502 test program. Everything works except BCD mode. I'm now starting to write the PPU. The CPU seems like a walk in the park compared to what I'm trying to grok on the nesdev wiki.

What's a great/easy ROM for testing while programming the PPU?

Also, any recommended documentation (for beginners) for writing the PPU? I don't have any experience in game programming which is why I might be struggling in understanding the concepts.

nIghtorius
Posts: 48
Joined: Tue Apr 29, 2014 1:31 pm

Re: Stuck on PPU Implementation

Post by nIghtorius » Sun Apr 05, 2015 12:25 pm

You also keep instruction cycles? Because that is very required to do so. As the ppu (NTSC) does 3 cycles per cpu cycle. This synchronization is very imported as a lot of games depend on it.
You could try with Donkey Kong. This is the easiest game to play with considering PPU development.

Super Mario Bros is actually allot harder to emulate.
And test NROM games, not MMCx, etc mappers. Because you need to emulate those mappers too to get those games running.

mkwong98
Posts: 227
Joined: Mon May 30, 2011 9:01 pm

Re: Stuck on PPU Implementation

Post by mkwong98 » Mon Apr 06, 2015 7:40 am

I started with BKG Graphics Test which can be found on the project page in NesDev wiki. It is very simple so it is easy to see what's wrong.

User avatar
MottZilla
Posts: 2832
Joined: Wed Dec 06, 2006 8:18 pm

Re: Stuck on PPU Implementation

Post by MottZilla » Mon Apr 06, 2015 9:53 am

First, do you understand the general concept of how the PPU generates an image? For example do you understand the NameTables and Pattern Tables? It would help anyone trying to help you if you can detail what you understand so far.

User avatar
tokumaru
Posts: 11858
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Stuck on PPU Implementation

Post by tokumaru » Mon Apr 06, 2015 10:10 am

Just like the CPU, the PPU repeats a series of operations over and over. While the CPU is stuck fetching, decoding and executing instructions, the PPU is stuck on a more complex loop, generating video based on some internal parameters (which can be changed by the program) and VRAM/ROM.

This wiki page describes what happens every frame, and that's what the PPU does over and over. There's also the sprite avaluation, that runs in parallel with the background rendering.

The CPU and the PPU run in parallel, so you have to find a way to emulate that. People who are worried about speed often switch between the PPU and CPU every scanline (i.e. run the CPU for 1 scanline, then run the PPU for 1 scanline), but those who are worried about accuracy may switch every cycle. Another option is the "catch up" method, which lets the chips run until something that one does affects the other, then it catches up to that point. The CPU can affect the PPU by writing to its registers ($2000-$2007), and the PPU can affect the CPU with NMIs and the flags in the status registers (VBlank, sprite 0 hit, etc).

A very very very crude way to get a basic PPU working is to draw an entire picture at once using the current state of the PPU (palettes, name tables, attribute tables, pattern tables, scroll, etc). This will help you understand how the different parts are combined to form the picture before you have to worry about timing details, but this is so rudimentary that it will fail in any games that modify PPU parameters mid-frame (i.e. raster effects). Note that you still have to implement the basic CPU-PPU communication with some degree of timing accuracy, such as NMIs, the VBlank flag and the sprite 0 hit flag, otherwise the program might get stuck waiting on those.

User avatar
zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

Re: Stuck on PPU Implementation

Post by zeroone » Wed Apr 08, 2015 7:31 am

Based on my own experience and the experiences that I read about on this forum, most NES emulator developers end up writing the PPU in iterative stages of complexity. In other words, the PPU is virtually rewritten several times such that each version approximates the actual hardware better. There are several reasons to do this: It takes time to comprehend each aspect of the PPU and part of the learning process involves coding. Working with multiple versions maybe the only practical way to learn the material. Another reason is the amount of time that you have to spend on the project, which depending on your current programming knowledge and experience may be quite immense. Each version of the PPU will be able to play some subset of games and you can call it quits at any of the iterative stages. But, if you jump right into the most complex PPU design, you might never get anything playable completed. Related to that is motivation. Once you see some games running with a simple PPU implementation, you'll probably find it a lot easier to work on the next version as opposed to waiting and waiting for the complex one to get done.

Understanding timing is key to making the emulator work. Each frame needs to be displayed approximately every 17 milliseconds. You'll need some sort of sleep function that delays until it is time to generate the successive frame:

Code: Select all

while(true) {
  renderFrame();
  waitForNextFrameTime();
}
The CPU and PPU execute in parallel and they are synchronized by a common clock. But, the first approximation of this might look like:

Code: Select all

void renderFrame() {
  renderBackground();
  renderSprites();
  generateNMI();
  runCpuForNumberOfCyclesInFrame();
}
That is sufficient for the simplest games like Donkey Kong and Popeye.

The next approximation of PPU is scanline based:

Code: Select all

void renderFrame() {
  for(int i = -1; i < 240; i++) {
    renderScanline(i);
    runCpuForNumberOfCyclesInScanline();
  }
  generateNMI();
  for(int i = 240; i < 262; i++) {
    runCpuForNumberOfCyclesInScanline();
  }
}


Ultimately, you should create a PPU function that renders a single pixel:

Code: Select all

void renderFrame() {
  for(int i = -1; i < 262; i++) {
    for(int j = 0; j < 341; j++) {
      renderDot(i, j);
    }
  }
}
In this model, the PPU drives the CPU. For NTSC, the ratio is 3:1 (3 dots per CPU cycle). For PAL, the ratio is 16:5 and there are additional vblank scanlines. The sleep delay between frames will also be slightly different. These ratios can be maintained by using floats or by integer overflows.

The PPU does several things in parallel. Such a renderDot() function will contain a lot of switching logic that decides what to do based on the current scanline and the current dot index. The wikis that describe the PPU are not written in procedural pseudo code. Instead, they are written as a bunch of possible cases. You'll need the switching logic to direct execution to each case.

Finally, do not optimize early. Modern CPUs are insanely fast. Write your code clean and readable and your emulator will likely run perfectly with plenty of time to spare for each frame.

tepples
Posts: 22044
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Stuck on PPU Implementation

Post by tepples » Wed Apr 08, 2015 7:59 am

zeroone wrote:Modern CPUs are insanely fast.
Only PC or also mobile?

User avatar
zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

Re: Stuck on PPU Implementation

Post by zeroone » Wed Apr 08, 2015 8:41 am

tepples wrote:Only PC or also mobile?
Who is the audience for an emulator project? The world is saturated with super optimized and accurate emulators for all possible devices. The reality is that these projects are done for the experience. The only relevant platform is the machine that the developer develops on.

Besides, someone will likely stumble upon this post 5 years from now (hello future person), in a world where mobile devices run just as fast as a typical desktop does today.

User avatar
James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

Re: Stuck on PPU Implementation

Post by James » Wed Apr 08, 2015 6:15 pm

zeroone wrote:

Code: Select all

void renderFrame() {
  for(int i = -1; i < 262; i++) {
    for(int j = 0; j < 341; j++) {
      renderDot(i, j);
    }
  }
}
That's one scanline too many. Should be:

Code: Select all

for(int i = 0; i < 262; i++)
Edit: oops
Last edited by James on Wed Apr 08, 2015 7:55 pm, edited 2 times in total.
get nemulator
http://nemulator.com

Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: Stuck on PPU Implementation

Post by Sik » Wed Apr 08, 2015 7:43 pm

Erm, that's identical...

User avatar
James
Posts: 429
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

Re: Stuck on PPU Implementation

Post by James » Wed Apr 08, 2015 7:56 pm

Sik wrote:Erm, that's identical...
Oops. Fixed.
get nemulator
http://nemulator.com

User avatar
tokumaru
Posts: 11858
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Stuck on PPU Implementation

Post by tokumaru » Thu Apr 09, 2015 6:33 am

I think the -1 was for the pre-render scanline, in which case 0-239 would be the visible picture, 240 would be the post-render scanline, and 241-260 would be VBlank, so the for should be for(int i = -1; i < 261; i++). I think that numbering an scanline as -1 is a bit confusing though.

User avatar
zeroone
Posts: 934
Joined: Mon Dec 29, 2014 1:46 pm
Location: New York, NY
Contact:

Re: Stuck on PPU Implementation

Post by zeroone » Thu Apr 09, 2015 7:19 am

My bad. Yep, there was an extra scanline in there. Some of the docs refer to the pre-render scanline as -1; so, I put that into the for-loop to stress that.

User avatar
tokumaru
Posts: 11858
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Stuck on PPU Implementation

Post by tokumaru » Thu Apr 09, 2015 7:41 am

I never coded an emulator, but I guess it would make sense to start the render loop at the same point a real PPU would. According to this wiki page, "The PPU comes out of reset at the top of the picture", but to me it's unclear if the top of the picture is the pre-render scanline or scanline 0.

EDIT: forgot to link to the page.
Last edited by tokumaru on Thu Apr 09, 2015 9:57 am, edited 1 time in total.

User avatar
thefox
Posts: 3141
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Re: Stuck on PPU Implementation

Post by thefox » Thu Apr 09, 2015 7:45 am

tokumaru wrote:I never coded an emulator, but I guess it would make sense to start the render loop at the same point a real PPU would. According to this wiki page, "The PPU comes out of reset at the top of the picture", but to me it's unclear if the top of the picture is the pre-render scanline or scanline 0.
It's scanline 0: http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

I'd also suggest going by the counters actually used by the PPU (since we now know them), since it makes it easier to compare the implementation to that diagram and Visual 2C02.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi

Post Reply