Need some help with the PPU

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
Dartht33bagger
Posts: 59
Joined: Sat Jan 03, 2009 3:28 pm
Location: Oregon
Contact:

Need some help with the PPU

Post by Dartht33bagger »

Alright guys, bear with me please. I've read through all of the documents on the wiki and read through a ton of threads on here and there are still some things I just do not understand with the PPU. Maybe you guys can clear these things up for me.

1. How often are the PPU registers ($2000-$2007) looked at/updated? Does the PPU look at them all every cycle or every frame? Also, when are they checked? Eg. First thing at the start of a cycle/frame, or after a cycle/frame?
2. I'm still fuzzy on how some I/O registers work. For example $2007 is either a read or a write to PPU memory. From what I understand, the PPU checks $2006 twice to get the address that the CPU wants to read/write to and stores the address in a temporary VRAM address. Then the data written to $2007 is written into VRAM by the address specified in the temp address. After the write, the temporary address is incremented. If that's correct, then I get that part. How then, can $2007 be used as a read?
3. What is the base nametable that is referred to in register $2000? Does that just tell the PPU which nametable to read from first?
4. How does horizontal and vertical mirroring work? I've read multiple documents on this and I just don't understand how the nametables are arranged. Are the nametables physically mirrored, physically moved to different nametable locates in VRAM, or does the PPU just read from them in a different order? My original thought for horizontal was that nametable 1 would be on the left side (top and bottom) and nametable 2 would be on the right side (top and bottom). The wiki seems to tell me that all four nametables are used somehow.
5. My current idea on how to run my emulator would be to run the CPU and return the number of cycles the opcode that was executed took. Then I would take that information and run the PPU for 3 * cycles to catch up. For example, say some operation took 7 cycles, I would run the PPU for 21 cycles before executing another opcode. Is this an ok way of doing it?

This is all I have for now. The PPU has been worrying me since I started this project and it's proving to be quite confusing. I'm slowly getting there, though.

Thanks
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Need some help with the PPU

Post by tokumaru »

Dartht33bagger wrote:1. How often are the PPU registers ($2000-$2007) looked at/updated? Does the PPU look at them all every cycle or every frame? Also, when are they checked? Eg. First thing at the start of a cycle/frame, or after a cycle/frame?
Writes to PPU registers take effect immediately. The CPU and PPU run in parallel, as soon as the CPU performs a write to one of these registers, they are immediately sent to the PPU. The time it takes for the writes to affect how the picture is rendered might vary though... For example, the first write to $2005 (during rendering), will change the fine scroll immediately, but the coarse scroll will not change until the scanline ends. The second write to $2005 (Y scroll) simply doesn't do anything until the end of the frame. You can only emulate these effects correctly if you perform all the PPU tasks in the same order the PPU does it, in parallel with the CPU.
2. I'm still fuzzy on how some I/O registers work. For example $2007 is either a read or a write to PPU memory. From what I understand, the PPU checks $2006 twice to get the address that the CPU wants to read/write to and stores the address in a temporary VRAM address. Then the data written to $2007 is written into VRAM by the address specified in the temp address. After the write, the temporary address is incremented. If that's correct, then I get that part. How then, can $2007 be used as a read?
Exactly the same way. Reads, just like writes, cause the address to increment. The only difference is that there's a delay when reading. When you read from $2007, a buffered value will be returned, and the contents of the address being read go into that buffer, so it will only be read on the next read. Games that read from $2007 will often throw out the first value read because of this. This delay does not happen when reading from the palettes though.
3. What is the base nametable that is referred to in register $2000? Does that just tell the PPU which nametable to read from first?
Yes, that's the name table where rendering starts (i.e. the name table where the pixel that will show up at the top left corner of the screen is). Unless the scroll is (0, 0), you will see more than one name table at once.
4. How does horizontal and vertical mirroring work? I've read multiple documents on this and I just don't understand how the nametables are arranged.
The NES has an addressing range of 4096 bytes dedicated to name tables (which are displayed as a 2x2 grid), but the NES has only 2048 of memory for this, so the other 2048 are mirrored. Games get to pick whether the 2 available name tables are arranged horizontally (and mirrored vertically) of vertically (and mirrored horizontally):

Code: Select all

Vertical mirroring (top and bottom look the same):
A B
A B

Horizontal mirroring (left and right look the same):
A A
B B
Are the nametables physically mirrored, physically moved to different nametable locates in VRAM, or does the PPU just read from them in a different order?
The PPU address lines are manipulated so that the PPU reads from different parts of the available 2KB of memory.
My original thought for horizontal was that nametable 1 would be on the left side (top and bottom) and nametable 2 would be on the right side (top and bottom). The wiki seems to tell me that all four nametables are used somehow.
They are arranged that way when vertical mirroring is used. The PPU doesn't know that there are only 2KB, it thinks it's accessing 4KB of data, so as far as it's concerned there are 4 name tables. But since that extra 2KB don't exist, the 2KB that do exist are used twice. You should know that carts do have the option of disabling the internal 2KB of VRAM and supplying 4KB of its own for the name tables (this is called 4-screen "mirroring" - quotes used because there's no mirroring involved, despite the name).
5. My current idea on how to run my emulator would be to run the CPU and return the number of cycles the opcode that was executed took. Then I would take that information and run the PPU for 3 * cycles to catch up. For example, say some operation took 7 cycles, I would run the PPU for 21 cycles before executing another opcode. Is this an ok way of doing it?
That should be OK for most games, but this is not how an actual console works. For example, if a program is waiting for a sprite 0 hit, it will constantly read $2002 waiting for the sprite hit flag to get set. If the LDA $2002 instruction starts before the sprite hit, and the hit happens before that instruction ends (i.e. within 12 pixels) the flag will not be set, like it should be. You'd have to run things cycle by cycle to catch this.
User avatar
Dartht33bagger
Posts: 59
Joined: Sat Jan 03, 2009 3:28 pm
Location: Oregon
Contact:

Re: Need some help with the PPU

Post by Dartht33bagger »

Thank you! Things are starting to make a lot more sense now. Two more quick questions.

1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?
2. When a $2007 read occurs, where does the data read go to? Does the PPU just send the read byte to $2007 for the CPU?
3. I want to try to get SMB1 to run since it has no mapper, but I know it's a tricky to emulate game. I also know it uses the sprite 0 hit flag. I'm assuming I need to use something like openMP to get the CPU and PPU running in parallel, but I'm stumped on how to run the CPU cycle by cycle. How do I, for example, split up an add with carry operation into mutiple steps so that only a little of the operation happens per cycle?

I won't work on question 3 for a while since I just want to get a game working first.
3gengames
Formerly 65024U
Posts: 2284
Joined: Sat Mar 27, 2010 12:57 pm

Re: Need some help with the PPU

Post by 3gengames »

1. There's a R/W pin on the CPU. When it's clocked, the pin is "read" and the action is done on the PPU, weather it be a read or write.

2. The byte first goes to a buffer and is not directly read out. All reads are delayed by one. I'm not exactly sure where or how it's stored, but that's how it works.

3. You do that by looking at the 6502 cycle-by-cycle operations for the instructions and implement them like that. :)
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Need some help with the PPU

Post by lidnariq »

3gengames wrote:2. The byte first goes to a buffer and is not directly read out. All reads are delayed by one. I'm not exactly sure where or how it's stored, but that's how it works.
Except for reads from palette memory (Those aren't delayed). I don't know if any games care, though.
3gengames
Formerly 65024U
Posts: 2284
Joined: Sat Mar 27, 2010 12:57 pm

Re: Need some help with the PPU

Post by 3gengames »

Trust me games probably care, that's a big difference. But yep, that is right.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Need some help with the PPU

Post by tokumaru »

Dartht33bagger wrote:1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?
What? Where did you get this zero/non-zero idea from? Like 3gengames said, one of the CPU pins indicates whether it's trying to read or write data, and the PPU uses that to tell reads and writes apart.

Fun fact: the Atari 2600 doesn't send the R/W signal to the cart, so carts with extra RAM use one address range for writing and another range for reading. For example, a cart with 256 bytes of RAM will make this memory writable at $1000-$10FF and readable at $1100-$11FF (it uses an address line to select between reading/writing).
2. When a $2007 read occurs, where does the data read go to? Does the PPU just send the read byte to $2007 for the CPU?
When $2007 is read, the buffered value is sent to the CPU and the value read from the PPU goes to the buffer.
I'm assuming I need to use something like openMP to get the CPU and PPU running in parallel
It doesn't have to be anything fancy, you can simply alternate between emulating the 2 chips. Things only get complex if you want both accuracy AND speed. Things get really simpler if you only care about one or the other.
but I'm stumped on how to run the CPU cycle by cycle. How do I, for example, split up an add with carry operation into mutiple steps so that only a little of the operation happens per cycle?
There are documents such as this one (scroll down) that explain what happens on each cycle of the various instructions.
I won't work on question 3 for a while since I just want to get a game working first.
Yeah, you shouldn't bother with cycle accurate emulation for now.
User avatar
Dartht33bagger
Posts: 59
Joined: Sat Jan 03, 2009 3:28 pm
Location: Oregon
Contact:

Re: Need some help with the PPU

Post by Dartht33bagger »

Thank you!

One final question for now: How do I emulate the CPU pin for reads and writes on $2007? Do certain opcodes tell the CPU to read to write from that register?
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Need some help with the PPU

Post by tepples »

The most commonly encountered CPU instructions for this are LDA $2007 (read) and STA $2007 (write). There are others.


EDIT: mislead less
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Need some help with the PPU

Post by tokumaru »

Dartht33bagger wrote:How do I emulate the CPU pin for reads and writes on $2007? Do certain opcodes tell the CPU to read to write from that register?
Emulators don't usually emulate individual pins. This would actually be a good idea if it weren't so painfully slow.

You will have to emulate all the instructions one by one, so you absolutely MUST know whether an instruction is reading or writing. Emulators usually have a method that handles CPU writes (that all store instructions can call) and another one that handles CPU reads (that all load instructions can call), and these methods perform a range check to know what to do when particular addresses are accessed. If you detect that a write is being made to $2000-$2007 (or mirrors of that range) you call the appropriate PPU methods to process the write. The same goes for reads, and you pass along to the CPU whatever the PPU returns.
fred
Posts: 67
Joined: Fri Dec 30, 2011 7:15 am
Location: Sweden

Re: Need some help with the PPU

Post by fred »

Speaking of $2007, there's something i'm unsure of. Looking at the "skinny on nes scrolling" page, I can see that X and Y scrolling updates every now and then during a frame, sometimes resulting in a wrap-around and nametable toggle. Does this apply to read and writes to $2007? Based on most text on the wiki, it seems like 1 or 32 just gets added to the vram address, with no wrapping or anything. Is that correct?

Oh, and this: "If rendering is enabled" - is this if certain bits are set in $2001? 0x1E? Or is 0xA enough (for BG rendering)?
User avatar
Quietust
Posts: 1920
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: Need some help with the PPU

Post by Quietust »

fred wrote:Speaking of $2007, there's something i'm unsure of. Looking at the "skinny on nes scrolling" page, I can see that X and Y scrolling updates every now and then during a frame, sometimes resulting in a wrap-around and nametable toggle. Does this apply to read and writes to $2007? Based on most text on the wiki, it seems like 1 or 32 just gets added to the vram address, with no wrapping or anything. Is that correct?
Yes.
fred wrote:Oh, and this: "If rendering is enabled" - is this if certain bits are set in $2001? 0x1E? Or is 0xA enough (for BG rendering)?
Enabling either background or sprite rendering is sufficient - even if only one is enabled, the PPU still does all of the work to render both (it just discards whichever one is turned off when it comes time to output the pixels themselves).
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
User avatar
ulfalizer
Posts: 349
Joined: Fri Mar 08, 2013 9:55 pm
Location: Linköping, Sweden

Re: Need some help with the PPU

Post by ulfalizer »

fred wrote:Speaking of $2007, there's something i'm unsure of. Looking at the "skinny on nes scrolling" page, I can see that X and Y scrolling updates every now and then during a frame, sometimes resulting in a wrap-around and nametable toggle. Does this apply to read and writes to $2007? Based on most text on the wiki, it seems like 1 or 32 just gets added to the vram address, with no wrapping or anything. Is that correct?
Accessing $2007 during VBlank and when rendering is disabled (which means that neither background nor sprite rendering is enabled in $2001, i.e. that bits 3 and 4 are both zero) increments the address linearly by either 1 or 32. Accessing $2007 outside of VBlank with rendering enabled (this is seldom done) performs a glitchy update that takes its parts from the kinds of updates that are normally done during rendering.

Internally, the same register (loopy_v) is used both to hold the address for $2006/$2007 and during rendering to keep track of the current nametable location being rendered (along with fine x). Saves on hardware.
fred
Posts: 67
Joined: Fri Dec 30, 2011 7:15 am
Location: Sweden

Re: Need some help with the PPU

Post by fred »

Ah, that clears it up. Thanks to you both!
User avatar
Zepper
Formerly Fx3
Posts: 3262
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Re: Need some help with the PPU

Post by Zepper »

tokumaru wrote:
Dartht33bagger wrote:1. How does the PPU know if $2007 wants a read or a write? Does the CPU write a nonzero value for a write and a zero value for a read?
What? Where did you get this zero/non-zero idea from? Like 3gengames said, one of the CPU pins indicates whether it's trying to read or write data, and the PPU uses that to tell reads and writes apart.
Avoid technical or low level things. He's writing an emulator.

How does the PPU know if $2007 wants a read or a write?
- Firstly, you need the CPU 6502 program code. You should trap reads by LDA $2007 instruction, and writes by STA $2007. Look for LDA/STA timing diagram. Easy.
Post Reply