Questions about NES programming and architecture

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Oziphantom
Posts: 913
Joined: Tue Feb 07, 2017 2:03 am

Re: Questions about NES programming and architecture

Post by Oziphantom » Tue Sep 01, 2020 1:36 am

lidnariq wrote:
Mon Aug 31, 2020 11:30 pm
The C64 and VIC-20 do this, and as a result are barely half the speed of the NES.
The NES is 1.8 the C64 is 1±0.02. Its not barely half its clearly over half.

This limits the VIC to being able to get 1 byte per clock. It does need more so it blocks the CPU to get 2 bytes per "cycle" every 8 lines and it has a 10bit bus.
You can't do the clock share on the NES, as it needs more bytes per "clock". You could expand the PPU to have 24 bit bus and grab in parallel. Or a 16 bit BUS and run the ROMS at quad speed. So you can run the CPU for 1 cycle over 2 FSB clocks then run the PPU for 2 1 FSB clocks cycles. Which would have cost a lot more.

User avatar
Secamline
Posts: 39
Joined: Sat Aug 15, 2020 4:25 pm

Re: Questions about NES programming and architecture

Post by Secamline » Tue Sep 01, 2020 5:39 am

I knew one of the main design goals of the NES was to be cheap (which also explains why the Famicom controllers couldn't be detached from the console if I'm not mistaken). I didn't consider the lack of experience of the company in electronics back then though, thanks for pointing that out!
I thought the limited graphics of the C64 were due to the limited RAM (From what I recall, the tilesets, color information, layouts and the code itself if running a game from disk, have to coexist in the same 30K or so of contiguous RAM space).
Also, somewhat unrelated question, what happens when trying to send data to a read-only register? I've read several answers but some of them seem contradictory.

User avatar
Bregalad
Posts: 7951
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: Questions about NES programming and architecture

Post by Bregalad » Tue Sep 01, 2020 5:43 am

Also, somewhat unrelated question, what happens when trying to send data to a read-only register? I've read several answers but some of them seem contradictory.
Most of the time, nothing happens.

In some rare cases, that would be with some mappers, it's possible that bus conflicts could appear though, when they only decode the adress lines and not the R/W line, and just force the data lines whenever adress X is accessed. This would be a rare case though, and only for mappers using registers at $4020-$7FFF.
Last edited by Bregalad on Tue Sep 01, 2020 5:51 am, edited 1 time in total.
Useless, lumbering half-wits don't scare us.

Oziphantom
Posts: 913
Joined: Tue Feb 07, 2017 2:03 am

Re: Questions about NES programming and architecture

Post by Oziphantom » Tue Sep 01, 2020 5:47 am

the VIC-II chip can see 16.5K at once. However a screen takes 1K and a char set takes 2 K, leaving you with 13K for sprites. Hardly a limit. Its down to bandwidth, it could stall the CPU for the entire visible frame to get 2 bytes per 8 pixels but then the CPU would be limited to VBlank. However it also has a Bitmap mode which eats 9K.
The C64 has 64K of RAM, and you are free to put code and data where you want in said 64K. If you have a cart you can bank in 16K and then have as many banks as you want. We now have upto 16Mb carts however 512K and 1MB are more standard.

writing to a read only register, just causes the data lines to hit a wall and energy get dissipated as tiny amount of heat.

lidnariq
Posts: 9676
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Questions about NES programming and architecture

Post by lidnariq » Tue Sep 01, 2020 10:56 am

Oziphantom wrote:
Tue Sep 01, 2020 1:36 am
The NES is 1.8 the C64 is 1±0.02. Its not barely half its clearly over half.
Sorry. Hadn't meant to bait you. I was just trying to demonstrate how much overhead a multiplexed bus costs.

In the case of the C64 and VIC-20, they were designed from the ground up to target this multiplexed bus. Trying to get something similar out of the NES would be a challenge, even if we allow as much caching as possible.

The PPU issues 34 background fetches and 8 sprite fetches on every scanline. One of the background fetches is never used. All of these always consist of four bytes, even though two fetches are entirely wasted for each sprite, and two fetches are very cacheable for the background. So we could do something like the C64's "badlines" and cache name- and attribute- table data, reducing our hypothetical NES64 to only needing (33+8)×2 = 82 bytes per scanline. That's only 1.3MB/s, so a multiplexed bus would only need to run at 2.6MB/s, which is the same speed as the PPU currently does run.

But.

This is much more complex, and requires a more complex design to support it. Commodore bought their own silicon foundry (MOS) and the skills that went along with it, but Nintendo at best had a contract with Ricoh. And both the C64 and VIC-20 were more expensive than the NES/Famicom.

User avatar
Quietust
Posts: 1603
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: Questions about NES programming and architecture

Post by Quietust » Tue Sep 01, 2020 11:44 am

lidnariq wrote:
Tue Sep 01, 2020 10:56 am
The PPU issues 34 background fetches and 8 sprite fetches on every scanline. One of the background fetches is never used. All of these always consist of four bytes, even though two fetches are entirely wasted for each sprite, and two fetches are very cacheable for the background. So we could do something like the C64's "badlines" and cache name- and attribute- table data, reducing our hypothetical NES64 to only needing (33+8)×2 = 82 bytes per scanline.
While inefficient, the PPU's simplistic behavior allowed for significant improvement by external hardware - if it had cached the attribute table data, then the MMC5's extended attribute mode wouldn't have been possible.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.

User avatar
Secamline
Posts: 39
Joined: Sat Aug 15, 2020 4:25 pm

Re: Questions about NES programming and architecture

Post by Secamline » Wed Sep 02, 2020 4:03 am

I'm actually curious about how the PPU works now! In fact all those old systems that work with a CRT screen are a mystery for me. Like I said in my first post, I'm familiar with programming for the Atari 2600, so I pretty much know how it works and how the CPU actually generates the graphics through registers, but I have no idea how the system manages to put the right sprite and playfield pixel at the right position. From what I know, it's just a matter of timing, but still it all seems kind of obscure to me. Same goes for the NES, how does the PPU know that a sprite should be displayed ontop of part of the background. There's no screen buffer like on modern systems, and of course it can't just draw the background first with the sprites ontop of it since everything has to be drawn in one frame.

Oziphantom
Posts: 913
Joined: Tue Feb 07, 2017 2:03 am

Re: Questions about NES programming and architecture

Post by Oziphantom » Wed Sep 02, 2020 4:34 am

well imagine the PPU has a very simple CPU that only has operations to sort the graphics.

Basically it has a small FSM that loops through the Sprite RAM, does y checks, then enters the number into a buffer

Then when the scan line is drawn, it checks the sprite buffers X entries to see if this x pos has a sprite, then it outputs the sprite data.

Basically it doesn't have a frame buffer, it has a "line buffer"

see here for in depth details https://github.com/martinpiper/BombJack

User avatar
tokumaru
Posts: 11859
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Questions about NES programming and architecture

Post by tokumaru » Wed Sep 02, 2020 5:14 am

Graphics are just video signals generated from data by a processing unit. This data comes from external memory and/or registers internal to the processing unit.

On the 2600, the TIA outputs playfield pixels in a loop. In parallel to that, each object (players, missiles and ball) has a counter that lasts one scanline, and when that counter expires, the object is drawn and the counter is reset, so the object is drawn at the same position every scanline. You position the objects by manually resetting these counters, and fine adjustments to the positions are made by clocking the counters additional times, forcing the objects to move specific numbers of pixels.

On the NES, the PPU is programmed to combine data from the name, attribute and pattern tables in a loop, using the scroll to calculate which parts of the name table to read. Background pixels are formed from that data and from the colors stored in palette RAM. Parallel to that, the PPU scans all 64 sprites in the OAM looking for the ones that will be visible on the next scanline, and it copies the graphics of the ones it finds to an internal buffer. While rendering scanlines, for each position the PPU selects either the computed background pixel or a buffered sprite pixel, taking the sprite priorities into consideration.

This is an oversimplification of the process, which is described in detail in our wiki. There you can read about everything that the PPU does during each cycle of a frame.

User avatar
Secamline
Posts: 39
Joined: Sat Aug 15, 2020 4:25 pm

Re: Questions about NES programming and architecture

Post by Secamline » Wed Sep 02, 2020 6:52 am

So the PPU has an internal buffer for a whole scanline worth of pixels? How is the color information stored inside the buffer then?

User avatar
Dwedit
Posts: 4352
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Questions about NES programming and architecture

Post by Dwedit » Wed Sep 02, 2020 7:06 am

The buffer isn't a whole scanline, just 16 pixels. It's refilled every time it goes down to 8 pixels.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

User avatar
tokumaru
Posts: 11859
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Questions about NES programming and architecture

Post by tokumaru » Wed Sep 02, 2020 7:11 am

Secamline wrote:
Wed Sep 02, 2020 6:52 am
So the PPU has an internal buffer for a whole scanline worth of pixels?
No, it has a buffer for 16 background pixels (which gets constantly rotated and fed with more pixels as the scanline is rendered) and 64 sprite pixels, stored separately as 8 lines of 8 pixels, copied from the patterns of the sprites that were found to be in range during the most recent OAM scan.
How is the color information stored inside the buffer then?
Color is not stored in the buffer, only the 2-bit patterns. Right before the output, these patterns are combined with 2-bit attributes (buffered separately and reused for 8 consecutive pixels) and the 4-bit results are used to look up the final color values in palette RAM.

Fiskbit
Posts: 162
Joined: Sat Nov 18, 2017 9:15 pm

Re: Questions about NES programming and architecture

Post by Fiskbit » Wed Sep 02, 2020 7:25 am

The upper half of this picture shows some of the buffering in action. The grey and black line is a period where rendering is disabled, and after it's enabled again, 8 buffered sprite pixels (the red and blue section sticking out) are drawn that would have been drawn at the start of the grey region. I think the following white section is buffered background that wasn't completely fetched. The lower half is from Mesen; actually emulating this behavior is very slow and I'm not aware of any emulators that do it. Sour did write an implementation of some of this for Mesen, but it's not checked in because of the performance impact.

Image

User avatar
Quietust
Posts: 1603
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: Questions about NES programming and architecture

Post by Quietust » Wed Sep 02, 2020 12:59 pm

tokumaru wrote:
Wed Sep 02, 2020 7:11 am
Secamline wrote:
Wed Sep 02, 2020 6:52 am
So the PPU has an internal buffer for a whole scanline worth of pixels?
No, it has a buffer for 16 background pixels (which gets constantly rotated and fed with more pixels as the scanline is rendered) and 64 sprite pixels, stored separately as 8 lines of 8 pixels, copied from the patterns of the sprites that were found to be in range during the most recent OAM scan.
Each sprite also has an 8-bit counter that's initialized with the X position, and those counters get decremented once per pixel - once each one reaches zero, then it starts outputting the pixel data from a pair of shift registers (and palette+priority data from additional registers), and an 8-to-1 priority encoder picks which sprite is going to be combined with the background data.

In a way, it's similar to what the TIA does, except that the 8 sprite units are dynamically loaded from sprite RAM based on their Y coordinates rather than being initialized manually.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.

User avatar
tokumaru
Posts: 11859
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Questions about NES programming and architecture

Post by tokumaru » Wed Sep 02, 2020 5:27 pm

Quietust wrote:
Wed Sep 02, 2020 12:59 pm
In a way, it's similar to what the TIA does, except that the 8 sprite units are dynamically loaded from sprite RAM based on their Y coordinates rather than being initialized manually.
That's true. And I'm sure I already heard of kernels for the 2600 that dynamically allocate the hardware sprites in order to support more than 2 software sprites per scanline.

Post Reply