More detail on SNES PPU tile-to-pixel pipeline/compositing?

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
ittyBittyByte
Posts: 24
Joined: Sun Dec 18, 2016 1:11 pm

More detail on SNES PPU tile-to-pixel pipeline/compositing?

Post by ittyBittyByte »

So how does the PPU actually convert sprites and background into pixels? Does it directly loop through each individual pixel on each line, then loop through tiles/sprites to calculate the final pixel that should end up in that spot? Or does it loop through each tile or sprite, draw a row of pixels all at once (assuming there isn't a pixel there already)? Both of those sound like they'd be a very intensive process, especially for such old hardware; there's just so many pixels (and layers!). Not to even mention the

Code: Select all

byte or word of VRAM data -> bits -> pixel value + palette -> final color
conversion, which sounds time-consuming as well considering the sheer amount of pixels on screen. So how is everything blitted onto the screen so efficiently?

Also, clipping: transparent pixels. Is it as simple as

Code: Select all

if (pixel_value != 0) draw_pixel() else continue
? Retro Game Mechanics Explained mentions clipping behavior here (without going into any real detail), it seems to decide which sprite pixels to render and which to "clip away" in a convoluted way.

Finally, compositing (semi-related to above) is something I'm wondering about as well. According to the video I linked above, the various types of graphics are rendered separately. Are objects and background layers all rendered and stored into their own personal "buffers" before compositing? If so, do the "high-priority" bg tiles and sprites have their own separate buffers from low-priority, or are they separated from lower-priority tiles in a different way?

It is also not clear to me whether the compositing process is done per-pixel one after the other, or if each scanline is rendered in it's entirety before being composited together.

Can someone help me understand, at least on a slightly lower level than the video I linked, how does the "internal logic" of clipping, drawing and compositing everything work, and how is it so efficient?
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: More detail on SNES PPU tile-to-pixel pipeline/compositi

Post by tepples »

The S-PPU is in fact too big to fit on one chip using the standard cell process Ricoh had in 1990. So during hblank time, the sprite tiles are retrieved from video memory and composited to a 256x1-pixel buffer in one chip. During draw time, it feeds a stream of sprite pixels to the other PPU and clears the buffer while scanning OAM for sprites to be retrieved next hblank. This other PPU reads tilemap and CHR for four backgrounds, decodes them on the fly through four shift registers, and feeds them and the sprite stream pixel-by-pixel into a 5-input priority encoder. (For comparison, the NES's sprite unit is eight shift registers and an 8-input priority encoder, followed by a 2-input priority encoder to combine them with the background.) This priority encoder is double-pumped, meaning it can return two results per pixel, to be displayed either in the left and right halves of a pixel (hi-res backgrounds) or blended using addition, average, or subtraction.

Sega instead chose to keep its Genesis VDP in one chip, which limited the palette memory it could address, and pushed more advanced features like texture compression and affine mapping out to a video coprocessor that was delayed until the Sega CD.
srg320
Posts: 32
Joined: Fri Feb 16, 2018 5:52 am
Location: Ukraine

Re: More detail on SNES PPU tile-to-pixel pipeline/compositi

Post by srg320 »

tepples wrote:This priority encoder is double-pumped, meaning it can return two results per pixel
Mode 7 also uses double-pumped for get two results of the multiplication or uses two separate multipliers?

Thanks, very helpful information.
creaothceann
Posts: 611
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: More detail on SNES PPU tile-to-pixel pipeline/compositi

Post by creaothceann »

ittyBittyByte wrote:So how does the PPU actually convert sprites and background into pixels? Does it directly loop through each individual pixel on each line, then loop through tiles/sprites to calculate the final pixel that should end up in that spot? Or does it loop through each tile or sprite, draw a row of pixels all at once (assuming there isn't a pixel there already)? Both of those sound like they'd be a very intensive process, especially for such old hardware; there's just so many pixels (and layers!). Not to even mention the byte or word of VRAM data -> bits -> pixel value + palette -> final color conversion, which sounds time-consuming as well considering the sheer amount of pixels on screen. So how is everything blitted onto the screen so efficiently?
There is no framebuffer holding the rendered picture. Each pair of the 512 pixels per line is rendered and output on-the-fly as the electron beam scans over the screen.

Background tiles are loaded from VRAM a few tiles in advance. The sprites that are visible on a line are determined outside of HBLANK and, as tepples said, loaded and rendered into a line buffer (ordered by their index, not their priority) during HBLANK.
ittyBittyByte wrote:Also, clipping: transparent pixels. Is it as simple as if (pixel_value != 0) draw_pixel() else continue? Retro Game Mechanics Explained mentions clipping behavior here (without going into any real detail), it seems to decide which sprite pixels to render and which to "clip away" in a convoluted way.
Clipping is not transparency. The latter means that the output of one background layer renderer controls if this result makes it into the final output. Clipping on the other hand makes the pixel output black; it basically controls if the colors loaded from CGRAM are blocked or not.
ittyBittyByte wrote:how is it so efficient?
The amount of data to process is much less than what is usual today: The graphics are mostly just indices into a palette. Transparency is just a palette index of zero - very easy to do. Tiles are composed out of bitplanes (the 0th, 1st, 2nd, ... bits of a pixel are stored together), so for the programmer expanding a 3bpp tile from ROM to a 4bpp tile in VRAM is as easy as adding one byte per 8-pixel row.

The graphics system is basically just advancing counters and shifting groups of bits around. When speed is important, things are done in parallel by duplicated hardware. For example the 4 background layer pixels for the current screen position are most likely calculated all at once.

Remember that the hardware is basically just one big circuit. The bits of the data stored in hardware registers are hooked up directly to switches (transistors) that open and close certain lines, directing the power flowing from VCC to ground while the system has some time (one of the two clock phases of a master clock cycle) to stabilizes the voltages. This is how the BG pixel combiner could look like:

EDIT: incorrect drawing removed
Attachments
BG combiner (v2).pdf
(78.4 KiB) Downloaded 313 times
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
Post Reply