ittyBittyByte wrote:So how does the PPU actually convert sprites and background into pixels? Does it directly loop through each individual pixel on each line, then loop through tiles/sprites to calculate the final pixel that should end up in that spot? Or does it loop through each tile or sprite, draw a row of pixels all at once (assuming there isn't a pixel there already)? Both of those sound like they'd be a very intensive process, especially for such old hardware; there's just so many pixels (and layers!). Not to even mention the byte or word of VRAM data -> bits -> pixel value + palette -> final color conversion, which sounds time-consuming as well considering the sheer amount of pixels on screen. So how is everything blitted onto the screen so efficiently?
There is no framebuffer holding the rendered picture. Each pair of the 512 pixels per line is rendered and output on-the-fly as the electron beam scans over the screen.
Background tiles are loaded from VRAM a few tiles in advance. The sprites that are visible on a line are determined outside of HBLANK and, as tepples said, loaded and rendered into a line buffer (ordered by their index, not their priority) during HBLANK.
Also, clipping: transparent pixels. Is it as simple as if (pixel_value != 0) draw_pixel() else continue
? Retro Game Mechanics Explained mentions clipping behavior here
(without going into any real detail), it seems to decide which sprite pixels to render and which to "clip away" in a convoluted way.
Clipping is not transparency. The latter means that the output of one background layer renderer controls if this result makes it into the final output. Clipping on the other hand makes the pixel output black; it basically controls if the colors loaded from CGRAM are blocked or not.
ittyBittyByte wrote:how is it so efficient?
The amount of data to process is much less than what is usual today: The graphics are mostly just indices into a palette. Transparency is just a palette index of zero - very easy to do. Tiles are composed out of bitplanes (the 0th, 1st, 2nd, ... bits of a pixel are stored together), so for the programmer expanding a 3bpp tile from ROM to a 4bpp tile in VRAM is as easy as adding one byte per 8-pixel row.
The graphics system is basically just advancing counters and shifting groups of bits around. When speed is important, things are done in parallel by duplicated hardware. For example the 4 background layer pixels for the current screen position are most likely calculated all at once.
Remember that the hardware is basically just one big circuit. The bits of the data stored in hardware registers are hooked up directly to switches (transistors) that open and close certain lines, directing the power flowing from VCC
to ground while the system has some time (one of the two clock phases of a master clock cycle) to stabilizes the voltages. This is how the BG pixel combiner could look like:
EDIT: incorrect drawing removed