For Lizard, I handled this by dividing the screen into halves and quarters, and using that as sort of a parity for whether draw a tile or not. I'll talk about this in one dimension, but you could do the same process independently in both dimensions.
The 16-bit position that goes in is generally the centre of the object, partly because that makes flipping simple, but the main requirement here is that the sprite should not extend more than 64 pixels -/+ this centre position. (Internally I call this centre position a "
pivot point", being used to that term from 3D animation.)
In this example I will use
^ to represent an exclusive-or operation (
EOR).
Step 1: Figure out its relationship to the screen. Subtract pivot from camera to find the screen relative position, with 16 bit result
X. This leaves a few categories of where it might appear:
1.
X < -64 or
X >= 256+64 --- completely offscreen, return early (far outside screen)
2. 64 =<
X < 196 --- completely onscreen, draw with no culling (middle two quadrants)
3.
X < 64 or
X >= 196 --- onscreen but could be partially culled (outer two quadrants)
4.
X < 0 or
X >= 256 --- offscreen but could be partially onscreen, i.e. inverse culling (two quadrants adjacent to visible screen)
At this point, reduce
X to an 8-bit value
x. 16-bit math is no longer required.
The high bit of
x (i.e. representing
x >= 128) will become the parity we will use for culling. We want to create a parity value
p so that if the high bit of
x ^ p = 0, the position is considered offscreen (culled), and if it = 1, the position is onscreen.
So, if
X was onscreen (case 3),
p =
x ^ $80. If it was offscreen (case 4),
p =
x. Store
p on the zero page for easy use.
Step 2: Cull the metatiles. (Case 3 and 4 only.)
For each tile, add its signed 8-bit position to
x. I'll call this result
u. The result is still 8-bit; we never need to go back to 16-bits, the only additional information we needed was
p.
So, calculate
u ^ p and inspect the high bit of the result (
BMI/
BPL). If the high bit is 1, add the tile to your OAM buffer. If it's 0, skip it (culled).
That's basically it. The whole point is doing 16-bit operations only once (on the pivot point), and then from there taking advantage of the constraint that sprites won't ever be wide enough to end up on more than half of the quadrants of the screen to encode the leftover information needed to decide onscreen/offscreen, represented with that single parity.
Even differentiating the 4 cases doesn't require more 16-bit compares beyond the intial pivot - camera subtraction (i.e. you can obtain quadrant from the high 2 bits of the low 8-bit result, and a little bit of logic on the high result). I didn't outline all of that here to keep it simple, but really there's only one 16-bit subtract (per axis) for the whole metasprite.
Case 1 resulted in skipping the whole metasprite. Case 2 requires parity to be ignored and just to draw all tiles; this can either be a separate routine, or maybe an extra
ORA with a second flag to force the result to always say "onscreen". Case 3 and 4 can be rendered with the same code, the input
p controls which half of the 8-bit "screen" (reduced by wrapping) is considered "onscreen" or "offscreen".