Culling

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

Post Reply
User avatar
bleubleu
Posts: 108
Joined: Wed Apr 04, 2018 7:29 pm
Location: Montreal, Canada

Culling

Post by bleubleu »

Hi.

My objects keep their position in 16-bit, world space. When it comes time to draw them, I simply subtract the camera position and draw the metasprite.

how do you guys handle culling off-screen objects? There seems to be 2 levels :
  • First cull the object using a simple bounding box, to see if the whole thing is off camera.
  • If it is at least partially on screen, draw it, and cull the individual 8x8 tiles to avoid stuff wrapping around the screen.
Do you guys have a clever way of doing this while minimizing the number of 16-bit operations (ideally after the camera transform, everything would be back to 8-bit...).

-Mat
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: Culling

Post by Kasumi »

Assume the position is the top left corner. Assume the widest or tallest an object can be is 256 pixels. Assume that sprite offsets are always positive. (If you need negative offsets, add the maximum negative offset before drawing, and subtract it after drawing. Then offset your offset data accordingly so it can be exclusively positive.)

Subtract the camera position from the object's position and store the low byte. (The high byte is in A. I guess store this too if you don't want a lot of duplicated similar code.)

If the high byte is zero, add the width of the metasprite to the low byte. If the carry is clear, individual clipping is not needed for the sprites in that dimension. It is fully on screen for that dimension. Otherwise it is partially off screen on the right.

If the high byte is 255, add the width of the metasprite to the low byte. If the carry is set, the object is partially on screen on the left, otherwise it is fully offscreen. Return and do nothing if fully offscreen.

If the high byte was anything but 0 or 255, the object is fully offscreen. Return and do nothing.

The high byte check could look like this:

Code: Select all

lda OBJxlow,x
sec
sbc scrollxlow
sta templow

lda OBJxhigh,x
sbc scrollxhigh
beq partiallyonscreenleft
cmp #255
bne fullyoffscreen
lda templow
clc
adc OBJwidth,x
bcc fullyoffscreen
;If here, object is partially on screen on the right


partiallyonscreenleft:
lda templow
clc
adc OBJwidth,x
bcc fullyonscreenx
;If here, object is partially on screen on the left


fullyonscreenx:
;If here, the object is fully on screen in the X dimension

fullyoffscreen:
rts
If you know the object is fully onscreen, you can just add and store your offsets.
If you know the object is partially on screen with a high byte of zero, you add the low byte to the offset, and branch on carry set to not drawing the sprite.
If you know the object is partially on screen with a high byte of 255, you add the low byte to the offset, and branch on carry clear to not drawing the sprite.

So: You use the high byte of the initial 16bit subtract to find out if the object is fully on screen, fully offscreen, partially off screen on the right, or partially off screen on the left.
If fully offscreen, you don't even need to check the other dimension. No drawing should happen.
If fully on screen, you only need to add the low byte to the offset and store it. No high byte check at all.
If partially off screen on the left, you need to add the low byte to the offset but only store if the carry is set.
If partially on screen on the right, you need to add the low byte to the offset but only store if the carry is clear.

Obviously this requires multiple drawing algorithms for the fastest possible result. I had two. One for fully on screen (the high byte in both dimensions was zero, and there was no carry set after adding the width for either) and one for partially on screen. (Which meant I had to do the 16 bit add even if one of the dimensions was fully on screen.) I figured that was fine.

Edit: If you don't have duplicate routines for each type of partial offscreen, you still have to do a 16bit add for every sprite, but you just do a non zero branch.

Code: Select all

	lda [reserved4],y;X Offset
	clc
	adc <reserved0;X Position
	sta OAM+3,x
	
	lda <reserved1
	adc #$00
	bne dms.partial.skipsprite.twoiny
Edit2: Knowing the direction of partially offscreen saves you the load of the high byte and the add. (You just need to branch depending on carry from the low byte.)
Fully on screen is just this:

Code: Select all

;35 bytes
	iny
	
	lda [reserved4],y;Y position
	clc
	adc <reserved2
	sta OAM,x

	iny
	
	lda [reserved4],y;X Position
	;clc
	adc <reserved0
	sta OAM+3,x
	
	iny

	
	lda [reserved4],y
	;clc
	adc <reserved7;This should guarantee a clear carry
	sta OAM+1,x
	iny
	
	lda [reserved4],y
	sta OAM+2,x
	
	txa
	;clc;guaranteed clear above
	adc #4
	tax;Carry not guaranteed anything after that add since oampos wraps
Which I also unrolled.
Last edited by Kasumi on Tue Apr 24, 2018 6:18 pm, edited 2 times in total.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Culling

Post by tokumaru »

I always assume the object is on screen, and do the following:
  • Subtract the camera's coordinates from the object's to find the "center" of the sprite, while also adjusting for flipping and the excess-128 format that will be used later;
  • Set up masks to invert the sprite offsets in case of flipping (EOR masks that are either $00 or $FF);
  • Iterate over the individual sprites of the metasprite, EOR'ing the excess-128 offsets against the respective masks and adding the result to the reference coordinates;
  • A final coordinate with a high byte other then $00 means the sprite is off screen, so just move on to the next sprite;
  • If the sprite is on screen, copy the remaining bytes (attributes - EOR'ed against another flipping mask - and the tile index) to the current OAM position and advance to the next position;
In the end I compare the OAM index to what it was in the beginning, to detect whether any sprites were output, and use this information when deciding whether to deactivate objects (I have two conditions for this to happen: the object has to be off screen and its spawn point outside of the active area).

My biggest tip for reducing the overhead of 16-bit math is using the excess-128 format in the metasprite definitions, so you don't have to worry about selecting/forming a high byte for them (it will always be 0 in excess-128 format, even when there's flipping), but you still need the 16-bit result to tell whether the resulting coordinates are on screen (you can obviously throw the high byte away after detecting that, there's no need to save it anywhere).

Of course things will have to be a little different if your metasprites are stored in grids rather than free sprites.

EDIT: Kasumi's method has some similarities to mine, but there are a lot of things that differ depending on the format in which you want to store the data and the limitations you impose on how the sprites can be formed. I personally don't like to use the top left corner as the reference point because I find that too restrictive. For example, you can't easily make a character that grows upward (such as James Pond) because you'd have to awkwardly manipulate the object's position in addition to its size while it's actually standing still. I don't think this is always good for physics either, specially if you're doing any sort of rotation - I prefer to have the hotspot at the center of the character's feet, so I can easily rotate around that when running up steep slopes and such. The best thing about metasprites made of free sprites IMO is that the hotspot can be anywhere really, so you can use whatever makes the most sense for each character, taking into account how they behave and interact with the level map.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Culling

Post by rainwarrior »

For Lizard, I handled this by dividing the screen into halves and quarters, and using that as sort of a parity for whether draw a tile or not. I'll talk about this in one dimension, but you could do the same process independently in both dimensions.

The 16-bit position that goes in is generally the centre of the object, partly because that makes flipping simple, but the main requirement here is that the sprite should not extend more than 64 pixels -/+ this centre position. (Internally I call this centre position a "pivot point", being used to that term from 3D animation.)

In this example I will use ^ to represent an exclusive-or operation (EOR).


Step 1: Figure out its relationship to the screen. Subtract pivot from camera to find the screen relative position, with 16 bit result X. This leaves a few categories of where it might appear:

1. X < -64 or X >= 256+64 --- completely offscreen, return early (far outside screen)
2. 64 =< X < 196 --- completely onscreen, draw with no culling (middle two quadrants)
3. X < 64 or X >= 196 --- onscreen but could be partially culled (outer two quadrants)
4. X < 0 or X >= 256 --- offscreen but could be partially onscreen, i.e. inverse culling (two quadrants adjacent to visible screen)

At this point, reduce X to an 8-bit value x. 16-bit math is no longer required.

The high bit of x (i.e. representing x >= 128) will become the parity we will use for culling. We want to create a parity value p so that if the high bit of x ^ p = 0, the position is considered offscreen (culled), and if it = 1, the position is onscreen.

So, if X was onscreen (case 3), p = x ^ $80. If it was offscreen (case 4), p = x. Store p on the zero page for easy use.


Step 2: Cull the metatiles. (Case 3 and 4 only.)

For each tile, add its signed 8-bit position to x. I'll call this result u. The result is still 8-bit; we never need to go back to 16-bits, the only additional information we needed was p.

So, calculate u ^ p and inspect the high bit of the result (BMI/BPL). If the high bit is 1, add the tile to your OAM buffer. If it's 0, skip it (culled).


That's basically it. The whole point is doing 16-bit operations only once (on the pivot point), and then from there taking advantage of the constraint that sprites won't ever be wide enough to end up on more than half of the quadrants of the screen to encode the leftover information needed to decide onscreen/offscreen, represented with that single parity.

Even differentiating the 4 cases doesn't require more 16-bit compares beyond the intial pivot - camera subtraction (i.e. you can obtain quadrant from the high 2 bits of the low 8-bit result, and a little bit of logic on the high result). I didn't outline all of that here to keep it simple, but really there's only one 16-bit subtract (per axis) for the whole metasprite.

Case 1 resulted in skipping the whole metasprite. Case 2 requires parity to be ignored and just to draw all tiles; this can either be a separate routine, or maybe an extra ORA with a second flag to force the result to always say "onscreen". Case 3 and 4 can be rendered with the same code, the input p controls which half of the 8-bit "screen" (reduced by wrapping) is considered "onscreen" or "offscreen".
User avatar
bleubleu
Posts: 108
Joined: Wed Apr 04, 2018 7:29 pm
Location: Montreal, Canada

Re: Culling

Post by bleubleu »

Thank a lot everybody.

I think ill be able to figure out an algorithm using some of the ideas you suggested. rainwarrior's suggestion seems more in-line with how my data is stored right now and is simple to implement.

As a somewhat related question, something obvious i *wasn't* doing (since all i had was one guy in a room) is clear the OAM every frame.

What do you guys do? Do you take the lazy way and clear it at the beginning of the frame? Of clear the unused entries at the end? Whats the fastest OAM-clearning loop you guys can think of ? ;)

-Mat
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Culling

Post by rainwarrior »

I have an index to the OAM buffer as it gets filled up. It starts at 0. Every tile added increases it by 4, etc. This tells me exactly where the leftover OAM that needs to be cleared is. So, when I'm done adding sprites, I call a "finish" routine that just puts Y=$FF in all the remaining tiles.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Culling

Post by tepples »

Everything I've made since Zap Ruder has used the following finish routine to clear remaining OAM entries (source: ppuclear.s).

Code: Select all

OAM=$0200
;;
; Moves all sprites starting at address X (e.g, $04, $08, ..., $FC)
; below the visible area.
; X is 0 at the end.
.proc ppu_clear_oam

  ; First round the address down to a multiple of 4 so that it won't
  ; freeze should the address get corrupted.
  txa
  and #%11111100
  tax
  lda #$FF  ; Any Y value from $EF through $FF will work
loop:
  sta OAM,x
  inx
  inx
  inx
  inx
  bne loop
  rts
.endproc
On the Game Boy I use something similar:

Code: Select all

;;
; Moves sprites in the display list from SOAM+a through
; SOAM+$9C offscreen by setting their Y coordinate to 0, which is
; completely above the screen top (16).
lcd_clear_oam::
  ; Destination address in shadow OAM
  ld h,high(SOAM)
  and $FC
  ld l,a

  ; iteration count
  rrca
  rrca
  add 256 - 40
  ld c,a

  xor a
.rowloop:
  ld [hl+],a
  inc l
  inc l
  inc l
  inc c
  jr nz, .rowloop
  ret
User avatar
bleubleu
Posts: 108
Joined: Wed Apr 04, 2018 7:29 pm
Location: Montreal, Canada

Re: Culling

Post by bleubleu »

rainwarrior wrote:I have an index to the OAM buffer as it gets filled up. It starts at 0. Every tile added increases it by 4, etc. This tells me exactly where the leftover OAM that needs to be cleared is. So, when I'm done adding sprites, I call a "finish" routine that just puts Y=$FF in all the remaining tiles.
Yeah that's what i just did. If i were really motivated I would keep track how how far I need to clear based on the previous frame, but that's not gonna happen tonight. You culling algorithm works great. It's all implemented and working. Real cool stuff.

-Mat
Post Reply