It is currently Fri Feb 22, 2019 6:57 am

 All times are UTC - 7 hours

 Page 1 of 1 [ 8 posts ]
 Print view Previous topic | Next topic
Author Message
 Post subject: CullingPosted: Tue Apr 24, 2018 4:11 pm

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 58
Hi.

My objects keep their position in 16-bit, world space. When it comes time to draw them, I simply subtract the camera position and draw the metasprite.

how do you guys handle culling off-screen objects? There seems to be 2 levels :

• First cull the object using a simple bounding box, to see if the whole thing is off camera.
• If it is at least partially on screen, draw it, and cull the individual 8x8 tiles to avoid stuff wrapping around the screen.

Do you guys have a clever way of doing this while minimizing the number of 16-bit operations (ideally after the camera transform, everything would be back to 8-bit...).

-Mat

Top

 Post subject: Re: CullingPosted: Tue Apr 24, 2018 6:08 pm

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1273
Assume the position is the top left corner. Assume the widest or tallest an object can be is 256 pixels. Assume that sprite offsets are always positive. (If you need negative offsets, add the maximum negative offset before drawing, and subtract it after drawing. Then offset your offset data accordingly so it can be exclusively positive.)

Subtract the camera position from the object's position and store the low byte. (The high byte is in A. I guess store this too if you don't want a lot of duplicated similar code.)

If the high byte is zero, add the width of the metasprite to the low byte. If the carry is clear, individual clipping is not needed for the sprites in that dimension. It is fully on screen for that dimension. Otherwise it is partially off screen on the right.

If the high byte is 255, add the width of the metasprite to the low byte. If the carry is set, the object is partially on screen on the left, otherwise it is fully offscreen. Return and do nothing if fully offscreen.

If the high byte was anything but 0 or 255, the object is fully offscreen. Return and do nothing.

The high byte check could look like this:
Code:
lda OBJxlow,x
sec
sbc scrollxlow
sta templow

lda OBJxhigh,x
sbc scrollxhigh
beq partiallyonscreenleft
cmp #255
bne fullyoffscreen
lda templow
clc
bcc fullyoffscreen
;If here, object is partially on screen on the right

partiallyonscreenleft:
lda templow
clc
bcc fullyonscreenx
;If here, object is partially on screen on the left

fullyonscreenx:
;If here, the object is fully on screen in the X dimension

fullyoffscreen:
rts

If you know the object is fully onscreen, you can just add and store your offsets.
If you know the object is partially on screen with a high byte of zero, you add the low byte to the offset, and branch on carry set to not drawing the sprite.
If you know the object is partially on screen with a high byte of 255, you add the low byte to the offset, and branch on carry clear to not drawing the sprite.

So: You use the high byte of the initial 16bit subtract to find out if the object is fully on screen, fully offscreen, partially off screen on the right, or partially off screen on the left.
If fully offscreen, you don't even need to check the other dimension. No drawing should happen.
If fully on screen, you only need to add the low byte to the offset and store it. No high byte check at all.
If partially off screen on the left, you need to add the low byte to the offset but only store if the carry is set.
If partially on screen on the right, you need to add the low byte to the offset but only store if the carry is clear.

Obviously this requires multiple drawing algorithms for the fastest possible result. I had two. One for fully on screen (the high byte in both dimensions was zero, and there was no carry set after adding the width for either) and one for partially on screen. (Which meant I had to do the 16 bit add even if one of the dimensions was fully on screen.) I figured that was fine.

Edit: If you don't have duplicate routines for each type of partial offscreen, you still have to do a 16bit add for every sprite, but you just do a non zero branch.
Code:
lda [reserved4],y;X Offset
clc
sta OAM+3,x

lda <reserved1
bne dms.partial.skipsprite.twoiny

Edit2: Knowing the direction of partially offscreen saves you the load of the high byte and the add. (You just need to branch depending on carry from the low byte.)
Fully on screen is just this:
Code:
;35 bytes
iny

lda [reserved4],y;Y position
clc
sta OAM,x

iny

lda [reserved4],y;X Position
;clc
sta OAM+3,x

iny

lda [reserved4],y
;clc
adc <reserved7;This should guarantee a clear carry
sta OAM+1,x
iny

lda [reserved4],y
sta OAM+2,x

txa
;clc;guaranteed clear above
tax;Carry not guaranteed anything after that add since oampos wraps

Which I also unrolled.

_________________
https://kasumi.itch.io/indivisible

Last edited by Kasumi on Tue Apr 24, 2018 6:18 pm, edited 2 times in total.

Top

 Post subject: Re: CullingPosted: Tue Apr 24, 2018 6:12 pm

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11170
Location: Rio de Janeiro - Brazil
I always assume the object is on screen, and do the following:

• Subtract the camera's coordinates from the object's to find the "center" of the sprite, while also adjusting for flipping and the excess-128 format that will be used later;
• Set up masks to invert the sprite offsets in case of flipping (EOR masks that are either \$00 or \$FF);
• Iterate over the individual sprites of the metasprite, EOR'ing the excess-128 offsets against the respective masks and adding the result to the reference coordinates;
• A final coordinate with a high byte other then \$00 means the sprite is off screen, so just move on to the next sprite;
• If the sprite is on screen, copy the remaining bytes (attributes - EOR'ed against another flipping mask - and the tile index) to the current OAM position and advance to the next position;

In the end I compare the OAM index to what it was in the beginning, to detect whether any sprites were output, and use this information when deciding whether to deactivate objects (I have two conditions for this to happen: the object has to be off screen and its spawn point outside of the active area).

My biggest tip for reducing the overhead of 16-bit math is using the excess-128 format in the metasprite definitions, so you don't have to worry about selecting/forming a high byte for them (it will always be 0 in excess-128 format, even when there's flipping), but you still need the 16-bit result to tell whether the resulting coordinates are on screen (you can obviously throw the high byte away after detecting that, there's no need to save it anywhere).

Of course things will have to be a little different if your metasprites are stored in grids rather than free sprites.

EDIT: Kasumi's method has some similarities to mine, but there are a lot of things that differ depending on the format in which you want to store the data and the limitations you impose on how the sprites can be formed. I personally don't like to use the top left corner as the reference point because I find that too restrictive. For example, you can't easily make a character that grows upward (such as James Pond) because you'd have to awkwardly manipulate the object's position in addition to its size while it's actually standing still. I don't think this is always good for physics either, specially if you're doing any sort of rotation - I prefer to have the hotspot at the center of the character's feet, so I can easily rotate around that when running up steep slopes and such. The best thing about metasprites made of free sprites IMO is that the hotspot can be anywhere really, so you can use whatever makes the most sense for each character, taking into account how they behave and interact with the level map.

Top

 Post subject: Re: CullingPosted: Tue Apr 24, 2018 8:37 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7224
For Lizard, I handled this by dividing the screen into halves and quarters, and using that as sort of a parity for whether draw a tile or not. I'll talk about this in one dimension, but you could do the same process independently in both dimensions.

The 16-bit position that goes in is generally the centre of the object, partly because that makes flipping simple, but the main requirement here is that the sprite should not extend more than 64 pixels -/+ this centre position. (Internally I call this centre position a "pivot point", being used to that term from 3D animation.)

In this example I will use ^ to represent an exclusive-or operation (EOR).

Step 1: Figure out its relationship to the screen. Subtract pivot from camera to find the screen relative position, with 16 bit result X. This leaves a few categories of where it might appear:

1. X < -64 or X >= 256+64 --- completely offscreen, return early (far outside screen)
2. 64 =< X < 196 --- completely onscreen, draw with no culling (middle two quadrants)
3. X < 64 or X >= 196 --- onscreen but could be partially culled (outer two quadrants)
4. X < 0 or X >= 256 --- offscreen but could be partially onscreen, i.e. inverse culling (two quadrants adjacent to visible screen)

At this point, reduce X to an 8-bit value x. 16-bit math is no longer required.

The high bit of x (i.e. representing x >= 128) will become the parity we will use for culling. We want to create a parity value p so that if the high bit of x ^ p = 0, the position is considered offscreen (culled), and if it = 1, the position is onscreen.

So, if X was onscreen (case 3), p = x ^ \$80. If it was offscreen (case 4), p = x. Store p on the zero page for easy use.

Step 2: Cull the metatiles. (Case 3 and 4 only.)

For each tile, add its signed 8-bit position to x. I'll call this result u. The result is still 8-bit; we never need to go back to 16-bits, the only additional information we needed was p.

So, calculate u ^ p and inspect the high bit of the result (BMI/BPL). If the high bit is 1, add the tile to your OAM buffer. If it's 0, skip it (culled).

That's basically it. The whole point is doing 16-bit operations only once (on the pivot point), and then from there taking advantage of the constraint that sprites won't ever be wide enough to end up on more than half of the quadrants of the screen to encode the leftover information needed to decide onscreen/offscreen, represented with that single parity.

Even differentiating the 4 cases doesn't require more 16-bit compares beyond the intial pivot - camera subtraction (i.e. you can obtain quadrant from the high 2 bits of the low 8-bit result, and a little bit of logic on the high result). I didn't outline all of that here to keep it simple, but really there's only one 16-bit subtract (per axis) for the whole metasprite.

Case 1 resulted in skipping the whole metasprite. Case 2 requires parity to be ignored and just to draw all tiles; this can either be a separate routine, or maybe an extra ORA with a second flag to force the result to always say "onscreen". Case 3 and 4 can be rendered with the same code, the input p controls which half of the 8-bit "screen" (reduced by wrapping) is considered "onscreen" or "offscreen".

Top

 Post subject: Re: CullingPosted: Wed Apr 25, 2018 5:26 pm

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 58
Thank a lot everybody.

I think ill be able to figure out an algorithm using some of the ideas you suggested. rainwarrior's suggestion seems more in-line with how my data is stored right now and is simple to implement.

As a somewhat related question, something obvious i *wasn't* doing (since all i had was one guy in a room) is clear the OAM every frame.

What do you guys do? Do you take the lazy way and clear it at the beginning of the frame? Of clear the unused entries at the end? Whats the fastest OAM-clearning loop you guys can think of ?

-Mat

Top

 Post subject: Re: CullingPosted: Wed Apr 25, 2018 5:32 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7224
I have an index to the OAM buffer as it gets filled up. It starts at 0. Every tile added increases it by 4, etc. This tells me exactly where the leftover OAM that needs to be cleared is. So, when I'm done adding sprites, I call a "finish" routine that just puts Y=\$FF in all the remaining tiles.

Top

 Post subject: Re: CullingPosted: Wed Apr 25, 2018 5:33 pm

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21115
Location: NE Indiana, USA (NTSC)
Everything I've made since Zap Ruder has used the following finish routine to clear remaining OAM entries (source: ppuclear.s).

Code:
OAM=\$0200
;;
; Moves all sprites starting at address X (e.g, \$04, \$08, ..., \$FC)
; below the visible area.
; X is 0 at the end.
.proc ppu_clear_oam

; First round the address down to a multiple of 4 so that it won't
; freeze should the address get corrupted.
txa
and #%11111100
tax
lda #\$FF  ; Any Y value from \$EF through \$FF will work
loop:
sta OAM,x
inx
inx
inx
inx
bne loop
rts
.endproc

On the Game Boy I use something similar:
Code:
;;
; Moves sprites in the display list from SOAM+a through
; SOAM+\$9C offscreen by setting their Y coordinate to 0, which is
; completely above the screen top (16).
lcd_clear_oam::
ld h,high(SOAM)
and \$FC
ld l,a

; iteration count
rrca
rrca
ld c,a

xor a
.rowloop:
ld [hl+],a
inc l
inc l
inc l
inc c
jr nz, .rowloop
ret

Top

 Post subject: Re: CullingPosted: Wed Apr 25, 2018 6:02 pm

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 58
rainwarrior wrote:
I have an index to the OAM buffer as it gets filled up. It starts at 0. Every tile added increases it by 4, etc. This tells me exactly where the leftover OAM that needs to be cleared is. So, when I'm done adding sprites, I call a "finish" routine that just puts Y=\$FF in all the remaining tiles.

Yeah that's what i just did. If i were really motivated I would keep track how how far I need to clear based on the previous frame, but that's not gonna happen tonight. You culling algorithm works great. It's all implemented and working. Real cool stuff.

-Mat

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 8 posts ]

 All times are UTC - 7 hours

#### Who is online

Users browsing this forum: No registered users and 8 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ NES / Famicom    NESdev    NESemdev    NES Graphics    NES Music    Homebrew Projects       2018 NESdev Competition       2017 NESdev Competition       2016 NESdev Competition       2014 NESdev Competition       2011 NESdev Competition    Newbie Help Center    NES Hardware and Flash Equipment       Reproduction    NESdev International       FCdev       NESdev China       NESdev Middle East Other    General Stuff    Membler Industries    Other Retro Dev       SNESdev       GBDev    Test Forum Site Issues    phpBB Issues    Web Issues    nesdevWiki