Sprite data caching or reuse?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Sprite data caching or reuse?

Post by Drakim »

So, after some profiling I've realized that a lot of my precious CPU time is being spent on calculating sprites. Not just the reads and writes, but also all the meta stuff, picking the right palette, doing offsets to read out animation data, flipping the sprite if it's turned around, etc. I got a neat animation system that I'm very happy with, but it's a little costy.

I've been optimizing all of this to be as fast and clever as possible, but the one thing that really irks me is that most object's sprites end up being exactly the same each frame, so I'm constantly recalculating the same results over and over.

So I've been trying to implement some sort of caching mechanism, so that I don't have to recalculate everything if nothing has changed. I usually know when things have changed (changed object state, scrolled the background, etc) so I know exactly when to reuse the cache and when to refresh it.

But building an efficient and lightweight cache has proved difficult.

The simplest and fastest way would be to simply reuse my DMA's Sprite RAM, and not clear it every frame. Maybe adjusting some x and y positions if the game has scrolled. But, all the sprite flickering techniques I know involves scrambling the order of the sprites every frame, which means no object ever gets the same Sprite RAM position twice in a row. This ruins everything.

Next up I tried allocating some more RAM as a temporary buffer, so that objects could put their sprite data there, and then it could be copy-scrambled over to the real Sprite RAM right before the DMA. But, to allow for all 64 sprites that's 256 bytes of RAM down the drain. Ouch. Not sure I want to spend that much memory.

According to the wiki, a "simple OAM cycling technique" can be implemented by using a write to OAMADDR before the DMA transfer. However, due to OAMADDR writes also having a "corruption" effect this technique is not recommended. Also, if the technique works like how I think it does, the OAM cycling would be very crude and might leave objects invisible for several frames.

So, is my quest impossible, or are there some other ideas or techniques? :D
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: Sprite data caching or reuse?

Post by pubby »

Post some code? It's surprising to me that sprites would be so expensive.
But, all the sprite flickering techniques I know involves scrambling the order of the sprites every frame
I've never implemented flickering, but I thought you could write OAMADDR before OAMDMA to do the shuffle on the DMA. I don't think you have to reorder the sprites in CPU RAM, but maybe I'm wrong.
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Re: Sprite data caching or reuse?

Post by GradualGames »

Metasprite rendering continues to be the bottleneck for me, too. There are some improvements I could make, but the biggest I got so far was to simply move to 8x16 sprites, halving the number of iterations the meta sprite drawing routines must do. That's been good enough for now and has given me the performance I want for the game I'm building.
Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Re: Sprite data caching or reuse?

Post by Drakim »

pubby wrote:Post some code? It's surprising to me that sprites would be so expensive.
I'll write up an explanation, it's a tad complex so it might take some minutes.
I've never implemented flickering, but I thought you could write OAMADDR before OAMDMA to do the shuffle on the DMA. I don't think you have to reorder the sprites in CPU RAM, but maybe I'm wrong.
The wiki recommends against this technique, but maybe it's too conservative? Does anybody know the ups and downs in more detail?
User avatar
Bregalad
Posts: 8055
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Sprite data caching or reuse?

Post by Bregalad »

But, all the sprite flickering techniques I know involves scrambling the order of the sprites every frame, which means no object ever gets the same Sprite RAM position twice in a row. This ruins everything.
Well, as usual in computing and in particular in retro-computing, you have to sacrifice thigns in order to get the desired features. You should just have two OAM pages, one where the sprites are not shuffled, which is your cache, and one where you shuffle the sprites from the cache so they're re-ordered and flickers properly when there's more than 8 per line instead of disappearing. That sounds rather simple to do.
Ouch. Not sure I want to spend that much memory.
Well, that's the price for your sprite caching system. You can save memory by caching only some of the 4 parameters if RAM usage is really this much a problem.
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: Sprite data caching or reuse?

Post by Kasumi »

Sprite updates are indeed pretty expensive. Some switch to 8x16 sprites purely so less time is spent rendering an object.

If your game has larger objects, having a separate render routine when you know the object is entirely onscreen can skip a lot of extra logic for checking offscreen per sprite. (Alternatively... don't check for offscreen per sprite.)

Code: Select all

.macro DMSNORMALBODY
	;35 bytes
	iny
	
	lda [reserved4],y;Y position
	clc
	adc <reserved2
	sta OAM,x

	iny
	
	lda [reserved4],y;X Position
	;clc
	adc <reserved0
	sta OAM+3,x
	
	iny

	
	lda [reserved4],y
	;clc
	adc <reserved7;This should guarantee a clear carry
	sta OAM+1,x
	iny
	
	lda [reserved4],y
	sta OAM+2,x
	
	txa
	;clc;guaranteed clear above
	adc #4
	tax;Carry not guaranteed anything after that add since oampos wraps
	.endm
Reserved0 is low X, reserved2 is low Y, reserved7 is a tile offset. You can totally get rid of that if the tiles used to render your object don't "move" in CHR.

Checking offscreen is not much harder, but you lose the guarantee of the clear carry.

Code: Select all

dms.partial.o.loop:;{
	iny
	
	lda [reserved4],y;Y position
	clc
	adc <reserved2
	sta OAM,x
	
	lda <reserved3
	adc #$00
	bne dms.partial.o.yoffscreen

	iny
	
	lda [reserved4],y;X Position
	clc
	adc <reserved0
	sta OAM+3,x
	
	lda <reserved1
	adc #$00
	bne dms.partial.o.skipsprite.twoiny
	
	iny

	
	lda [reserved4],y
	clc
	adc <reserved7
	sta OAM+1,x
	iny
	
	lda [reserved4],y
	and #%11111100;Clear out the palette
	ora <reserved8
	sta OAM+2,x
	
	txa
	;clc;guaranteed clear above
	adc #4
	tax;Carry not guaranteed anything after that add since oampos wraps

	dec <reserved6
	bne dms.partial.o.loop
dms.partial.o.end:
	rts
dms.partial.o.yoffscreen:
	iny;move to next OAM entry
dms.partial.o.skipsprite.twoiny:
	iny
	iny
	
	lda #$FF
	sta OAM,x;Set it offscreen

	;Just repeat what we'd branch to and save a branch
	dec <reserved6
	bne dms.partial.o.loop
	rts;}
Reserved1 is high X, reserved 3 is high Y. Reserved6 is how many sprites are left.

But I made this note as an optimization (untested, so I'm not including it with the code I know works)
Indivisible's rolled drawmetasprite loop can probably be made faster. They end with

dec <reserved6
bne dms.o.loop

But:
cpy <reserved6; (or some other zero page variable, since reserved6 can't really be changed
bne dms.o.loop; without making the other code slower)

cpy is 2 cycles faster than dec, but it also ensures a clear carry when the loop begins again.

Basically the setup code should just add <reserved6 *4 to y and store it somewhere. I imagine the reason I didn't is because reserved6 is technically variable (due to the greater than 64 sprite stuff), but it wouldn't really affect this if the loop were set up properly.
I also have a separate subroutine when I want to do "versatile" things like dynamically changing the palette of every sprite in the object.
Edit: Oh wait, no it's just the one above. That's what the reserved8 thing is. So basically I have a fast and a slow one.

Basically I recommend having a different routines for every case. Usually you don't want to do anything advanced, so at least have one for the fastest possible case. (Guaranteed on screen, no dynamic anything.)

But post some of your advanced code, maybe we can improve it.
Last edited by Kasumi on Fri Jan 12, 2018 11:34 am, edited 1 time in total.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Sprite data caching or reuse?

Post by tokumaru »

The use of OAMADDR with values other than zero is heavily discouraged, since that can result in sprite corruption.

The best method for caching sprites I can think of is indeed using another 256-bytes for a second OAM shadow, so you can alternate between them every frame and copy data from one to the other if the sprites are known to not have changed.

What kind of sprite cycling method are you currently using? Are you willing to change that to accommodate the sprite caching? Maybe you can come up with a solution that swaps individual OAM entries when they need to be kept, and simply overwrites the ones that don't.
Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Re: Sprite data caching or reuse?

Post by Drakim »

Bregalad wrote:
Ouch. Not sure I want to spend that much memory.
Well, that's the price for your sprite caching system. You can save memory by caching only some of the 4 parameters if RAM usage is really this much a problem.
That's fair enough, and it's probably what I will fall back to if no other secret technique pops up. I've been playing around with doing an "in place" shuffle of the original Sprite RAM so that objects have a new position in the buffer every frame, yet still retain their old values. The shuffling process is fairly expensive though, since you gotta shuffle 256 different values.
Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Re: Sprite data caching or reuse?

Post by Drakim »

pubby wrote:Post some code? It's surprising to me that sprites would be so expensive.
Now, I haven't posted any code in this post. It's not like my code is secret, but I'd much rather explain what it does (and why it's slow) instead of posting a big blob of asm and forcing everybody to decrypt what's going on. Also, I haven't commented it yet :oops:

Some games like Super Mario Bros 3 has a lot of restrictions on what game objects can exist where, doing tricks like hardcoding the available palettes and CHR banks to the level. So if this is a Goomba+Koopa Troopa level, you simply can't use the Boo or Thwomp enemies or they will look strange and miscolored, and vice versa.

I've been working on a system to defeat such restrictions, by dynamically loading and unloading CHR and palettes as they are needed. The way things work, when an ingame object is created, it attempts to grab an 8k sprite CHR page and a palette for itself. I use a lot of techniques to maximize their potential like reusing as much as possible, having optional alternative graphics and color schemes, and even splitting palettes in two (and if it's just utterly impossible to fit in, the object simply despawns before it's seen).

But all of this only happens when the object is created, not every frame, so it's not the expensive part. But the point is, any object might end up with any of the CHR pages or any of the palettes. So, while SMB3 can optimize it's Koopa Troopa drawing routine by always refering to palette 2, my system has to do a lookup to see which of the four palettes my object was assigned.

Then there is animation data, and some extra goodies I've baked in there like allowing small x/y offsets on the sprites or vflip/hflip flags on individual sprites in the meta-sprite. Despite having so much more stuff than SMB3, my system is still faster due to better coding.

Still, I can see the potential for massive gains by reusing those values rather than having to recalculate them every frame.
Last edited by Drakim on Fri Jan 12, 2018 11:40 am, edited 1 time in total.
Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Re: Sprite data caching or reuse?

Post by Drakim »

Kasumi wrote:Sprite updates are indeed pretty expensive...
Thanks for the routines, I'll compare them to my own and see if there are any places I can shave off some cycles.

I do indeed have different drawing routines, with varying levels of functionality. Objects, in their initialization routine, pick one that they know will be enough for them.
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: Sprite data caching or reuse?

Post by Kasumi »

Then there is animation data, and some extra goodies I've baked in there like allowing small x/y offsets on the sprites or vflip/hflip flags on individual sprites in the meta-sprite.
That shouldn't really affect rendering at all. Whether an individual sprite is flipped or not doesn't matter to the block copy, whether an individual sprite is offset a little doesn't matter to the block copy.

Is the issue that you're also trying to save space? I stored every frame twice, once flipped and once not, rather than flipping it at runtime and I don't feel bad about it.
Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Re: Sprite data caching or reuse?

Post by Drakim »

Kasumi wrote: That shouldn't really affect rendering at all. Whether an individual sprite is flipped or not doesn't matter to the block copy, whether an individual sprite is offset a little doesn't matter to the block copy.
The thing is, I don't block copy my animation data right to the sprite buffer, I do stuff like XOR the global flip flags with the individual flip flags, and add the global x/y coordinate with the sprite's local x/y offset. I also have to fish out the correct palette since it's not hardcoded.
Is the issue that you're also trying to save space? I stored every frame twice, once flipped and once not, rather than flipping it at runtime and I don't feel bad about it.
Huh....I hadn't thought about that. That's genius! It totally saves me the XOR of the flip flags. I could even use a macro, and potentially do it for other things too. Thanks mate!
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: Sprite data caching or reuse?

Post by Kasumi »

I also have to fish out the correct palette since it's not hardcoded.
The palette thing is easy, if the object only uses one palette. (Which, if you're dynamically allocating palettes is probably the common case) It's one instruction:

Code: Select all

lda [reserved4],y
ora SPRpalette
sta OAM+2,x
You store the palette the object wants to SPRpalette before rendering and lose 3 cycles per sprite, oh well.

In a case where palette 0 is like... a shared palette (reserved for player one or something, that enemies can also use)... you could maybe get cute with the data and store it shifted right one bit.

Now the highest bit is free to use as a flag for that.
Essentially:
Bit 7: Use Palette 0
Bit 6: Flip Sprite Vertically
Bit 5: Flip Sprite Horizontally
etc.

Code: Select all

lda [reserved4],y
asl a;Whether to use palette 0 is now in the carry, flip sprite vertically is in bit 7 where OAM expects it, etc.
bcs storepalette;If the high bit was set, we use palette zero
ora SPRpalette
storepalette:
sta OAM+2,x
Admittedly that's still a bit heavy 64 times, but well...
Edit: Just to say it, I'm not sure how much help caching sprite data will be in a game that scrolls. But I'll think about it.
Drakim
Posts: 97
Joined: Mon Apr 04, 2016 3:19 am

Re: Sprite data caching or reuse?

Post by Drakim »

Kasumi wrote:The palette thing is easy, if the object only uses one palette. (Which, if you're dynamically allocating palettes is probably the common case) It's one instruction.
I am saving the palette per sprite per frame, for the entire metasprite, but in hindsight it's just as you say, maybe 90% of all metasprites use only one palette. I'll make a faster drawing routine they can use instead of the full one, that doesn't need to look up the palette byte each time. :beer:
Edit: Just to say it, I'm not sure how much help caching sprite data will be in a game that scrolls. But I'll think about it.
That's a good point, but I think it could be worked around since scrolling most of the time only happens on one axis even for games with multi-directional scrolling like SMB3. You'd simply have to loop over every 4th byte and do an addition :D The carry flag will tell you if the sprite is now off-screen.

But even if it's not viable, just being able to reuse the pattern and attribute bytes would still be a boon.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Sprite data caching or reuse?

Post by psycopathicteen »

For each object, do you have a register holding the palette bits?
Post Reply