It is currently Fri Feb 23, 2018 11:57 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Fri Jan 12, 2018 1:01 pm 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
psycopathicteen wrote:
For each object, do you have a register holding the palette bits?


Yup! It's assigned when the object spawns. An object can actually "claim" up to three palettes.

Futhermore, in the animation data for each object, for each frame, each individual sprite has 2 bits to tell what palette that sprite uses. However, because of my dynamic palette allocation it becomes a bit tricky, because even if the animation data says that the first four sprites all use palette 0, it might be that the game scene was pretty crowded when this object spawned so it was assigned palette 3.

That means I have to translate the "frame palette" to "actual palette" on the fly as the sprites are being plotted out.

But as Kasumi pointed out, for objects that only use one palette (which is MOST objects), that lookup logic can be greatly simplified. I don't mind if bosses and the like are a little more expensive to draw.


Top
 Profile  
 
PostPosted: Fri Jan 12, 2018 1:26 pm 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1093
I actually think that "data shifted right" thing I posted is a great idea. Maybe not for your game, but I'm happy with it even an hour later. :lol:

One other thing: If an object does use say two palettes, you could split it into two metasprites. It adds... two bytes for the extra address? And probably not the much overhead. May only save if your objects are biggish, though.

Edit: To be more clear, split it into two metasprites and change SPRpalette after the first is rendered. You could even create a routine that handles anything that'd need to switch to avoid the hit of the second subroutine call/return.

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
PostPosted: Fri Jan 12, 2018 1:34 pm 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
Kasumi wrote:
I actually think that "data shifted right" thing I posted is a great idea. Maybe not for your game, but I'm happy with it even an hour later. :lol:


I already use some similar techniques for packing some of my animation data pretty tight. Who needs the background priority flag anyways?

Quote:
One other thing: If an object does use say two palettes, you could split it into two metasprites. It adds... two bytes for the extra address? And probably not the much overhead. May only save if your objects are biggish, though.


You could, but it would complicate a lot of internal logic, so I'm trying to avoid it. It's already possible by writing a custom drawing routine for the object, but I really don't wanna bake support for it into the standard drawing path.


Top
 Profile  
 
PostPosted: Fri Jan 12, 2018 3:41 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2540
You can organize the metasprite data into color groups.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 7:26 am 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
Alright, I've been hard at work incorporating in some of the suggestions I've gotten, which saw me about 10% faster rendering. I also managed to implement caching, which cut the rendering time in half again.

My caching technique is this:

Every frame the Sprite RAM is shuffled, but it's shuffled so that metasprites still have all their child sprites grouped together. The shuffling process also updates sprite pointers for all objects to their new locations. I use some macros and loop unrolling to ensure that this shuffling is fast.

When it's time to draw an object, the object knows that what it plotted down in the Sprite RAM the previous frame is still there, and it can skip on a lot of things unless they have changed. If nothing except the object's position has changed since last frame, then a fast drawing replacement routine is invoked that adjusts all child sprites x/y and then calls it a day. Even if the metasprite is a non-rectangular form it's maintained perfectly. If any of the sprites positions produces a carry it's removed as it likely went off-screen.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 9:28 am 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1093
One potential issue. (Rare case, and probably an acceptable loss but still.)

What if the screen scrolls right, then scrolls left? Scroll right, some sprites move offscreen due to carry change, scroll left, those sprites should be back on screen. Is that case covered?

I'm not clear, is this two pages of RAM (one for cache, one for not?) or just one?

Can objects have a variable number of sprites? (Some frames take 12, some take 8)Edit: To ask a more specific question. For the frames that take 8 sprites, are 12 child slots still needed for the object?

Does the halved time include the added time for the actual shuffle? I assume half on average, is your worst case much worse? (Though I guess it wouldn't be.)

Sorry for all the questions, but sounds brilliant. If it uses one page of sprite RAM, I'm all in. if it's two pages of sprite RAM, I'm half in. :D

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 9:40 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10299
Location: Rio de Janeiro - Brazil
What happens when an object that has moved partially off screen and had some of its sprites removed moves back on screen? Does it still trigger the "fast mode" where the existing sprites are moved (causing parts of the object to be missing) or does it know it has to generate the missing sprites again?


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 9:52 am 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
Kasumi wrote:
What if the screen scrolls right, then scrolls left? Scroll right, some sprites move offscreen due to carry change, scroll left, those sprites should be back on screen. Is that case covered?


tokumaru wrote:
What happens when an object that has moved partially off screen and had some of its sprites removed moves back on screen? Does it still trigger the "fast mode" where the existing sprites are moved (causing parts of the object to be missing) or does it know it has to generate the missing sprites again?


I was thinking about the exact same issue, trying to come up with a way that doesn't overly complicate things, but, I think I'll solve it by never allowing the use of the fast drawing replacement routine for objects that are either being clipped by the edge of the screen or that was clipped (or outside) the screen on the previous frame. Too much special logic going on there to account for in the caching mechanism.

I might refine it later to narrow down, since in theory there is nothing wrong with "clipping away" sprites just as long as no sprites need to reappear.

Kasumi wrote:
I'm not clear, is this two pages of RAM (one for cache, one for not?) or just one?


I'm just using one page of RAM, it's being shuffled "in place". If I used two pages of RAM I'd have to waste time copying over the values I wanted to reuse, and I'd be spending a lot more memory, so that solution would be subpar.

By cleverly using LDA/STA and LDX/STX in combination we can shuffle around the RAM "in place" without any performance penalties for not having a shadow copy.

Kasumi wrote:
Can objects have a variable number of sprites? (Some frames take 12, some take 8)Edit: To ask a more specific question. For the frames that take 8 sprites, are 12 child slots still needed for the object?


Right now I'm just giving 8 sprites to each object, I was really curious on how much CPU time the caching would save me so I cheated a bit on sprite assignment. However, I have plans to change that in favor of just using the sprites it exactly needs that frame, by making a routine that gives out sprites whenever it's called, divided in such a way that my shuffling mechanism is still fast.

Kasumi wrote:
Does the halved time include the added time for the actual shuffle? I assume half on average, is your worst case much worse? (Though I guess it wouldn't be.)


Well, my claim was perhaps a bit simplistic. The shuffling adds a static cost, but the caching makes every object cheaper. That means it's actually slower if you only have one object on the scene, but a lot faster if you have eight objects on the scene. The cost per object was more than cut in half, but I eyeballed it a bit since I was just measuring by tinting the screen with colors as the code was executing.

This is a very good kind of tradeoff though, as we don't really care about saving that much CPU when there is only one object on the scene. We want to optimize for the worst case scenario after all.

Kasumi wrote:
Sorry for all the questions, but sounds brilliant. If it uses one page of sprite RAM, I'm all in. if it's two pages of sprite RAM, I'm half in.


It's definitely just one page. I have big plans and it's gonna need a lot of spare RAM.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 12:31 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10299
Location: Rio de Janeiro - Brazil
Drakim wrote:
Kasumi wrote:
I'm not clear, is this two pages of RAM (one for cache, one for not?) or just one?


I'm just using one page of RAM, it's being shuffled "in place". If I used two pages of RAM I'd have to waste time copying over the values I wanted to reuse, and I'd be spending a lot more memory, so that solution would be subpar.

You'd waste no more time than you do by shuffling in place. Using a second page would make the logic more straightforward, since you'd be essentially filling a new OAM page from scratch every time, generating new data when necessary and copying from the previous table when possible.

Quote:
By cleverly using LDA/STA and LDX/STX in combination we can shuffle around the RAM "in place" without any performance penalties for not having a shadow copy.

Doing things in place may save memory, but it can be significantly more complex, and somewhat slower, since you have to deal with fragmentation due to shuffling groups of sprites of different lengths and clipping.

Quote:
Right now I'm just giving 8 sprites to each object, I was really curious on how much CPU time the caching would save me so I cheated a bit on sprite assignment.

So that's why fragmentation isn't a problem for you... I'm not a fan of this solution, since being limited to using multiples of 8 sprites can be very wasteful.

Kasumi wrote:
Sorry for all the questions, but sounds brilliant. If it uses one page of sprite RAM, I'm all in. if it's two pages of sprite RAM, I'm half in.

Using two pages might not be so bad if you can reuse those pages for more than just the OAM shadow. For example, you could use pages $0200 and $0300, and alternate which page is used for the OAM shadow and which is used for the VRAM update buffer every frame. As long as you handle all the sprites before buffering NT/AT/PT/etc. updates, you can copy OAM entries that were used last time, and then you can overwrite the old OAM completely with buffered VRAM updates so the space doesn't go to waste.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 2:26 pm 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
You are right tokumaru, without a shadowy copy, fragmentation becomes an issue. I've been twisting my brain at the problem, and while It's possible to write some clever loops to defrag things while shuffling (as long as all "groups" are 2, 4, 8 or 16), the loops would be massive and not prone to loop unrolling which makes them a lot slower. I guess one could live with the 8 sprite tradeoff, but I'm starting to realize the shadow copy is probably worth it and could bring other benefits as well.

Edit: Another possible setup if you really wanna save that memory is to have groups of 4 sprites in the shuffling process and then use more than one metasprite for objects that are bigger than 4 sprites.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 3:13 pm 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1093
I did a lot of math on this. As far as fragmentation, I just assumed a totally unrolled 2048 cycle 256 load 256 store shuffle before any rendering. But any changes would have a duplicate load store later in the frame.

Even with the duplicate load and store, it actually still beat my unrolled thing so long as most sprites didn't need more than one byte changed. (But I made a lot of assumptions, so take that with a grain of salt.)

I didn't think too deeply about it, but I think with two pages you start to really win. I might play around with it for a non scrolling game I'm thinking about.

If anyone wants to check some stuff themselves:
64 fast sprites (my method) is 4160 cycles (Well... not always because page cross stuff).
32 fast sprites (my method) is 2080 + 607= 2687 cycles. (607 is for moving the remaining sprites offscreen)
I assumed always fastest method for both, all sprites in one go. Obviously there'd also be overhead in places but the overhead (deciding whether to use the fast function, navigating to the next object) would be a bit similar for either method.
Edit: Oh. I timed with adding the tile offset, so 3 cycles per sprite could be taken off the above counts for some games. Also the 607 could be made a cycle faster for every sprite. But probably take the counts as they are, because obviously there's still a check to skip the offscreen loop when there are 64 sprites and that's not counted. And there are similar things for the 32 sprite one.
tokumaru wrote:
Using two pages might not be so bad if you can reuse those pages for more than just the OAM shadow. For example, you could use pages $0200 and $0300, and alternate which page is used for the OAM shadow and which is used for the VRAM update buffer every frame.

True, but I'd usually prefer pla sta $2007 VRAM updates. There are definitely games where I'd be fine just dedicating the second page, though.

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 4:01 pm 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
Has anybody thought about using a mapper with RAM bankswitching and putting the two OAM pages at the exact same address but in two different banks? Instead of writing the address high byte to $4014 to select between your two OAM pages you'd simply switch banks instead.

The advantage would be that all the drawing code that refers to your Sprite OAM address would just magically work on both copies, without the need for indirect addressing, duplicate drawing methods, or copying from one page to the other.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 4:55 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10299
Location: Rio de Janeiro - Brazil
Drakim wrote:
Has anybody thought about using a mapper with RAM bankswitching and putting the two OAM pages at the exact same address but in two different banks? Instead of writing the address high byte to $4014 to select between your two OAM pages you'd simply switch banks instead.

That's overkill! The difference between LDA #IMM; STA $4014 and LDA $ZP; STA $4014 is just 1 cycle... that's hardly worth the trouble.

Quote:
The advantage would be that all the drawing code that refers to your Sprite OAM address would just magically work on both copies, without the need for indirect addressing, duplicate drawing methods, or copying from one page to the other.

Yes, but copying from one set to the other becomes slower, because you have to constantly switch back and forth between the banks.

If you can spare a little ROM space, duplicating the sprite drawing routine is probably the best choice to avoid indirection, since each copy of the routine will know which buffer is the primary one.


Top
 Profile  
 
PostPosted: Sun Jan 14, 2018 2:37 am 
Offline

Joined: Mon Apr 04, 2016 3:19 am
Posts: 67
tokumaru wrote:
That's overkill! The difference between LDA #IMM; STA $4014 and LDA $ZP; STA $4014 is just 1 cycle... that's hardly worth the trouble.


Hehe, I didn't mean that particular aspect as a cost saving measure. I was just introing my explanation so people would know what I was talking about.

Quote:
Yes, but copying from one set to the other becomes slower, because you have to constantly switch back and forth between the banks.


I realized that for the MMC5 mapper at least, you can mount the RAM banks in several places, so if you need to work on both of them at once (copying back and forth) you could have them neatly side by side for the operation.

Quote:
If you can spare a little ROM space, duplicating the sprite drawing routine is probably the best choice to avoid indirection, since each copy of the routine will know which buffer is the primary one.


I guess it's not so ugly if you are using a macro to duplicate everything, but still, I like my way better :D

The advantage would be that you aren't using up your "always available RAM" but instead using the paged RAM which can be a little more messy to access for global variables.

You could theoretically even have more than one shadow copy if you have some tricks in mind (splitscreen? I dunno).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2

All times are UTC - 7 hours


Who is online

Users browsing this forum: Sour and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group