- For making cartridges of your Super NES games, see Reproduction.
In its current state, object management has more or less fallen into place. I have a system where objects set up function pointers to thinker functions where they can do as much or as little work as they need. I also have efficient object creation/deletion working with an object pool system for fast constant-time slot allocation - the overhead is small enough that creating an object takes less than 1% of CPU. Additionally, you will be able to slice the object list and pool (total size up to 128 objects) into several smaller regions for more efficient iteration over, say, bullets or particles.
Right now I'd say only one or two major obstacles are left before I would consider my framework "fully-featured". Right now I will need to make some kind of level format and scrolling system, for which I am taking heavy inspiration from Sonic 2 (8x8 tiles form 16x16 tiles form 128x128 tiles). The scrolling logic for updating the on-screen tilemap might be complex, but I have a few ideas involving lookup tables and indirect access to simplify and speed up the logic. My goal is to have bi-directional scrolling with at least a max speed of 16 pixels per frame along both axes. If it becomes efficient enough I may raise the limit to 32 pixels per frame per axis.
Here's the latest build of my engine demonstrating object creation (hold A to spawn objects, move them off-screen to despawn them): And now, I'd like to take this space to document a number of the observations I made while working on this homebrew, in the hope that it will benefit someone:
- I've found that there are two general themes to the work I've been doing: 1. Provide convenience while minimizing overhead, and 2. Approach the same problem from multiple angles - after designing a few implementations you can decide on which one performs the best.
- Try to design around the worst case performance scenario. So far whenever I've been implementing something I've been testing it by applying it to 128 objects at once every frame. This causes any overhead introduced by your systems to be effectively amplified, making it easier to see just how fast or slow your code is, as well as stress testing the system to see just how much you can throw at it. If your code is well-written you'll have all 128 objects moving and thinking at full speed with significant headroom for some extra processing and it will feel really good.
- Know what to optimize. Functions that are called many times per frame are the most obvious candidates. For example, in my CreateObject, MoveObject, and DeleteObject routines, I removed the php and plp instructions and documented the fact that these functions simply expect the A/X/Y registers to already be set to 16-bit size. However, for my SPC upload functions I let loose with the 24-bit long addressing for developer convenience because the overhead is from the main CPU waiting on the audio CPU to acknowledge each byte sent. The developer convenience here is that you can reference audio data from a different bank to upload (something that's much easier to do with DMA, but you can't use that with the SPC, so...)
- Familiarize yourself with the processor status flags and the different addressing modes. There are a number of features that will simplify your code and thus let you achieve more efficient designs. In two of my routines (CreateObject and AddSprite), I set the overflow flag in the status register if the object doesn't fit. This is a convenient way to return a boolean value from a function and works hand in hand with branching since it also operates on the status flags.
- As for addressing modes, indirect addressing can be very useful. If you've ever worked with function pointers or similar concepts in languages such as C, you'll know what I'm talking about. This can save you time and code complexity trying to decide how to do X - instead you can just go ahead and tell the CPU to do X. I haven't fully explored the possibilities, but between the direct page, indirect addressing, and the index registers there is a lot that you can effectively automate.
- Find ways to exploit your routines to minimize further processing later on. I mentioned setting the overflow flag in AddSprite above - this happens if the sprite is outside the screen. The check is meant to prevent placing a pointless off-screen sprite into one of the OAM slots, but it also doubles as a way to perform a rough despawning check - if overflow is set you can do a more precise "out of screen" check only in that scenario, instead of every frame.
- Set your Break interrupt handler to something that halts the game. I only mention this because I know some tutorial projects will assign this handler to an EmptyHandler function that immediately returns. This is an extremely bad idea because if the program counter gets lost, it will go right over any zero bytes and keep on marching and causing havoc in your program without you knowing what went wrong.
- On that same note, you should definitely be using an emulator with good debugging facilities. Memory viewing, breakpoints, and program step-thru have been the most useful features for me.
Hopefully you'll see more from me soon!
So does that mean you can save memory with simple objects such as bullets? How do you get around fragmentation?Additionally, you will be able to slice the object list and pool (total size up to 128 objects) into several smaller regions for more efficient iteration over, say, bullets or particles.
How much memory are you devoting to objects. I use direct page, but I'm limited to ~52 objects because I need to fit them into 8kB and I need room for other stuff.
- Initialize a region of memory to include all the possible object indexes
- Use it like a queue (FIFO), so have two values to keep track of the current head and tail positions of this queue
- When creating an object, pop the value at the head of the queue and use it as the index of the new object
- When deleting an object, push the index of the deleted object at the tail of the queue
Essentially you're keeping track of a list of unused slots with this technique, so there is no need to look for an empty slot by iterating over the main object list. There's no fragmentation for creating/deleting objects, but there is fragmentation for processing each living object per frame (as it has to iterate over any empty slots). My justification for not avoiding that fragmentation:
1. "Moving" an existing object to another index would require updating every single reference to that object's index, which to me isn't desirable or even feasible
2. Because of 1, we have gaps formed by still-living objects - the only way I can think of to optimize this, besides make the object pool sorted (huge overhead) would be dynamically altering the size of the object list, and that won't improve the worst case scenario, which is largely what I'm optimizing for
Basically my design philosophy for this is simple solutions to simple problems, unless a complex solution provides a sufficient convenience-to-performance ratio.
Edit: Thinking about this some more, I suppose one strategy would be to take a linked list approach to reduce overhead on having just a partially filled list, but again I am unsure what effect this would have on worst case.
Of course, objects don't necessarily need to use 33 bytes each. I can increase or decrease the number of fields available for each object based on the remaining amount of the first 8kB of RAM.
I'm still doing work on the framework itself simultaneously. I recently found a way to speed up the object thinker iteration to the point where I'm happy with the overhead involved in doing two passes over 128 objects! For a bit I was concerned I would have to lower the default limit to 96 when I previously said the engine was capable of 128 objects, but then I realized that I was effectively doing this:
Code: Select all
jsr (objThinker, x) ; Object thinker code rts inx inx inx jsr (objThinker, x) ; Object thinker code rts inx inx inx etc...
Additionally, the framework has been ported to the ca65 assembler, and I am now working on the animation system.