So, I've found a neat little technique, which I'm sure isn't new or anything, but I haven't seen much discussion about it. I figured I'd post it here for comments and just to gather my thoughts on the matter. The TLDR explanation of this technique is that you use WRAM bankswitching to create a sort of pseudo hardware-accelerated array, saving cycles and freeing up registers. Just be aware, this technique has some limitations, so it's not necessarily something you want to use for everything.
the MMC5 can have up to 64K of WRAM, divided into 8K pages. Since 64/8 = 8, I'll be working on the assumption that we have 8 separate WRAM banks we can utilize. The technique becomes weaker the fewer banks we have.
The most straightforward way to utilize this might be to organize your gameobjects/entities. Now, to compare, the traditional way to do gameobjects is usually to dedicate some RAM to them, like so "Object_Health: DBS 8", organized as in the shape of a whole bunch of parallel arrays. Then, we use indexed absolute access, like so "LDA Object_Health,X" to read/write the various values that makes up the gameobject, where register X decides which of the 8 gameobjects we want to access (by being a value from 0 to 7).
But what we do instead is only allocate one byte of RAM "Object_Health: DBS 1" but make sure that it's identically allocated across all 8 WRAM banks, in the same exact location. Then, to switch between accessing the health of the 8 different gameobjects, we bankswitch the WRAM bank from 0 to 7 in lieu of using the X register, and use a vanilla "LDA Object_Health" to load out the health value. That should give you the idea of how this works and why.
Now, at this point the technique might seem like overkill, why go though so much trouble? So, let's dig into the advantages!
- When updating, animating, and moving a gameobject, you frequently access that gameobject's values. You'll probably be accessing values like Object_XPosition,X, Object_YPosition,X, Object_Width,X, Object_Height,X, Object_XVelocity,X, Object_YVelocity,X, Object_Attributes,X, Object_Direction,X, and you'll probably be accessing them quite a lot. Normally to do this, you have to either ensure that X stays unmolested as the object index, or you have to copy out all those values to the Zero Page RAM before working on them. But, with the WRAM bankswitching you free up the X register, as the currently active WRAM Bank takes up the role of being your "index". That means you are free to write better and faster code utilizing the A, X and Y registers freely.
- A common technique for gameobject behavior is to have pointers/addresses to subroutines stored in RAM, ready to be called regularly or under certain conditions. For instance, you might have "Object_DeathAddress: DSW 8" that you use the Indirect Jumping or the Reverse RTS trick to call when the object dies. Or you might have one for "update" that you call every frame. With the WRAM bankswitching, you can do it a lot more efficiently. Just call "JSR Object_DeathAddress" and you are done. Since we don't need to use absolute indexed mode (which JSR lacks) we don't need to jump though all those hoops to call the subroutine.
- While switching RAM banks to select between gameobjects might sound annoying and inefficient, you actually do this a lot less than you'd think. When looping over and updating the 8 gameobjects every frame, we start by switching the RAM bank at the top of the loop, and that's it. We can now invoke all manner of code, for moving the object, checking object collisions against tiles, drawing out sprites, and all of the code simply assumes that the various Object_* labels are pointing to the one object they should be working on. It actually makes your code a lot more clean and straightforward.
- One thing you might get hung up on is when two gameobjects need to interact, for example to check collision. There are a number of easy ways to solve this. If your mapper allows for more than one place to bankswitch WRAM (like MMC5 does) simply bankswitch in two banks of WRAM in at the same time in two separate locations. You can have gameobject 1 at $6000-$7FFF and gameobject 2 at $8000-$DFFF, to easily compare them. You haven't seen a fast and clean collision check routine until you have seen one that doesn't need register X for gameobject 1 and register Y for gameobject 2. If your mapper can only bankswitch WRAM in one location, you can quickly copy gameobject 2's values temporarily into Zero Page RAM for the same experience.
- With "LDA MyAddress,X" you can access 256 bytes indexed by X. Sometimes 256 is clearly not enough though, such as the tiledata for your level. In cases like that, you gotta use some other trick like "LDA (CalculatedAddress),Y" Indirect addressing, which is expensive and clunky to setup for each access. However, with WRAM bankswitching, you can stack the 256 bytes in WRAM parallel for up to 256*8 = 2048 bytes while still using "LDA MyAddress,X", almost at the same cost. It's not enough for everything but it's still neat.
- In Super Mario Bros 3, the tiledata buffer is 6480 bytes (27 rows * 16 columns * 15 screens). Let's imagine that for whatever reason, we need to increase the data size of each tile from 1 byte to 2 bytes, so that each tile has an additional byte of metadata. That's going to be extremely hard, even if we adjust all our code so that the tile calculations takes this new double offset into account, the fact is that we don't have 12960 continuous WRAM bytes. But you guessed it, just put up two 6480 tiledata buffers in WRAM parallel so that to access the second metadata byte we just switch the bank after accessing the first byte.
- Obviously, this doesn't work on most mappers. Therefore, it's always going to be a niche technique.
- While absolute indexed LDA can index up to 256 different gameobjects, this technique can only do 8. That might not be enough for your needs. Super Mario Bros 3 only allows for 4 enemies onscreen at once, but it obviously depends on each individual project. Luckily you can do the trick more than once, and have 8 important gameobjects, 8 projectile objects, 8 sfx objects, etc.
- Maybe you have other important things in WRAM that you need to switch occasionally to, such as tiledata. In such cases you have to back up which gameobject was currently "active" in your code (akin to how you'd back up register X in the vanilla setup).