Why I've decided an x/y indexed object table is best
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Why I've decided an x/y indexed object table is best
Being the indecisive person I am, I've battled back and forth as to whether I want to have my object table be indexed by direct page or x/y. I've thought hard about it, and now I'm kind of curious I ever wanted it the other way, because it's faster and overall just better for every application I can think of:
For any routine where you are constantly searching through object slots, each load or store will take an extra cycle (it will take 2 extra cycles if the index registers are 16 bit, but you only need 8 bit for this, because 128 objects with 2 byte object slots will just fit), but you only need to do 2 inx or iny's instead of tdc, clc, adc, tcd. Heck, the accumulator might not even be 16 bit here, so that's an extra instruction or two.
Direct page can be freed up to actually be useful as a 16 bit index register, and as I said earlier, There's no reason index the object table by a 16 bit number. I've have situations where x and y are 16 bit, but only one of them has to be, it's just that you can't do that. Now though, you just make the thing that needs to be indexed by a 16 bit number be indexed by direct page.
The object table no longer has to be in the first 8KB of ram and can even have some parts of it outside the first 8KB of ram.
So yeah, I'm pretty dumb. I found I could make my metasprite routine significantly faster now because what I'm mostly loading in that routine is metasprite data that will now be direct page, (1 cycle faster, but has to all be in 32KB of rom. I can actually shrink each sprite in the metasprite from 8 to 6 bytes with a little more processing time, but it's still much faster than metasprite data being indexed by x or y) so x and y can now be 8 bit, because the object table can be indexed by an 8 bit number and I could easily index oam by an 8 bit number if I create an otherwise duplicate routine, which I'm more than willing to do to save CPU time.
For any routine where you are constantly searching through object slots, each load or store will take an extra cycle (it will take 2 extra cycles if the index registers are 16 bit, but you only need 8 bit for this, because 128 objects with 2 byte object slots will just fit), but you only need to do 2 inx or iny's instead of tdc, clc, adc, tcd. Heck, the accumulator might not even be 16 bit here, so that's an extra instruction or two.
Direct page can be freed up to actually be useful as a 16 bit index register, and as I said earlier, There's no reason index the object table by a 16 bit number. I've have situations where x and y are 16 bit, but only one of them has to be, it's just that you can't do that. Now though, you just make the thing that needs to be indexed by a 16 bit number be indexed by direct page.
The object table no longer has to be in the first 8KB of ram and can even have some parts of it outside the first 8KB of ram.
So yeah, I'm pretty dumb. I found I could make my metasprite routine significantly faster now because what I'm mostly loading in that routine is metasprite data that will now be direct page, (1 cycle faster, but has to all be in 32KB of rom. I can actually shrink each sprite in the metasprite from 8 to 6 bytes with a little more processing time, but it's still much faster than metasprite data being indexed by x or y) so x and y can now be 8 bit, because the object table can be indexed by an 8 bit number and I could easily index oam by an 8 bit number if I create an otherwise duplicate routine, which I'm more than willing to do to save CPU time.
Last edited by Drew Sebastino on Sun Sep 25, 2016 10:09 am, edited 1 time in total.
Re: Why I've decided an x/y indexed object table is best
Some of what you said is confusing to me.
"Indexed by direct page" instead of x/y.
Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.
By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.
And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.
"Indexed by direct page" instead of x/y.
Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.
By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.
And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.
nesdoug.com -- blog/tutorial on programming for the NES
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Why I've decided an x/y indexed object table is best
I'm actually not quite sure what you're trying to say. Unless I'm mistaken, direct page is a 16 bit register (that holds a 16 bit value, obviously) that has its number added to an 8 bit address. While you can only access 256 bytes at a time for wherever direct page is, for 99% of applications where you need a 16 bit offset, this shouldn't matter.dougeff wrote:Some of what you said is confusing to me."Indexed by direct page" instead of x/y.Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.
So yeah, it's not quite as good as a 16 bit x or y in that a 16 bit number is only added to an 8 bit number (and not a 16 bit one), but it's also 2 cycles faster, which adds up very quickly.
While it is true I'm kind of being a freak about how to make things as fast as possible and not really getting anywhere (you can also thank school for that) but this is a pretty major thing, because if I ever decide later down the road if I want to change (and like I said, there are several advantages to doing so) then I have to fix every routine that is using the object table, which is a hell of a lot.dougeff wrote:And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.
-
- Posts: 3140
- Joined: Wed May 19, 2010 6:12 pm
Re: Why I've decided an x/y indexed object table is best
The downside to using dp as index is you can't increment/decrement in one instruction.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Why I've decided an x/y indexed object table is best
True. The speed advantage should still be greater though, unless it's something like searching for vram slots where you're only doing one instruction and then incrementing. Most of the time for this though, you can use the other index register.
Here's the "ideal" way of how each register will be used. X and y are 8 bit. (x or y are interchangeable here. There's some random difference, but I forgot, so I'm not taking that into account)
x or y: object table
x or y: ram table
direct page: large rom table (like metasprite data)
If nothing needs 2 additional index registers (I'm going to assume we always need one for object slots) then can either make the second x/y or direct page. Direct page is faster, except when you have to increment or decrement it, and if direct page is whatever, loading variables from the first 256 bytes of ram (that's where I have it by default) takes one extra cycle.
Here's the "ideal" way of how each register will be used. X and y are 8 bit. (x or y are interchangeable here. There's some random difference, but I forgot, so I'm not taking that into account)
x or y: object table
x or y: ram table
direct page: large rom table (like metasprite data)
If nothing needs 2 additional index registers (I'm going to assume we always need one for object slots) then can either make the second x/y or direct page. Direct page is faster, except when you have to increment or decrement it, and if direct page is whatever, loading variables from the first 256 bytes of ram (that's where I have it by default) takes one extra cycle.
Re: Why I've decided an x/y indexed object table is best
I guess I had always seen a round number stored to thedirect page is a 16 bit register
direct bank register... 0100, 3f00, fe00, etc.
This is probably why...
-- Programming the 65816, WDC, 2007Fragment 11.4
You will for the most part, however, want to set the direct page to begin on a boundary: it saves one
cycle for every direct page addressing operation. This is because the processor design includes logic that, when
the direct page register’s low byte is zero, concatenates the direct page register’s high byte to the direct page
offset – instead of adding the offset to the entire direct page register – to form the effective direct page address;
concatenation saves a cycle over addition.
nesdoug.com -- blog/tutorial on programming for the NES
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Why I've decided an x/y indexed object table is best
Well, that's weird... That makes me sad, although it's still faster. (one less cycle than 16 bit index, which it's pretty much equivalent to.)
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Why I've decided an x/y indexed object table is best
I'm not even going to lie, but after helping dougeff, I realized that I completely forgot about 24 bit addressing. That would have saved me a lot of head ache. I guess it's better how I have it though, because it's a cycle slower than absolute addressing. I don't actually have a single instance of long addressing in my code. I might actually be able to make it faster in a couple places where I unnecessarily switched banks.
Re: Why I've decided an x/y indexed object table is best
Absolute long on the Super NES is no worse than absolute on NES.
On a 1.8 MHz 6502 clocked by a divided 21.5 MHz crystal, reading an 8-bit variable using a,X mode costs 48 master clocks: 12 to read the opcode, 24 (2*12) to read the address, and 12 to read the data.
On a 3.6 MHz 65816 clocked by the same crystal, reading a 16-bit variable using al,X mode also costs 40 master clocks: 6 for the opcode, 18 (3*6) for the address, and 16 (2*8) for the data.
On a 1.8 MHz 6502 clocked by a divided 21.5 MHz crystal, reading an 8-bit variable using a,X mode costs 48 master clocks: 12 to read the opcode, 24 (2*12) to read the address, and 12 to read the data.
On a 3.6 MHz 65816 clocked by the same crystal, reading a 16-bit variable using al,X mode also costs 40 master clocks: 6 for the opcode, 18 (3*6) for the address, and 16 (2*8) for the data.
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: Why I've decided an x/y indexed object table is best
A Porsche with its tires removed is no worse than a brick.tepples wrote:Absolute long on the Super NES is no worse than absolute on NES.