Why I've decided an x/y indexed object table is best

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Why I've decided an x/y indexed object table is best

Post by Drew Sebastino »

Being the indecisive person I am, I've battled back and forth as to whether I want to have my object table be indexed by direct page or x/y. I've thought hard about it, and now I'm kind of curious I ever wanted it the other way, because it's faster and overall just better for every application I can think of:

For any routine where you are constantly searching through object slots, each load or store will take an extra cycle (it will take 2 extra cycles if the index registers are 16 bit, but you only need 8 bit for this, because 128 objects with 2 byte object slots will just fit), but you only need to do 2 inx or iny's instead of tdc, clc, adc, tcd. Heck, the accumulator might not even be 16 bit here, so that's an extra instruction or two.

Direct page can be freed up to actually be useful as a 16 bit index register, and as I said earlier, There's no reason index the object table by a 16 bit number. I've have situations where x and y are 16 bit, but only one of them has to be, it's just that you can't do that. Now though, you just make the thing that needs to be indexed by a 16 bit number be indexed by direct page.

The object table no longer has to be in the first 8KB of ram and can even have some parts of it outside the first 8KB of ram.

So yeah, I'm pretty dumb. :lol: I found I could make my metasprite routine significantly faster now because what I'm mostly loading in that routine is metasprite data that will now be direct page, (1 cycle faster, but has to all be in 32KB of rom. I can actually shrink each sprite in the metasprite from 8 to 6 bytes with a little more processing time, but it's still much faster than metasprite data being indexed by x or y) so x and y can now be 8 bit, because the object table can be indexed by an 8 bit number and I could easily index oam by an 8 bit number if I create an otherwise duplicate routine, which I'm more than willing to do to save CPU time.
Last edited by Drew Sebastino on Sun Sep 25, 2016 10:09 am, edited 1 time in total.
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: Why I've decided an x/y indexed object table is best

Post by dougeff »

Some of what you said is confusing to me.

"Indexed by direct page" instead of x/y.

Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.

By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.

And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.
nesdoug.com -- blog/tutorial on programming for the NES
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why I've decided an x/y indexed object table is best

Post by Drew Sebastino »

dougeff wrote:Some of what you said is confusing to me."Indexed by direct page" instead of x/y.Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.
I'm actually not quite sure what you're trying to say. :lol: Unless I'm mistaken, direct page is a 16 bit register (that holds a 16 bit value, obviously) that has its number added to an 8 bit address. While you can only access 256 bytes at a time for wherever direct page is, for 99% of applications where you need a 16 bit offset, this shouldn't matter.

So yeah, it's not quite as good as a 16 bit x or y in that a 16 bit number is only added to an 8 bit number (and not a 16 bit one), but it's also 2 cycles faster, which adds up very quickly.
dougeff wrote:And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.
While it is true I'm kind of being a freak about how to make things as fast as possible and not really getting anywhere (you can also thank school for that) but this is a pretty major thing, because if I ever decide later down the road if I want to change (and like I said, there are several advantages to doing so) then I have to fix every routine that is using the object table, which is a hell of a lot.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why I've decided an x/y indexed object table is best

Post by psycopathicteen »

The downside to using dp as index is you can't increment/decrement in one instruction.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why I've decided an x/y indexed object table is best

Post by Drew Sebastino »

True. The speed advantage should still be greater though, unless it's something like searching for vram slots where you're only doing one instruction and then incrementing. Most of the time for this though, you can use the other index register.

Here's the "ideal" way of how each register will be used. X and y are 8 bit. (x or y are interchangeable here. There's some random difference, but I forgot, so I'm not taking that into account)

x or y: object table
x or y: ram table
direct page: large rom table (like metasprite data)

If nothing needs 2 additional index registers (I'm going to assume we always need one for object slots) then can either make the second x/y or direct page. Direct page is faster, except when you have to increment or decrement it, and if direct page is whatever, loading variables from the first 256 bytes of ram (that's where I have it by default) takes one extra cycle.
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: Why I've decided an x/y indexed object table is best

Post by dougeff »

direct page is a 16 bit register
I guess I had always seen a round number stored to the
direct bank register... 0100, 3f00, fe00, etc.

This is probably why...
Fragment 11.4
You will for the most part, however, want to set the direct page to begin on a boundary: it saves one
cycle for every direct page addressing operation. This is because the processor design includes logic that, when
the direct page register’s low byte is zero, concatenates the direct page register’s high byte to the direct page
offset – instead of adding the offset to the entire direct page register – to form the effective direct page address;
concatenation saves a cycle over addition.
-- Programming the 65816, WDC, 2007
nesdoug.com -- blog/tutorial on programming for the NES
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why I've decided an x/y indexed object table is best

Post by Drew Sebastino »

Well, that's weird... That makes me sad, although it's still faster. (one less cycle than 16 bit index, which it's pretty much equivalent to.)
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why I've decided an x/y indexed object table is best

Post by Drew Sebastino »

I'm not even going to lie, but after helping dougeff, I realized that I completely forgot about 24 bit addressing. That would have saved me a lot of head ache. :lol: I guess it's better how I have it though, because it's a cycle slower than absolute addressing. I don't actually have a single instance of long addressing in my code. I might actually be able to make it faster in a couple places where I unnecessarily switched banks.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Why I've decided an x/y indexed object table is best

Post by tepples »

Absolute long on the Super NES is no worse than absolute on NES.

On a 1.8 MHz 6502 clocked by a divided 21.5 MHz crystal, reading an 8-bit variable using a,X mode costs 48 master clocks: 12 to read the opcode, 24 (2*12) to read the address, and 12 to read the data.

On a 3.6 MHz 65816 clocked by the same crystal, reading a 16-bit variable using al,X mode also costs 40 master clocks: 6 for the opcode, 18 (3*6) for the address, and 16 (2*8) for the data.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why I've decided an x/y indexed object table is best

Post by Drew Sebastino »

tepples wrote:Absolute long on the Super NES is no worse than absolute on NES.
A Porsche with its tires removed is no worse than a brick. :lol:
Post Reply