It is currently Tue Dec 12, 2017 9:25 am

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Sat Sep 24, 2016 7:25 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3153
Location: Nacogdoches, Texas
Being the indecisive person I am, I've battled back and forth as to whether I want to have my object table be indexed by direct page or x/y. I've thought hard about it, and now I'm kind of curious I ever wanted it the other way, because it's faster and overall just better for every application I can think of:

For any routine where you are constantly searching through object slots, each load or store will take an extra cycle (it will take 2 extra cycles if the index registers are 16 bit, but you only need 8 bit for this, because 128 objects with 2 byte object slots will just fit), but you only need to do 2 inx or iny's instead of tdc, clc, adc, tcd. Heck, the accumulator might not even be 16 bit here, so that's an extra instruction or two.

Direct page can be freed up to actually be useful as a 16 bit index register, and as I said earlier, There's no reason index the object table by a 16 bit number. I've have situations where x and y are 16 bit, but only one of them has to be, it's just that you can't do that. Now though, you just make the thing that needs to be indexed by a 16 bit number be indexed by direct page.

The object table no longer has to be in the first 8KB of ram and can even have some parts of it outside the first 8KB of ram.

So yeah, I'm pretty dumb. :lol: I found I could make my metasprite routine significantly faster now because what I'm mostly loading in that routine is metasprite data that will now be direct page, (1 cycle faster, but has to all be in 32KB of rom. I can actually shrink each sprite in the metasprite from 8 to 6 bytes with a little more processing time, but it's still much faster than metasprite data being indexed by x or y) so x and y can now be 8 bit, because the object table can be indexed by an 8 bit number and I could easily index oam by an 8 bit number if I create an otherwise duplicate routine, which I'm more than willing to do to save CPU time.


Last edited by Espozo on Sun Sep 25, 2016 10:09 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sun Sep 25, 2016 4:27 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1866
Location: DIGDUG
Some of what you said is confusing to me.

"Indexed by direct page" instead of x/y.

Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.

By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.

And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Sun Sep 25, 2016 10:09 am 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3153
Location: Nacogdoches, Texas
dougeff wrote:
Some of what you said is confusing to me."Indexed by direct page" instead of x/y.Direct page just gives a constant value to the upper byte, so you can access an address with just the lower byte. But to 'index' still will need to use the x or y. Even an indirect index needs a y of zero.By index, I mean 'treat an address as the start of an array, using x or y as the offset # of bytes'.

I'm actually not quite sure what you're trying to say. :lol: Unless I'm mistaken, direct page is a 16 bit register (that holds a 16 bit value, obviously) that has its number added to an 8 bit address. While you can only access 256 bytes at a time for wherever direct page is, for 99% of applications where you need a 16 bit offset, this shouldn't matter.

So yeah, it's not quite as good as a 16 bit x or y in that a 16 bit number is only added to an 8 bit number (and not a 16 bit one), but it's also 2 cycles faster, which adds up very quickly.

dougeff wrote:
And...to be blunt...this is the kind of 'worrying about efficiency' issue that will slow down development. I know that you (Espozo) are trying to squeeze every last possible sprite object onto screen as humanly possible. But, I personally never worry about which indexing method is slightly faster than the next.

While it is true I'm kind of being a freak about how to make things as fast as possible and not really getting anywhere (you can also thank school for that) but this is a pretty major thing, because if I ever decide later down the road if I want to change (and like I said, there are several advantages to doing so) then I have to fix every routine that is using the object table, which is a hell of a lot.


Top
 Profile  
 
PostPosted: Sun Sep 25, 2016 10:53 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2421
The downside to using dp as index is you can't increment/decrement in one instruction.


Top
 Profile  
 
PostPosted: Sun Sep 25, 2016 11:14 am 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3153
Location: Nacogdoches, Texas
True. The speed advantage should still be greater though, unless it's something like searching for vram slots where you're only doing one instruction and then incrementing. Most of the time for this though, you can use the other index register.

Here's the "ideal" way of how each register will be used. X and y are 8 bit. (x or y are interchangeable here. There's some random difference, but I forgot, so I'm not taking that into account)

x or y: object table
x or y: ram table
direct page: large rom table (like metasprite data)

If nothing needs 2 additional index registers (I'm going to assume we always need one for object slots) then can either make the second x/y or direct page. Direct page is faster, except when you have to increment or decrement it, and if direct page is whatever, loading variables from the first 256 bytes of ram (that's where I have it by default) takes one extra cycle.


Top
 Profile  
 
PostPosted: Sun Sep 25, 2016 12:11 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1866
Location: DIGDUG
Quote:
direct page is a 16 bit register

I guess I had always seen a round number stored to the
direct bank register... 0100, 3f00, fe00, etc.

This is probably why...
Quote:
Fragment 11.4
You will for the most part, however, want to set the direct page to begin on a boundary: it saves one
cycle for every direct page addressing operation. This is because the processor design includes logic that, when
the direct page register’s low byte is zero, concatenates the direct page register’s high byte to the direct page
offset – instead of adding the offset to the entire direct page register – to form the effective direct page address;
concatenation saves a cycle over addition.

-- Programming the 65816, WDC, 2007

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Sun Sep 25, 2016 12:58 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3153
Location: Nacogdoches, Texas
Well, that's weird... That makes me sad, although it's still faster. (one less cycle than 16 bit index, which it's pretty much equivalent to.)


Top
 Profile  
 
PostPosted: Sat Oct 22, 2016 8:15 am 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3153
Location: Nacogdoches, Texas
I'm not even going to lie, but after helping dougeff, I realized that I completely forgot about 24 bit addressing. That would have saved me a lot of head ache. :lol: I guess it's better how I have it though, because it's a cycle slower than absolute addressing. I don't actually have a single instance of long addressing in my code. I might actually be able to make it faster in a couple places where I unnecessarily switched banks.


Top
 Profile  
 
PostPosted: Sat Oct 22, 2016 11:13 am 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19334
Location: NE Indiana, USA (NTSC)
Absolute long on the Super NES is no worse than absolute on NES.

On a 1.8 MHz 6502 clocked by a divided 21.5 MHz crystal, reading an 8-bit variable using a,X mode costs 48 master clocks: 12 to read the opcode, 24 (2*12) to read the address, and 12 to read the data.

On a 3.6 MHz 65816 clocked by the same crystal, reading a 16-bit variable using al,X mode also costs 40 master clocks: 6 for the opcode, 18 (3*6) for the address, and 16 (2*8) for the data.


Top
 Profile  
 
PostPosted: Sat Oct 22, 2016 1:46 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3153
Location: Nacogdoches, Texas
tepples wrote:
Absolute long on the Super NES is no worse than absolute on NES.

A Porsche with its tires removed is no worse than a brick. :lol:


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group