It is currently Tue Jul 16, 2019 5:43 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 28 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sun Jun 16, 2019 12:50 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1884
I'm tying to access an array with a calculated index.

Variables:
Code:
unsigned char x, y, index, value;
unsigned char array[100];

This code of course transforms the index into an integer and therefore accesses the array in a complicated way, by copying it to ptr1, using ADC to calculate the index and all that stuff:
Code:
void Test1(void)
{
    array[y * 16 + x] = value;
}

This one makes the array access simple since the index is just a byte:
Code:
void Test2(void)
{
    index = y * 16 + x;
    array[index] = value;
}
-->
Code:
   ldy     _index
   lda     _value
   sta     _array,y

However, these two make it complicated again, even though the compiler should know that the index is a single byte:
Code:
void Test3(void)
{
    array[(unsigned char)(y * 16 + x)] = value;
}

void Test4(void)
{
    array[index = y * 16 + x] = value;
}

Is there a way to get the compiler to access the array in the simple way with a byte index, without having to put the index into a variable first?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Sun Jun 16, 2019 1:46 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7526
Location: Canada
Hmm, I did something like that in NESert golfing. The sprite_add function casted index expressions like that to access the oam buffer... I never checked its generated code, but am disappointed to notice now that it promoted them all to 16-bit, even the +0!

It seems that if the expression is anything more complicated than a single term it fails to reduce it from the 16-bit pointer access.

I dunno, the ability of this compiler to optimize stuff like this is pretty poor. I suspect the cases where it does recognize it can use an absolute index are due to an optimizer pass that looks for very specific patterns, and the addition of extra terms breaks up that pattern. Maybe more patterns could be recognized if you wanted to research that and add code to the compiler to address them, but in general it's not a very versatile system.


Trying to do this with C syntax mangling might be squeezing blood from a stone... usually I would just ignore the problem until it comes up as a real performance issue (i.e. why I didn't notice sprite_add wasn't more efficient: it was still efficient enough for that game), and then when it does I rewrite the affected thing in assembly.

Storing the computed index into a char variable that has #pragma zpsym is maybe about as reasonable a workaround as you'll get if your solution is restricted to C-side only.


Top
 Profile  
 
PostPosted: Sun Jun 16, 2019 1:53 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7526
Location: Canada
Looking at that example I linked, I found:

Code:
// original: generates 16-bit indirect pointer for each store
void sprite_add(uint8 tile, uint8 x, uint8 y, uint8 attrib)
{
   oam[(uint8)(oam_pos+2)] = attrib;
   oam[(uint8)(oam_pos+0)] = (uint8)(y-1);
   oam[(uint8)(oam_pos+3)] = x;
   oam[(uint8)(oam_pos+1)] = tile;
   oam_pos += 4;
}

// better: gets an absolute indexed store for 3 of them, at least
void sprite_add(uint8 tile, uint8 x, uint8 y, uint8 attrib)
{
   i = oam_pos; // i is uint8 with #pragma zpsym
   oam[i] = (uint8)(y-1); ++i; // this one still fails to optimize because the y-1 expression confuses the pattern
   oam[i] = tile; ++i;
   oam[i] = attrib; ++i;
   oam[i] = x;
   oam_pos = i+1;
}


Ultimately though, I think the second solution is kinda worthless. It would have taken much less time to write this function in assembly than trying to deduce how to mangle my C to fit the compiler's optimization patterns, and it would perform better too.

As a shortcut I could even just copy and paste the generated assembly for the bad function, so I wouldn't have to look up how to fetch the parameters off the stack, etc. and just fix the actual stores.


Top
 Profile  
 
PostPosted: Mon Jun 17, 2019 8:09 am 
Offline
User avatar

Joined: Fri Nov 24, 2017 2:40 pm
Posts: 168
For the OPs question, I've never found a particularly better way to do it either. I have a global zero page variable 'idx' that I use exclusively as a temporary value for indexes. You can use it with the comma operator to make it a little less obtuse, if a little more obscure. I like doing it this way because I can't forget that it's not really a local variable if I call another function that uses it.

Code:
some_array[idx = expression, idx] = whatever;


You still get a wasted zero page load/store, but I think that's the best you are going to get.

@rainwarrior You can optimize your function in C the same way you do it in assembly with an absolute address + an offset. I use the following pattern all the time. It produces exactly the assembly you would expect.

Code:
// better: gets an absolute indexed store for 3 of them, at least
void sprite_add(uint8 tile, uint8 x, uint8 y, uint8 attrib)
{
   i = oam_pos; // i is uint8 with #pragma zpsym
   (oam + 0)[i] = y-1;
   (oam + 1)[i] = tile;
   (oam + 2)[i] = attrib;
   (oam + 3)[i] = x;
   oam_pos = i+4;
}


Top
 Profile  
 
PostPosted: Mon Jun 17, 2019 11:40 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7526
Location: Canada
Ah! Hadn't thought of that one. That's almost how it would look in assemble, heh.


Top
 Profile  
 
PostPosted: Mon Jun 17, 2019 4:02 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1884
slembcke wrote:
Code:
some_array[idx = expression, idx] = whatever;

That's actually a pretty creative use. Thanks for the suggestion.

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Mon Jun 17, 2019 4:27 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2534
Location: DIGDUG
If optimization is important (and it always is) I would write all this in assembly.

Perhaps an inline assembly macro.

I've seen na_th_an do something like this.

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Tue Jun 18, 2019 10:03 am 
Offline
User avatar

Joined: Fri Nov 24, 2017 2:40 pm
Posts: 168
Related tip:
you can use the pseudo variables __A__ and __AX__ to make shuffling data in and out of inline assembly much easier. Quite handy when you pair it with (abuse?) the comma operator.

Dumb example:
Code:
#define DUMB_MACRO(value) (__A__ = value, asm("xor #$0F"), __A__)
my_variable = DUMB_MACRO(45);


Using those pseudo variables make the compiler select the correct way to load the value you wanted into the register instead of needing to do it with those % format chars. Then at the end of the sequence, you tell the compiler the result is already in registers and handle the store too.


Top
 Profile  
 
PostPosted: Tue Jun 18, 2019 10:17 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 956
Is xor a valid alias to eor?


Top
 Profile  
 
PostPosted: Tue Jun 18, 2019 11:04 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2534
Location: DIGDUG
no.

But ca65 has a .XOR operator (for calculations at assemble time of constants)

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Tue Jun 18, 2019 11:20 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1884
I didn't know about __A__. This might come in handy.


By the way, I noticed another thing:
Not only does a calculated array index produce bad code. Even setting a calculated value to the array isn't really effective, even if the array is a byte type.


Stuff like this is fine (we assume all variables are global or static):
array[index] = value;

But this isn't:
array[index] = value + 1;

And this time, the comma operator won't help you:
array[index] = (temp = value + 1, temp);

You have to do this for nice short code:
temp = value + 1;
array[index] = temp;


However, the code is perfectly fine when you use a constant array index since it acts like a regular non-array variable. And for some reason, regular non-array variables don't make such a fuss:
array[5] = value + 1;

Using __A__ doesn't help either:
__A__ = value + 1;
array[index] = __A__;


And type conversion are unhelpfull as well:
array[index] = (unsigned char)(value + 1);

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Wed Jun 19, 2019 7:03 am 
Offline
User avatar

Joined: Sat Jan 09, 2016 9:21 pm
Posts: 618
Location: Central Illinois, USA
Ugh. I think the important lesson here seems to be: cc65 isn't very efficient. If performance matters in the section of code you're working on, assembly is the answer.

_________________
My games: http://www.bitethechili.com


Top
 Profile  
 
PostPosted: Wed Jun 19, 2019 8:05 am 
Offline
User avatar

Joined: Fri Nov 24, 2017 2:40 pm
Posts: 168
calima wrote:
Is xor a valid alias to eor?


Uh... probably not. That's just the sort of shoddy off the top of my head examples I'm known for. LOL


Top
 Profile  
 
PostPosted: Wed Jun 19, 2019 9:30 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7526
Location: Canada
DRW wrote:
But this isn't:
array[index] = value + 1;

So, something I was saying above, cc65's optimizations don't occur at a high level. It only has one way to generate the base assembly code, and then only after it's already in assembly form it matches patterns in the assembly code to reduce them.

Basically what you're looking at is that the code generated by assigning a value to an array element generates a known pattern. As soon as you put some extra operations on the value to be assigned, more code ends up inside that doesn't match the pattern. The problem is the order in which things are done, I guess. It creates the pointer to the array, constructs the expression, then assigns it. When the code in the middle of this gets more complex, it can't match the pattern anymore.

You can use --debug-opt-output to observe the process if you're curious.

So... if you expect cc65 to ever be able to handle this optimization step, it probably either requires a whole new high-level optimization system (big overhaul) or a bunch of pattern matches for every kind of intermediate expression you can think of (very cumbersome). I feel like the latter isn't worth doing, and the former requires someone with a lot of interest.


Top
 Profile  
 
PostPosted: Wed Jun 19, 2019 12:00 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1884
gauauu wrote:
Ugh. I think the important lesson here seems to be: cc65 isn't very efficient. If performance matters in the section of code you're working on, assembly is the answer.

Well, performance matters everywhere in the actual action gameplay. But I still wouldn't want to write all this in pure Assembly. That's why I use inline Assembly when necessary and C for the rest. (And a bunch of low level pure Assembly code files, mostly for general purpose functions or NES-hardware-related things.)

However, arrays and pointers are really the only stuff that I noticed that are turned into unnecessarily complicated code.
Pointers are always copied to ptr1, even if your pointer is in the zeropage itself.
And arrays, as we see now, produce bad code whenever either the index or the assigned value isn't an exact byte value, but something that gets promoted to int due to calculations and stuff.

But other than arrays and pointers, I have yet to see a situation where the generated Assembly code from C is so bad that I'd have to use Assembly to cut down the required ROM size and CPU time by a huge amount.

rainwarrior wrote:
Basically what you're looking at is that the code generated by assigning a value to an array element generates a known pattern. As soon as you put some extra operations on the value to be assigned, more code ends up inside that doesn't match the pattern.

I'm just a bit confused why even the assigned value makes a difference in calculating the array itself.

The array index, o.k., this one makes sense: Having a byte index or an int index makes a huge difference in the processing. And calculations always get promoted to int, so yeah, I understand why array[byteIndex + 1] has an issue over array[byteIndex].

But I just don't understand why the assigned value makes a difference on how the array's offset is calculated.

Well, looks like I have to create a macro:
Code:
/* Only use this if the calculated value is actually calculated at runtime.
   Example: array[index] = variable + 1;
   If you assign an unaltered variable or a constant value known at compile time:
   array[index] = variable;
   array[index] = 1;
   you can access the array directly.
   This macro is only there because the generated code gets large otherwise. */
#define SetArrayValueFromCalculatedValue(array, index, calculatedValue, tempVariable)\
{\
    (tempVariable) = (calculatedValue);\
    (array)[index] = (tempVariable);\
}

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group