It is currently Mon Jul 23, 2018 5:12 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sun Jun 24, 2018 4:42 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1648
Inspired by Banshaku's recent threads, I did some analyzing and I found out that the cc65 compiler is not very efficient when it comes to pointer access, even though this has nothing to do with the architecture and could easily be avoided, if I'm not mistaken.

When I have this simple code snippet:
Code:
extern unsigned char *pNumber;
#pragma zpsym("pNumber")

void __fastcall__ Test(void)
{
    *pNumber = 5;
}

then this is what the compiler turns it into:

Code:
   lda     _pNumber+1
   sta     ptr1+1
   lda     _pNumber
   sta     ptr1
   lda     #$05
   ldy     #$00
   sta     (ptr1),y
   rts

My own pointer is clearly declared as being located in the zeropage:
Code:
#pragma zpsym("pNumber")
--> .importzp   _pNumber

And yet, the compiler feels the need to always copy the pointer values to its own pointer instead of simply doing this:
Code:
   lda     #$05
   ldy     #$00
   sta     (_pNumber),y
   rts

Why is this the case at all? Is there any technical reason for it or is it simply an oversight by the programmer who created the parser?

Is there any way to get the compiler to change this behavior without adding inline Assembly manually?

I compiled with
cc65 -O Test.c
and the situation is the same in the old cc65 from cc65.org as well as the newer version from github.


By the way, if you do more than one variable access, like this:
Code:
    *pNumber = 5;
    *pNumber = 6;

Guess what:
Code:
   lda     _pNumber+1
   sta     ptr1+1
   lda     _pNumber
   sta     ptr1
   lda     #$05
   ldy     #$00
   sta     (ptr1),y
   lda     _pNumber+1
   sta     ptr1+1
   lda     _pNumber
   sta     ptr1
   lda     #$06
   sta     (ptr1),y

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 4:53 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2156
Location: DIGDUG
I just avoid using pointers like this.

I only used pointers to access individual enemies, and I sent it as a parameter to a function that I wrote a function in assembly to process drawing that enemy's sprite.

I didn't write it until I needed to save cycles.

So, anything that takes too many cycles, I rewrote in assembly, as a fastcall function.

So basically, it translates to...

function(&enemy1);

lda lowbyte.enemy1
ldx highbyte.enemy1
jsr function

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 5:04 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1648
dougeff wrote:
I just avoid using pointers like this.

Well, sometimes you can't avoid using pointers.

(Of course, my current example of accessing a single number through a pointer would be nonsense in a real situation, but it was just a simple minimalistic example to demonstrate the concept.)

For example, my new game will have a whole bunch of enemies, so you cannot program each enemy behavior individually.
Instead, I created a script-based function. It reads the first item from an array and depending on the contents, it reads the next values in a certain way.

For example:
If the current array value is "Move forward", then read the next value as "direction" and the value after that as "number of tiles".
If, instead, the current value is "Wait", read the next value as the number of frames to wait.

Etc.

Same with the level buildup function: Each screen is stored in an array of arbitrary size because each screen can have an arbitrary number of background objects, NPCs, enemies etc. So, I need a pointer to iterate through it until the pointer reads the screen end byte.


How would you do these things without using pointers?


dougeff wrote:
So, anything that takes too many cycles, I rewrote in assembly, as a fastcall function.

Well, yeah, writing directly in Assembly is always the best solution, but not wanting to do this is also the thing that's pretty much the reason why people use C to begin with.

And in the current situation, we're not even discussing anything that a C compiler cannot optimize because of the architecture.
In the moment, it's simply the question: Why does the compiler always copy the pointer to its own pointer? Is there any reason for it? And can it be avoided (either by command line options or by a certain code style that we simply remember to always apply to C programs for the NES)?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 5:29 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2156
Location: DIGDUG
Well, I used to write inline assembly just like your na_th_an's example*, but I find it "ugly" to see C code with lots of assembly.

You could write a macro that inserts inline assembly to make it "pretty" and more C like.

edit
*example
https://github.com/mojontwins/MK1_NES/b ... enengine.h

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 6:11 am 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 3090
Location: Tampere, Finland
DRW wrote:
Why is this the case at all? Is there any technical reason for it or is it simply an oversight by the programmer who created the parser?

I wouldn't expect any compiler to generate optimal code in all scenarios. If I was writing a code generator I, too, would definitely start by handling the general case (in this case, a pointer from anywhere in the memory space), and only then start thinking about case-specific optimizations like this.

(By the way, no compiler would be doing optimizations like this in the parsing phase. Parsing simply checks the input against the grammar of the language.)

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 10:18 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 761
The compiler lacks optimizations for this case. Nothing you can do, except write a patch.


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 1:12 pm 
Offline

Joined: Mon May 27, 2013 9:40 am
Posts: 473
I avoid using pointers in cc65 as well, as I know they tend to behave worse than arrays. Sometimes you have to, as pointed. But it's fun how you better use array access when possible when targetting the 6502 via cc65, but you better use pointer based access when possible when targetting the Z80 via z88dk or SDCC. Sometimes porting is a nightmare because of this :-D

_________________
http://www.mojontwins.com


Top
 Profile  
 
PostPosted: Sun Jun 24, 2018 6:15 pm 
Offline
User avatar

Joined: Tue Jun 24, 2008 8:38 pm
Posts: 1630
Location: Fukuoka, Japan
@DRW

I checked the code regarding the array of structure and saving the reference was not so bad BUT accessing the data that is referenced by the pointer (2 arrays) causes the compiler to move the data inside PTR1 even though it had the information just before in the last statement.

I guess even though it looked "nicer" code wise at first, I will avoid that pattern after all. I do not really need the array of structures, it just looked better to me.


Top
 Profile  
 
PostPosted: Mon Jun 25, 2018 12:39 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1648
Yeah, looks like every pointer access of any kind does that.

Unfortunately, I still need pointers if a character has a certain movement pattern that is stored in an array.

I wrote some macros for this kind of stuff now, like this:
Code:
#define AsmSetVariableFromZpArrayPointer(variable, zpArrayPointer, index)\
{\
   __asm__("LDY %v", index);\
   __asm__("LDA (%v), Y", zpArrayPointer);\
   __asm__("STA %v", variable);\
}

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Tue Jun 26, 2018 9:24 am 
Offline
User avatar

Joined: Sat Jan 09, 2016 9:21 pm
Posts: 416
Location: Central Illinois, USA
Just to chime in because I had this same issue with my project: yes, cc65 generates terrible code for pointers. Anything using pointers in a loop will probably need to be written in assembly.

In Robo Ninja Climb, I had a simple loop with some pointers that literally used 80% of a frame with cc65's version. Rewriting in assembly with a tiny bit of optimization dropped it to less than 5% of my frame.

_________________
My games: http://www.bitethechili.com


Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 3:49 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1648
Here's another strange cc65 behavior:

This:
Code:
dest = (src + 3) >> 2;

gets turned into this:
Code:
   ldx     #$00
   lda     _src
   jsr     incax3
   jsr     shrax2
   sta     _dest

Why doesn't the compiler simply use LSR?
It creates perfectly fine code when you turn the shift operator around:
Code:
   lda     _src
   clc
   adc     #$03
   asl     a
   asl     a
   sta     _dest

And if you use the right shift operator, but remove the + 3, then it's fine as well:
Code:
   lda     _src
   lsr     a
   lsr     a
   sta     _dest

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 3:57 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6431
Location: Canada
That's actually correct. The temporary result src + 3 is implicitly a 16-bit int. The high bits of the result can matter when you shift them down, but they won't matter when you shift them up. Think of (255+3)>>2.

How does it deal with:
Code:
dest = (unsigned char)(src + 3) >> 2;


Last edited by rainwarrior on Wed Jun 27, 2018 4:00 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 3:59 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20291
Location: NE Indiana, USA (NTSC)
Is this also incorrect?
Code:
clc
lda src
adc #3  ; C:A ranges from 3 to 258
ror a
lsr a
sta dest


Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 4:01 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6431
Location: Canada
tepples wrote:
Is this also incorrect?

No, that's fine, but that's a whole new class of optimization that you've ordered here. (Something about keeping track of not just 8 and 16 bit results, but 9 bit as well...)


Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 4:04 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1648
Is there any way I can force the compiler to treat this as a byte?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Last edited by DRW on Wed Jun 27, 2018 4:19 pm, edited 1 time in total.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group