cc65: Array access with int index even if it should be abyte

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
rainwarrior
Posts: 7680
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65: Array access with int index even if it should be a

Post by rainwarrior » Wed Jun 19, 2019 12:31 pm

DRW wrote:I'm just a bit confused why even the assigned value makes a difference in calculating the array itself.
Well, honestly it's not like the operation of cc65's optimizer is at all obvious. It generates the same initial assembly code whether or not -O is used, and then after doing this -O does a series of pattern match steps to refactor the already generated assembly code. It's a bit of a backward approach to the problem, and really weird. It would be much better to start optimizing at a higher level, but I'm not going to fantasize too much about that. This is what the compiler we have does.

If you want to understand what it does, like I said just above, you can use --debug-opt-output to observe the process if you're curious. It will show you exactly what the initial generated assembly code is, and every step it takes to optimize.

If you want an example:

Code: Select all

// starting C line

(oam+3)[oam_pos] = x+5;

; 1. initial generated assembly:

	lda     #<(_oam+3)
	ldx     #>(_oam+3)
	clc
	adc     _oam_pos
	bcc     L02DF
	inx
L02DF:	jsr     pushax
	ldy     #$04
	ldx     #$00
	lda     (sp),y
	jsr     incax5 ; this is the +5 operation, which critically breaks up the optimization pattern (see below)
	ldx     #$00
	ldy     #$00
	jsr     staspidx

;  2. OptAdd5

	lda     #<(_oam+3)
	ldx     #>(_oam+3)
	clc
	adc     _oam_pos
	bcc     L02DF
	inx
L02DF:	jsr     pushax
	ldy     #$04
	ldx     #$00
	lda     (sp),y
	clc
	adc     #$05 ; jsr incax5 replaced with an inline add
	ldx     #$00
	ldy     #$00
	jsr     staspidx

; 3. OptUnusedLoads

	lda     #<(_oam+3)
	ldx     #>(_oam+3)
	clc
	adc     _oam_pos
	bcc     L02DF
	inx
L02DF:	jsr     pushax
	ldy     #$04 ; unused ldx #00 eliminated
	lda     (sp),y
	clc
	adc     #$05
	ldy     #$00 ; unused ldx #00 eliminated
	jsr     staspidx

; 4. OptStackOps

	lda     #<(_oam+3)
	ldx     #>(_oam+3)
	clc
	adc     _oam_pos
	bcc     L02DF
	inx
L02DF:	sta     ptr1 ; jsr pushax / staspidx (temporary pointer on stack) replaced by keeping it in ptr1
	stx     ptr1+1
	ldy     #$02
	lda     (sp),y
	clc
	adc     #$05
	ldy     #$00
	sta     (ptr1),y
At that point, it's out of patterns to match and stops optimizing.

For comparison, without the intervening +5:

Code: Select all

// starting C line

(oam+3)[oam_pos] = x;

; 1. initial generated assembly

	lda     #<(_oam+3)
	ldx     #>(_oam+3)
	clc
	adc     _oam_pos
	bcc     L02DF
	inx
L02DF:	jsr     pushax
	ldy     #$04
	ldx     #$00
	lda     (sp),y
	ldy     #$00
	jsr     staspidx

;  2. OptPtrStore2

	lda     #<(_oam+3)
	ldx     #>(_oam+3)
	ldy     #$02
	ldx     #$00
	lda     (sp),y
	ldy     _oam_pos
	sta     _oam+3,y ; the temporary pointer on stack is eliminated, and the index add is replaced with Y index

; 3. OptUnusedLoads

	ldy     #$02 ; unused ldx #00 eliminated
	lda     (sp),y
	ldy     _oam_pos
	sta     _oam+3,y
So you can see the pattern that failed to apply is called OptPtrStore2.

The optimization patterns are each a function in the cc65 source. They tend to have a good explanation of the operation. Looking up OptPtrStore2:

Code: Select all

unsigned OptPtrStore2 (CodeSeg* S)
/* Search for the sequence:
**
**      clc
**      adc     xxx
**      bcc     L
**      inx
** L:   jsr     pushax
**      ldy     yyy
**      ldx     #$00
**      lda     (sp),y
**      ldy     #$00
**      jsr     staspidx
**
** and replace it by:
**
**      sta     ptr1
**      stx     ptr1+1
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     (ptr1),y
**
** or by
**
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     (zp),y
**
** or by
**
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     label,y
**
** or by
**
**      ldy     yyy-2
**      ldx     #$00
**      lda     (sp),y
**      ldy     xxx
**      sta     $xxxx,y
**
** depending on the code preceeding the sequence above.
*/
As you can see, this pattern was written to match specifically a simple fetch from the stack and store. There's two other variations of this type of pattern, you can read about them all in the header.

The problem is simply that each of them can only match a very simple pattern. It's looking for a specific beginning, middle, and end. You can't just start adding arbitrary extra code in the middle (i.e. the expression to be resolved), that new stuff in the middle has to fit a pattern that's known to be safe to optimize. Probably possible to write such a thing, but it's complicated and difficult... so, instead only this bunch of simpler cases were written.

When you put an expression in a temporary variable first, that code gets generated before the array access portion, so doesn't interfere with the simple "generate address, fetch, store" pattern that these are capable of matching.


Anyhow, that's just one example of how to analyze cc65's optimizer. It's really all spelled out by --debug-opt-output, so if you want to know about any specific cases you're dealing with, that's the tool to use.

Still probably easier to just rewrite offending code in assembly as needed* than it is to try and understand whatever this byzantine optimizer is doing, though.

* ...and don't worry about it when it's not needed.

User avatar
DRW
Posts: 1914
Joined: Sat Sep 07, 2013 2:59 pm

Re: cc65: Array access with int index even if it should be a

Post by DRW » Wed Jun 19, 2019 1:17 pm

Thanks for the explanation. I'll have a more detailed look at it to understand how the optimization in compilers works.
rainwarrior wrote:Still probably easier to just rewrite offending code in assembly as needed* than it is to try and understand whatever this byzantine optimizer is doing, though.

* ...and don't worry about it when it's not needed.
Yeah, that's what I'm doing anyway.
Although, in this case, I don't even need Assembly. I simply need to put a temporary variable in between and it's good enough.

Changing it into Assembly would mean that I also have to write the calculation itself in Assembly, so it would be different code for different situations. And sometimes, it is stuff like this:
array[index] = Tile16XToAbsoluteX(heroTile16X) + (width16Block - 1) * 16 + 1;
Which I don't feel at all like turning into inline Assembly.

But with my C macro, I can still write:

Code: Select all

SetArrayValueFromCalculatedValue(
    array, index,
    Tile16XToAbsoluteX(heroTile16X) + (width16Block - 1) * 16 + 1,
    temp);
The only downside is one single additional STA temp in the generated output code.
Available now: My game "City Trouble".
Sales website: https://megacatstudios.com/products/city-trouble
Gameplay: https://youtu.be/Eee0yurkIW4
Download website: http://www.denny-r-walter.de/city.htm

User avatar
rainwarrior
Posts: 7680
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65: Array access with int index even if it should be a

Post by rainwarrior » Wed Jun 19, 2019 1:25 pm

So, like, to address this specific case, you might be able to augment each of the OptPtrStore functions to look for 1 or 2 extra instructions between the fetch and the store. INC (+1), DEC (-1), CLC/ADC (+), SEC/SBC (-), and if found leave them in there. Of course you might look for EOR, ORA, etc. as well, there's a lot of possibilities here.

That might take care of most expressions that are just 2 terms. 3 term expressions will still be unmatchable, but maybe that optimization would be worth writing code for since + a constant is such a common case.

Similarly for the [a+1] case you could try to pattern match the "generate address + add a constant to an index" thing, and other similar cases...

Of course after writing these you must make sure they still passes all the unit tests, and try it out on whatever you can to make sure it doesn't have some unexpected interaction with one of the other 100 optimization patterns that breaks stuff. Testing itself is a lot of work, and if you do happen to find a conflict, every attempt to resolve that multiplies your testing too. :P


Maybe you can see how this creates a combinatorial explosion of work for a fairly marginal gain. In a lot of ways, even if you could write these pattern matchers correctly and they worked, it's just not worth adding that much code to the optimizer to solve the problem backwards like this. The better solution is high level optimization before the assembly begins, but that's a whole project of its own.

User avatar
rainwarrior
Posts: 7680
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65: Array access with int index even if it should be a

Post by rainwarrior » Wed Jun 19, 2019 1:29 pm

DRW wrote:Changing it into Assembly would mean that I also have to write the calculation itself in Assembly, so it would be different code for different situations. And sometimes, it is stuff like this:
array[index] = Tile16XToAbsoluteX(heroTile16X) + (width16Block - 1) * 16 + 1;
Which I don't feel at all like turning into inline Assembly.
The quick way is: write it in C, compile, copy paste the generated assembly, then just rewrite the bits you're interested in improving.

User avatar
DRW
Posts: 1914
Joined: Sat Sep 07, 2013 2:59 pm

Re: cc65: Array access with int index even if it should be a

Post by DRW » Thu Jun 20, 2019 12:18 pm

rainwarrior wrote:Maybe you can see how this creates a combinatorial explosion of work for a fairly marginal gain.
Sure. Besides, I never intended that anybody should actually write an optimization in the compiler. I merely asked about this to find out whether there are tricks to make it smaller by doing some stuff in the C code that I haven't thought of yet.
rainwarrior wrote:The quick way is: write it in C, compile, copy paste the generated assembly, then just rewrite the bits you're interested in improving.
Yeah, but I also want to keep my code readable. If I intended to clutter everything with inline Assembly, just to save one byte or two, I would write the whole thing in Assembly in the first place.

I mean, if this code:

Code: Select all

Pointer[Index] = Value;
gets turned into this:

Code: Select all

	lda     _Pointer
	ldx     _Pointer+1
	clc
	adc     _Index
	bcc     L0005
	inx
L0005:	sta     ptr1
	stx     ptr1+1
	lda     _Value
	ldy     #$00
	sta     (ptr1),y
then it really pays off if I have a general purpose inline Assembly macro that simply does this:

Code: Select all

LDA Value
LDY Index
STA (Pointer), Y
Especially if I can reuse it many times and the code is still readable if I put it into a macro called

Code: Select all

AsmSetVarFromPtrAtIdxVar(variable, pointer, indexVariable)
But I don't think it's a good idea to analyze every little code line to see if I can turn it into slightly better Assembly than the C compiler does.
After all, there's a reason why I write in C. Because this:

Code: Select all

Value = heroTile16X * 16 + (width16Block - 1) * 16 + 1;
is much more readable than its Assembly version.

Also, I wouldn't even know how to start optimizing the output.

For example, if I had to write the above code in Assembly, I would probably do it like this:

Code: Select all

LDA heroTile16X
ASL
ASL
ASL
ASL
STA tmp1
LDX width16Block
DEX
TXA
ASL
ASL
ASL
ASL
CLC
ADC tmp1
TAX
INX
STX Value
On the other hand, this is what the compiler creates:

Code: Select all

	ldx     #$00
	lda     _heroTile16X
	jsr     shlax4
	sta     ptr1
	stx     ptr1+1
	ldx     #$00
	lda     _width16Block
	jsr     decax1
	jsr     shlax4
	clc
	adc     ptr1
	pha
	txa
	adc     ptr1+1
	pla
	clc
	adc     #$01
	sta     _Value
This doesn't look even slightly like my code, so I have no idea how I should take this one as the base and then convert it into something better. And if I had the patience to write my own code above for everything, I wouldn't have used C in the first place.

So, yeah, not really the place to manually fiddle with one-time code expressions that never get repeated again.
I have a bunch of inline Assembly macros for common performance issues, like pointer and array access. And for most of the rest, I simply swallow the CPU and ROM penalty that is included in C. I mean, in the moment, I can have about 12 characters on screen without any lag, so it looks like the C code isn't that bad.
Available now: My game "City Trouble".
Sales website: https://megacatstudios.com/products/city-trouble
Gameplay: https://youtu.be/Eee0yurkIW4
Download website: http://www.denny-r-walter.de/city.htm

User avatar
dougeff
Posts: 2617
Joined: Fri May 08, 2015 7:17 pm
Location: DIGDUG
Contact:

Re: cc65: Array access with int index even if it should be a

Post by dougeff » Thu Jun 20, 2019 12:42 pm

This doesn't look even slightly like my code
Implied promotion to INT on many C library functions.
in the moment, I can have about 12 characters on screen without any lag, so it looks like the C code isn't that bad.
Depends on what kind of game you want. When I made a card game, C greatly sped up dev time. If you want a shoot em up with dozens of objects on screen, I would do it in ASM.
nesdoug.com -- blog/tutorial on programming for the NES

User avatar
rainwarrior
Posts: 7680
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65: Array access with int index even if it should be a

Post by rainwarrior » Thu Jun 20, 2019 1:55 pm

dougeff wrote:
This doesn't look even slightly like my code
Implied promotion to INT on many C library functions.
Yeah, compounded by the fact that it does not attempt to identify that it only needs an 8-bit result at the high level. That reduction is sometimes made after the fact by assembly pattern recognition, but like we saw above it can't really identify 8-bit result patterns from expressions that aren't rather simple.

In many cases breaking an expression into intermediate results might actually help:

Code: Select all

// C code, value/ht/wb are char
value = ht * 16;
value += (wb - 1) * 16;
value += 1;

;
; value = ht * 16;
;
	lda     _ht
	asl     a
	asl     a
	asl     a
	asl     a
	sta     _value
;
; value += (wb - 1) * 16;
;
	lda     _wb
	sec
	sbc     #$01
	asl     a
	asl     a
	asl     a
	asl     a
	clc
	adc     _value
	sta     _value
;
; value += 1;
;
	inc     _value
;
; }
;
	rts
There's not much rhyme or reason to this, though. What works or doesn't isn't very easy to predict.

User avatar
DRW
Posts: 1914
Joined: Sat Sep 07, 2013 2:59 pm

Re: cc65: Array access with int index even if it should be a

Post by DRW » Thu Jun 20, 2019 2:55 pm

dougeff wrote:When I made a card game, C greatly sped up dev time. If you want a shoot em up with dozens of objects on screen, I would do it in ASM.
This comparison comes up so often: "If you want to use C, you might be able to program a "Pac-Man" clone. For anything else, you should use Assembly."

Well, I programmed a lagless platformer. And I'm in the middle of programming a "Zelda"-like action adventure.
Yes, I had to use some Assembly functions, usually when it comes to array-based stuff.
However, the general gameplay algorithm is still mostly in C.

And I'm pretty sure a shoot'em up would be doable in C, especially since enemies in these games mostly move in fixed patterns and don't even care about the environment. (Unlike in many other games, where each character always has to check for walls.)

So, yeah, whoever is fluent in Assembly should write the game directly in this language. However, it is not true that C only enables you to write simple 1985-style games. I'm pretty positive that a game like "Ninja Gaiden" would largely be possible in C.
Available now: My game "City Trouble".
Sales website: https://megacatstudios.com/products/city-trouble
Gameplay: https://youtu.be/Eee0yurkIW4
Download website: http://www.denny-r-walter.de/city.htm

supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: cc65: Array access with int index even if it should be a

Post by supercat » Thu Jun 20, 2019 3:30 pm

rainwarrior wrote:Yeah, compounded by the fact that it does not attempt to identify that it only needs an 8-bit result at the high level. That reduction is sometimes made after the fact by assembly pattern recognition, but like we saw above it can't really identify 8-bit result patterns from expressions that aren't rather simple.
The authors of the Standard noted in the Rationale that on commonplace implementations, if the result of an additive, multiplicative, or bitwise is coerced to a type which is not larger, it won't matter if the operation is performed as signed or unsigned(*). For the same reasons, it wouldn't matter if the operand were performed on a type larger than the result type. A good compiler for 8-bit machines really should use a production rules for assignment operators that evaluates the right side as an 8-bit quantity if the left side is likewise, and for a production rule for 8-bit expressions which processes additive, multiplicative, and bitwise operators by evaluating their operands as 8-bit quantities.

(*) The authors of the Standard expected that commonplace implementations would do this even for values between INT_MAX+1u and UINT_MAX, but did not want to require such behavior on platforms where it would be impractical. They didn't think should it necessary to say that commonplace implementations should handle such values in the expected fashion, but gcc exploits the Standard's laxity by generating code that doesn't.

User avatar
rainwarrior
Posts: 7680
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65: Array access with int index even if it should be a

Post by rainwarrior » Thu Jun 20, 2019 3:38 pm

supercat wrote:The authors of the Standard noted... A good compiler for 8-bit machines really should...
What I said has nothing to do with the standard. I was talking about cc65. cc65 just doesn't have the capability to recognize it, and the many cases that it manages to generate succinctly are kind of miraculous given what it's working with as an optimizer. What it "should" do is something completely different than what it does, but I doubt this will change any time soon, unfortunately. (Fortunately, it's still pretty useful in this compromised state anyway.)
Last edited by rainwarrior on Thu Jun 20, 2019 3:46 pm, edited 1 time in total.

supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: cc65: Array access with int index even if it should be a

Post by supercat » Thu Jun 20, 2019 3:45 pm

DRW wrote:Yes, I had to use some Assembly functions, usually when it comes to array-based stuff.
However, the general gameplay algorithm is still mostly in C.
If 25% of the program's non-vblank time is spent waiting for vblank, 70% is spent in 10% of the code, and the remaining 5% of the time is spent running the other 90% of the code, the performance of that 90% of the code would simply not matter unless something caused it to run more than five times as slowly. Some games have their execution time spread out among large parts of the program, but others have it concentrated in a few tiny places. Micro-optimizing parts of the code that dominate execution time can be much easier and more effective than worrying about the performance of anything else.

User avatar
gauauu
Posts: 660
Joined: Sat Jan 09, 2016 9:21 pm
Location: Central Illinois, USA
Contact:

Re: cc65: Array access with int index even if it should be a

Post by gauauu » Fri Jun 21, 2019 7:36 am

DRW wrote: So, yeah, whoever is fluent in Assembly should write the game directly in this language. However, it is not true that C only enables you to write simple 1985-style games. I'm pretty positive that a game like "Ninja Gaiden" would largely be possible in C.
As somebody who likes using C, I partially agree. If you're willing to optimize a few routines in assembly, write your C carefully and evaluate bottlenecks, and use a library of assembly routines for interacting with the hardware, you could do Ninja Gaiden in C.

That said, you can definitely run into trouble using C if you aren't careful. Both my Robo-Ninja Climb and Super Homebrew War were written mostly in C (For the sake of development time), but my first pass was too slow in both of them, requiring some optimization (both re-writing some C parts to better take advantage of how cc65 does things, and also rewriting some critical bits into assembly)

User avatar
slembcke
Posts: 170
Joined: Fri Nov 24, 2017 2:40 pm

Re: cc65: Array access with int index even if it should be a

Post by slembcke » Fri Jun 21, 2019 8:18 am

I've skimmed some of the cc65 optimizer code, and there's really not that much to it. It simply doesn't recognize many cases where it can optimize, and those are the only places where it really seems to relax the spec.

I'm not going to tell you "just write it all in assembly", because I don't. I just do what you do: Spend some minimal effort to trick the compiler into doing what you want. Once you know what you really want, rewrite parts using inline or regular assembly. Rinse and repeat.

Post Reply