cc65 - Are % and * supported?

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

User avatar
Goose2k
Posts: 148
Joined: Wed Dec 11, 2019 9:38 pm
Contact:

cc65 - Are % and * supported?

Post by Goose2k » Fri May 22, 2020 1:50 pm

I am using CC65 compiler to build an NES game, and I thought that I was not able to use %, *, and / operators.

However, on a whim, I just tried it, and it worked. Is it supprted, but just very slow or something?

C++:

Code: Select all

return game_board[(y * 10) + (x % 10)];
Generated ASM:

Code: Select all

	ldx     #$00
	lda     (sp),y
	jsr     mulax10
	jsr     pushax
	ldy     #$03
	lda     (sp),y
	jsr     pusha0
	lda     #$0A
	jsr     tosumoda0
	jsr     tosaddax
	sta     ptr1
	txa
	clc
	adc     #>(_game_board)
	sta     ptr1+1
	ldy     #<(_game_board)
	ldx     #$00
	lda     (ptr1),y

User avatar
dougeff
Posts: 2778
Joined: Fri May 08, 2015 7:17 pm
Location: DIGDUG
Contact:

Re: cc65 - Are % and * supported?

Post by dougeff » Fri May 22, 2020 1:53 pm

It is slow, except for CHAR and powers of 2, which are optimized to bit shifts.

var = var * 2

lda var
asl a
sta var

var = var / 2

lda var
lsr a
sta var

(I'm not 100% sure this is true at the moment)
Last edited by dougeff on Fri May 22, 2020 2:05 pm, edited 1 time in total.
nesdoug.com -- blog/tutorial on programming for the NES

User avatar
rainwarrior
Posts: 7901
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65 - Are % and * supported?

Post by rainwarrior » Fri May 22, 2020 2:02 pm

Code: Select all

* 10
became mulax10, which is a dedicated subroutine for multiplying by 10, and is pretty efficient. CC65 does have dedicated ones for 3, 5, 6, 7, 9, and 10. Powers of two are generally handled with reasonably efficient shifts. (More simply: 0-10 and other powers of 2 are efficient.) Other numbers will use a more generic and slower iterative multiply subroutine.

Code: Select all

% 10
became tosumoda0, which in turn calls a generic iterative division subroutine (udiv16). This is relatively slow.

It's not at all unreasonable to do some multiplication or division in your program. Just because it's slower than addition or subtraction doesn't mean it can't be fit into a performance budget.

You can analyze the code and understand the algorithms to get a sense of their (in)efficiency, but the easiest way to figure this stuff out is just to measure the code and see how many cycles it takes. If it's too many, then look at a different approach, if not, just carry on.

However, an immediate and simple suggestion is to make game board 16 wide, because multiply and modulo by 16 is generally fast (power of 2). Even if you don't use the extra 6 bytes on the end of each row, the extra 60 wasted bytes will still get you back a lot of performance if you're going into that array a lot.
Last edited by rainwarrior on Fri May 22, 2020 2:31 pm, edited 1 time in total.

User avatar
Goose2k
Posts: 148
Joined: Wed Dec 11, 2019 9:38 pm
Contact:

Re: cc65 - Are % and * supported?

Post by Goose2k » Fri May 22, 2020 2:15 pm

Thanks for the replies. Very informative!
rainwarrior wrote:
Fri May 22, 2020 2:02 pm
However, an immediate and simple suggestion is to make game board 16 wide, because multiply and modulo by 16 is generally fast (power of 2). Even if you don't use the extra 6 bytes on the end of each row, the extra 60 wasted bytes will still get you back a lot of performance if you're going into that array a lot.
That's what I was thinking as well (and something Doug did in his tutorial for breakout), prior to trying mod/mult.

User avatar
tokumaru
Posts: 11909
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: cc65 - Are % and * supported?

Post by tokumaru » Fri May 22, 2020 2:42 pm

The fact that the 6502 doesn't do multiplication and division natively doesn't mean that those operations are impossible, it just means that they'll be slow, due to being implemented algorithmically, through combinations and repetitions of several instructions that the 6502 does have.

The same goes for manipulating 16-bit (and larger) values, which the 6502 can do just fine in spite of being an 8-bit CPU... It just has to do it one byte at a time, making it slower than CPUs that can manipulate multiple bytes at a time.

It's perfectly okay to use multiplications and divisions in NES programs sparingly, in the few cases where you can't use powers of 2. As long as you don't do dozens of these operations per frame you should be fine.

User avatar
Goose2k
Posts: 148
Joined: Wed Dec 11, 2019 9:38 pm
Contact:

Re: cc65 - Are % and * supported?

Post by Goose2k » Fri Aug 07, 2020 2:34 pm

rainwarrior wrote:
Fri May 22, 2020 2:02 pm

Code: Select all

* 10
became mulax10, which is a dedicated subroutine for multiplying by 10, and is pretty efficient.
...
However, an immediate and simple suggestion is to make game board 16 wide, because multiply and modulo by 16 is generally fast (power of 2).
I got curious about how much CPU time I would save by switching to rows of 16, vs rows of 10, so I did a quick profiling test, and it looks like:

Code: Select all

n << 4
is roughly 3x faster than

Code: Select all

n * 10
Image

Code: Select all

PROFILE_POKE(PROF_R)
	for (in_x = 0; in_x < 255; ++in_x)
	{
		in_y = in_x << 4;
	}
PROFILE_POKE(PROF_G)
	for (in_x = 0; in_x < 255; ++in_x)
	{
		in_y = in_x * 16;
	}
PROFILE_POKE(PROF_CLEAR);
Attachments
mult10perf.png
Last edited by Goose2k on Fri Aug 07, 2020 3:08 pm, edited 1 time in total.

User avatar
tokumaru
Posts: 11909
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: cc65 - Are % and * supported?

Post by tokumaru » Fri Aug 07, 2020 3:00 pm

There are other ways to avoid multiplications and divisions that don't result in structural or graphical changes in your game... If all you need is to access the rows of your playfield, you can use a lookup table of pointers to each row.

lidnariq
Posts: 9878
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: cc65 - Are % and * supported?

Post by lidnariq » Fri Aug 07, 2020 3:11 pm

Yeah, the 6502 makes space-for-time optimizations really easy, as long as the input type is an (8-bit) unsigned char.

User avatar
Goose2k
Posts: 148
Joined: Wed Dec 11, 2019 9:38 pm
Contact:

Re: cc65 - Are % and * supported?

Post by Goose2k » Fri Aug 07, 2020 3:34 pm

Interesting. So something like:

Code: Select all

unsigned char* RowLookup[] =
{
	&game_board[0],
	&game_board[10],
	&game_board[20],
	&game_board[30],
...
};

// later...

	unsigned char* RowData = RowLookup[row_num];
	unsigned char CellVal = RowData[col_num];
You would expect that to result in significant perf gains?

User avatar
tokumaru
Posts: 11909
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: cc65 - Are % and * supported?

Post by tokumaru » Fri Aug 07, 2020 3:51 pm

Goose2k wrote:
Fri Aug 07, 2020 3:34 pm
So something like:
Yeah, something like that.
You would expect that to result in significant perf gains?
I honestly don't know how cc65 handles multiplications, and I don't code for the NES in C, so I can't say for sure... but if this was assembly, replacing a multiplication by a couple of indexed loads should make a significant difference.

Are you getting dropped frames in your game or are you just doing some general optimizations?

User avatar
Goose2k
Posts: 148
Joined: Wed Dec 11, 2019 9:38 pm
Contact:

Re: cc65 - Are % and * supported?

Post by Goose2k » Fri Aug 07, 2020 4:02 pm

Thinking about it a bit more, I think just storing the index for the common offsets rather than pointers into the array would be faster (and simpler).

I am dropping frames, yah (my original post was just a general question, but I came back to it after trying to fix some sprite flicker). Only in extreme cases, but I'm hoping to make this game as polished as possible. :shock:

User avatar
rainwarrior
Posts: 7901
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: cc65 - Are % and * supported?

Post by rainwarrior » Fri Aug 07, 2020 4:22 pm

Goose2k wrote:
Fri Aug 07, 2020 2:34 pm
I got curious about how much CPU time I would save by switching to rows of 16, vs rows of 10, so I did a quick profiling test, and it looks like:

Code: Select all

n << 4
is roughly 3x faster than

Code: Select all

n * 10
Just in case it isn't known: n * 16 should generate the same code as n << 4 when optimizations are enabled.
tokumaru wrote:
Fri Aug 07, 2020 3:00 pm
There are other ways to avoid multiplications and divisions that don't result in structural or graphical changes in your game... If all you need is to access the rows of your playfield, you can use a lookup table of pointers to each row.
I think that works very effectively in assembly, but unfortunately I don't know of a way to do that effectively in CC65.

Code: Select all

char d[3*10];
const char lookup[30] = {
    0,0,0,0,0,0,0,0,0,0,
    10,10,10,10,10,10,10,10,10,10,
    20,20,20,20,20,20,20,20,20,20,
};
char lookdata(char x)
{
    return d[lookup[x]];
}
unfortunately becomes (with -O2)

Code: Select all

.proc   _lookdata: near
        jsr     pusha
        ldy     #$00
        lda     (sp),y
        tay
        lda     _lookup,y
        sta     ptr1
        lda     #$00
        clc
        adc     #>(_d)
        sta     ptr1+1
        ldy     #<(_d)
        ldx     #$00
        lda     (ptr1),y
        jmp     incsp1
When what you really want to see is more like:

Code: Select all

.proc _lookdata:
    tay
    ldx _lookup, Y
    lda _d, X
    ldx #0 ; all 8-bit results are required to return #0 in X for cc65
    rts
Array access, especially through multiple levels of indirection, as one of the notoriously poor areas for cc65's optimizer. You could write that lookup function in assembly... or you could even do some weird macro with inline assembly, but I don't think there's a natural/native C version of this that will perform well with this compiler.

User avatar
Goose2k
Posts: 148
Joined: Wed Dec 11, 2019 9:38 pm
Contact:

Re: cc65 - Are % and * supported?

Post by Goose2k » Fri Aug 07, 2020 4:31 pm

For the multiplication case, I ended up just doing this to my macro:

Code: Select all

#define TILE_TO_BOARD_INDEX(x,y) (((y) * 10) + (x))

// is now:

#define TILE_TO_BOARD_INDEX(x,y) ((board_lookup_y[(y)]) + (x))
And it appears to be the fastest out of all 3 versions (saving 13 scan lines for every 254 calls over the power of 2 change).

Hopefully I can do similar for my divide/mod cases.
Attachments
perf.png

lidnariq
Posts: 9878
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: cc65 - Are % and * supported?

Post by lidnariq » Fri Aug 07, 2020 4:56 pm

rainwarrior wrote:
Fri Aug 07, 2020 4:22 pm
unfortunately becomes
I was thinking one might be able to coerce it into being something less stupid with an explicit temporary, and it's ... ok ...

Code: Select all

	static unsigned char t;
	t = lookup[x];
	return d[t];
becomes

Code: Select all

	jsr     pusha
	ldy     #$00
	lda     (sp),y
	tay
	lda     _lookup,y
	sta     L0022
	ldy     L0022
	ldx     #$00
	lda     _d,y
	jmp     incsp1

drludos
Posts: 56
Joined: Mon Dec 11, 2017 4:01 pm

Re: cc65 - Are % and * supported?

Post by drludos » Mon Aug 17, 2020 2:18 pm

Goose2k wrote:
Fri Aug 07, 2020 4:02 pm
I am dropping frames, yah (my original post was just a general question, but I came back to it after trying to fix some sprite flicker). Only in extreme cases, but I'm hoping to make this game as polished as possible. :shock:
I'm just a CC65 newbie, but if you are looking for useful tips / best practice to make optimizations specifically tailored for the CC65, I highly recommend the following article (it's written by an Atari 5200 dev, but all the code optimizations still applies for the NES):
https://github.com/ilmenit/CC65-Advanced-Optimizations

Your case is mentioned in step 9, where the article advise to use lookup table whenever possible (but you already had this info ;)). I don't know if it's applicable to your game, but I personnaly got a huge speed boost in my latest project with the "struct of arrays vs array of structs" advice.
Download ROMs of my games: https://drludos.itch.io/
Support my work and get access to beta and prototypes: https://www.patreon.com/drludos

Post Reply