cc65 Code Generation Tips

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
Posts: 84
Joined: Mon Sep 19, 2005 11:51 am

cc65 Code Generation Tips

Post by rox_midge » Sat May 30, 2020 5:45 pm

After about two weeks of hacking in my free time, I managed to get horizontal and vertical scrolling working using pure assembly, which is great. But updating the attributes was turning out to be far more of a pain in the neck - due to the way I'm partitioning the screen, I have to sometimes swap the upper and lower nybbles of adjacent attribute bytes.

This turned out to be too much to keep track of, so I decided I'd rewrite at least the scrolling routines using C. This turned out to be relatively straightforward (or, at least, it has been so far), and the C code is far easier to understand than the assembly.

Part of my assembly code uses a table of addresses, like this:

Code: Select all

    .word WramChunkBase + 0
    .word WramChunkBase + 1
    .word WramChunkBase + 2
    .word WramChunkBase + 3
And I load this into a pointer like so:

Code: Select all

    lda _wramX
    lda WramColumnAddresses, y
    sta _tmpAddr
    lda WramColumnAddresses+1, y
    sta _tmpAddr+1
Later on, I use this to retrieve the bytes that will be written to the nametable. This has a straightforward translation to C:

Code: Select all

static unsigned char* const WramColumnAddresses[] =
        (unsigned char*)WramChunkBase + 0,
        (unsigned char*)WramChunkBase + 1,
        (unsigned char*)WramChunkBase + 2,
        (unsigned char*)WramChunkBase + 3,
    void DoSomething(void)
        tmpAddr = WramColumnAddresses[wramX];
But cc65 doesn't do the obvious thing, instead insisting on inserting a temporary pointer that it uses instead:

Code: Select all

	ldx     #$00
	lda     _wramX
	asl     a
	bcc     L004C
	adc     #<(_WramColumnAddresses)
	sta     ptr1
	adc     #>(_WramColumnAddresses)
	sta     ptr1+1
	ldy     #$01
	lda     (ptr1),y
	sta     _pTmpAddr+1
	lda     (ptr1),y
	sta     _pTmpAddr
I mean... this works, but it's way more cycles than the hand-written way.

Is there another way I can write the equivalent in C to produce a better assembled version?

Posts: 9491
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: cc65 Code Generation Tips

Post by lidnariq » Sat May 30, 2020 6:04 pm

Your code relies on the assumption that _wramX is an unsigned 7-bit type, which means you can safely assume that 2*_wramX fits in an 8-bit register; the extra work that CC65 is doing is to account for this difference.

I think I remember someone else (possibly DRW?) asking sometime in the past year or two about this same thing. I don't think we had a good option at the time, other than maybe explicitly pre-striping your arrays so there's no type promotion.

Looking through cc65's source, I don't see many ways for it to emit LDA abs,Y at all, nor many ways to get its optimizer to emit that opcode either. Some of its built-ins (memcmp, memcpy) do.

Probably the best thing to do right now is put a note to your future self, remarking that here is some low-hanging fruit for optimization, should you need better performance in the future.

User avatar
Posts: 7822
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada

Re: cc65 Code Generation Tips

Post by rainwarrior » Sat May 30, 2020 7:15 pm

You really can't get cc65 to read from an array of 16-bit (or larger) values using absolute indexed addressing.

You can get it to read from an array of 8-bit values using an index that is also 8-bit. Sometimes promotion rules can be tricky. Like if the index is i+1, even if i is a char, it still has to promote it to 16-bits so that i=255 is still valid. Explicitly casting the index expression to char can help sometimes.

Though as an easy first stage you can replace the array lookup with an assembly function:

Code: Select all

extern unsigned char * lookup( unsigned char index ); // c prototype

; in assembly
.import WramColumnAddresses_;
.proc lookup_:
	lda WramColumnAddresses_+0, Y
	ldx WramColumnAddresses_+1, Y
That'll do a bit better, though if your problem is that you're looking up this array a lot, ultimately the solution is going to be rewriting the function that uses the array (or part of it) in assembly directly, rather than targeting just the lookup.

Post Reply