Assembly optimization

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
User avatar
Kitty_Space_Program
Posts: 27
Joined: Mon Sep 21, 2020 7:42 am

Assembly optimization

Post by Kitty_Space_Program » Mon Sep 28, 2020 8:16 pm

Any tips, tricks, or articles/videos on how to optimize assembly code?

User avatar
tokumaru
Posts: 11891
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Assembly optimization

Post by tokumaru » Mon Sep 28, 2020 9:39 pm

First you have to decide whether you're optimizing for space or speed. Most people are interested in optimizing for speed, in which case the following tips apply:

- Count down instead of up: if possible, have your loop counters start from the highest value and decrement them down to 0 or -1, so you can use the CPU status flags Z or N, respectively, to detect the end condition, without the need for a CMP/CPX/CPY instruction per iteration.

- Unroll your loops: checking for the end condition of a loop and branching back to repeat it are steps that can really slow down a loop, specially if each iteration consists of a quick operation, like copying a byte. If possible, reduce the number of iterations by doing more work in each iteration (e.g. copy 4 bytes instead of 1). This is key for increasing the amount of data you can write to VRAM during vblank.

- Avoid setting flags when you know their state: this is a bit controversial, because when not well documented it can lead to bugs that are hard to catch... but here's an example: if you know that the carry flag is set (e.g. at the target of a BCS instruction) before a subtraction, don't waste time with a SEC instruction, just do the subtraction. Even if the known state of the flag is the opposite of what you need (e.g. you need to do a subtraction but the carry is clear) you can sometimes adjust the operation to compensate for the wrong value (a subtraction with the carry clear will subtract one extra unit, so if you have control over the value being subtracted you can make it so it's one less than the value you actually have to subtract, to compensate). If you do this, I suggest that you still write the instruction that puts the flag in the state you need where it would be, but comment that line so it's not assembled. That way you can easily tell what state the flag is supposed to have at that point, so it's easy to tell when you make a change that breaks that assumption.

- Use structures of arrays instead of array of structures: when you need to represent multiple instances of a specific kind of entity, it's better to group the data by field rather than by instance. For example, if you had an array of enemies, each with their data fields holding different values, in order to access this data you'd have to create a pointer to the instance you wanted to access, and then load Y with the index of each field you wanted to access. The faster alternative is to have a separate array for each field, holding the values for all instances. If your engine supports 12 active enemies, have an array 12 bytes long to hold all the health values, another 12-byte long array for all their X coordinates, and so on. This way, all you need to do in order to access a specific enemy is load its index (0 to 11) into one of the index registers (X or Y) and access the separate arrays according to the fields you need (e.g. lda EnemyHealth, x while X is 4 will load the health of the 5th active enemy).

Garth
Posts: 194
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Assembly optimization

Post by Garth » Mon Sep 28, 2020 11:04 pm

Good points so far. I have a lot at http://wilsonminesco.com/6502primer/PgmTips.html .
http://WilsonMinesCo.com/ lots of 6502 resources

User avatar
Bregalad
Posts: 7963
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: Assembly optimization

Post by Bregalad » Tue Sep 29, 2020 8:47 am

First you have to decide whether you're optimizing for space or speed.
Many optimisations does both at the same time ; some improve space or speed without hurting the other.
Avoid setting flags when you know their state: this is a bit controversial, because when not well documented it can lead to bugs that are hard to catch... but here's an example: if you know that the carry flag is set (e.g. at the target of a BCS instruction) before a subtraction, don't waste time with a SEC instruction, just do the subtraction.
What I do and recommand everyone to do is add a commented SEC instruction, so that it is clear it is supposed to be there and it's removal is an optimisation.

Besides, there's already a whole wiki page dedicated to the subject.
Useless, lumbering half-wits don't scare us.

Oziphantom
Posts: 928
Joined: Tue Feb 07, 2017 2:03 am

Re: Assembly optimization

Post by Oziphantom » Sun Oct 04, 2020 3:46 am

use 64tass and lets it optimiser show of and show you simple things you could improve.

turboxray
Posts: 115
Joined: Thu Oct 31, 2019 12:56 am

Re: Assembly optimization

Post by turboxray » Tue Oct 06, 2020 12:54 pm

Bregalad wrote:
Tue Sep 29, 2020 8:47 am
First you have to decide whether you're optimizing for space or speed.
Many optimisations does both at the same time ; some improve space or speed without hurting the other.
That's really only true for small/simple opcode optimization. More advance optimizations for 65x definitely bloat memory foot print.

User avatar
Controllerhead
Posts: 165
Joined: Tue Nov 13, 2018 4:58 am
Location: $4016
Contact:

Re: Assembly optimization

Post by Controllerhead » Tue Oct 06, 2020 1:06 pm

tokumaru wrote:
Mon Sep 28, 2020 9:39 pm
- Count down instead of up: if possible, have your loop counters start from the highest value and decrement them down to 0 or -1, so you can use the CPU status flags Z or N, respectively, to detect the end condition, without the need for a CMP/CPX/CPY instruction per iteration.
You can also count ascending to $80 with BPL and descending to $7F with BMI. Similar logic applies. Make sure your DE(C/X/Y) or IN(C/X/Y) is the last instruction you do before branch testing, and the flag will set.

When descending, use BPL if you want to process the loop on 0, use BNE if you do not.
tokumaru wrote:
Mon Sep 28, 2020 9:39 pm
Unroll your loops
I find "semi-unrolled" loops to be a nice balance. If you do 2 calculations/actions per loop instead of 1, you cut your end-of-loop detections in half.

For example, when updating an entire attribute table, i do 8 updates per loop and loop 8 times. This gives a great speed boost without taking up an exorbitant amount of ROM.
Image

User avatar
Kitty_Space_Program
Posts: 27
Joined: Mon Sep 21, 2020 7:42 am

Re: Assembly optimization

Post by Kitty_Space_Program » Tue Oct 06, 2020 7:08 pm

turboxray wrote:
Tue Oct 06, 2020 12:54 pm
Bregalad wrote:
Tue Sep 29, 2020 8:47 am
First you have to decide whether you're optimizing for space or speed.
Many optimisations does both at the same time ; some improve space or speed without hurting the other.
That's really only true for small/simple opcode optimization. More advance optimizations for 65x definitely bloat memory foot print.
Yeah I’ve started to notice that in my programming recently. Though I at least try and do as pretty a code as possible a avoids long algorithms if I can.

User avatar
Kitty_Space_Program
Posts: 27
Joined: Mon Sep 21, 2020 7:42 am

Re: Assembly optimization

Post by Kitty_Space_Program » Tue Oct 06, 2020 7:12 pm

If both are an option, is it faster and or more space saving to do this:


bne thing
Jmp other thing

Thing:
Do stuff

Or this:

Beq other thing
Thing:
Do stuff

User avatar
Controllerhead
Posts: 165
Joined: Tue Nov 13, 2018 4:58 am
Location: $4016
Contact:

Re: Assembly optimization

Post by Controllerhead » Tue Oct 06, 2020 7:42 pm

Kitty_Space_Program wrote:
Tue Oct 06, 2020 7:12 pm
is it faster and or more space saving to do X or Y
Always situational. I use this instruction reference with byte and cycle counts:
https://www.masswerk.at/6502/6502_instruction_set.html

Branch instructions are faster and smaller than a 16-bit JMP or JSR, but, they only have a range of -128 to 127 bytes from where they are. They use one byte to point to a relative location from where the program counter is as opposed to a full 16-bit address.
Image

Oziphantom
Posts: 928
Joined: Tue Feb 07, 2017 2:03 am

Re: Assembly optimization

Post by Oziphantom » Tue Oct 06, 2020 11:41 pm

Kitty_Space_Program wrote:
Tue Oct 06, 2020 7:12 pm
If both are an option, is it faster and or more space saving to do this:


bne thing
Jmp other thing

Thing:
Do stuff

Or this:

Beq other thing
Thing:
Do stuff
Beq other thing
Thing:
Do stuff

Saves a jump, faster and smaller.

unregistered
Posts: 1098
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: Assembly optimization

Post by unregistered » Tue Oct 13, 2020 4:05 pm

Hi Kitty_Space_Program,

The best, most helpful, read, for me, has been “6502 Hacks” by Mark S. Ackerman. That read can be found in Dr Dobb’s Journal vol12 on pages 97 through 103.

The pdf is massive though... I ended up making a much smaller pdf with just those pages, but am under the impression that it would be unwise to share my edit of their publication.

There are filler pages before the numbering starts... I believe that it, 6502 Hacks, starts around pdf-page 111. That should give you a good starting place for your search. And, if that is not helpful, try looking at its table of contents for “6502 Hacks”. :)


edit: Though, I’m unsure if that pdf provides clickable links in its table of contents. If it doesn’t, using Adobe Acrobat Reader DC can make searching through a massive pdf quite easy. :)

Post Reply