Bregalad wrote:

EDIT: OK I now read the code. I think if you have 4 or less sprites sequences, (i.e. the total length is 256 or less bytes), you should not use indirect adressing to access the sequences, as this is a pure waste of time. This by itself will save more cycles than unrolling of any amount.

I'm definitely not good programmer on 6502 yet. I'm still learning and will learn forever. What would be the alternative?

This is what I have now

**Code:**

; header. load pointer once

ldx spriteSecuenceIndex ; ZP 3

lda spriteSequencesLo,x ; Abs 4+

sta spriteSequence+0 ; ZP 3

lda spriteSequencesHi,x ; Abs 4+

sta spriteSequence+1 ; ZP 3

; total 17_

...

; load value from sequence every sprite

lda (spriteSequence),y ; Ind 5+

; 5 * 64 sprites = 325+

Total 325 + 17 = 342 cycles - regadless of sequence #.

I summed minimum possible cycles.

When I look at it, one alternative is branching at every iteration:

**Code:**

ldx spriteSecuenceIndex ; ZP 3

cmp #0 ; Imm 2

bne @no0 ; branch taken 3+

lda spriteSequence0,x

jmp @end

@no0:

cmp #1 ; Imm 2

bne @no1 ; branch taken 3+

lda spriteSequence1,x

jmp @end

@no1:

cmp #2 ; Imm 2

bne @no2 ; branch taken 3+

lda spriteSequence2,x

jmp @end

@no2:

cmp #3 ; Imm 2

bne @no3 ; branch taken 3+

lda spriteSequence3,x ; Abs 4+

@no3:

@end:

; Total 27

So when

`spriteSecuenceIndex` is 3, total cycles would be 27 * 64 sprites = 1728. Looks like indirect is win

Or to use RTS trick and run 4 different implementations.

**Code:**

; jump once at the beginning

ldx spriteSecuenceIndex ; ZP 3

jsr JumpToSequenceImplementation ; 6

JumpToSequenceImplementation:

lda SequenceImplementationHi,x ; Abs 4+

pha ; 3

lda SequenceImplementationLo,x ; Abs 4+

pha ; 3

rts ; 6

; Total: 29

; Every iteration

ldx spriteSecuenceIndex ; ZP 3

lda spriteSequence3,x ; Abs 4+

; Total: 7 * 64 sprites = 448

; Grand total = 448 + 29 = 477

; or if we can preserver X

lda spriteSequence3,x ; Abs 4+

; Total: 4 * 64 sprites = 256

; Grand total = 256 + 29 = 285

So we can save 325 - 285 = 40 cycles. But we waste more ROM.

Is there more optimized solution for that?