It is currently Mon May 20, 2019 7:56 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Thu May 09, 2019 10:16 am 
Offline

Joined: Tue Aug 28, 2018 8:54 am
Posts: 143
Location: Edmonton, Canada
I don't think you can unroll the clear/finish loop if you implement blinking using shuffle arrays. Your sprite data is going to be all over the place. But you can still only clear the rest of the sprites you did not use.

Code:
; Warning. Code is written in place, not tested. But I hope it explains what I intended
MAX_SEQUENCE = 2
spriteSequence1:
.byte 0, 3, 7, 11, ..., 252
spriteSequence1:
.byte 0, 11, 3, 7, ..., 252
spriteSequencesLo:
.byte #<spriteSequence1, #<spriteSequence2
spriteSequencesHi:
.byte #>spriteSequence1, #>spriteSequence2

BeginSprites:
lda #0
sta currentSpriteId
inc spriteSecuenceIndex
ldx spriteSecuenceIndex
cmp #MAX_SEQUENCE
beq @noSequenceReset
    ldx #0
    stx spriteSequenceIndex
@noSequenceReset:
ldx spriteSecuenceIndex
lda spriteSequencesLo,x
sta spriteSequence+0
lda spriteSequencesHi,x
sta spriteSequence+1
rts

AddSprite:
ldy currentSpriteId
; assuming spriteSequence is a pointer to one of the shuffle arrays
; and array contains the offsets in oam, not IDs (eg 0, 3, 7)
lda (spriteSequence),y
tax
lda spriteY
sta OAM,x
; ...
; Fill the rest
inc currentSpriteId
rts

FinishSprites:
ldy currentSpriteId
lda (spriteSequence),y
tax
lda #$FF
sta OAM,x
inc currentSpriteId
bne FinishSprites
rts


Edit: fixed incorrect spriteSequence#


Last edited by yaros on Thu May 09, 2019 10:40 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu May 09, 2019 10:38 am 
Offline

Joined: Tue Aug 28, 2018 8:54 am
Posts: 143
Location: Edmonton, Canada
Now thinking about it, you have ROM to spare, you can unroll clear loop completely and only clear remaining sprites even with cycling. Something like that.

Code:
FinishSprites:
   ; load the last available sprite
   ldy currentSpriteId
   ; jump to the appropriate label
   jsr FinishSpritesJump
   ; just to avoid copy-paste
   .repeat 64, I
      .ident(.concat("sprite", .string(I), "clear")):
      lda (spriteSequence),y
      tax
      lda #$FF
      sta OAM,x
      iny
   .endrep
   rts

FinishSpritesJump:
   lda FinishSpritesJumpHi,y
   pha
   lda FinishSpritesJumpLo,y
   pha
   rts

FinishSpritesJumpLo:
   .repeat 64, I
      .byte .lobyte(.ident(.concat("sprite", .string(I), "clear")))
   .endrep
FinishSpritesJumpHi:
   .repeat 64, I
      .byte .hibyte(.ident(.concat("sprite", .string(I), "clear")))
   .endrep


Top
 Profile  
 
PostPosted: Fri May 10, 2019 12:18 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7711
Location: Chexbres, VD, Switzerland
Quote:
Now thinking about it, you have ROM to spare, you can unroll clear loop completely and only clear remaining sprites even with cycling.

I didn't read the code or anything, but generally speaking, unrolling completely loops tends to be a temendous waste of ROM for only a very marginal speed gain. As mentionned before in this thread, something arround 2 or 4 iterations per loop is enough to speed up significantly, and any further unrolling won't do significant difference.

EDIT: OK I now read the code. I think if you have 4 or less sprites sequences, (i.e. the total length is 256 or less bytes), you should not use indirect adressing to access the sequences, as this is a pure waste of time. This by itself will save more cycles than unrolling of any amount.


Top
 Profile  
 
PostPosted: Fri May 10, 2019 8:09 am 
Offline

Joined: Tue Aug 28, 2018 8:54 am
Posts: 143
Location: Edmonton, Canada
Bregalad wrote:
EDIT: OK I now read the code. I think if you have 4 or less sprites sequences, (i.e. the total length is 256 or less bytes), you should not use indirect adressing to access the sequences, as this is a pure waste of time. This by itself will save more cycles than unrolling of any amount.


I'm definitely not good programmer on 6502 yet. I'm still learning and will learn forever. What would be the alternative?

This is what I have now

Code:
; header. load pointer once
ldx spriteSecuenceIndex ; ZP  3
lda spriteSequencesLo,x ; Abs 4+
sta spriteSequence+0    ; ZP  3
lda spriteSequencesHi,x ; Abs 4+
sta spriteSequence+1    ; ZP  3
; total 17_
...
; load value from sequence every sprite
lda (spriteSequence),y  ; Ind 5+
; 5 * 64 sprites = 325+

Total 325 + 17 = 342 cycles - regadless of sequence #.
I summed minimum possible cycles.

When I look at it, one alternative is branching at every iteration:

Code:
ldx spriteSecuenceIndex   ; ZP  3
cmp #0                    ; Imm 2
bne @no0                  ; branch taken 3+
   lda spriteSequence0,x
   jmp @end
@no0:
cmp #1                    ; Imm 2
bne @no1                  ; branch taken 3+
   lda spriteSequence1,x
   jmp @end
@no1:
cmp #2                    ; Imm 2
bne @no2                  ; branch taken 3+
   lda spriteSequence2,x
   jmp @end
@no2:
cmp #3                    ; Imm 2
bne @no3                  ; branch taken 3+
   lda spriteSequence3,x ; Abs 4+
@no3:
@end:
; Total 27

So when spriteSecuenceIndex is 3, total cycles would be 27 * 64 sprites = 1728. Looks like indirect is win

Or to use RTS trick and run 4 different implementations.

Code:
; jump once at the beginning
ldx spriteSecuenceIndex   ; ZP  3
jsr JumpToSequenceImplementation ; 6

JumpToSequenceImplementation:
lda SequenceImplementationHi,x ; Abs 4+
pha                            ; 3
lda SequenceImplementationLo,x ; Abs 4+
pha                            ; 3
rts                            ; 6
; Total: 29

; Every iteration
ldx spriteSecuenceIndex   ; ZP  3
lda spriteSequence3,x ; Abs 4+
; Total: 7 * 64 sprites = 448
; Grand total = 448 + 29 = 477

; or if we can preserver X
lda spriteSequence3,x ; Abs 4+
; Total: 4 * 64 sprites = 256
; Grand total = 256 + 29 = 285


So we can save 325 - 285 = 40 cycles. But we waste more ROM.

Is there more optimized solution for that?


Top
 Profile  
 
PostPosted: Fri May 10, 2019 10:36 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7711
Location: Chexbres, VD, Switzerland
The idea of the "RTS trick" (also known as jump table) in this context makes perfect sense, you have 4 highly optimized sprite mazing routines for all 4 orders. However that's not what I had in mind. The idea was just to use normal indexed mode. The LDX $xxxx,Y or LDY $xxxx,Y instructions would be more useful here than a LDA $($xx),Y.
Just pasting your code modified to use what I had in mind.

Code:
; Warning. Code is written in place, not tested. But I hope it explains what I intended
MAX_SEQUENCE = 2
spriteSequence1:
.byte 0, 3, 7, 11, ..., 252
spriteSequence1:
.byte 0, 11, 3, 7, ..., 252
spriteSequencesLo:
.byte #<spriteSequence1, #<spriteSequence2
spriteSequencesHi:
.byte #>spriteSequence1, #>spriteSequence2

BeginSprites:
lda #0
sta currentSpriteId
lda spriteSecuenceIndex
clc
adc #64                 ; Size of a sequence
cmp #MAX_SEQUENCE*64    ;MAX_SEQUENCE can only be 2 or 3 for this to work, if it's 4 the CMP instruction should be removed, if it's more than 4 it won't work at all
beq @noSequenceReset
    lda #0
@noSequenceReset:
sta spriteSequenceIndex
tax
rts

AddSprite:
lda currentSpriteId
clc
adc SpriteSequenceIndex
tay
; assuming spriteSequence is a pointer to one of the shuffle arrays
; and array contains the offsets in oam, not IDs (eg 0, 3, 7)
ldx spriteSequence,y
lda spriteY
sta OAM,x
; ...
; Fill the rest
inc currentSpriteId
rts

FinishSprites:
lda currentSpriteId
clc
adc SpriteSequenceIndex
FinishSprites_loop:
ldx spriteSequence,y
lda #$FF
sta OAM,x
iny
inc currentSpriteId
bne FinishSprites   ; ??? not sure this will work
rts


Now there is no right or wrong, it just depends on what you want to do. A thing I like to do when mazing sprites is to constantly keep the index in either X and Y, and only use the remaining 2 registers for all others operation. Your idea to store and load the "curentspriteId" in a zero-page variable instead of keeping in a register is a source of inneficiency.

Also I like to generate what you call the "sequences" on the fly, that is the sequence is not stored anywhere in ROM, it's just the result of a computation. This has its limits, though.


Top
 Profile  
 
PostPosted: Fri May 10, 2019 11:41 am 
Offline

Joined: Tue Aug 28, 2018 8:54 am
Posts: 143
Location: Edmonton, Canada
Bregalad wrote:
The idea of the "RTS trick" (also known as jump table) in this context makes perfect sense, you have 4 highly optimized sprite mazing routines for all 4 orders. However that's not what I had in mind. The idea was just to use normal indexed mode. The LDX $xxxx,Y or LDY $xxxx,Y instructions would be more useful here than a LDA $($xx),Y.
Just pasting your code modified to use what I had in mind.

Code:
lda spriteSecuenceIndex
clc
adc #64                 ; Size of a sequence



Thank you for the explanation. Makes sense to me. Except this part, I'm assuming you wanted to multiply spriteSecuenceIndex * 64 to get an offset, instead of adding [0, 1, 2, 3] + 64. Is that correct?


Top
 Profile  
 
PostPosted: Fri May 10, 2019 12:03 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7711
Location: Chexbres, VD, Switzerland
Basically the initial offset will cycle through 0 and 64 in the case you have 2 sequences, 0, 64 and 128 in the case you have 3 sequences, or 0, 64, 128 and 192 when you have 4 sequences. How this is acheived is not of much interest.

Another way to have this is to interleave the sequences, you add 2, 3 or 4 to the index within the sequences, and have the sequences themselves starting one byte away of another. This could be more efficient or not - depending on how it is coded.


Top
 Profile  
 
PostPosted: Fri May 10, 2019 5:00 pm 
Offline
User avatar

Joined: Mon Mar 13, 2017 5:21 pm
Posts: 58
Quote:
Really? You are able to claim OAM slots for actual sprites, obviously... how is that any different from claiming them for "dummy" sprites which actually just put the Y coordinate off-screen? Whatever method you're using for selecting OAM slots for use, do the exact same thing for putting the remaining Y coordinates off-screen, until all 64 slots have been claimed.
First, let me say how much I appreciate everybody's feedback. When I first read this comment, it kind of ticked me off. I mean, I had worked really hard on the code that I had already written, my carefully crafted engine, and didn't like being told that there was still a lot of room for improvement. Then I slept on it and mulled it over a little, and went back and reworked all of the code that needed to be reworked so that this type of approach would work. So ultimately I gained a lot from it. Yes, my original code was doing a lot of redundant work in several areas that I was able to trim out. So I'm grateful, Tokumaru.


Top
 Profile  
 
PostPosted: Fri May 10, 2019 7:40 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11345
Location: Rio de Janeiro - Brazil
Cool, glad I could help! :D


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2

All times are UTC - 7 hours


Who is online

Users browsing this forum: HardWareMan, tepples and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group