A simple optimization

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
Bregalad
Posts: 8008
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: A simple optimization

Post by Bregalad » Fri May 10, 2019 12:18 am

Now thinking about it, you have ROM to spare, you can unroll clear loop completely and only clear remaining sprites even with cycling.
I didn't read the code or anything, but generally speaking, unrolling completely loops tends to be a temendous waste of ROM for only a very marginal speed gain. As mentionned before in this thread, something arround 2 or 4 iterations per loop is enough to speed up significantly, and any further unrolling won't do significant difference.

EDIT: OK I now read the code. I think if you have 4 or less sprites sequences, (i.e. the total length is 256 or less bytes), you should not use indirect adressing to access the sequences, as this is a pure waste of time. This by itself will save more cycles than unrolling of any amount.

User avatar
Bregalad
Posts: 8008
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: A simple optimization

Post by Bregalad » Fri May 10, 2019 10:36 am

The idea of the "RTS trick" (also known as jump table) in this context makes perfect sense, you have 4 highly optimized sprite mazing routines for all 4 orders. However that's not what I had in mind. The idea was just to use normal indexed mode. The LDX $xxxx,Y or LDY $xxxx,Y instructions would be more useful here than a LDA $($xx),Y.
Just pasting your code modified to use what I had in mind.

Code: Select all

; Warning. Code is written in place, not tested. But I hope it explains what I intended
MAX_SEQUENCE = 2
spriteSequence1:
.byte 0, 3, 7, 11, ..., 252
spriteSequence1:
.byte 0, 11, 3, 7, ..., 252
spriteSequencesLo:
.byte #<spriteSequence1, #<spriteSequence2
spriteSequencesHi:
.byte #>spriteSequence1, #>spriteSequence2

BeginSprites:
lda #0
sta currentSpriteId
lda spriteSecuenceIndex
clc
adc #64                 ; Size of a sequence
cmp #MAX_SEQUENCE*64    ;MAX_SEQUENCE can only be 2 or 3 for this to work, if it's 4 the CMP instruction should be removed, if it's more than 4 it won't work at all
beq @noSequenceReset
    lda #0
@noSequenceReset:
sta spriteSequenceIndex
tax
rts

AddSprite:
lda currentSpriteId
clc
adc SpriteSequenceIndex
tay
; assuming spriteSequence is a pointer to one of the shuffle arrays
; and array contains the offsets in oam, not IDs (eg 0, 3, 7)
ldx spriteSequence,y
lda spriteY
sta OAM,x
; ...
; Fill the rest
inc currentSpriteId
rts

FinishSprites:
lda currentSpriteId
clc
adc SpriteSequenceIndex
FinishSprites_loop:
ldx spriteSequence,y
lda #$FF
sta OAM,x
iny
inc currentSpriteId
bne FinishSprites   ; ??? not sure this will work
rts
Now there is no right or wrong, it just depends on what you want to do. A thing I like to do when mazing sprites is to constantly keep the index in either X and Y, and only use the remaining 2 registers for all others operation. Your idea to store and load the "curentspriteId" in a zero-page variable instead of keeping in a register is a source of inneficiency.

Also I like to generate what you call the "sequences" on the fly, that is the sequence is not stored anywhere in ROM, it's just the result of a computation. This has its limits, though.

User avatar
Bregalad
Posts: 8008
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: A simple optimization

Post by Bregalad » Fri May 10, 2019 12:03 pm

Basically the initial offset will cycle through 0 and 64 in the case you have 2 sequences, 0, 64 and 128 in the case you have 3 sequences, or 0, 64, 128 and 192 when you have 4 sequences. How this is acheived is not of much interest.

Another way to have this is to interleave the sequences, you add 2, 3 or 4 to the index within the sequences, and have the sequences themselves starting one byte away of another. This could be more efficient or not - depending on how it is coded.

User avatar
gravelstudios
Posts: 89
Joined: Mon Mar 13, 2017 5:21 pm
Contact:

Re: A simple optimization

Post by gravelstudios » Fri May 10, 2019 5:00 pm

Really? You are able to claim OAM slots for actual sprites, obviously... how is that any different from claiming them for "dummy" sprites which actually just put the Y coordinate off-screen? Whatever method you're using for selecting OAM slots for use, do the exact same thing for putting the remaining Y coordinates off-screen, until all 64 slots have been claimed.
First, let me say how much I appreciate everybody's feedback. When I first read this comment, it kind of ticked me off. I mean, I had worked really hard on the code that I had already written, my carefully crafted engine, and didn't like being told that there was still a lot of room for improvement. Then I slept on it and mulled it over a little, and went back and reworked all of the code that needed to be reworked so that this type of approach would work. So ultimately I gained a lot from it. Yes, my original code was doing a lot of redundant work in several areas that I was able to trim out. So I'm grateful, Tokumaru.

User avatar
tokumaru
Posts: 11996
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: A simple optimization

Post by tokumaru » Fri May 10, 2019 7:40 pm

Cool, glad I could help! :D

Post Reply