I made a 65816 routine (not 100% optimized yet though) for shell sort. I got the gap sequence (1, 4, 10, 23, 57) from several places that say it's the fastest gap sequence on average, but I'm curious about it's worst case scenario because 20 is divisible by both 4 and 10, so some sequences of numbers such as (0,x,x,x,1,x,x,x,2,x,3,x,4,x,x,x,5,x,x,x,6) would slow it down quite a bit, but that probably rarely happen.

**Code:**

shell_sort:

sep #$30

ldx.b #57

jsr +

ldx.b #23

jsr +

ldx.b #10

jsr +

ldx.b #4

jsr +

ldx.b #1

+;

stx {temp}

ldy #$00

-;

lda {z},x

cmp {z},y

bcs ++

phy

phx

sta {temp2}

lda {entry},x

pha

-;

lda {entry},y

sta {entry},x

lda {z},y

sta {z},x

tyx

tya

cmp {temp}

bcc +

sbc {temp}

tay

lda {temp2}

cmp {z},y

bcc -

+;

lda {temp2}

sta {z},x

pla

sta {entry},x

plx

ply

+;

iny

inx

cpx #$80

bne --

rts

The above code looks at 8-bit numbers. If 8-bit is only 256 values, and my own homebrew game already uses 8 levels of sprite priorities using a linked list system, I can probably just expand it to 256 levels of priorities. Do you think 8-bit is enough precision for Z layering? Because if it is, I don't think I need any sorting algorithm.