Suppose you have exactly one screen (256x256 for ease, rather than 256x240) of data that wraps. 16x16 tiles. Edit: I guess we're also assuming the data is stored such that byte 0 is top left, byte 1 is the tile below that rather than to the right of that as well, since that's the code I wrote.
)
Which byte is the tile at the top of the column data for any given position?
Code:
lda posx
and #%11110000
tay
lda tileram,y
The and is effectively a modulus, divide, and subtract of the scroll position all in one.
Suppose you have exactly two screens of data that wrap. Which byte is the tile at the top of the column data for any given position?
Code:
lda posxhigh
ror a;Which screen in RAM is now in the carry
lda posxlow
and #%11110000
tay
bcs screen2
lda tileram1,y
bcc byteloaded;This could be a jmp
screen2
lda tileram2,y
byteloaded:
If you just add one offscreen column in each direction, you can no longer take those shortcuts because there are 18 columns instead of 16 or 32. And obviously with 32x32 metatiles (or some other power of two thing) the math changes slightly, but it'll still remain bitwise in a way that you can't with 18 columns.
This is why dougeff and I don't need to shift data once it's in the array. You should absolutely find a way to not have do that. A lookup table might get you results slightly similar to the above if I'm understand correctly and the plan is a sliding window of 18 columns.
As far as whether to split the update across frames, that's up to you. If you don't have to move the array, you'll only ever need to write 16 bytes every frame, and that's assuming you scroll 16 pixels every frame. (I guess really 15 bytes assuming columns are 240 as is traditional.)
Edit2: I guess the thought to take with you if nothing else is that power of two data has a way of unlocking random access, because of how simple the math is. If your data is screens, you can still only load the next column. It's easy to find which screen (horizontally) because that's a power of two, and then it's easy to find a column within that screen. So you never need 16 reads and 16 writes for 16 columns. Just 15 reads and 15 writes for one column. (Assuming... you only scroll in one direction. Also assuming no compression beyond screens/metatiles.)