Fixed point math for scrolling games
Moderator: Moderators
Fixed point math for scrolling games
I have some experience writing physics engines (Chipmunk2D), and want to try and see how far I can go with that on the NES. So far I've gotten circle-circle and circle-line collisions working with reasonable performance, and am working on reducing it further with some small lookup tables. My end goal is a game with a two wheeled vehicle. So far I've been using 8.8 fixed point because it's easy to prototype with from C, but I'm hoping to get at least 2 screen scrolling. So I was thinking about my options for using fixed point.
If I wanted to stick with 16 bit types I could do something like 9.7 or 12.4 depending on how much accuracy feels right in the fractional bits for acceleration and such. All the extra shifting gets messy real fast in C, and I can only imagine moreso in assembly. Having a lot of multi-byte shifts seems like it might really bloat and slow the code too.
16.8 is sort of appealing for the final assembly version. If I wanted to access the values from C I'd need to split the integer and fractional parts into a struct. I've already kind of ruled out padding it out to 32 bits. Even with a lot of casting and type annotations for 16/8 bit ops, I wasn't able to get cc65 to generate reasonable code using 32 bit ints.
What have other people had good success with? Is 16.8 the way to go, or is something like 9.7 less troublesome than I think it is?
If I wanted to stick with 16 bit types I could do something like 9.7 or 12.4 depending on how much accuracy feels right in the fractional bits for acceleration and such. All the extra shifting gets messy real fast in C, and I can only imagine moreso in assembly. Having a lot of multi-byte shifts seems like it might really bloat and slow the code too.
16.8 is sort of appealing for the final assembly version. If I wanted to access the values from C I'd need to split the integer and fractional parts into a struct. I've already kind of ruled out padding it out to 32 bits. Even with a lot of casting and type annotations for 16/8 bit ops, I wasn't able to get cc65 to generate reasonable code using 32 bit ints.
What have other people had good success with? Is 16.8 the way to go, or is something like 9.7 less troublesome than I think it is?
Re: Fixed point math for scrolling games
I think this is a good approach.16.8 is sort of appealing for the final assembly version. If I wanted to access the values from C I'd need to split the integer and fractional parts into a struct.
1 byte for which screen. Then 8.8 for within that screen.
na_th_an does some kind of bit shifting for different parts of his X,Y code, to keep the storage space 16 bits per direction.
I thought it was 12.4 (allowing for 16 screen wide), but it might be 10.6 for the flick screen games (allowing for 4 screen wide movements). (not sure)
Edited.
I think that corresponds to what you said "12.4"
nesdoug.com -- blog/tutorial on programming for the NES
Re: Fixed point math for scrolling games
Both Haunted games use 16.8 displacements and 8.8 velocities. One drawback with 12.4 is that you may need to keep 16-bit displacements longer than you would with 16.8. In 16.8, you can ignore subpixel positions, subtract 16-bit pixel positions, reject displacements more than 128 pixels apart early, and save 8-bit pixel displacements for later processing. In 12.4, by contrast, a 20-pixel displacement between two objects is 320 units, which is solidly a 16-bit number.
Re: Fixed point math for scrolling games
This is my preferred configuration too. Much easier to manage, no need for bit shifting when doing physics. You'll probably still need bit shifting for collisions against the background, though.tepples wrote:Both Haunted games use 16.8 displacements and 8.8 velocities.
How many bytes are you really saving when using more tightly packed schemes? Most people don't even have 16 slots for active objects, the savings are minimal. There's hardly any advantage in processing time too, since the time spent shifting bits is probably larger than that of directly manipulating a whole extra byte.
Re: Fixed point math for scrolling games
Sounds like 16.8 is the way to go then. I found some random stuff on the internet that talked about using 12.4 fixed point, but it seems like a bad idea in this case. I was just wondering if there was something I was missing.
More or less what I was thinking. It would waste a few dozen bytes of RAM at most, and I'm not expecting to make a very big game.tokumaru wrote:tepples wrote:How many bytes are you really saving when using more tightly packed schemes? Most people don't even have 16 slots for active objects, the savings are minimal. There's hardly any advantage in processing time too, since the time spent shifting bits is probably larger than that of directly manipulating a whole extra byte.
- rainwarrior
- Posts: 8731
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Fixed point math for scrolling games
It's difficult to do 24 bit numbers in C, especially with how cumbersome 32 bit numbers are in CC65. All I could suggest for this is to be sparing with the promotion to 32 bits, and do some of the relevant work in assembly instead.
At the same time, it's difficult to do efficient arrays of anything bigger than byte-size in C as well. If you have an array of objects, there's a big advantage in assembly to storing each byte of this 24-bit number in a separate array. ("Structure of arrays" idiom.) Unfortunately it's cumbersome to use this kind of data in C, but like above I guess the advice is to try and minimize it, and be prepared to write some assembly. I don't think there's really any good way to do this in CC65 C, but you can plan to do some assembly optimization as needed.
The above code is relatively efficient on CC65, because it does at least tend to handle byte-aligned operations like << 8 fairly well, and if you can use static variables where possible a lot of its internal stack use can be minimized. I think overall with careful use of 8 bit stripes, and a few macros to help you pack and unpack them, you'll end up with faster and smaller code than using 16 bit arrays. (Even with 24 bits its probably still a win over using arrays of 32-bit numbers... though the necessary casting up to 32-bit is going to hurt any way you do it except assembly.)
Note above that i being an unsigned char + both a and b being char arrays means that they can be fetched and stored with LDA abs, X/Y / STA abs, X/Y, which you don't get if either of those conditions isn't true. (For arrays of short int, it always has to promote the index to 16 bits.)
9.7 is OK if you need compactness and are only 2 screens wide, but the extra shift is very cumbersome for the code (in both assembly and C). Splitting in other sizes (e.g. 12.4) is not very good on the 6502. That works much better on processors that can shift more than 1 bit at a time.
At the same time, it's difficult to do efficient arrays of anything bigger than byte-size in C as well. If you have an array of objects, there's a big advantage in assembly to storing each byte of this 24-bit number in a separate array. ("Structure of arrays" idiom.) Unfortunately it's cumbersome to use this kind of data in C, but like above I guess the advice is to try and minimize it, and be prepared to write some assembly. I don't think there's really any good way to do this in CC65 C, but you can plan to do some assembly optimization as needed.
Code: Select all
ab = a[i] | (b[i] << 8);
a[i] = (ab >> 0) & 0xFF;
b[i] = (ab >> 8) & 0xFF;
// a/b are both static unsigned char arrays
// ab is a static short int
// i is an unsigned char
Note above that i being an unsigned char + both a and b being char arrays means that they can be fetched and stored with LDA abs, X/Y / STA abs, X/Y, which you don't get if either of those conditions isn't true. (For arrays of short int, it always has to promote the index to 16 bits.)
9.7 is OK if you need compactness and are only 2 screens wide, but the extra shift is very cumbersome for the code (in both assembly and C). Splitting in other sizes (e.g. 12.4) is not very good on the 6502. That works much better on processors that can shift more than 1 bit at a time.
Last edited by rainwarrior on Thu Aug 16, 2018 4:16 pm, edited 1 time in total.
Re: Fixed point math for scrolling games
I mixed and matched in my platformer and used 16.8 for X and 12.4 for Y.
Re: Fixed point math for scrolling games
I've been toying with macros of casting and dereferencing. I'm away from my computer, but it's something like...I don't think there's really any good way to do this in CC65 C, but you can plan to do some assembly optimization as needed.
#define low_byte(a) *((unsigned char*)a)
#define high_byte(a) *((unsigned char*)a+1)
if "a" and "array1" are a known values, (and global) at compile time, then...
low_byte(x_pos) = array1[object_number];
compiles to...
ldy object_number
lda array1, y
sta x_pos
likewise for the high byte from another array.
(these are 8 bit char arrays).
high_byte(x_pos) = array2[object_number];
edit... x_pos in this example would be an int (16 bit)
nesdoug.com -- blog/tutorial on programming for the NES
Re: Fixed point math for scrolling games
All good tricks. I think in my case, only the physics will need the fractional bits. Rendering and hit detection don't really need them.
On a separate note, you can get cc65 to do nice loads for interleaved arrays too like this:
On a separate note, you can get cc65 to do nice loads for interleaved arrays too like this:
Code: Select all
// Bad, assembly is exactly as written, no constant folding/propagation
for(i = 0; i < 2*count; i += 2){
a = GLOBAL_ARRAY[i + 0];
b = GLOBAL_ARRAY[i + 2];
}
// Better -> tax, lda, lda
for(i = 0; i < 2*count; i += 2){
a = (GLOBAL_ARRAY + 0)[i];
b = (GLOBAL_ARRAY + 1)[i];
}
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: Fixed point math for scrolling games
For completeness. When dealing with odd fp. You can use tables to speed up and extract values
Say 6.2 for example
You make a table that is
0,0,0,0,1,1,1,1,2,2,2,2..... then to extract the '6' its upperSixLUT[value]
for 12.4 you have a tables
XXXX 0000 upper 4 bits -> lower 4 bits
00000 YYYY lower 4 bits -> upper 4 bits
Then to extract the '12'
lo = upperToLower4BitsLUT[value.lo] | lowerToUpper4BitsLUT[value.hi]
hi = upperToLower4BitsLUT[value.hi]
which in ASM becomes
Say 6.2 for example
You make a table that is
0,0,0,0,1,1,1,1,2,2,2,2..... then to extract the '6' its upperSixLUT[value]
for 12.4 you have a tables
XXXX 0000 upper 4 bits -> lower 4 bits
00000 YYYY lower 4 bits -> upper 4 bits
Then to extract the '12'
lo = upperToLower4BitsLUT[value.lo] | lowerToUpper4BitsLUT[value.hi]
hi = upperToLower4BitsLUT[value.hi]
which in ASM becomes
Code: Select all
ldx value.lo
lda upperToLower4BitsLUT,x
ldx value.hi
ora lowerToUpper4BitsLUT,x
sta lo
lda upperToLower4BitsLUT,x
sta hi