It is currently Wed Dec 12, 2018 8:46 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Thu Aug 16, 2018 11:48 am 
Offline
User avatar

Joined: Fri Nov 24, 2017 2:40 pm
Posts: 95
I have some experience writing physics engines (Chipmunk2D), and want to try and see how far I can go with that on the NES. So far I've gotten circle-circle and circle-line collisions working with reasonable performance, and am working on reducing it further with some small lookup tables. My end goal is a game with a two wheeled vehicle. So far I've been using 8.8 fixed point because it's easy to prototype with from C, but I'm hoping to get at least 2 screen scrolling. So I was thinking about my options for using fixed point.

If I wanted to stick with 16 bit types I could do something like 9.7 or 12.4 depending on how much accuracy feels right in the fractional bits for acceleration and such. All the extra shifting gets messy real fast in C, and I can only imagine moreso in assembly. Having a lot of multi-byte shifts seems like it might really bloat and slow the code too.

16.8 is sort of appealing for the final assembly version. If I wanted to access the values from C I'd need to split the integer and fractional parts into a struct. I've already kind of ruled out padding it out to 32 bits. Even with a lot of casting and type annotations for 16/8 bit ops, I wasn't able to get cc65 to generate reasonable code using 32 bit ints.

What have other people had good success with? Is 16.8 the way to go, or is something like 9.7 less troublesome than I think it is?


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 1:41 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2354
Location: DIGDUG
Quote:
16.8 is sort of appealing for the final assembly version. If I wanted to access the values from C I'd need to split the integer and fractional parts into a struct.


I think this is a good approach.

1 byte for which screen. Then 8.8 for within that screen.


na_th_an does some kind of bit shifting for different parts of his X,Y code, to keep the storage space 16 bits per direction.

I thought it was 12.4 (allowing for 16 screen wide), but it might be 10.6 for the flick screen games (allowing for 4 screen wide movements). (not sure)

Edited.
I think that corresponds to what you said "12.4"

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 2:21 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20867
Location: NE Indiana, USA (NTSC)
Both Haunted games use 16.8 displacements and 8.8 velocities. One drawback with 12.4 is that you may need to keep 16-bit displacements longer than you would with 16.8. In 16.8, you can ignore subpixel positions, subtract 16-bit pixel positions, reject displacements more than 128 pixels apart early, and save 8-bit pixel displacements for later processing. In 12.4, by contrast, a 20-pixel displacement between two objects is 320 units, which is solidly a 16-bit number.


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 2:45 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
tepples wrote:
Both Haunted games use 16.8 displacements and 8.8 velocities.

This is my preferred configuration too. Much easier to manage, no need for bit shifting when doing physics. You'll probably still need bit shifting for collisions against the background, though.

How many bytes are you really saving when using more tightly packed schemes? Most people don't even have 16 slots for active objects, the savings are minimal. There's hardly any advantage in processing time too, since the time spent shifting bits is probably larger than that of directly manipulating a whole extra byte.


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 3:44 pm 
Offline
User avatar

Joined: Fri Nov 24, 2017 2:40 pm
Posts: 95
Sounds like 16.8 is the way to go then. I found some random stuff on the internet that talked about using 12.4 fixed point, but it seems like a bad idea in this case. I was just wondering if there was something I was missing.

tokumaru wrote:
tepples wrote:
How many bytes are you really saving when using more tightly packed schemes? Most people don't even have 16 slots for active objects, the savings are minimal. There's hardly any advantage in processing time too, since the time spent shifting bits is probably larger than that of directly manipulating a whole extra byte.


More or less what I was thinking. It would waste a few dozen bytes of RAM at most, and I'm not expecting to make a very big game.


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 3:58 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7014
Location: Canada
It's difficult to do 24 bit numbers in C, especially with how cumbersome 32 bit numbers are in CC65. All I could suggest for this is to be sparing with the promotion to 32 bits, and do some of the relevant work in assembly instead.

At the same time, it's difficult to do efficient arrays of anything bigger than byte-size in C as well. If you have an array of objects, there's a big advantage in assembly to storing each byte of this 24-bit number in a separate array. ("Structure of arrays" idiom.) Unfortunately it's cumbersome to use this kind of data in C, but like above I guess the advice is to try and minimize it, and be prepared to write some assembly. I don't think there's really any good way to do this in CC65 C, but you can plan to do some assembly optimization as needed.

Code:
ab = a[i] | (b[i] << 8);
a[i] = (ab >> 0) & 0xFF;
b[i] = (ab >> 8) & 0xFF;
// a/b are both static unsigned char arrays
// ab is a static short int
// i is an unsigned char

The above code is relatively efficient on CC65, because it does at least tend to handle byte-aligned operations like << 8 fairly well, and if you can use static variables where possible a lot of its internal stack use can be minimized. I think overall with careful use of 8 bit stripes, and a few macros to help you pack and unpack them, you'll end up with faster and smaller code than using 16 bit arrays. (Even with 24 bits its probably still a win over using arrays of 32-bit numbers... though the necessary casting up to 32-bit is going to hurt any way you do it except assembly.)

Note above that i being an unsigned char + both a and b being char arrays means that they can be fetched and stored with LDA abs, X/Y / STA abs, X/Y, which you don't get if either of those conditions isn't true. (For arrays of short int, it always has to promote the index to 16 bits.)

9.7 is OK if you need compactness and are only 2 screens wide, but the extra shift is very cumbersome for the code (in both assembly and C). Splitting in other sizes (e.g. 12.4) is not very good on the 6502. That works much better on processors that can shift more than 1 bit at a time.


Last edited by rainwarrior on Thu Aug 16, 2018 4:16 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 4:02 pm 
Offline
User avatar

Joined: Thu Mar 31, 2016 11:15 am
Posts: 441
I mixed and matched in my platformer and used 16.8 for X and 12.4 for Y.


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 4:13 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2354
Location: DIGDUG
Quote:
I don't think there's really any good way to do this in CC65 C, but you can plan to do some assembly optimization as needed.


I've been toying with macros of casting and dereferencing. I'm away from my computer, but it's something like...

#define low_byte(a) *((unsigned char*)a)
#define high_byte(a) *((unsigned char*)a+1)

if "a" and "array1" are a known values, (and global) at compile time, then...

low_byte(x_pos) = array1[object_number];

compiles to...
ldy object_number
lda array1, y
sta x_pos

likewise for the high byte from another array.
(these are 8 bit char arrays).

high_byte(x_pos) = array2[object_number];

edit... x_pos in this example would be an int (16 bit)

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 8:08 pm 
Offline
User avatar

Joined: Fri Nov 24, 2017 2:40 pm
Posts: 95
All good tricks. I think in my case, only the physics will need the fractional bits. Rendering and hit detection don't really need them.

On a separate note, you can get cc65 to do nice loads for interleaved arrays too like this:

Code:
// Bad, assembly is exactly as written, no constant folding/propagation
for(i = 0; i < 2*count; i += 2){
  a = GLOBAL_ARRAY[i + 0];
  b = GLOBAL_ARRAY[i + 2];
}

// Better -> tax, lda, lda
for(i = 0; i < 2*count; i += 2){
  a = (GLOBAL_ARRAY + 0)[i];
  b = (GLOBAL_ARRAY + 1)[i];
}


Top
 Profile  
 
PostPosted: Thu Aug 16, 2018 10:44 pm 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 629
For completeness. When dealing with odd fp. You can use tables to speed up and extract values
Say 6.2 for example
You make a table that is
0,0,0,0,1,1,1,1,2,2,2,2..... then to extract the '6' its upperSixLUT[value]

for 12.4 you have a tables
XXXX 0000 upper 4 bits -> lower 4 bits
00000 YYYY lower 4 bits -> upper 4 bits
Then to extract the '12'
lo = upperToLower4BitsLUT[value.lo] | lowerToUpper4BitsLUT[value.hi]
hi = upperToLower4BitsLUT[value.hi]

which in ASM becomes
Code:
ldx value.lo
lda upperToLower4BitsLUT,x
ldx value.hi
ora lowerToUpper4BitsLUT,x
sta lo
lda upperToLower4BitsLUT,x
sta hi


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: Memblers and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group