It is currently Mon Dec 11, 2017 2:06 pm

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 51 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Aug 28, 2016 9:09 am 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3152
Location: Nacogdoches, Texas
Sure. However, I actually already deleted the flipping part of the code. It wasn't way too optimized, but at the same time, I think the speed increase would be marginal.

Here's the metasprite code though. I got rid of the single metasprite code because there's no need now. However, I might bring it back, because it'll be marginally faster in that I shouldn't need to add the x or y position from the metasprite, I'd just use the object's x and y positions, so that'll save me two sets of clc and adc! :lol: I guess the other thing would be that I don't need to check if we're at the end of the table again, but that's even less impressive in terms of saving cpu time. I don't care though, when you're going through a routine (at most) 128 times, any cycle saved helps.

Oh yeah, "big_metasprite" is the exact same thing except that the values for checking if it's out of bounds are different. I figured I'd waste a few cycles (and go against what I said earlier :lol: ) to combat overdraw. The comments are pretty much useless, but I think they're accurate. The comments on the metasprite are, I know.

Code:
.proc metasprite_handler
  rep #$30   ;A=16, X/Y=16
  lda #ObjectTable
  tcd
  ldy #$0000
  bra continue_metasprite_finder

metasprite_finder:
  tdc
  clc
  adc #ObjectSlotSize
  tcd
  cmp #ObjectTable+ObjectTableSize
  bne continue_metasprite_finder
  sty a:SpriteCount
  rts

continue_metasprite_finder:
  ldx ObjectSlot::MetaspriteOffset
  beq metasprite_finder
  lda a:$0000,x
  sta a:MetaspriteCount

metasprite_loop:
  lda a:$0006,x
  and #$0001
  sta a:SpriteBuf3+2,y      ;sprite size
  bne big_sprite
  lda a:$0002,x
  clc
  adc ObjectSlot::OnscreenXPosition
  cmp #256
  bcc sprite_x_not_out_of_bounds
  cmp #65528
  bcs sprite_x_not_out_of_bounds
  txa
  clc
  adc #$0008
  tax
  dec a:MetaspriteCount      ;decrement MetaspriteCount by 1
  brl metasprite_loop      ;back to the loop...

metasprite_finder_branch:
  bra metasprite_finder

sprite_x_not_out_of_bounds:
  and #$01FF
  sta a:SpriteBuf1,y      ;Store sprite X position SpriteBuf1+y
  sta a:SpriteBuf3,y      ;Store sprite X position SpriteBuf1+y
  lda a:$0004,x         ;2nd byte = sprite Y position (value 0-255)
  clc
  adc ObjectSlot::OnscreenYPosition
  cmp #224
  bcc sprite_y_not_out_of_bounds
  cmp #65528
  bcs sprite_y_not_out_of_bounds
  txa
  clc
  adc #$0008
  tax
  dec a:MetaspriteCount      ;decrement MetaspriteCount by 1
  brl metasprite_loop      ;back to the loop...

sprite_y_not_out_of_bounds:
  sta a:SpriteBuf1+1,y
  lda a:$0006,x
  sta a:SpriteBuf3+2,y      ;sprite size
  lda a:$0008,x
  ora ObjectSlot::Attributes
  sta a:SpriteBuf1+2,y      ;extra/character
  iny
  iny
  iny
  iny
  cpy #$0200         ;sees if all 128 sprites are used up
  bne continue_sprite_y_not_out_of_bounds
  sty a:SpriteCount
  rts

continue_sprite_y_not_out_of_bounds:
  dec a:MetaspriteCount      ;decrement MetaspriteCount by 1
  beq metasprite_finder_branch
  txa
  clc
  adc #$0008
  tax
  brl metasprite_loop      ;back to the loop...
Code:
TestMetasprite:
  .word $0002   ; Number of metasprite table entries below
        ;XPos YPos NextTile/Size Extra/Character
  .word $0000,$0000,$0000,$0001
  .word $0000,$0008,$0101,$4000

Yeah though, I actually got it to only cover "only" half the screen now. Much better though. The rest of my stuff takes up about a forth of that, and I know that can be optimized more than this can. Additionally, like I said, I'm not using FastROM either. I think I've learned though that anything that doesn't have to be done at runtime (like flipping metasprites), I'm not doing it.

I know I keep going on and on, but I'm confused, if you're doing a "rep" or "sep", if the accumulator is 16 bit, will that add 2 extra cycles? I've been trying to be a bit smarter in terms of the size of the accumulator, and x and y. Unfortunately, I really can't do anything with x and y in the above routine. It's a pain in the ass that x and y can't be different sizes, because you could easily make two (really four now :roll:) different routines that deal with the different 256 byte halves of oam. There's no feasible way to make x 8 bit here. (I suppose you could have a different routine for every 256 bytes... Yeah, put the metasprite data at the beginning of every bank... :lol:) It's also a pain in the ass that direct page can't escape bank 0, because it would be perfect for indexing metasprites as each slot is only a handful of bytes. With the object table, anything you use is just as effective. The reason I'm using direct page on the object table is because it's the fastest, and most of my object routines are probably going to deal with data outside of the first 8KB or ram or the data outside of bank $00.



Oh yeah, one final thing, an obvious optimization I saw with your hioam filling code is that you can use direct page instead of x or y to save a cycle for every "ora". X and y can then just be 8 bit, because you're only indexing 32 bytes instead of 512.


Top
 Profile  
 
PostPosted: Sun Aug 28, 2016 10:19 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2421
You can use a separate routine for big and small sprites, and forget about storing the size bits separate from the x corrdinates.

Code:
big_sprite_x_not_out_of_bounds:
  and #$01FF
  sta a:SpriteBuf1,y      ;Store sprite X position SpriteBuf1+y
  ora #$0200               ;Set size bit
  sta a:SpriteBuf3,y      ;Store sprite X position SpriteBuf1+y


Top
 Profile  
 
PostPosted: Sun Aug 28, 2016 12:13 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3152
Location: Nacogdoches, Texas
I actually do have a separate routine for small and large sprites, but I didn't think about that. Good call! :)

By doing that and some other stuff I did, I got it down by about 8 scanlines I think. I can think of a couple more fallback things, but they increase the code size ten fold for a measly amount of cycles.

Do you have any clue how to increase the rom speed to FastROM? I don't. I'm just saying, if Rendering Ranger R2 is SlowROM, (and probably even if it weren't) it must be hardcoded to hell. I really don't know how much CPU time games generally use toward creating metasprites. I know that no game on the SNES (maybe aside from Super R-Type on level 2 when it's freaking out) even uses that many sprites though.


Top
 Profile  
 
PostPosted: Mon Aug 29, 2016 9:37 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 818
Using FastROM in the general case involves understanding memory mapping, because you have to be accessing ROM in banks $80-$FF for it to work. FastROM is activated by writing a 1 to $420D, but this only affects the upper half of the memory map.

The method I use - which is fairly standard - is to insert a long jump near the beginning of my code, to jump to $800000 plus the absolute address of the instruction following the jump. That is, label the next instruction, use an assembler expression to add $800000 (or $7F0000 in the glitchy old version of WLA DX I've been using) to the value of the label, and JML to that. This works because in both the HiROM and LoROM standard mappings, bank $00 is the same as bank $80, so you end up executing the same code as if you hadn't jumped. Then just write 1 to $420D, if you haven't already, and you're off to the races.

Oh, and change the data bank to somewhere in $80-$FF too (unless you're using WRAM, or a special feature of the cartridge that isn't the same between halves of the memory map). That way data accesses in ROM will be fast.

Code:
jml $7F0000+high_speed   ; $800000 doesn't work for some reason, but this does - assembler bug? (ancient version of WLA DX)
high_speed:
   lda #$01
   sta $420D
   phk
   plb


Top
 Profile  
 
PostPosted: Mon Aug 29, 2016 10:03 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 336
Location: FL
This should also work for WLA:

Code:
.base $80
    jml +
+   lda #$01
    ...


Top
 Profile  
 
PostPosted: Mon Aug 29, 2016 11:32 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 818
Yeah, should. The version from 2003 that I got with Neviksti's SNES Starter Kit seems to have a buggy base directive... I seem to recall it being case-specific, causing me to have to experiment with the no$sns debugger to get it to work (or was that the method I showed above? In any case Neviksti's .inc template had a comment to the effect that base wasn't working properly)...

I'm sure the latest version is much better, but last time I tried to use it, it not only failed to work at all but somehow managed to make the old version not work either, so I'm not in a hurry to try again... my SNES development environment needs an overhaul if I'm going to do more than noodle around, but it will have to wait because I'm busy with my actual job...

Thanks for mentioning it, though. It's probably the way you're supposed to do it... but now that I think of it, IIRC Espozo's using ca65, and I haven't gotten very far with that yet; there's probably some voodoo that you need to do...


Top
 Profile  
 
PostPosted: Tue Aug 30, 2016 5:35 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3152
Location: Nacogdoches, Texas
93143 wrote:
IIRC Espozo's using ca65

Yes.

I think I can actually solve the problem right here though, maybe...

Code:
# ca65 linker config for 256 KiB (2 Mbit) sfc file

# Physical areas of memory
MEMORY {
  ZEROPAGE:   start =  $000000, size =  $0100;   # $0000-00ff -- zero page
                                                 # $0100-01ff -- stack
  BSS:        start =  $000200, size =  $1e00;   # $0200-1fff -- RAM
  BSS7E:      start =  $7e2000, size =  $e000;   # SNES work RAM, $7e2000-7effff
  BSS7F:      start =  $7f0000, size = $10000;   # SNES work RAM, $7f0000-$7ffff
  ROM0:       start =  $008000, size =  $8000, fill = yes;
  ROM1:       start =  $018000, size =  $8000, fill = yes;
  ROM2:       start =  $028000, size =  $8000, fill = yes;
  ROM3:       start =  $038000, size =  $8000, fill = yes;
  ROM4:       start =  $048000, size =  $8000, fill = yes;
  ROM5:       start =  $058000, size =  $8000, fill = yes;
  ROM6:       start =  $068000, size =  $8000, fill = yes;
  ROM7:       start =  $078000, size =  $8000, fill = yes;
}

# Logical areas code/data can be put into.
SEGMENTS {
  CODE:       load = ROM0, align =  $100;
  RODATA:     load = ROM0, align =  $100;
  SNESHEADER: load = ROM0, start = $ffc0;
  CODE1:      load = ROM1, align =  $100, optional = yes;
  RODATA1:    load = ROM1, align =  $100, optional = yes;
  CODE2:      load = ROM2, align =  $100, optional = yes;
  RODATA2:    load = ROM2, align =  $100, optional = yes;
  CODE3:      load = ROM3, align =  $100, optional = yes;
  RODATA3:    load = ROM3, align =  $100, optional = yes;
  CODE4:      load = ROM4, align =  $100, optional = yes;
  RODATA4:    load = ROM4, align =  $100, optional = yes;
  CODE5:      load = ROM5, align =  $100, optional = yes;
  RODATA5:    load = ROM5, align =  $100, optional = yes;
  CODE6:      load = ROM6, align =  $100, optional = yes;
  RODATA6:    load = ROM6, align =  $100, optional = yes;
  CODE7:      load = ROM7, align =  $100, optional = yes;
  RODATA7:    load = ROM7, align =  $100, optional = yes;

  ZEROPAGE:   load = ZEROPAGE, type = zp;
  BSS:        load = BSS,   type = bss, align = $100, optional = yes;
  BSS7E:      load = BSS7E, type = bss, align = $100, optional = yes;
  BSS7F:      load = BSS7F, type = bss, align = $100, optional = yes;
}

I'm guessing all I need to do is change "ROM0-7" to bank $80. I'll try it soon. I swear though, if I can't get this down to 1/3 of the screen, I'm (seriously) hardcoding metasprites into each object's code, although I know I'll end up regretting it. The problem is that collision detection probably takes up way more time than creating metasprites, because even if each collision check would take about half the time metasprites take me, if we're doing something like 32x32, that's a whopping 1024 checks. I really hope I don't have to hardcode everything. :lol:

The one thing I really like about having each object have its metasprite hardcoded in is that for something like a bullet or explosion, you're really not doing much of anything to create the metasprite. I also thought about how if you have a multi sprite object, you could just fill out the different x and y positions of each sprite back to back. However, with heavily animated objects that change size often, this setup completely falls apart and pretty much becomes impossible. I honestly have no clue what I'll do.

Edit: Wait a minute, where does the SNES start to read code on power up? $000000? Actually, wait, it couldn't be because that's in ram.


Top
 Profile  
 
PostPosted: Tue Aug 30, 2016 9:51 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 818
It starts at the address defined by the RESET vector, which is given at $FFFC in bank $00. The RESET vector is 16-bit, so the SNES can't start anywhere other than bank $00.


Top
 Profile  
 
PostPosted: Tue Aug 30, 2016 11:23 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
Amusingly, I went over all this months ago, including on a live twitch stream. RIP.


Top
 Profile  
 
PostPosted: Wed Aug 31, 2016 5:25 am 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3152
Location: Nacogdoches, Texas
Amusingly, I forget things. :P

Anyway, I'll fix try this when I get home.


Top
 Profile  
 
PostPosted: Wed Aug 31, 2016 10:32 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2421
Quote:
if we're doing something like 32x32, that's a whopping 1024 checks


Do you mean 32 objects colliding with 32 objects?, or 32x32 sprites colliding with 32x32 sprites?


Top
 Profile  
 
PostPosted: Wed Aug 31, 2016 11:33 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19329
Location: NE Indiana, USA (NTSC)
Standard ways to make collision among 32 objects manageable:
  • Partition the objects into sets by type, where only certain type pairs will produce a collision. For example, player bullets won't collide with each other, nor will enemy bullets. And in many games, enemies don't collide with each other.
  • Sort all objects by their X or Y center point, and reject those outside a certain radius.


Top
 Profile  
 
PostPosted: Wed Aug 31, 2016 9:04 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3152
Location: Nacogdoches, Texas
Wait a minute...

93143 wrote:
It starts at the address defined by the RESET vector, which is given at $FFFC in bank $00. The RESET vector is 16-bit, so the SNES can't start anywhere other than bank $00.

How do you have the cartridge not be very large while simultaneously having data in both bank $00 and bank $80? I had thought of this upon editing that code I posted.

psycopathicteen wrote:
Do you mean 32 objects colliding with 32 objects?

This.

Actually, let me break down collision. If the theoretical game is a two player shooter with a bunch of crap going on, you could probably break down total number of objects into 40 player bullets, 48 enemy bullets, 16 enemies, and the rest miscellaneous (explosions, shrapnel, actually the player ships). Because both enemies and bullets are harmful to the player, add them both up for 64, x 2 = 128, then add 40 x 16 for 768 total checks. Better, but still not at all good. If the SNES took that big of a hit from just creating metasprites, I'm screwed. However, I always did plan to hardcode collision detection, because it's easy to do so and collision can be more flexible than a standard routine could offer, like changing the velocity of an object after being hit from a certain direction.


Top
 Profile  
 
PostPosted: Wed Aug 31, 2016 9:15 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 336
Location: FL
Espozo wrote:
How do you have the cartridge not be very large while simultaneously having data in both bank $00 and bank $80? I had thought of this upon editing that code I posted.


Banks $00 and $80 are physically identical on (nearly) all cartridges. That is, you can put your vectors at $80FFE4+ and they will be mirrored to $00FFE4. The only difference is that bank $80 is accessed faster when FastROM is enabled.


Top
 Profile  
 
PostPosted: Thu Sep 01, 2016 1:56 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2421
Quote:
Actually, let me break down collision. If the theoretical game is a two player shooter with a bunch of crap going on, you could probably break down total number of objects into 40 player bullets, 48 enemy bullets, 16 enemies, and the rest miscellaneous (explosions, shrapnel, actually the player ships). Because both enemies and bullets are harmful to the player, add them both up for 64, x 2 = 128, then add 40 x 16 for 768 total checks. Better, but still not at all good. If the SNES took that big of a hit from just creating metasprites, I'm screwed. However, I always did plan to hardcode collision detection, because it's easy to do so and collision can be more flexible than a standard routine could offer, like changing the velocity of an object after being hit from a certain direction.


Are you talking about shmups or run'n'guns? I think run'n'guns typically have less bullets per player, while shmups typically have more bullets per player but don't usually support multiplayer. I think 20 bullets for a one player shmup is what most shmups do.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 51 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group