Why did Super Mario RPG and Kirby Super Star use an SA-1?

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by Oziphantom »

psycopathicteen wrote:
Nicole wrote:If you're smart about optimizing the stuff that really has to be, it can certainly save development time.
That's what I've always figured, it just bugs me when people think you can't write code that is both optimized and maintainable under time constraints. I've seen programmers who thought this:

Code: Select all

sep #$20
ror $01
ror $00
ror $01
ror $00
rep #$20
lda $00
was more maintainable than this:

Code: Select all

rep #$20
lda $00
ror
ror
sta $00
simply because optimizations are "risky".
I feel your pain. That is not an optimisation, that is just how you do it.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

Another thing that annoys me is when someone dismisses an optimization because "it will only cost 2% of a frame" but they do the same thing 50 times a frame, resulting in slowdown.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by rainwarrior »

psycopathicteen wrote:That's what I've always figured, it just bugs me when people think you can't write code that is both optimized and maintainable under time constraints. I've seen programmers who thought this:
{contrived example}
was more maintainable than this:
{contrived example}
simply because optimizations are "risky".
This is a big fat straw-man. I can believe you found the former example in your disassembly of some shipped code. I don't believe you have any insight as to why the programmer wrote it that way.

There's certainly reasons to write slower code in service to maintainability, but this example isn't it. You need context to make such a justification. Five lines of assembly code is not a context; a hundred thousand line program that needs to ship by Tuesday might be.

You can't just act like someone deliberately considered those two alternative pieces of code and chose the former. There'a a million ways code gets edited mangled, etc. during production where everything is constantly changing. I think it's preposterous that you propose this was the result of an argument for maintainability.
psycopathicteen wrote:Another thing that annoys me is when someone dismisses an optimization because "it will only cost 2% of a frame" but they do the same thing 50 times a frame, resulting in slowdown.
Except the example you gave is 0.02% of a frame, not 2%, and you'd have to do the same thing 5000 times, not 50, and that's a real difference.

You're not representing the argument fairly here. I don't dismiss something because "it will only cost 2% of a frame", I was suggesting that optimization should be approached by profiling and working from the top down (example prior discussion), and that finding and fixing a thousand tiny pin pricks might not make the change you've hoped for.

What I really object to is you calling programmers or other labourers stupid or lazy or bad at their job for having written some inefficient code in one place or another. They succeeded at making a game that you liked so much that you're disassembling it 25 years later with tools that are like a microscope compared to their magnifying glass.

There is such a thing as doing a bad job, but making examples of extreme minutiae out of context isn't a good argument for it.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

It's not like I'm saying "why don't they unroll every loop in the game?". If I was trying to fix a thousand nitpicks I would never have gotten this far with my homebrew.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by tokumaru »

But you're making assumptions about how the code came to be the way it is. You can't say that the cause for a specific instance of weird/slow code is ignorance, laziness or whatever, because you weren't there. We're seeing this code from a completely different point of view than the people who wrote it.

I for example am not proud of every single line of code I've written when coding professionally, because getting things done on time was more important than writing the best possible code for every little aspect of a program. The exact same happens with commercial games, there's always someone on your back expecting results, so there's hardly any time to look back on stuff that's already working to make improvements.

Now, you appear to be dead set on creating the most optimal SNES game ever, and from the forum topics I remember reading, you've rewritten your sprite system a few hundred times already. Has that gotten you any closer to actually finishing a game? Things are different when coding is a hobby. One could even argue that your hobby is actually optimizing code, not making games. The same thing happens to me, I've rewritten the same systems so many times in search of the best possible solutions it isn't even funny, and while that has been a good exercise in its own right, because it's a hobby of mine, it hasn't gotten me any closer to shipping a finished product. This is a completely different context from that in which the games you're debugging were created, and it's unfair of you to do some of the comparisons you make.

By all means, debug the hell out of them and point out all the weirdness and slowness you can find, that actually helps other coders make better choices in their own projects, but try not to make assumptions about why other programmers did what they did, it just makes you sound pretentious.
d4s
Posts: 96
Joined: Mon Jul 14, 2008 4:02 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by d4s »

Unrelated, but I rejoice a little every time I see posts by rainwarrior.
Not only because of his avatar picture, but also because they contain
so much truth, sharp analysis and wisdom without ever being
insulting, arrogant or condescending.
My hat's off to you!
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

My code example didn't really drive my point across, but if I waste 60 cycles doing something, and that routine gets called 100 times, then it takes up 10% of the CPU time, and the 60 cycles wasted in the routine might include the above code.

you've rewritten your sprite system a few hundred times already. Has that gotten you any closer to actually finishing a game?
Well, actually it did because I have less limitations to work with than I did back in 2010. I no longer have to manually squeeze sprites into VRAM.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by Drew Sebastino »

tokumaru wrote:you've rewritten your sprite system a few hundred times already
Wow. That's even more than I have! :lol: (I'm actually in the process of rebuilding it again; I had broken routines into a million specialized subroutines that were faster, but I just couldn't keep up with it all and then I started to have to have to chain beq/bne to bra a bunch, which often mitigated whatever speed increase there was. I've also redone my metasprite routine to accept metasprite data outside of bank 0. I had direct page acting as an index register, as at that point I didn't know it was a cycle slower if it wasn't set to multiples of 256.)
psycopathicteen wrote:I no longer have to manually squeeze sprites into VRAM.
It's because of me, right? :wink:
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

Espozo wrote: I'm actually in the process of rebuilding it again; I had broken routines into a million specialized subroutines that were faster, but I just couldn't keep up with it all and then I started to have to have to chain beq/bne to bra a bunch, which often mitigated whatever speed increase there was. I've also redone my metasprite routine to accept metasprite data outside of bank 0. I had direct page acting as an index register, as at that point I didn't know it was a cycle slower if it wasn't set to multiples of 256.
So is this what's going on:

Code: Select all

beq +
jmp over_routine
+;
(routine)
over_routine:
(another_routine)
rts
Then you're going to do this?

Code: Select all

bne +
jsr routine
+;
(another_routine)
rts

(routine)
rts
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by Drew Sebastino »

Well, what I was referring to specifically was the problem I was having with the vram engine. In order to create the linked list, it would need to know what slot was used previously. To do that, it would look at the entry in the tile request table right before it. I had it find all the 32x32's for a metasprite, then all the 16x16's, as you wouldn't need to do anything to the number you were indexing by. The thing is, I had a 32x32 tile request table, and a 16x16 tile request table, which makes it to where you have to have a different subroutine for every situation; I had one for starting 32x32, one for starting 16x16 (if the metasprite had no 32x32 sprites), one for 32x32 to 16x16, one for continuing 32x32, and one for continuing 16x16. That's 5 different "groups" of code. The alternative is to also store the results for the previous slot as regular variables in ram as well as in the appropriate table. I started feeling like an idiot when I thought about how the speed of the code really wouldn't change much, as one absolute x indexed load is 4 cycles, while a direct page store and load is 6 cycles combined. However, with all the extra branches I had to do, the speed probably evened out, while the code was probably over twice as large. I really wasn't thinking... :lol:
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

How I did it was something like this:

Code: Select all

jsr find_vram_slot
store initial slot number

loop:
store x/y/attributes on the linked list
branch to end if done

store slot number
jsr find_vram_slot
branch to loop

end:
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by Drew Sebastino »

Does something as short as "find_vram_slot" really need to be it's own routine?
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

My routine also sets up DMA.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by Drew Sebastino »

What do you mean by "sets up DMA"?

This is my sprite tile uploading routine. I don't know how you could do any less of this in VBLANK and still be able to do the DMA transfer; all this is really doing is writing to the different DMA registers.

Code: Select all

.proc tile_uploader
  sep #$10
  rep #$20
  lda #$4300
  tcd
  lda #$1801			;Set DMA mode (word, normal increment) and destination register (VRAM write register)
  sta $00
  sta $10
  sta $20
  sta $30
  ldy #$80
  sty a:$2115
  lda a:TileRequestCounter16x16
  beq tile_uploader_32x32
  ldx #$00

tile_uploader_16x16_loop:
  lda #$0040
  sta $05
  sta $15

;16x16 Top Half
  lda a:TileRequest16x16LoWordTable,x
  sta $02
  clc
  adc #$0040
  sta $12

  lda a:TileRequest16x16BankByteTable,x
  tay
  sty $04
  sty $14

  lda a:TileRequest16x16VramAddressTable,x
  sta a:$2116

  ldy #$01		;Initiate DMA transfer (channel 0)
  sty a:$420B

;16x16 Bottom Half  
  clc
  adc #$0100
  sta a:$2116

  ldy #$02		;Initiate DMA transfer (channel 1)
  sty a:$420B

  inx
  inx
  beq tile_uploader_done
  cpx a:TileRequestCounter16x16
  bne tile_uploader_16x16_loop



tile_uploader_32x32:
  lda a:TileRequestCounter32x32
  beq tile_uploader_done
  ldx #$00

tile_uploader_32x32_loop:
  lda #$0080
  sta $05
  sta $15
  sta $25
  sta $35

;32x32 Top Part
  lda a:TileRequest32x32LoWordTable,x
  sta $02
  clc
  adc #$0080
  sta $12
  adc #$0080
  sta $22
  adc #$0080
  sta $32

  lda a:TileRequest32x32BankByteTable,x
  tay
  sty $04
  sty $14
  sty $24
  sty $34

  lda a:TileRequest32x32VramAddressTable,x
  sta a:$2116

  ldy #$01		;Initiate DMA transfer (channel 0)
  sty a:$420B

;32x32 Upper Middle Part 
  clc
  adc #$0100
  sta a:$2116

  ldy #$02		;Initiate DMA transfer (channel 1)
  sty a:$420B

;32x32 Lower Middle Part
  adc #$0100
  sta a:$2116

  ldy #$04		;Initiate DMA transfer (channel 2)
  sty a:$420B

;32x32 Bottom Part
  adc #$0100
  sta a:$2116

  ldy #$08		;Initiate DMA transfer (channel 3)
  sty a:$420B

  inx
  inx
  cpx #$40
  beq tile_uploader_done
  cpx a:TileRequestCounter32x32
  bne tile_uploader_32x32_loop

tile_uploader_done:
  lda #$0000
  tcd
  stz TileRequestCounter16x16
  stz TileRequestCounter32x32
  rts
.endproc
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Why did Super Mario RPG and Kirby Super Star use an SA-1

Post by psycopathicteen »

By setting up DMA, I mean building the DMA table.

You know I must say, the reason I dislike liscensed developers is not because they write inefficient code, but because a lot of them acted like know-it-alls in interviews. Treasure's lead programmers didn't say they couldn't get SNES to run Gunstar Heroes because of time constraints, they said because "the SNES's CPU can't handle the action, period" which I know is complete bullshit. Plus the fact that Konami jumped on the bandwagon, and made "Contra Hard Corps" on the Genesis which they've designed to be as CPU efficient as possible (even under time constraints) from the get go, unlike their SNES counterparts where they just threw together any code from NES games or 68000-based arcade games where they converted ASM code line by line, and just did a couple nitpick optimizations at the end.
Post Reply