nesdev.comhttps://forums.nesdev.com/ Why no SNES homebrew scene?https://forums.nesdev.com/viewtopic.php?f=12&t=10957 Page 9 of 30

 Author: Drew Sebastino [ Sun Jan 03, 2016 12:41 am ] Post subject: Re: Why no SNES homebrew scene? Stef wrote:The SNES only has 8x8 multiplication (the other unit being used by the mode 7, that is not safe for a regular alternative)Wait, are you saying that you can do 16x8 or 16x16 multiplication with mode 7's multiplication and division registers? Is there any realistic way to "chain" two 8 bit multiplications together like you can do with addition or subtraction? Probably not...Stef wrote:(you have to count at least 15/16 cycles for load / read result time)That's why I was thinking it wouldn't be that fast. 70 to 140 cycles sounds astronomically high to me, but on the SNES, it's more like half which is still ridiculously large, but it's at least somewhat reasonable.

 Author: thefox [ Sun Jan 03, 2016 1:30 am ] Post subject: Re: Why no SNES homebrew scene? Espozo wrote:Stef wrote:The SNES only has 8x8 multiplication (the other unit being used by the mode 7, that is not safe for a regular alternative)Wait, are you saying that you can do 16x8 or 16x16 multiplication with mode 7's multiplication and division registers? Is there any realistic way to "chain" two 8 bit multiplications together like you can do with addition or subtraction? Probably not...Not sure if this was what you were asking, but off the top of my head: (a and b are 16-bit numbers)Code:a*b = (a_hi*256 + a_lo) * (b_hi*256 + b_lo)    = a_hi*b_hi*256*256 + a_hi*256*b_lo + a_lo*b_hi*256 + a_lo*b_lo    = ((a_hi*b_hi)<<16) + ((a_hi*b_lo + a_lo*b_hi)<<8) + a_lo*b_lo

 Author: Stef [ Sun Jan 03, 2016 8:00 am ] Post subject: Re: Why no SNES homebrew scene? tomaitheous wrote:This could be repurposed for '816. The routine has a large overhead of T1 loading because it's setup for sequential calls were T1 doesn't change, but you could easily switch that to indirect, removing the self-modifying code, and seriously cut that loading over head down by a lot. Also, the tables are split 8bit as well as the math, so that could be optimized into single tables with single sbc's; word wide tables and 16bit operations. A re-write/re-structure for '816 could probably get it close to 70 cycle range.http://codebase64.org/doku.php?id=base: ... iplication 16x16->32bit mulInteresting is there the same for signed multiplication (which is more useful) ? I guess it only requires different lut.Edit: I found the signed version which add some cycles at the end of the multiplication operation, signed operands process a bit slower because of that, still that is a fast implentation for the 6502 CPU

 Author: tomaitheous [ Sun Jan 03, 2016 2:16 pm ] Post subject: Re: Why no SNES homebrew scene? Stef wrote:Interesting is there the same for signed multiplication (which is more useful) ? I guess it only requires different lut.Edit: I found the signed version which add some cycles at the end of the multiplication operation, signed operands process a bit slower because of that, still that is a fast implentation for the 6502 CPU Code:; Description: Signed 16-bit multiplication with signed 32-bit result.;                                                                     ; Input: 16-bit signed value in T1                                    ;        16-bit signed value in T2                                    ;        Carry=0: Re-use T1 from previous multiplication (faster)     ;        Carry=1: Set T1 (slower)                                     ;                                                                     ; Output: 32-bit signed value in PRODUCT                              ;; Clobbered: PRODUCT, X, A, C.proc multiply_16bit_signed                jsr multiply_16bit_unsigned                ; Apply sign (See C=Hacking16 for details).                lda T1+1                bpl :+                    sec                    lda PRODUCT+2                    sbc T2+0                    sta PRODUCT+2                    lda PRODUCT+3                    sbc T2+1                    sta PRODUCT+3                :                lda T2+1                bpl :+                    sec                    lda PRODUCT+2                    sbc T1+0                    sta PRODUCT+2                    lda PRODUCT+3                    sbc T1+1                    sta PRODUCT+3                :                rts.endproc Yeah, those two groups of 8bit SBCs could be optimized down to one each for '816. So worse case is two 16bit subtractions for signed overhead. Though I think a slightly more optimal routine could be written for signed input/output.

 Author: Stef [ Sun Jan 03, 2016 3:14 pm ] Post subject: Re: Why no SNES homebrew scene? Indeed, someone should rewrite an optimized version for the 65816, would be pretty useful

 Author: psycopathicteen [ Sun Jan 03, 2016 8:47 pm ] Post subject: Re: Why no SNES homebrew scene? I think I figured out how Tomaitheous's code works. This is basically the math equation:LUT1(a+b) - LUT2(255-a+b)LUT1(x) = (x^2)/4LUT2(x) = ((x-255)^2)/4((a+b)^2)/4 - ((255-a+b-255)^2)/4=((a+b)^2)/4 - ((-a+b)^2)/4 =((a+b)^2 - (-a+b)^2)/4 = (a^2 + 2ab + b^2 - (a^2 - 2ab + b^2))/4 =(a^2 + 2ab + b^2 - a^2 + 2ab - b^2)/4 =4ab/4 = abI don't know how the rounding error somehow gets cancelled out.

 Author: tepples [ Sun Jan 03, 2016 9:10 pm ] Post subject: Re: Why no SNES homebrew scene? See quarter square multiplication on Wikipedia.

 Author: psycopathicteen [ Mon Jan 04, 2016 12:51 pm ] Post subject: Re: Why no SNES homebrew scene? I use multiplication for calculating the rotation of the bosses' joints. The sines and cosines are 16-bit signed values from -256 to 256, and the radius is a signed 8-bit values. The result is a signed 16-bit value.I thought of a fast routine that uses non-mode-7 multiplication registers.Code:sine_multiplication:lda #\$0000sep #\$20cpy #\$80bcc +sbc sine_lo,x //clears carry for upcomming adc+;sty \$4202ldy sine_hi,xsty \$4203ldy sine_lo,xadc \$004216 //wastes one cyclesty \$4203xbarep #\$21adc \$4216rts

 Author: psycopathicteen [ Mon Jan 25, 2016 5:29 pm ] Post subject: Re: Why no SNES homebrew scene? What really boggles my mind about slowdown is when games lag with 4 sprites, while other games can have more than 40. You mean to tell me they programmed every routine 10 times slower than necessary?What's even more odd is that the games with heavy slowdown seem to have a higher threshold on the second lag frame as if running the game itself takes a lot of overhead. Like, it takes 4 sprites to make it run at 30fps, but with 10 sprites onscreen, it STILL runs at 30fps? WTF?

 Author: tepples [ Mon Jan 25, 2016 5:50 pm ] Post subject: Re: Why no SNES homebrew scene? The dataGame A1-3 objects: 60 fps4-10 objects: 30 fpsGame B40 objects: 60 fpsSpeculation as to causeFor purpose of argument, I'll assume good faith, imputing no malice where incompetence is sufficient (Hanlon's razor), nor incompetence when bona fide intractability is sufficient.As for the difference between the games: Some games have more complicated collision detection and path finding algorithms and may thus slow down with fewer active objects. I doubt that individual spread-gun bullet sprites in Contra or even ships in Recca have very complicated movement.As for why 10 is no slower than 4: Perhaps there is a constant overhead equivalent to seven objects, such as decoding the map into background update packets. Or perhaps the player character is as complex as two or three objects. I know the walking characters in Haunted: Halloween '85 (the player and the zombies) are the most algorithmically complex because they have to read the collision map four times (bottom center, left, and right, and head-height at leading edge), compared to everything else that reads it once (bottom center) or not at all. Incidentally, I had to do a shload of optimization to the code that parses collision slabs to get it to work well with six walkers (the player and five zombies) on the most platform-filled levels (such as the barn) without slowing down.

 Author: Drew Sebastino [ Mon Jan 25, 2016 6:35 pm ] Post subject: Re: Why no SNES homebrew scene? What's funny is that I just got 1943 from a local video game store, Game X Change, (don't know why I felt like sharing that) and I played it and it's a ton better about not slowing down than Gradius III which I also got, except that this one is on the SNES... That's what really boggles my mind.Isn't the SNES's CPU supposed to be something like 4x faster than the one in the NES?

 Author: psycopathicteen [ Mon Jan 25, 2016 7:11 pm ] Post subject: Re: Why no SNES homebrew scene? Now that I think about it, I think most people mistakenly identify object collision as speed-critical code, when it could actually be the BG collision that bogs the cpu down. I wonder if slopes have anything to do with it, because a lot of NES games didn't use slopes.Edit: I changed the wording, because I'm not exactly sure if this is totally accurate.

 Author: Drew Sebastino [ Mon Jan 25, 2016 7:22 pm ] Post subject: Re: Why no SNES homebrew scene? psycopathicteen wrote:BG collision that bogs the cpu down.Oh yeah... There's no BG to bump into in that game.

 Page 9 of 30 All times are UTC - 7 hours Powered by phpBB® Forum Software © phpBB Grouphttp://www.phpbb.com/