GSU revision comparison test

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
qwertymodo
Posts: 775
Joined: Mon Jul 02, 2012 7:46 am

GSU revision comparison test

Post by qwertymodo » Wed Jun 24, 2015 11:53 am

A few months back, somebody sent me this test ROM to run on a real GSU board. I don't remember who sent it to me. I've gone through my post history and can't find it. If you recognize it, please chime in.

I tested it on every revision I have on hand. The MARIO-CHIP board (Star Fox) just gave me a blank screen, but it could have been an issue with the board, because I've been messing around trying to transplant the SRAM on the board. All other revisions, GSU-1, GSU-1A, and GSU-2 all gave the same result.

Image

However, the test does not match any emulator I have tested.

higan-accuracy 0.94
Image

Snes9x 1.53
Image

I'm not even going to bother downloading ZSNES to try it on there. But there you go. No idea what the numbers mean, hopefully whoever sent me the ROM sees this and can give some more information.

Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: GSU revision comparison test

Post by Sik » Wed Jun 24, 2015 6:05 pm

Probably just counting around how many iterations of a loop happen within a given time. Note that this is tricky to get right, since it involves proper emulation of the CPU at the cycle (or even subcycle) level, memory timings (this can throw off things really badly if you don't take into account everything) as well as proper emulation of whatever is being used to do the timing in the first place.

Saying this from experience since I did something similar to benchmark 68000 cycle count accuracy in Mega Drive emulators and none gets it right. Turns out that besides VDP emulation issues, there are memory access timings that were still not being taken into account (RAM refresh is emulated by nothing, and I'm not sure if ROM-side refresh was known at the time either).

93143
Posts: 1361
Joined: Fri Jul 04, 2014 9:31 pm

Re: GSU revision comparison test

Post by 93143 » Thu Jun 25, 2015 1:48 am

I don't think the MARIO chip supports either 21 MHz mode or fast multiply, though I don't know why it would outright refuse to run...

Also, I was under the impression that fast multiply was disallowed in 21 MHz mode, and it seems byuu thought the same. Of course, just because it executes at the higher speed doesn't mean the answers are right...

Either way, those are some substantial differences. I hope I don't have to wreck a game just to find out if my port of [redacted] is even feasible...

ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: GSU revision comparison test

Post by ARM9 » Thu Jun 25, 2015 5:45 am

Ah yes, I recognize that screen, thanks for taking the time to run it on hardware, very much appreciated.
I made that rom to test whether the fast multiplication setting worked in 21mhz mode. Book says no, so does bsnes, yet some superfx games I've looked at seem to use it at 21mhz so I wanted to see what's up.
Apparently it works on hardware so this change to higan/sfc/chip/superfx/timing/timing.cpp would reflect that: http://vpaste.net/VYOR0
There might be more to it than that, but it should be a start.

Here's the source for the original test:
https://github.com/ARM9/snesdev/tree/ma ... u/profiler

The reason it crashes on the MARIO chip could be because I don't check the chip version before setting the clock to 21mhz, because I don't know what VCR returns on the different revisions. fullsnes only has this information:
Known versions: 1=MC1/Blob, ?=MC1/SMD, ?=GSU1, ?=GSU1A, 4=GSU2, ?=GSU2-SP1.
I would guess that 2=GSU1, 3=GSU1A but I don't know for sure.
Last edited by ARM9 on Thu Jun 25, 2015 9:20 am, edited 1 time in total.

qwertymodo
Posts: 775
Joined: Mon Jul 02, 2012 7:46 am

Re: GSU revision comparison test

Post by qwertymodo » Thu Jun 25, 2015 8:48 am

93143 wrote:I hope I don't have to wreck a game just to find out if my port of [redacted] is even feasible...
[redacted]

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Re: GSU revision comparison test

Post by MottZilla » Thu Jun 25, 2015 3:03 pm

Sik wrote:Probably just counting around how many iterations of a loop happen within a given time. Note that this is tricky to get right, since it involves proper emulation of the CPU at the cycle (or even subcycle) level, memory timings (this can throw off things really badly if you don't take into account everything) as well as proper emulation of whatever is being used to do the timing in the first place.

Saying this from experience since I did something similar to benchmark 68000 cycle count accuracy in Mega Drive emulators and none gets it right. Turns out that besides VDP emulation issues, there are memory access timings that were still not being taken into account (RAM refresh is emulated by nothing, and I'm not sure if ROM-side refresh was known at the time either).
Perhaps the cache functionality of the Super FX emulation in Higan/BSNES is not entirely accurate? Maybe that's why the numbers are off. I do recall hearing emulators like ZSNES and SNES9X do not simulate the cache at all.

AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ » Thu Jun 25, 2015 5:44 pm

(Crossposted from byuu's forum)

The following patch makes bsnes match hardware exactly:

Code: Select all

diff --git a/bsnes/snes/chip/superfx/core/opcodes.cpp b/bsnes/snes/chip/superfx/core/opcodes.cpp
index 7d2f13a..3b14d81 100644
--- a/bsnes/snes/chip/superfx/core/opcodes.cpp
+++ b/bsnes/snes/chip/superfx/core/opcodes.cpp
@@ -366,7 +366,7 @@ template<int n> void SuperFX::op_mult_r() {
   regs.sfr.s = (regs.dr() & 0x8000);
   regs.sfr.z = (regs.dr() == 0);
   regs.reset();
-  if(!regs.cfgr.ms0) add_clocks(2);
+  if(!regs.cfgr.ms0) add_clocks(cache_access_speed);
 }
 
 //$80-8f(alt1): umult rN
@@ -375,7 +375,7 @@ template<int n> void SuperFX::op_umult_r() {
   regs.sfr.s = (regs.dr() & 0x8000);
   regs.sfr.z = (regs.dr() == 0);
   regs.reset();
-  if(!regs.cfgr.ms0) add_clocks(2);
+  if(!regs.cfgr.ms0) add_clocks(cache_access_speed);
 }
 
 //$80-8f(alt2): mult #N
@@ -384,7 +384,7 @@ template<int n> void SuperFX::op_mult_i() {
   regs.sfr.s = (regs.dr() & 0x8000);
   regs.sfr.z = (regs.dr() == 0);
   regs.reset();
-  if(!regs.cfgr.ms0) add_clocks(2);
+  if(!regs.cfgr.ms0) add_clocks(cache_access_speed);
 }
 
 //$80-8f(alt3): umult #N
@@ -393,7 +393,7 @@ template<int n> void SuperFX::op_umult_i() {
   regs.sfr.s = (regs.dr() & 0x8000);
   regs.sfr.z = (regs.dr() == 0);
   regs.reset();
-  if(!regs.cfgr.ms0) add_clocks(2);
+  if(!regs.cfgr.ms0) add_clocks(cache_access_speed);
 }
 
 //$90: sbk
diff --git a/bsnes/snes/chip/superfx/timing/timing.cpp b/bsnes/snes/chip/superfx/timing/timing.cpp
index aae7820..3f493d0 100644
--- a/bsnes/snes/chip/superfx/timing/timing.cpp
+++ b/bsnes/snes/chip/superfx/timing/timing.cpp
@@ -72,14 +72,17 @@ void SuperFX::update_speed() {
   if(clockmode == 2) {
     cache_access_speed  = 1;
     memory_access_speed = 5;
-    regs.cfgr.ms0 = 0;  //cannot use high-speed multiplication in 21MHz mode
     return;
   }
 
   //default: allow S-CPU to select mode
   cache_access_speed  = (regs.clsr ? 1 : 2);
   memory_access_speed = (regs.clsr ? 5 : 6);
-  if(regs.clsr) regs.cfgr.ms0 = 0;  //cannot use high-speed multiplication in 21MHz mode
+  //According to docs, CLSR and MS0 should not both be set to 1.
+  //Previously it was believed that setting CLSR forced MS0 to 0, but
+  //hardware tests show that this is not the case. It is possible that
+  //multiplication may not work reliably when CLSR and MS0 are both set.
+  //if(regs.clsr) regs.cfgr.ms0 = 0;
 }
 
 void SuperFX::timing_reset() {

93143
Posts: 1361
Joined: Fri Jul 04, 2014 9:31 pm

Re: GSU revision comparison test

Post by 93143 » Thu Jun 25, 2015 6:20 pm

AWJ wrote:The following patch makes bsnes match hardware exactly:
Really? Awesome.

Why was there a difference with high-speed multiply turned off?
qwertymodo wrote:[redacted]
Uh... I'm not trying to re-port Doom, if that's what you were trying to imply... I'm referring to that shmup I keep dragging into the conversation and then refusing to name.

AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ » Thu Jun 25, 2015 6:29 pm

93143 wrote:Why was there a difference with high-speed multiply turned off?
Because low-speed multiply was always taking an additional 10MHz cycle, rather than either a 10MHz or 20MHz cycle depending on the clock multiplier.

ARM9, can you write another program that tests these two instructions?

Code: Select all

//$9f(alt0): fmult
void GSU::op_fmult() {
  uint32_t result = (int16_t)regs.sr() * (int16_t)regs.r[6];
  regs.dr() = result >> 16;
  regs.sfr.s  = (regs.dr() & 0x8000);
  regs.sfr.cy = (result & 0x8000);
  regs.sfr.z  = (regs.dr() == 0);
  regs.reset();
  step(4 + (regs.cfgr.ms0 << 2));
}

//$9f(alt1): lmult
void GSU::op_lmult() {
  uint32_t result = (int16_t)regs.sr() * (int16_t)regs.r[6];
  regs.r[4] = result;
  regs.dr() = result >> 16;
  regs.sfr.s  = (regs.dr() & 0x8000);
  regs.sfr.cy = (result & 0x8000);
  regs.sfr.z  = (regs.dr() == 0);
  regs.reset();
  step(4 + (regs.cfgr.ms0 << 2));
}
Because I'm fairly sure they're also wrong in bsnes (for one thing, they take more cycles in high-speed mode than in low-speed mode)

93143
Posts: 1361
Joined: Fri Jul 04, 2014 9:31 pm

Re: GSU revision comparison test

Post by 93143 » Thu Jun 25, 2015 6:36 pm

AWJ wrote:Because low-speed multiply was always taking an additional 10MHz cycle, rather than either a 10MHz or 20MHz cycle depending on the clock multiplier.
Thanks; I see that now. Apparently I'm fuzzy-headed today...

Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: GSU revision comparison test

Post by Near » Thu Jun 25, 2015 9:47 pm

This seems like a really convoluted way to improve GSU timing ...

AWJ looks at instructions, ARM9 writes tests, qwertymodo runs tests and reports numbers, and I apply submitted patches.

... but hey, if it improves the emulation, then I'm all for it.

> I do recall hearing emulators like ZSNES and SNES9X do not simulate the cache at all.

That is correct. It's a lot more than one cache, too.

There's a ROM buffer cache, RAM buffer cache, primary pixel buffer cache, secondary pixel buffer cache, and 16x16 instruction cache. Nobody else emulates any of that (well, nobody can inspect what nocash is doing, so I guess we'd have to ask him ...), but I emulate it all. Just that, as you're seeing, it's never been compared to real hardware timings, so it certainly has its issues.

The one thing I don't know how to emulate is what happens when the secondary pixel buffer is full and you are executing a tight loop out of RAM? Does it stall the pixel cache (least likely), stall the CPU loop, or interleave the two operations?

qwertymodo
Posts: 775
Joined: Mon Jul 02, 2012 7:46 am

Re: GSU revision comparison test

Post by qwertymodo » Thu Jun 25, 2015 10:27 pm

93143 wrote:
qwertymodo wrote:[redacted]
Uh... I'm not trying to re-port Doom, if that's what you were trying to imply... I'm referring to that shmup I keep dragging into the conversation and then refusing to name.
Sorry, the first thing to pop into my mind was that thread Espozo started about Doom awhile back.

93143
Posts: 1361
Joined: Fri Jul 04, 2014 9:31 pm

Re: GSU revision comparison test

Post by 93143 » Fri Jun 26, 2015 12:36 am

Yeah, I remember. I seem to recall getting really into the question of how much picture one could stuff through VBlank with different levels of compromise. That was not meant to imply that I was going to do something about it myself - I'm already so busy and/or burnt-out that I can scarcely find time to work on the shmup.

Sorry if I disappointed you...

ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: GSU revision comparison test

Post by ARM9 » Fri Jun 26, 2015 9:36 am

AWJ wrote:ARM9, can you write another program that tests these two instructions?
Here you go: https://dl.dropboxusercontent.com/u/134 ... f/mult.sfc
Source: https://github.com/ARM9/snesdev/tree/master/gsu/mult
There might be some cache issues involved in this one, writing 0 to the Go flag in the GSU status/flags register doesn't invalidate the cache in bsnes. Supposedly when doing so all cache flags are cleared and the CBR is set to 0x0000, I'm writing some cache tests to help verify.
byuu wrote:This seems like a really convoluted way to improve GSU timing ...
haha yeah, I need to make a dev cart, but I'm not yet sure what to get in terms of rom and programmer.
byuu wrote:The one thing I don't know how to emulate is what happens when the secondary pixel buffer is full and you are executing a tight loop out of RAM? Does it stall the pixel cache (least likely), stall the CPU loop, or interleave the two operations?
I don't know how to go about testing that yet, any ideas?

AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: GSU revision comparison test

Post by AWJ » Fri Jun 26, 2015 9:59 am

ARM9 wrote:
AWJ wrote:ARM9, can you write another program that tests these two instructions?
Here you go: https://dl.dropboxusercontent.com/u/134 ... f/mult.sfc
Source: https://github.com/ARM9/snesdev/tree/master/gsu/mult
There might be some cache issues involved in this one, writing 0 to the Go flag in the GSU status/flags register doesn't invalidate the cache in bsnes. Supposedly when doing so all cache flags are cleared and the CBR is set to 0x0000, I'm writing some cache tests to help verify.
byuu wrote:This seems like a really convoluted way to improve GSU timing ...
haha yeah, I need to make a dev cart, but I'm not yet sure what to get in terms of rom and programmer.
byuu wrote:The one thing I don't know how to emulate is what happens when the secondary pixel buffer is full and you are executing a tight loop out of RAM? Does it stall the pixel cache (least likely), stall the CPU loop, or interleave the two operations?
I don't know how to go about testing that yet, any ideas?
Why does this ROM depend on cache behaviour when the previous one didn't?

Post Reply