SPC700 instruction cycle breakdown

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Revenant
Posts: 462
Joined: Sat Apr 25, 2015 1:47 pm
Location: FL

Re: SPC700 instruction cycle breakdown

Post by Revenant »

Okay, looking again, I indeed had NMI enabled, which did affect the results. I uploaded a new version of the ROM.

Also, the reason $DA and higher were failing was because I wasn't waiting on the SMP program to write zero to $F4, so the CPU would write to the port, then the SMP would write $f1 to clear the input port, then the CPU would time out waiting for a response to something that the SMP never actually saw.

I fixed that and ended up seeing something strange on the hardware: after writing $CA, $DA, $EA, or $FA, the CPU<->SMP communication seems to break down and zero is never read from $2140 again. (Instead, I get a constant $CC, which tells me that attempting to restart the SMP program by forcing a jump to $200 may or may not have actually succeeded). See this screenshot

The actual values returned when I test each of those four individually (averaged over several runs each):
$CA -> $1A40
$DA -> $1AE2
$EA -> $1CCE
$FA -> $2000

Hopefully that's still useful. I'm not really sure if the weird port issue that happens after those four specific values is somehow my fault or not.

Here's the somewhat messy source to the CPU-side program. Hopefully it's clear what it's actually attempting to do. The SMP code is identical to what you already posted.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

Thanks once again.

It looks like on your particular SNES, when TEST is >= $CA the IPL ROM's RAM-clearing loop works (allowing one test to succeed) but some later step, either the IPL ROM comms loop or executing out of RAM, is hanging the SMP.

At any rate, it looks like nocash is approximately correct: TEST d6-d7 seems to affect access speed for both I/O ports and IPL ROM, and TEST d4-d5 affects access speed for RAM. TEST = $5A takes almost exactly twice as long as TEST = $0A, and TEST = $FA (when it works without crashing) takes almost exactly ten times as long as TEST = $0A.

Now to see if I can implement that in bsnes and get the SMP timer tests (which are apparently more precise?) to match...
Revenant
Posts: 462
Joined: Sat Apr 25, 2015 1:47 pm
Location: FL

Re: SPC700 instruction cycle breakdown

Post by Revenant »

AWJ wrote:It looks like on your particular SNES, when TEST is >= $CA the IPL ROM's RAM-clearing loop works (allowing one test to succeed) but some later step, either the IPL ROM comms loop or executing out of RAM, is hanging the SMP.
Considering my SNES is an old SHVC-CPU-01, I wonder if running the ROM on consoles with a separate vs. integrated APU would make a difference here. Or maybe my specific unit is just broken in a really unimportant way, who knows.

Either way, nice to see that the numbers look good. Hopefully matching it up with the other test ROMs isn't too much of a challenge.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

Image

Well how about that. It looks like nocash bested myself, anomie, and blargg in understanding the TEST register's top four bits. His model also passes all of blargg's test_*speed ROMs. Oh well, blargg at least figured out RAM disable which nocash didn't, heehee.

Great catch bringing this stuff up, AWJ. I'd have missed both Overload's PDF and nocash's TEST documentation.

This gives us a wonderful test opportunity as well. You know how (x)+ writes don't read the timer? Well if we execute enough (x)+ instructions with RAM at 2 (5 waits, 20% speed) and I/O at 3 (10 waits, 10% speed), then we can find out more information on what that cycle is actually doing. It might even end up being a 100% speed cycle. We can alternate between X fetching $fd and $00 for comparison as well.

The hard part will be trying to model emulation of when real hardware will lock up. It definitely varies per SNES, which makes this way harder. I'm guessing our best bet is simply to print a warning to a debugger/terminal saying TEST.d4-d7!=0. But, it'll always be one way to detect an emulator ... just, not so practical once you confirm it's not :P

Note: if anyone else is reading this thread and doesn't know ... the reason the numbers don't match perfectly is because the real SNES has separate oscillators for the CPU and APU. These values drift on real hardware. Adjusting the values in emulation subtly change the output values as you would expect. Trying to match the current oscillator rates of 20 year old hardware that's undoubtedly drifted somewhat out of spec and has a natural variance between parts is pretty silly. All we need to do is get really close, which we have.
Agree that that table is probably nonsense. Nocash claims NOP's idle cycle has "RAM timing", but blargg's test_speed.smc does NOPs and isn't affected by d4-d5 at all.
I think we can say for sure it's junk now.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

Great work! Are you using the address on the bus (based on Overload) to adjudicate which speed to use for "idle" cycles, or are you treating some of them as always-IO/ROM? And are you applying my 4-for-5 and 8-for-10 hypothesis for the timers?

Sorry Revenant, I've got one more test for you to run. This one can use exactly the same CPU-side program as before, only the SMP-side is different:

Code: Select all

    mov $f1,#$b0        ; enable IPL ROM; clear input ports
    mov x,#$7f          ; set up x for loop
    mov $f4,#$00        ; clear output port $F4
:     mov a,$f5
    beq :-
    mov $f0,$f4         ; set TEST
:     mul ya            ; (fetch, ?dummy operand?, 7 ?idle?)
      inc x             ; (fetch, ?dummy operand?)
    bmi :- OR bne :-    ; (fetch, operand, 2 ?idle? when branch taken)
    mov $f0,#$0a        ; reset TEST to default
    jmp $ffc9           ; jump direcly to IPL ROM handshake (skip RAM clear)
I'd like you to test both with BMI and with BNE on line 9, because nocash claims there's a difference between BPL/BVC/BCC/BNE and BMI/BVS/BCS/BEQ (which seems highly unlikely to me, but whatever). Use every value of TEST you can that doesn't lock up your SNES's SMP (including $8A/$9A/$AA/$BA, just in case they work on your SNES with this opcode sequence)

The goal is to see if there are any "idle cycles" that are always port/ROM-speed regardless of the address (we're already pretty sure that there aren't any that are always RAM-speed)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

> Are you using the address on the bus (based on Overload) to adjudicate which speed to use for "idle" cycles, or are you treating some of them as always-IO/ROM?

Code: Select all

uint waitStates[] = {1, 2, 5, 10};
if((addr & 0xffc0) == 0xffc0 && iplromEnable) return waitStates[TEST.bits(6,7)];  //ROM
if((addr & 0xfff0) == 0x00f0) return waitStates[TEST.bits(6,7)];  //IO
return waitStates[TEST.bits(4,5)];  //RAM
As per Overload's PDF, I'm acting like there's no such thing as pure idle cycles, except of course for the weird (x)+ case ... not sure what to do with that one.

> And are you applying my 4-for-5 and 8-for-10 hypothesis for the timers?

I left the timer step as (1 << TEST.bits(6,7)) + (2 << TEST.bits(4,5)); because anything else broke test_timer_speed.

> Sorry Revenant, I've got one more test for you to run.

No interest in that (x)+ test with separate ROM+IO/RAM speeds? :/
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

byuu wrote:I left the timer step as (1 << TEST.bits(6,7)) + (2 << TEST.bits(4,5)); because anything else broke test_timer_speed.
That means we aren't done yet. Bragging about still passing blargg's timer speed tests when you haven't actually changed the timer behaviour is cheating :mrgreen:

Also, that lookup table should really be static const.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

It is static const, don't worry.

I wasn't meaning to cheat on timers. I just meant that emulating the wait states passed Revenant's tests without breaking blargg's.

The timer thing is trickier. The number of cycles it takes to advance the stage 1 counters is based off both settings, and doesn't seem to care how long RAM vs IO accesses take. In other words, it only cares how many cycles are executed -- not how many wait states each cycle takes.

Code: Select all

(1 << clockSpeed[ROM/IO speed]) + (2 << timerSpeed[RAM speed])

00 =  1 wait state
01 =  2 wait states
10 =  5 wait states
11 = 10 wait states

(.d7,d6)   (.d5,d4)

(1 << 0) + (2 << 0) =  3
(1 << 0) + (2 << 1) =  5
(1 << 0) + (2 << 2) =  9
(1 << 0) + (2 << 3) = 17

(1 << 1) + (2 << 0) =  4
(1 << 1) + (2 << 1) =  6
(1 << 1) + (2 << 2) = 10
(1 << 1) + (2 << 3) = 18

(1 << 2) + (2 << 0) =  6
(1 << 2) + (2 << 1) =  8
(1 << 2) + (2 << 2) = 12
(1 << 2) + (2 << 3) = 20

(1 << 3) + (2 << 0) = 10
(1 << 3) + (2 << 1) = 12
(1 << 3) + (2 << 2) = 16
(1 << 3) + (2 << 3) = 24
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

byuu wrote:It is static const, don't worry.

I wasn't meaning to cheat on timers. I just meant that emulating the wait states passed Revenant's tests without breaking blargg's.

The timer thing is trickier. The number of cycles it takes to advance the stage 1 counters is based off both settings, and doesn't seem to care how long RAM vs IO accesses take. In other words, it only cares how many cycles are executed -- not how many wait states each cycle takes.
Again, you're confusing your emulator implementation with the hardware. My hypothesis is that RAM cycles and port/ROM cycles each clock the timers according to the length of time they individually take, and that formula of yours is simply an artifact of the fact that blargg's test program happens to execute exactly twice as many RAM cycles as it does port/ROM cycles. If you modified the test program so that the ratio of RAM cycles to port/ROM cycles it executed was different, that formula would no longer work (i.e. higan wouldn't produce the same results as hardware in the modified test).
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

Well, my attempts at decoupling it have failed. If I try to tick the timers per wait state, it doesn't matter how many ticks are required per iteration for stage0->stage1 tick, the test always fails. And it won't tell me the value of the failed test because that would be much too convenient, wouldn't it?

Hopefully you'll have better luck than I in getting the timers to work as a side effect of improving TEST.d4-d7 as nocash stated.
Revenant
Posts: 462
Joined: Sat Apr 25, 2015 1:47 pm
Location: FL

Re: SPC700 instruction cycle breakdown

Post by Revenant »

I really should have given the last test ROM a less clever (or at least better-punctuated) name, because thanks to my day job, I keep reading it as "SMPTEsttest".

Anyway,
AWJ wrote:Sorry Revenant, I've got one more test for you to run. This one can use exactly the same CPU-side program as before, only the SMP-side is different:
http://revenant1.net/smpidletest_bmi.sfc
http://revenant1.net/smpidletest_bne.sfc

http://imgur.com/a/E2Q8O

(spoiler alert: they're the same)

$CA and up actually work this time (with considerably higher values than I was expecting), but I assume that's due to the test register getting restored before going back to IPL land. $8A, etc. are still problematic.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

Revenant wrote:I really should have given the last test ROM a less clever (or at least better-punctuated) name, because thanks to my day job, I keep reading it as "SMPTEsttest".

Anyway,
AWJ wrote:Sorry Revenant, I've got one more test for you to run. This one can use exactly the same CPU-side program as before, only the SMP-side is different:
http://revenant1.net/smpidletest_bmi.sfc
http://revenant1.net/smpidletest_bne.sfc

http://imgur.com/a/E2Q8O

(spoiler alert: they're the same)

$CA and up actually work this time (with considerably higher values than I was expecting), but I assume that's due to the test register getting restored before going back to IPL land. $8A, etc. are still problematic.
Yeah, something very strange is happening with $CA and up on your SNES--probably no coincidence that the previous ROM and blargg's test ROMs lock up on your machine.

Anyway, let's just look at $0A, $1A, $4A and $5A. Translating the cycle counts into decimal:

Code: Select all

0A (IO 0, RAM 0): 666 cycles
1A (IO 0, RAM 1): 930 cycles
4A (IO 1, RAM 0): 1061 cycles
5A (IO 1, RAM 1): 1324 cycles
As expected, 5A takes almost exactly twice as many cycles as 0A. 1A and 4A take intermediate amounts of cycles, which means that some of the cycles in the loop are IO/ROM cycles even though the loop is running entirely out of RAM. We can use the ratios of these four cycle counts to determine how many of the cycles are IO/ROM and how many are RAM.

Let x be the number of cycles out of the 15 cycles in the loop that are IO/ROM cycles. Then, solve either of these linear equations for x:

Code: Select all

666 * x / 15 + 1324 * (15 - x) / 15 = 930 (solution: x = 8.98)
666 * (15 - x) / 15 + 1324 * x / 15 = 1061 (solution: x = 9.00)
How about that, a nice round number. It looks like 9 cycles are IO/ROM cycles and the remaining 6 cycles are RAM cycles. My educated guess would be that the first [PC+1] cycle of each one-byte instruction is a "real" read cycle (with speed depending on what memory it's accessing), and the additional idle cycles of the mul ya (and presumably other one-byte instructions that take 3 or more cycles) and the two idle cycles of a taken branch are IO/ROM cycles.

I think that's what nocash is getting at with his chart--it's meant to be the number of idle cycles of each instruction that are always IO/ROM cycles. But some of the numbers in his chart don't make a lot of sense (mainly the conditional branches and some of the stack instructions) and will have to be re-checked.

Anyway, now we have a straightforward method to test the idle cycles of any SPC700 instruction we want :mrgreen: Design a loop that runs only that instruction and ones with known timing, run it with TEST = 0A, 1A, 4A and 5A, and do the ratio math.

ETA: Another educated guess I would make is that "always IO/ROM" idle cycles are "real" idle cycles that don't affect internal SMP registers (i.e. don't reset the timers) regardless of what address appears on the external bus.

Just for byuu, here's a sequence to test the mov (x)+ and mov (x) instructions:

Code: Select all

    mov $f1,#$b0        ; enable IPL ROM; clear input ports
    mov $00,#$00        ; set up counter for loop
    mov $f4,#$00        ; clear output port $F4
:     mov a,$f5
    beq :-
    mov $f0,$f4         ; set TEST
:     mov x,#$01        ; (fetch, operand)
      mov a,(x)+ OR mov (x)+,a OR mov a,(x) OR mov (x),a (fetch, ?, ?, ?)
      inc $00           ; (fetch, operand, read, write)
    bne :-              ; (fetch, operand, idle, idle)
    mov $f0,#$0a        ; reset TEST to default
    jmp $ffc9           ; jump direcly to IPL ROM handshake (skip RAM clear)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

As-is (with RAM vs ROM/IO timing supported):
Image

With turning the two idle cycles in branch instructions, plus all but one in the MUL instruction to always-ROM/IO:
Image
(note that this is not concrete proof that branches are 2, multiplies are 7.)

Given that emulation doesn't have the lockup conditions, these test ROMs can go ahead and test 8a,9a,aa,ba. Or if we just want to hone in on these cycle counts, we only really need to test 0a-7a, which seems much more reliable. Clearly ca-fa is behaving pathologically on Revenant's SNES console.

But ... this is just getting insane. Rabbit holes are fun, but this is so far beyond anything that will ever be useful to any degree. It was bad enough that I/O cycles became read cycles with -very- strange addresses, but now there's some new mystery where some are secretly 'true' I/O cycles (even though the addresses show up on the bus) and some aren't, meaning Overload's logic analyzer can't even reveal this?

And apparently nocash's table won't work for us, as there's obvious errors like with BNE/BMI, and no test ROMs to verify any of it. So we pretty much have to scrap the whole thing.

Are we really going to try and emulate this exactly, and write test ROMs to time every single SPC700 instruction (well, you can break them into addressing mode groups, so probably 60 or so tests) ... even the ones that are insanely hard to test like TCALL, RTI, etc (and not test SLEEP/STOP since that's impossible)? Is Revenant going to be up for writing all of those tests? Ideally, we need it to be automated, too. Just dumping numbers on the screen is too laborious. Yet due to CPU/APU oscillator differences, we can't do exact matches. So he'll have to use ranged values. If the value is within +/- ~5%, it's a pass. Otherwise it's a failure.

And even if we do go this far ... we're still not going to know -which- cycles in each instruction are forced to use the ROM/IO timing, and which can use RAM timing. Yes, we can probably guess correctly, but they're gonna be just that ... guesses. But then I suppose it really doesn't matter if side effects are impossible to observe anyway.

One thing is ... I'd like to propose we stop calling TEST.d4,d5 "RAM timing" and TEST.d6,d7 "ROM/IO timing." It doesn't fit with this new extra behavior. Now it's more like "ROM/IO/IDLE timing." That's too clunky. I think we should change it to d4,d5 = external wait-states/timing, d6,d7 = internal wait-states/timing. Given that the APU RAM is attached to the DSP (or really, just not inside of the SMP itself is all that matters here), and the SMP goes through the DSP to get to it. Whereas IPLROM, I/O, and true idle are all internal to the SMP. Are you guys in agreeance with that? If not, please suggest something better :)
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

byuu wrote:Are we really going to try and emulate this exactly, and write test ROMs to time every single SPC700 instruction (well, you can break them into addressing mode groups, so probably 60 or so tests) ... even the ones that are insanely hard to test like TCALL, RTI, etc
We only need to test the instructions that have at least one "idle" cycle, and if we start seeing obvious patterns we can skip/assume a lot of them. TCALL is not hard to test at all; just disable IPL ROM and put a suitable vector into high RAM. RTI/RTS are easy to test once we know the timing for PUSH.

Frankly I think blargg-style pass/fail test ROMs are dumb and harmful, and I'm surprised if you don't agree after all the trouble you've gone through trying to make your Game Boy both pass various blargg tests and run difficult commercial games at the same time. Also, look how bsnes passed all of blargg's SMP $F0 test ROMs for ten years despite having a completely incorrect conception of what the register is actually doing. Test ROMs are all well and good, but doing an opaque series of operations, CRC32ing the results (adding a further layer of opaqueness) and finally printing either "pass" or "fail" doesn't contribute to hardware understanding, it just encourages emulator authors to game the tests.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

> We only need to test the instructions that have at least one "idle" cycle, and if we start seeing obvious patterns we can skip/assume a lot of them.

Okay, so then the question is ... are we really going to do this? We need to determine:

OR1,EOR1 addr:bit
MOV1 addr:bit (may not be the same as the former)
Absolute indexed
Branch
Branch on bit
CBNE, DBNZ
DBNZ Y-- (quite different from the others; though we should really test all four separately)
BRK
CALL absolute
PCALL
TCALL
CMC
DAA,DAS
MOVW
CMPW
CMP immediate to direct page
Direct
Direct indexed
DIV
XCN
CLR<flag>,SET<flag>
Implied instructions (NOP, CMC, TAX [mov x,a], etc)
Indexed indirect
Indirect indexed
(x) (should do reads and writes separate)
(x)+ (should do reads and writes separate)
(x),(y) (should do CMP (x),(y) separate but could go off other cases from before)
JMP indirect,x
MUL
PUSH
POP
RTI
RTS
TSB,TRB

STOP (untestable)
SLEEP (untestable)

The current test from Revenant is no good as there's more than one instruction with idle cycles, and right now, we need to start from nothing if we're gonna do this right. The first test should only test one instruction and be unambiguous.

And as you said, certain ones need to be tested before we can test others.

> Also, look how bsnes passed all of blargg's SMP $F0 test ROMs for ten years despite having a completely incorrect conception of what the register is actually doing.

Well that's emulation in a nutshell. I could be working on higan until I keel over at the age of 97 (as if, have you seen my diet?), and there would still surely be plenty of things completely wrong in the SNES core.

Let's not lose sight of the fact we're exhausting weeks of effort here to emulate a TEST register that not one single game, licensed or not, ever actually uses. The only thing that uses TEST are, unsurprisingly, test ROMs.

I'm not saying it's not worth the effort. But let's not pretend that this is a serious flaw in emulation either.

> Test ROMs are all well and good, but doing an opaque series of operations, CRC32ing the results (adding a further layer of opaqueness)

I've ranted plenty about CRC32s. blargg's DMG APU ones in particular that print ten pages of numbers, then a PASS/FAIL CRC32. The source code doesn't have a single comment on what the test is doing, what it's showing, or how to pass it. I get the impression blargg himself didn't actually know.

I actually offered a $50 bounty if anyone could help pass his tests in my DMG core, and had no takers. I gave up after a week of trying, but I suspect the true answer is finer grained cycle timing for latching registers and such.

> it just encourages emulator authors to game the tests.

You can find dozens of rants by me about emudevs being encouraged to pass tests blindly at the expense of working on actually important things; and this situation being hurt by sites like tasvideos ranking emulators based on a raw percentage of how many test ROMs they pass. (And I say this as someone who gets a 100% on the SNES tests list, so you know it's not just me being bitter about scoring badly.)

Nevermind that one test is "basic ADD/SUB flags are correct" and the other is "correct number of dummy read cycles versus internal I/O cycles for ADD when using SMP TEST register." If they had two tests, both would be worth 50% of your total score. That's insane.

However ... I still want these test ROMs for emudev use. I don't want to keep a big text document to familiarize myself with each test every time I run it, and what I should generally be expecting, and I want an easy regression tester. I may clean up the SMP core again in another five years, and I want a set of ROMs I can run to hopefully catch any mistakes I've made.
Post Reply