SPC700 instruction cycle breakdown

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

SPC700 instruction cycle breakdown

Post by AWJ »

I just noticed this document on kindred's homepage:

http://www.crazysmart.net.au/kindred/fi ... nst_op.pdf

Is this table based on some official Sony document or on logic analyzer traces? The total cycles taken by each instruction agree with bsnes/higan, but the order of operations for many instructions (particularly complex ones like the calls) is completely different.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: SPC700 instruction cycle breakdown

Post by lidnariq »

Seems likely, given the "Kris Bleakley 2014" in the corner, that he'd be the person to chase down to ask...
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

Is that Overload? Just want to make sure before I send a PM...
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: SPC700 instruction cycle breakdown

Post by lidnariq »

Whois data for the domain implies yes.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

Yes, Kris Bleakley is Overload. He is the author of Kindred (formerly Super Sleuth.)

The cycle timings in higan are from blargg. He wrote test ROMs that would test every possible instruction (obviously ones like SLEEP/STOP were not possible), and would report errors if your cycle orderings were wrong.

I can't recall if he said he tested CALL instructions and the like, or just the ALU instructions. I do remember that INCW/DECW were the most surprising ones of all (to me, anyway. I get why it's nicer for the CPU design, but boy is DECW weird.)

Unfortunately, I lost those test ROMs, and nobody else seems to have them.

So, in absense of a test ROM to the contrary, I'd say we should stick with blargg's results that are in higan.

(Aside: I'm still really disappointed there's no external pin for the S-SMP to trigger interrupts. I would really like to emulate SLEEP properly.)
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

Okay, "many instructions" appears to have been an exaggeration. As far as I can see after a bit more comparison, there are two differences between bsnes behaviour and Overload's document:

1) According to Overload, there's no such thing as an "internal operation" on the SPC700. Like the 6502, every cycle is either a read or a write; the ones which bsnes treats as "internal" are mostly re-reads from the previously read address. However, there's apparently something special about the MOV (X)+,A opcode--I've PMed Overload to ask for clarification, but it looks like he's saying it does do a dummy read from (X) (just like all other store instructions do) but that read somehow bypasses the internal registers, thus it can't be detected using the reset-on-read timer registers.

2) Call and return instructions; the timing of the stack accesses is totally different between bsnes and Overload's document. Verifying these via pure software testing would be stupendously difficult, because the stack is fixed to page $01 and there aren't any MMIO registers there. You'd have to point one of the DSP voices at stack memory, play a sample timed such that the DSP reads the stack just as the SMP is writing to it, and examine the echo buffer. Frankly the timing for these instructions in bsnes look like guesses to me (the "internal" operations in RET come after the stack pops? Highly unlikely) and Overload's look a lot more realistic.

On a tangent, I notice one of the fork maintainers that you haven't yet banned from your forum :lol: called you out for something that I think I've also pointed out (and I've changed in bsnes-classic). higan treats the SMP "output" communication ports (the ones the S-CPU can read via $2140-2143) as if the S-CPU can directly read four bytes of APU RAM, which seems utterly impossible hardware-wise. If I understand the APU correctly (I've never done any SNES sound programming, or sound programming period) here's a fairly easy way to test whether higan is correct:

- On the SMP, write something distinctive to $F4-$F7, e.g. DE AD BE EF
- Set the DSP echo buffer so that it wraps around and spills into zero page (e.g. start at $F900 with a size of $800)
- Play a nice complex looping sample with echo enabled
- Once the sample is playing, on the S-CPU side, read $2140-2143 in a loop and display the read values onscreen

If higan's implementation is correct, the values read by the S-CPU will constantly change as the DSP writes into the echo buffer. If I'm correct, the S-CPU will only see the values written by the SMP in step 1.

I can write a program to do this, but someone who's already got a bit of experience with APU programming (Revenant?) can probably do it a lot more easily and quickly.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

> According to Overload, there's no such thing as an "internal operation" on the SPC700. Like the 6502, every cycle is either a read or a write; the ones which bsnes treats as "internal" are mostly re-reads from the previously read address.

That's ... I really have trouble believing something that significant snuck by everyone for that long, especially with blargg's tests to determine individual cycle ordering for SPC700 opcodes. To be clear, I'm not saying Overload is wrong. I would just be stunned that blargg missed that detail when he tested this.

> and Overload's look a lot more realistic.

I can't say whether mine are right or wrong, but I know better than to design things based on how they "should" work. That proves to actually be correct approximately 15% of the time.

If Overload confirms he used a logic analyzer to trace these out, or if he made some test ROMs he can provide, then that's evidence enough to go with his design. If not, then both approaches are guessing, and we should add this to a list of things that need hardware verification.

> higan treats the SMP "output" communication ports (the ones the S-CPU can read via $2140-2143) as if the S-CPU can directly read four bytes of APU RAM, which seems utterly impossible hardware-wise.

You may be right. Prove to me it's wrong and I'll make the change. Again, the guiding philosophy with the SNES core was, "don't make changes because they feel right, make them after verifying on hardware." I'm not changing that methodology after hitting 100% compatibility with this approach.

You're welcome to make changes you haven't verified are correct with your fork of v073 from 2010 if you like.

> I notice one of the fork maintainers that you haven't yet banned from your forum

If you're gonna bring that up here, you should mention why I did that:

https://board.byuu.org/viewtopic.php?p=12684#p12684
https://board.byuu.org/viewtopic.php?p=8333#p8333

If I let you talk to me like that, then I'd have to let everyone on my forum talk to me like that, or admit that helpful people get special privileges.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

byuu wrote:That's ... I really have trouble believing something that significant snuck by everyone for that long, especially with blargg's tests to determine individual cycle ordering for SPC700 opcodes. To be clear, I'm not saying Overload is wrong. I would just be stunned that blargg missed that detail when he tested this.
Almost all the "internal operations" indicated by Overload either read from PC or from the stack. The stack never has side effects on reads, and the PC only does if you're executing code out of the timer registers, which is fairly impossible to do in a controllable way.

I spotted another group of instructions where higan differs from Overload's doc: the (indirect),y addressing mode.

higan does:

Code: Select all

read direct page address
idle
read indirect address LSB
read indirect address MSB
read data
(write data)
Overload has:

Code: Select all

read direct page address
read indirect address LSB
read indirect address MSB
idle (re-read indirect address MSB)
read data
(write data)
For what it's worth, Overload's version matches the 6502 and 65816 (though the idle cycle is conditional on those CPUs) It also makes more sense--Y is added to the indirect address, not to the direct page address.

This document dates to 2007-04-21. That date corresponds fairly closely with the release of bsnes 0.020, which has this in the changelog:
- Corrected all S-SMP cycle timings to be hardware accurate. Thanks to blargg for creating an amazing test ROM that tested every possible opcode
All of the cycle timings in higan except one match this document exactly, including the ones which are indicated in the document as "guesses" (i.e., you guessed it, the various stack instructions). The one place higan doesn't match the 2007 document is (indirect),y (the document matches Overload's findings). I wonder if it's possible that you used the (indirect,x) timings for (indirect),y by mistake?
If Overload confirms he used a logic analyzer to trace these out, or if he made some test ROMs he can provide, then that's evidence enough to go with his design. If not, then both approaches are guessing, and we should add this to a list of things that need hardware verification.
I asked Overload via PM and he says he used a logic analyzer.
You may be right. Prove to me it's wrong and I'll make the change. Again, the guiding philosophy with the SNES core was, "don't make changes because they feel right, make them after verifying on hardware." I'm not changing that methodology after hitting 100% compatibility with this approach.
That was meant to be a beg to anyone else reading this thread to write that test ROM for me, not to you to change higan.
Revenant
Posts: 462
Joined: Sat Apr 25, 2015 1:47 pm
Location: FL

Re: SPC700 instruction cycle breakdown

Post by Revenant »

I can write a test ROM for the I/O behavior tomorrow night if nobody else does.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

Revenant wrote:I can write a test ROM for the I/O behavior tomorrow night if nobody else does.
Thanks, that would be great. For testing purposes, if you revert this commit in bsnes-classic then it should behave the same way as higan (i.e. CPU reads from $2140-2143 will access the actual APU RAM)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

Note that I reply in the order I read things. Later comments are after having read more.

> Almost all the "internal operations" indicated by Overload either read from PC or from the stack.

I noticed that in his PDF. But until he chimes in here to tell us he's confirmed this on real hardware (and hopefully how), it's conjecture.

> The stack never has side effects on reads, and the PC only does if you're executing code out of the timer registers, which is fairly impossible to do in a controllable way.

If it's completely impossible to observe then, loathe as I am to say this (I'll suppress my gag reflex for it) ... it's not really relevant to emulation then ... so we'll need a logic analyzer to prove things before I'll change them.

If it is possible, then we can make a test ROM to prove the behavior.

> I wonder if it's possible that you used the (indirect,x) timings for (indirect),y by mistake?

Probably. It's also possible that test ROM from blargg didn't test (indirect),y; or that incredulously, the SPC700 didn't match the 65xx (since, you know, it's not a 65xx. Just a cleanroom clone of one with disguised opcode mnemonics.) I sure wish someone had that test ROM from him still.

> I asked Overload via PM and he says he used a logic analyzer.

Any chance he can chime in here himself?

I believe you, but I'd like the official record to be more than "a guy said another guy said he did that in a private message. Go find a random thread on an NES development forum to see for yourself."

I'm sorry to be a pain about this. You need to understand that I tried the opposite approach of just adding things that felt right to my Game Boy core, because it's not a system I'm set up to run my own hardware tests on and well ... it's been a disaster. Compatibility is worse today than it was when I started.

But ... again, I'll trust you and start working on this change. The way I usually do this is to change idle() so that it does a dummy read from PC without incrementing. For the weird ones where it reads from some other place instead of idling, we can replace idle() with the reads they are doing instead.

> That was meant to be a beg to anyone else reading this thread to write that test ROM for me, not to you to change higan.

That's fair! It's good to raise issues about corner cases instead of just assuming we are right already.

And just to reiterate the above again, I'm not saying I am right. If I am, I'm not going to gloat. If I'm not, I'll fix it and thank you and the tester for finding the issue. I just want proof before making changes.

I know every bsnes/higan forker loves to put passive aggressive comments into their Git changelogs, but one of these days ... even if you guys are right most of the time ... eventually, you're gonna be wrong. And now you're gonna have a comment about how it's literally impossible my behavior was correct, and lo and behold.

Notice how I never say in absolute terms that I'm right. That's because I don't know until I confirm it for myself. It's really better to be cautious about these things. No matter how unreasonable you think I am, everyone can see for themselves how SNES emulation used to be prior to my unorthodox approaches.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

(sorry for double post, but the former was too long already.)

Went through the document, tried to identify all discrepancies, noted concerns with Overload's document.

Overall, I don't think idle() is going to cut it. Even in the cases where it's a direct idle, the address bus indicates it's re-reading the last fetched byte. So it sounds like having a memory address register to keep track of the last read-from address would do the job ... but then there's cases like CALL that seem to have an idle cycle that reads from the stack *before* actually reading from the stack.

So right now, I'm not really sure the best way to handle these idle cycles correctly.

My biggest technical concern right now is ... when there are multiple reads from the same address, how do we know -definitively- which one is the value actually used? Like say you fetch the direct page address from PC+1, then there's another read from PC+1 ... what if the value differs between the two reads? Which read matters? I know, not likely, but still ... I'm guessing the second case.

Discrepancies found:

Code: Select all

AbsoluteBitModify:
  idle() => read(absolute)

AbsoluteIndexedWrite:
  consistency cleanup: perform absolute+index in-place

DirectWriteDirect:
  idle() => load(target)

DirectWriteImmediate:
  idle() => load(direct)

DirectReadWord:
  idle() => load(direct+0)

IndirectIndexed(Read,Write):
  idle() => load(direct+1)

IndirectXIncrement(Read,Write):
  second idle() => ... load(X) again?
  something about not affecting registers on the cycle before ...

IndirectXWriteIndirectY:
  second idle() => load(X)

Pull,PLP:
  second idle() => reads from SP without incrementing SP ...

Push:
  second idle() => unknown purpose

BBC,BBS,BNEDirect,BNEDirectDecrement,BNEDirectX:
  second fetch(), idle() => wrong ordering

BRK:
  move first idle() to top of function
  push PC, P before reading VA

JSPDirect:
  second idle() => goes after push; reads from SP

JSRAbsolute:
  second and third idle() => goes after push; reads from SP

JST:
  two idle()s go to top, one after read(absolute)

RTI,RTS:
  two idle()s go to top; second is a stack read

STW:
  consistency cleanup: turn into StoreWord(A)
Document concerns:

Code: Select all

  when there is an operand fetch followed by an idle cycle (eg DO fetch) ...
  Which fetch actually counts for DO? The first or the second?
  Same question for 19b

  17. Implied i: it's not made clear which instructions take how many cycles.
  I know the answers, but they should be documented, yes?

  19b. Relative rel: what does it mean for the data bus to have Offset on cycle 4?

  20b. Stack s
  Where is cycle 3 reading from?

  CLR, SET addr:bit do not appear to be documented?

  what the christ is with condition (5)? Madness >_>
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

byuu wrote:what the christ is with condition (5)? Madness >_>
I wonder if all the redundant reads are actually artifacts of the memory controller that Sony bolted on (remember that most SPC700 chips are MCUs with completely internal RAM and ROM and no external bus at all) and don't affect the internal registers. I'm waiting for a PM from Overload confirming my speculation, but based on the number of pins connecting the SMP to the DSP, it looks like it's a 6502-style bus, which has no way to distinguish a "read" from an "internal operation".

i.e. the SPC700 core itself is doing an idle cycle, and its internal registers don't see it as a read, but the external bus has no way to indicate that, so from outside the chip it looks like a re-read of the last address that was read from (the address pins are effectively "open bus").

In that case it would be safe to emulate them as true idle cycles, because no addresses that aren't SMP-internal have any side effects on reads.

Semi-offtopic: for the 6502, 65816 and SPC700, the opcode switch tables look a lot nicer and more logical if instead of putting the opcodes in numerical order you arrange them like this:

Code: Select all

case 0x00:
case 0x20:
case 0x40:
case 0x60:
case 0x80:
case 0xa0:
case 0xc0:
case 0xe0:
case 0x01:
case 0x21:
[....]
case 0x1f:
case 0x3f:
case 0x5f:
case 0x7f:
case 0x9f:
case 0xbf:
case 0xdf:
case 0xff:
It shouldn't make any difference to code generation/performance, because the compiler should see that there are 256 contiguous cases (even if they're in jumbled order) and compile it as a single indexed indirect jump.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: SPC700 instruction cycle breakdown

Post by Near »

> i.e. the SPC700 core itself is doing an idle cycle, and its internal registers don't see it as a read, but the external bus has no way to indicate that, so from outside the chip it looks like a re-read of the last address that was read from (the address pins are effectively "open bus").

That's a good working theory. It would also answer my question that the first cycle to read the same address would be the value used, and not the second.

What I'm wondering is if we can find a way to emulate this so that we can just say idle(); without having to say -where- it's reading every time. That would be really nice. Even better, if we can get the logic out of the SPC700 CPU core and into the bus handler of the SMP class. The main limitatin there are instructions like RETI where it's reading from the stack pointer -before- the first increment on its idle cycle ... a value that wouldn't be on the bus from a previous read.

> Semi-offtopic: for the 6502, 65816 and SPC700, the opcode switch tables look a lot nicer and more logical if instead of putting the opcodes in numerical order you arrange them like this:

I agree completely. The reason I went numerically was to fill in the table initially, where it would be very easy to spot any missing instructions. I never really bothered to redesign the tables to a grouped ordering after finishing. Partly because there'd be bikeshedding about which groups should go before which other groups.

For similar bikeshed-avoidance reasons, I've been ordering my actual instruction implementations alphabetically.

Well in any case if you haven't seen it and were interested, I've been cleaning up my SPC700 core since the start of v102 or so. Still a work in progress, but you can see it here:

https://gitlab.com/higan/higan/tree/mas ... sor/spc700

You probably won't be happy to find that I've reverted the BitField stuff. It didn't actually affect performance, and I didn't like that I couldn't pass BitField parameters to functions, because typeof BitField<0> != BitField<3>.

I did get rid of the "global state" aa, rd, dp, sp, bit, ya nonsense that was a hangover from when each cycle was a step inside of a state machine back in ... gods ... v020 or so.

The worst of it is still the disassemblers. The WDC65816 core is actually using snprintf still >_>
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: SPC700 instruction cycle breakdown

Post by AWJ »

byuu wrote:> i.e. the SPC700 core itself is doing an idle cycle, and its internal registers don't see it as a read, but the external bus has no way to indicate that, so from outside the chip it looks like a re-read of the last address that was read from (the address pins are effectively "open bus").

That's a good working theory. It would also answer my question that the first cycle to read the same address would be the value used, and not the second.

What I'm wondering is if we can find a way to emulate this so that we can just say idle(); without having to say -where- it's reading every time. That would be really nice. Even better, if we can get the logic out of the SPC700 CPU core and into the bus handler of the SMP class. The main limitatin there are instructions like RETI where it's reading from the stack pointer -before- the first increment on its idle cycle ... a value that wouldn't be on the bus from a previous read.

> Semi-offtopic: for the 6502, 65816 and SPC700, the opcode switch tables look a lot nicer and more logical if instead of putting the opcodes in numerical order you arrange them like this:

I agree completely. The reason I went numerically was to fill in the table initially, where it would be very easy to spot any missing instructions. I never really bothered to redesign the tables to a grouped ordering after finishing. Partly because there'd be bikeshedding about which groups should go before which other groups.

For similar bikeshed-avoidance reasons, I've been ordering my actual instruction implementations alphabetically.

Well in any case if you haven't seen it and were interested, I've been cleaning up my SPC700 core since the start of v102 or so. Still a work in progress, but you can see it here:

https://gitlab.com/higan/higan/tree/mas ... sor/spc700

You probably won't be happy to find that I've reverted the BitField stuff. It didn't actually affect performance, and I didn't like that I couldn't pass BitField parameters to functions, because typeof BitField<0> != BitField<3>.

I did get rid of the "global state" aa, rd, dp, sp, bit, ya nonsense that was a hangover from when each cycle was a step inside of a state machine back in ... gods ... v020 or so.

The worst of it is still the disassemblers. The WDC65816 core is actually using snprintf still >_>
I'm wondering whether it'd be better just to stick most of the instruction implementations directly inside the switch table, and only use functions for things that are actually reused across multiple instructions (the algorithms for adc/sbc/etc., and addressing modes that have the same timing across multiple instructions)

You can still parameterize SetFlag etc. with bitfields, you just pass the mask or the bit index (which are public static members of the bitfield classes) instead of a reference to the bitfield object. I.e. you can't do this:

Code: Select all

void SetFlag(Flag &flag, bool value)
{
  flag = value;
}

SetFlag(C, true)
SetFlag(V, false)
but you can do this:

Code: Select all

void SetFlag(uint8 mask, bool value)
{
  psw = (value ? psw | mask : psw & ~mask);
}

SetFlag(C.mask, true)
SetFlag(V.mask, false)
If you think the ".mask" is ugly you can hide it in a macro or something :mrgreen:
Post Reply