It is currently Tue Oct 24, 2017 12:58 am

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 86 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Tue Jun 27, 2017 3:18 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> I'm wondering whether it'd be better just to stick most of the instruction implementations directly inside the switch table

Especially if we used fallthrough and grouped ordering as you suggested previously, then one might as well. Doesn't really make much of a difference as I would sure hope that a single function call used only in one place would be inlined.

Of course if you really want to compress the code, there's always the microcoding approach. It would make understanding the code near impossible, but eg Bisqwit managed to get a 6502 core (with illegal ops) into less than 100 lines of code that way (and no, none of the lines were >80 characters.)

> only use functions for things that are actually reused across multiple instructions

I continue to dither on the instructions only used once.

Like, you can have sta direct,x; stx direct,y; sty direct,x so then I have storeDirect(uint8& reg, uint8& index) but then you can only have sta (direct+x), so is it better to have storeIndexedIndirect() [or just outright calling it STA_IDPX or something], or storeIndexedIndirect(uint8& reg, uint8& index)?

> You can still parameterize SetFlag etc. with bitfields

Yeah, that's how I was doing it before.

> If you think the ".mask" is ugly you can hide it in a macro or something

I used to have a deathly allergy to the existence of any and all macros (save for the L lastCycle(); hack), but lately I think it's a whole lot nicer for the innard of a CPU core to be able to say "A, X, Y" instead of "regs.a, regs.x, regs.y" everywhere.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 7:04 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
Okay, I tried to update the SPC700 code to follow Overload's cycle timings. Hopefully I got everything right. He's missing some instructions, and doesn't list cycle counts for some, and Direct, X Relative, CBNE has one cycle listed as RWB 0/1 that can only ever be a 1 (it's only one instruction and it never writes.) And of course, I'm not perfect, sadly ._.

I am using idle(uint16 address) to indicate that the cycle is "internal", but it has the same functionality as "read".

I also had to lose the very handy fetch/load/store/push/pull helper functions, because those all fell back on regular reads, and three of them affected registers. Instead I moved to only using read/write with two functions, page(uint8) and stack(uint8) that just transform into a full 16-bit address, and use those inside read/write.

Not crazy about it, but in this way, it's way easier than having no-increment/decrement versions of fetch/push/pull, and some way to signal load/store as being idle cycles.

Also dropped the address++ semantics and instead went with address+0, address+1, since sometimes you'd read from one of the two addresses twice. For both PC and S, I always inc/dec when doing the actual read anyway, and then use +1,-1 as appropriate to match the address bus locations in the PDF. In this way, the weirdness is isolated to the idle() cycle addresses.

I merged more functions to reduce repetition, and expanded the three-letter acronym functions (XCN => ExchangeNibble).

Probably the last redesign change I still want to make is to consider merging the read/modify/write variants with some sort of flag inside of them like "Compare/Write"; and then also use that to drop the op function comparisons I use to specialize some of the functions now.

As we know, the logic analyzer can't show us which read actually matters when the same address is read twice in a row, the first or the second one. This could end up wrong for any instructions like that, and I'm not sure there's a way we can ever find out which is correct.

I put source comments on the weird (x)+ cycle 3 case. We really need a test ROM for this one. spc700cyc.txt indicates the read version happens at cycle 3, the write version happens at cycle 4. I'd have expected both to happen on cycle 4, but then the read being at cycle 3 is probably how Overload observed this oddity: it was probably reading the underlying RAM value instead of the register value for mov (x+),a. If it were the discarded idle read, then he couldn't observe it was pulling underlying RAM.

https://gitlab.com/higan/higan/blob/mas ... ctions.cpp


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 7:58 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Which instructions are missing from Overload's list?

byuu wrote:
but then the read being at cycle 3 is probably how Overload observed this oddity: it was probably reading the underlying RAM value instead of the register value for mov (x+),a. If it were the discarded idle read, then he couldn't observe it was pulling underlying RAM.


No, the write is the one where the "bypassing registers" effect is noticeable. If you MOV (X)+,A and X is pointing at one of the timer registers which are reset when read ($FD through $FF), that timer won't get reset at all. Anomie's 2007 document and nocash's fullsnes document both point out this anomaly (it's an anomaly because almost all writes on the SPC700 do a discarded read from the same address first and thus do reset the timers)

You've just made it so that MOV (X)+,A does reset timer registers, which is well known to be wrong!

There are two possibilities for how MOV A, (X)+ works which are indistinguishable with a logic analyzer and tricky but possible to distinguish with software testing:

Possibility 1:
Cycle 3 reads from underlying RAM and is discarded
Cycle 4 reads from RAM or an internal register and is kept

Possibility 2:
Cycle 3 reads from RAM or an internal register and is kept
Cycle 4 reads from wherever and is discarded (standard duplicate read)

To distinguish them, you'd have to start the instruction exactly 3 cycles before the timer being read is due to tick over, so that reading the timer on Cycle 3 and Cycle 4 will get different values. I assume that that's exactly what blargg did back in 2007, because in Anomie's document every instruction where the final read cycle can be made to read from a timer register is "confirmed by blargg"; it's only instructions that only read from the stack which are marked as guesses.

In short, I'm almost certain that possibility 2 is the one that's correct. It's consistent with Overload's findings to the extent that can be distinguished with a logic analyzer, and consistent with blargg's findings to the extent that can be distinguished via software testing.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 10:33 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> Which instructions are missing from Overload's list?

SLEEP, STOP, and I thought CLRn/SETn was missing, but it looks like it's there but being called CLR1/SET1.

I know, if you call SLEEP or STOP, you're never recovering anyway. But I'm wondering if it waits one idle cycle or two between each execution (plus the opcode fetch.) I copied the 65816 WAI's two idle cycles for now.

> No, the write is the one where the "bypassing registers" effect is noticeable.

Ah, I see. You're gonna have to forgive me, I have eight different emulated systems all rattling around inside my head, so some details are fuzzy these days. It's not like back in the bsnes days I'm afraid.

So then ... it's gotta be one of these two designs, right?

Code:
//mov (x),a
auto SPC700::instructionIndirectXRead(fpb op) -> void {
  idle(PC);
  uint8 data = read(page(X));  //this *WILL* reset the timers, and A will be the old timer value
  A = alu(A, data);
}

//mov a,(x)
auto SPC700::instructionIndirectXWrite(uint8& data) -> void {
  idle(PC);
  idle(page(X));  //this *WILL* reset the timers
  write(page(X), data);
}

//POSSIBILITY ONE

//mov (x+),a
//I have to swap cycles 3 and 4 from the Git repo
auto SPC700::instructionIndirectXIncrementRead(uint8& data) -> void {
  idle(PC);
  idle();  //this will not reset the timers (or even if it does, we won't notice due to next cycle)
  data = read(page(X));  //this *WILL* reset the timers
  ZF = data == 0;
  NF = data & 0x80;
}

//mov a,(x+)
auto SPC700::instructionIndirectXIncrementWrite(uint8& data) -> void {
  idle(PC);
  idle();  //this will not reset the timers, period
  write(page(X++), data);
}

//POSSIBILITY TWO

//mov (x+),a
auto SPC700::instructionIndirectXIncrementRead(uint8& data) -> void {
  idle(PC);
  data = readButNotFromIORegistersBecauseFuckYouThatsWhy(page(X));  //this will return internal RAM
  read(page(X++));  //this *WILL* reset timers, but we won't get the timer value into A
  ZF = data == 0;
  NF = data & 0x80;
}

//mov a,(x+)
auto SPC700::instructionIndirectXIncrementWrite(uint8& data) -> void {
  idle(PC);
  readButNotFromIORegistersBecauseFuckYouThatsWhy(page(X));  //this will *not* affect the timers
  write(page(X++), data);
}


> To distinguish them, you'd have to start the instruction exactly 3 cycles before the timer being read is due to tick over, so that reading the timer on Cycle 3 and Cycle 4 will get different values.

That sounds like it would work. Revenant, I don't suppose you'd be up for writing this test too? ^-^

> In short, I'm almost certain that possibility 2 is the one that's correct. It's consistent with Overload's findings to the extent that can be distinguished with a logic analyzer

It doesn't sound like the logic analyzer revealed any evidence supporting either possibility. It cannot determine which cycle's read is actually used internally.

> and consistent with blargg's findings to the extent that can be distinguished via software testing.

Has blargg really confirmed this exact scenario? We're 100% proof positive of that? The million dollar problem here is that blargg's test ROMs are gone.

If we had his ROM, I could fire it up on possibilities 1 and 2, and one of them should fail if he wrote a test as you described.

So here's the thing ... possibility one is way more sane and logical to me. But I also have 13 years of experience with the SNES shitting all over my notions of sane and logical, so I concur with you. Possibility two is the most likely because "fuck you (emudevs), that's why."

Still ... I'd really like to have a test ROM to prove this.


Last edited by byuu on Wed Jun 28, 2017 11:04 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 11:00 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Your possibility 2 for reading is wrong (it doesn't even increment X). It should be:

Code:
auto SPC700::instructionIndirectXIncrementRead(uint8& data) -> void {
  idle(PC);
  data = read(page(X)); //this *WILL* reset the timers, and A will be the old timer value
  idle(page(X++)); // this might or might not reset the timers a second time; it's almost impossible to test
  ZF = data == 0;
  NF = data & 0x80;
}


According to blargg, the read whose value matters is on cycle 3, not cycle 4, and like I said I'm assuming that he used the timers to test all these opcode timings.

If MOV A,(X)+ couldn't read the timers at all (it returned underlying RAM, or always returned 0 due to resetting the timer and then reading it again) I'm sure blargg would have noticed and would have made a note of such surprising behaviour.

ETA: also you've got the canonical mnemonics backwards. SPC700 is Intel style (destination first), not Motorola style (source first). MOV A,(X)+ is a load, MOV (X)+,A is a store.


Last edited by AWJ on Wed Jun 28, 2017 11:08 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 11:25 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
... I can't handle this. Every single post changes small details and it keeps throwing me off.

> Your possibility 2 for reading is wrong

My possibility 2 does have X++. Possibility 1 doesn't, due to a copy paste typo.

> ETA: also you've got the canonical mnemonics backwards. SPC700 is Intel style (destination first), not Motorola style (source first).

Sigh. Okay, I'm just going to use target=source syntax, that way it's obvious what I mean and people don't have to keep track of which ordering every CPU uses.

...

Let's try this one more time. If this fails, then I'm giving up.

readIO() = this read DOES affect timers; returns RAM outside $f0-ff
readRAM() = this read does not affect timers; returns underlying RAM value

Note: when readRAM()'s read is discarded, it may be a true idle cycle that doesn't actually read.
However, this won't have any possible effect on emulation, so let's not worry about that.

For cycles 3 and 4 ... the possibilites are:

a=(x++) -- we are reading the memory at (X), storing it in A, and incrementing X
[Alpha] readRAM(x), a=readRAM(x++)
[Beta] readRAM(x), a=readIO(x++)
[Gamma] readIO(x), a=readRAM(x++)
[Delta] readIO(x), a=readIO(x++)
[Epsilon] a=readRAM(x), readRAM(x++)
[Zeta] a=readRAM(x), readIO(x++)
[Eta] a=readIO(x), readRAM(x++)
[Theta] a=readIO(x), readIO(x++)

(x++)=a -- we are storing the value of A into (X), and incrementing X
[Iota] readRAM(x), writeRAM(x++,a)
[Kappa] readRAM(x), writeIO(x++,a)
[Lambda] readIO(x), writeRAM(x++,a)
[Mu] readIO(x), writeIO(x++,a)

Many of these possibilities are nonsensical. I'm just elaborating every possible one so we can get this right.

Most likely, Kappa is correct for writes. Reads seem to be more confusing, but logically I'd say it was Beta, but the notes from blargg and Overload seem to indicate that case is most definitely wrong. It sounds like you favor either Eta or Theta. Of the two, Eta is more reasonable to me.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 12:00 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
Most likely, Kappa is correct for writes.


Agree.

Quote:
Reads seem to be more confusing, but logically I'd say it was Beta, but the notes from blargg and Overload seem to indicate that case is most definitely wrong. It sounds like you favor either Eta or Theta. Of the two, Eta is more reasonable to me.


Agree with Eta or Theta.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 1:41 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> According to blargg, the read whose value matters is on cycle 3, not cycle 4

I mean, this directly contradicts Overload's document.

Image

If blargg is correct and cycle 3 is the read that matters (eg it's the one to reset the timers), then Overload is wrong in saying cycle 3 can't access internal registers. If Overload is correct, then cycle 4 is the one that matters. From a logical design perspective, I like cycle 4 being the one that reads from or writes to IO, and cycle 3 being a read from internal RAM. Which is Beta.

It's not clear from either of their documentation which of the two cycle reads actually set A, either.

And to throw out more insane theories ... is there a possibility that simply reading from $fd-ff takes another cycle to wrap up after the fact, and the read -is- actually working with IO, just ignored because those registers are busy? Because I really don't believe there's specialization of logic inside the SMP core for just this one instruction. Even if we figure out how to emulate it, it would be nice to understand the why as well. I feel like if the timer register reads are the ONLY thing we're going on, that it's not enough information to definitively say that internal IO is ignored for all of $f0-ff.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 2:36 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
I mean, this directly contradicts Overload's document.

Image

If blargg is correct and cycle 3 is the read that matters (eg it's the one to reset the timers), then Overload is wrong in saying cycle 3 can't access internal registers. If Overload is correct, then cycle 4 is the one that matters. From a logical design perspective, I like cycle 4 being the one that reads from or writes to IO, and cycle 3 being a read from internal RAM. Which is Beta.


Reads have different timing from writes in every other SPC700 addressing mode, so why do you expect them to be perfectly symmetrical in this one?

:mrgreen: Unfortunately, your OCD obsession with beautiful, logical symmetry has nothing to do with how electronic hardware works. After more than a decade as an emulator developer across multiple systems by multiple manufacturers, surely you should stop being surprised by now when your intuitions about what the most "logical" way for something to work are wrong so much more often than they are right? :mrgreen:

Quote:
It's not clear from either of their documentation which of the two cycle reads actually set A, either.


The read that can hit I/O is definitely the read that sets A, because blargg would have noticed if reading from $FD-$FF via that addressing mode always set A to 0, or set A to the underlying APU RAM instead of the timer value.

(snip utterly unverifiable speculation)

Here's what we know: writing to $FD-$FF by MOV (X)+,A doesn't reset the timer, unlike literally every other addressing mode. Reading from $FD-$FF by MOV A,(X)+ does reset the timer and set A to its previous value, exactly like every other addressing mode. As "unlikely" and "impossible" as you find it, as loudly as it makes your OCD scream, those are the objective facts.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 2:57 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> Unfortunately, your OCD obsession with beautiful, logical symmetry has nothing to do with how electronic hardware works. After more than a decade as an emulator developer across multiple systems by multiple manufacturers, surely you should stop being surprised by now when your intuitions about what the most "logical" way for something to work are wrong so much more often than they are right?

I mean I did say like four times in this thread alone that hardware, especially SNES hardware, never acts in the logical way you would expect.

That's not going to stop me from thinking logically and wishing things were that way. When you and I star in that next Star Trek reboot, you can be Kirk and I'll be Spock :P

> The read that can hit I/O is definitely the read that sets A, because blargg would have noticed if reading from $FD-$FF via that addressing mode always set A to 0, or set A to the underlying APU RAM instead of the timer value.

I sure hope so. In absense of confirmation, that's still the more likely outcome at least. So that narrows us to: Beta, Delta, Eta, Theta.

In my view, I find it very unlikely that both reads are going to hit IO. Especially not if A is set on the fourth cycle. So that narrows us further to: Beta, Eta.

So if Overload is right that cycle 3 is the ignored one, then the answer is Beta.
If blargg is right that cycle 4 is the ignored one, then the answer is Eta.

> Reads have different timing from writes in every other SPC700 addressing mode, so why do you expect them to be perfectly symmetrical in this one?

Writes *gain* an extra cycle in all those other modes. That is not the case here.

> writing to $FD-$FF by MOV (X)+,A doesn't reset the timer, unlike literally every other addressing mode.

Right, Kappa will result in that effect.

> Reading from $FD-$FF by MOV A,(X)+ does reset the timer and set A to its previous value, exactly like every other addressing mode.

Right, so that rules out Alpha, Gamma, Epsilon, Zeta.

...

So, logically I think it's Beta. Based on my experience with the SNES, I would bet money on it being Eta, although there is a slim chance it is Delta or Theta.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 10:14 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 329
Location: FL
AWJ wrote:
I can write a program to do this, but someone who's already got a bit of experience with APU programming (Revenant?) can probably do it a lot more easily and quickly.


Alright, finally got around to this.

http://revenant1.net/smpechotest.sfc

The result:

Image

(that is, $214x is reading back the initial SMP->CPU port writes, not echo buffer data)

I can try to test the other thing later, I suppose.


Last edited by Revenant on Wed Jun 28, 2017 11:52 pm, edited 2 times in total.

Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 10:44 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Thanks a ton, Revenant. As I expected, that was much faster and much nicer than I could have done it. I'm impressed that you managed to play actual music essentially without access to zero page.


Top
 Profile  
 
PostPosted: Wed Jun 28, 2017 11:59 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 329
Location: FL
Okay, I had to take it down and reupload because I was dumb and didn't delay for long enough before enabling the echo to reliably avoid clobbering my one unnecessarily large sample on the real hardware. Don't write test ROMs at 2:30 AM, kids.

Anyway, you get the idea. The real console works pretty much the way I figured it did (i.e. with a bunch of simple D-latches or whatever between the CPU and SMP).


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 3:52 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
Shoot, I was going to post that hex_usr just made the same demo last night, but was too tired. Still, thank you for the extra test to confirm!

This has been fixed in upstream, by giving SPC700::port(Read,Write) its own 4x8-bit buffer.

So obviously this confirms that the DSP can modify the internal $f4-f7 RAM, and that the CPU does not read back the internal RAM, but separate 4x8-bit latches. But one question I still have is, if the CPU writes to $2140-2143, do those bytes show up when the DSP then reads from underlying $f4-f7 RAM?


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 7:55 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
Shoot, I was going to post that hex_usr just made the same demo last night, but was too tired. Still, thank you for the extra test to confirm!

This has been fixed in upstream, by giving SPC700::port(Read,Write) its own 4x8-bit buffer.

So obviously this confirms that the DSP can modify the internal $f4-f7 RAM, and that the CPU does not read back the internal RAM, but separate 4x8-bit latches. But one question I still have is, if the CPU writes to $2140-2143, do those bytes show up when the DSP then reads from underlying $f4-f7 RAM?


Definitely not. There are 8 latches, four CPU-to-SMP (though the SMP can clear them to 0) and four SMP-to-CPU and all totally separate from RAM. The CPU having direct access to APU RAM (either reading or writing) makes no sense and is simply impossible when the SMP and DSP are both already constantly using it. Look how difficult it is for the GSU or SA-1 to share ROM/RAM with the CPU.

I imagine the reason you got confused and originally implemented it the way you did is because whenever the SMP writes to any of $F0-$FF the data also falls through to APU RAM as well as to the internal register.

Code:
SMP writes to $F1     -> value written goes to underlying APU RAM, CPU-to-SMP latches set to 0 depending on set bits
SMP writes to $F4-$F7 -> value written goes to SMP-to-CPU latch (where the CPU can see it) and also to underlying APU RAM (where the DSP can see it)
SMP reads $F4-$F7     -> reads CPU-to-SMP latch

DSP reads from or writes to $F0-FF -> reads/writes APU RAM, never SMP registers

CPU writes to $2140-$2143  -> value written goes to CPU-to-SMP latch, does not affect APU RAM
CPU reads from $2140-$2143 -> reads SMP-to-CPU latch, cannot see APU RAM


Every one of these cases was already correct in higan except "CPU reads from $2140-$2143".


Last edited by AWJ on Fri Jun 30, 2017 10:04 am, edited 1 time in total.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 86 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group