It is currently Fri Oct 20, 2017 4:42 pm

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 86 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Thu Jun 29, 2017 8:31 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
Perfect, thank you. Sorry to be so pedantic, but thanks for your patience.

Here's the updated code, which passes hex_usr's and Revenant's test ROMs:

Code:
alwaysinline auto SMP::readRAM(uint16 addr) -> uint8 {
  if(addr >= 0xffc0 && io.iplromEnable) return iplrom[addr & 0x3f];
  if(io.ramDisable) return 0x5a;  //0xff on mini-SNES
  return apuram[addr];
}

alwaysinline auto SMP::writeRAM(uint16 addr, uint8 data) -> void {
  //writes to $ffc0-$ffff always go to apuram, even if the iplrom is enabled
  if(io.ramWritable && !io.ramDisable) apuram[addr] = data;
}

auto SMP::readPort(uint2 port) const -> uint8 {
  return io.port[port];
}

auto SMP::writePort(uint2 port, uint8 data) -> void {
  io.port[port] = data;
}

auto SMP::readBus(uint16 addr) -> uint8 {
  uint result;

  switch(addr) {
  case 0xf0:  //TEST -- write-only register
    return 0x00;

  case 0xf1:  //CONTROL -- write-only register
    return 0x00;

  case 0xf2:  //DSPADDR
    return io.dspAddr;

  case 0xf3:  //DSPDATA
    //0x80-0xff are read-only mirrors of 0x00-0x7f
    return dsp.read(io.dspAddr & 0x7f);

  case 0xf4:  //CPUIO0
  case 0xf5:  //CPUIO1
  case 0xf6:  //CPUIO2
  case 0xf7:  //CPUIO3
    synchronize(cpu);
    return cpu.readPort(addr);

  case 0xf8:  //RAM0
    return io.ram00f8;

  case 0xf9:  //RAM1
    return io.ram00f9;

  case 0xfa:  //T0TARGET
  case 0xfb:  //T1TARGET
  case 0xfc:  //T2TARGET -- write-only registers
    return 0x00;

  case 0xfd:  //T0OUT -- 4-bit counter value
    result = timer0.stage3;
    timer0.stage3 = 0;
    return result;

  case 0xfe:  //T1OUT -- 4-bit counter value
    result = timer1.stage3;
    timer1.stage3 = 0;
    return result;

  case 0xff:  //T2OUT -- 4-bit counter value
    result = timer2.stage3;
    timer2.stage3 = 0;
    return result;
  }

  return readRAM(addr);
}

auto SMP::writeBus(uint16 addr, uint8 data) -> void {
  switch(addr) {
  case 0xf0:  //TEST
    if(r.p.p) break;  //writes only valid when P flag is clear

    io.clockSpeed    = (data >> 6) & 3;
    io.timerSpeed    = (data >> 4) & 3;
    io.timersEnable  = data & 0x08;
    io.ramDisable    = data & 0x04;
    io.ramWritable   = data & 0x02;
    io.timersDisable = data & 0x01;

    io.timerStep = (1 << io.clockSpeed) + (2 << io.timerSpeed);

    timer0.synchronizeStage1();
    timer1.synchronizeStage1();
    timer2.synchronizeStage1();
    break;

  case 0xf1:  //CONTROL
    io.iplromEnable = data & 0x80;

    if(data & 0x30) {
      //one-time clearing of APU port read registers,
      //emulated by simulating CPU writes of 0x00
      synchronize(cpu);
      if(data & 0x20) {
        cpu.writePort(2, 0x00);
        cpu.writePort(3, 0x00);
      }
      if(data & 0x10) {
        cpu.writePort(0, 0x00);
        cpu.writePort(1, 0x00);
      }
    }

    //0->1 transistion resets timers
    if(!timer2.enable && (data & 0x04)) {
      timer2.stage2 = 0;
      timer2.stage3 = 0;
    }
    timer2.enable = data & 0x04;

    if(!timer1.enable && (data & 0x02)) {
      timer1.stage2 = 0;
      timer1.stage3 = 0;
    }
    timer1.enable = data & 0x02;

    if(!timer0.enable && (data & 0x01)) {
      timer0.stage2 = 0;
      timer0.stage3 = 0;
    }
    timer0.enable = data & 0x01;
    break;

  case 0xf2:  //DSPADDR
    io.dspAddr = data;
    break;

  case 0xf3:  //DSPDATA
    if(io.dspAddr & 0x80) break;  //0x80-0xff are read-only mirrors of 0x00-0x7f
    dsp.write(io.dspAddr & 0x7f, data);
    break;

  case 0xf4:  //CPUIO0
  case 0xf5:  //CPUIO1
  case 0xf6:  //CPUIO2
  case 0xf7:  //CPUIO3
    synchronize(cpu);
    writePort(addr, data);
    break;

  case 0xf8:  //RAM0
    io.ram00f8 = data;
    break;

  case 0xf9:  //RAM1
    io.ram00f9 = data;
    break;

  case 0xfa:  //T0TARGET
    timer0.target = data;
    break;

  case 0xfb:  //T1TARGET
    timer1.target = data;
    break;

  case 0xfc:  //T2TARGET
    timer2.target = data;
    break;

  case 0xfd:  //T0OUT
  case 0xfe:  //T1OUT
  case 0xff:  //T2OUT -- read-only registers
    break;
  }

  writeRAM(addr, data);  //all writes, even to MMIO registers, appear on bus
}


Please ignore that all eight registers are most likely inside the SMP core. I'm aware of that.

CPU writes to $214x go to a CPU-side 4x8-bit array, CPU reads come from the SMP-side 4x8-bit array.
SMP writes to $f4-f7 go to an SMP-side 4x8-bit array, SMP reads come from the CPU-side 4x8-bit array.
SMP writes to $f1 can write to the CPU-side 4x8-bit array (which is why all eight bytes are physically in the SMP.)
SMP writes to $f4-f7 will also modify the internal APU RAM that the DSP can then read back.
DSP reads from $f4-f7 are from underlying APU RAM, DSP writes to $f4-f7 are to underlying APU RAM.

...

Also, I added emulation of the glitch with mov (x)+ modes. I'm pretty confident writes are correct, but we still need to create test ROMs to narrow down the true behavior of reads.


Last edited by byuu on Thu Jun 29, 2017 10:55 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 8:53 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
That all looks right to me.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 10:36 am 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 328
Location: FL
byuu wrote:
Shoot, I was going to post that hex_usr just made the same demo last night, but was too tired. Still, thank you for the extra test to confirm!


Damn, that's what I get for not checking your forum as often as this one. Hopefully mine was useful anyway.

Re: the test for mov a,(x)+: if my understanding of the SMP timing is correct, would this theoretically work?

Code:
mov x, #$ff
mov $fc, #1 ; timer 2 ticks every 16 CPU cycles
mov $f1, #0 ; reload timer
mov $f1, #1 ; enable timer

; wait 12 (16-4) cycles
xcn
xcn
nop

; read timer 2
mov a, (x)+


If I have this right, then cycle 3 of the read will occur 15 cycles after the timer starts, and cycle 4 occurs 16 cycles after, with the timer ticking up in between the two. I have a feeling I might be missing some crucial timing detail, though, so please feel free to correct me before I try making another test ROM.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 12:06 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Revenant wrote:
byuu wrote:
Shoot, I was going to post that hex_usr just made the same demo last night, but was too tired. Still, thank you for the extra test to confirm!


Damn, that's what I get for not checking your forum as often as this one. Hopefully mine was useful anyway.

Re: the test for mov a,(x)+: if my understanding of the SMP timing is correct, would this theoretically work?

Code:
mov x, #$ff
mov $fc, #1 ; timer 2 ticks every 16 CPU cycles
mov $f1, #0 ; reload timer
mov $f1, #1 ; enable timer

; wait 12 (16-4) cycles
xcn
xcn
nop

; read timer 2
mov a, (x)+


If I have this right, then cycle 3 of the read will occur 15 cycles after the timer starts, and cycle 4 occurs 16 cycles after, with the timer ticking up in between the two. I have a feeling I might be missing some crucial timing detail, though, so please feel free to correct me before I try making another test ROM.


I don't think that will work. According to bsnes, it looks like the non-programmable stage of the timers (the one that gives T2 a different base frequency from T0 and T1) ticks constantly whether or not each timer is enabled via $F1. You'd have to enable the timer and then do some blargg voodoo to synchronize your instruction execution with it.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 1:52 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 328
Location: FL
Argh, you're right. Good catch.

I wonder if there's a way you could reliably manipulate that stage by (temporarily) changing the timer speed bits in the test register, assuming bsnes/higan is correct about how those actually work. It'd probably be a long shot though.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 2:30 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Revenant (or anyone with a flashcart handy) can you run "test_speed.smc" from http://snescentral.com/article.php?id=1115 on real hardware and snap a screenshot? I just want to check that it really prints "done" and not "passed" on hardware, and the numbers it prints in current higan still match hardware.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 2:35 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 328
Location: FL
I can check later tonight when I get home unless someone beats me to it again.

(Should I run the test_timer_* ones too or are those not as important?)


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 2:36 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Revenant wrote:
I can check later tonight when I get home unless someone beats me to it again.

(Should I run the test_timer_* ones too or are those not as important?)


You might as well run all of them if it's no inconvenience, but the test_speed one is the one I'm currently reverse engineering.

ETA: by "all of them" I only mean test_timer_speed*, not the freezing ones or whatever.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 4:50 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> You might as well run all of them if it's no inconvenience, but the test_speed one is the one I'm currently reverse engineering.

I think blargg's convention was to print "done" on tests where he didn't check the values against known good values. The rest will be "passed" or "failed", sometimes with a CRC32.

test_speed is testing the S-SMP TEST register's speed control. d7,d6 represent the SMP speed:
00 = 100% speed
01 = 50% speed
10 = deadlock
11 = 10% speed

So when you look at blargg's test numbers, you're seeing this:

0a, 1a, 2a, 3a = 100% speed = ~256
4a, 5a, 6a, 7a = 50% speed = ~128
8a, 9a, aa, ba = deadlock (cannot test)
ca, da, ea, fa = 10% speed = ~25.6

The actual reality of these speed bits is even more nuanced, however. Different SNES consoles, even when they're the exact same motherboard revision, will lock up with different values. Some will lock with d6,d7!=0. Others will run two speed modes. We've never found an SNES that could run d6,d7=2 without freezing.

d5,d4 represent the timer speed. That algorithm is more complicated, but it's:
Code:
io.timerStep = (1 << io.clockSpeed) + (2 << io.timerSpeed);

So when you see his results, the values repeat after every four increments of the upper nibble, because only the lower two affect the timer speed.

d3,d2,d1,d0 are timerEnable, ramDisable, ramWritable, timerDisable. Yes, it's very weird to have both an enable and disable bit on the timers. This is set to 0xa for obvious reasons on power-up.

EDIT: forgot that we actually did figure out the d5,d4. Added details after refreshing my memory on them.


Last edited by byuu on Thu Jun 29, 2017 4:58 pm, edited 3 times in total.

Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 4:52 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 328
Location: FL
http://imgur.com/a/cmdk3

Here are four runs of test_speed. The values (whatever they are) average slightly lower than what higan shows (i.e. I never see 252 on the real hardware).

All of the test_timer_speed ROMs hang after displaying only two rows of values, but appear to be consistent with higan otherwise.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 6:16 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Okay, I've pretty thoroughly disassembled and reverse engineered speed_test.smc. Using some slightly tricky code, it stores a BRA -2 ($2F $FE) into the last two SMP input ports (W:$2142-2143 on the CPU side, R:$00F6-00F7 on the SMP side) and then tells the IPL ROM to jump there, so that the SMP ends up executing a tight infinite loop right out of its input ports. The CPU then controls the SMP by writing two bytes worth of opcodes (one two-byte opcode or two one-byte ones) at a time to $2140-2141, writing $FC to $2143 (changing the BRA -2 to a BRA -4), giving the SMP just enough time to take the changed loop and execute the stored opcode once, and then changing $2143 back to $FE. The sequence of opcodes it executes by this method is as follows:

Code:
CD 00 MOV X=#$00
E8 xx MOV A=#$xx (xx is the TEST register value)
C4 F0 MOV $F0=A
00 3D NOP; INC X (for this one there's a delay loop before the CPU resets the BRA target)
E8 0A MOV A=#$0A (this one has a delay loop too)
C4 F0 MOV $F0=A (this one has a delay loop too)
D8 F4 MOV $F4=X
(at this point the CPU reads $2140 to get the value of X written by the SMP)
E8 00 MOV A=#$00
C4 F4 MOV $F4=A


The results the test program prints to the screen for each value of TEST are the values read from $2140 at line 8: basically, the number of times the SMP was able to execute NOP; INC X; BRA -4 while the CPU was in its delay loop.

The important thing is that during this test the SMP doesn't access RAM even once; it's executing entirely out of its I/O ports. So the test program results are fully consistent with nocash's theory that TEST d6-d7 controls the number of cycles an I/O port access takes and d4-d5 controls the number of cycles a RAM access takes. If that theory is right, the complicated formula in higan relating TEST d4-d7 to the timers is simply an artifact of the proportion of RAM accesses to I/O port accesses that blargg's timer test program does.

I also suspect that a speed of 2 (for either RAM or I/O) takes 5 cycles but only clocks the timers 4 times, and a speed of 3 takes 10 cycles but only clocks the timers 8 times. That would explain why the SMP runs at 1/10th speed while executing out of I/O ports while TEST.bits(6, 7) == 3, but the fastest the timers can apparently go relative to SMP instruction execution is only 8 times normal (when TEST.bits(4,7) == 0xF).

Another possibility for the 8/10 inconsistency is that blargg's timer test program makes the SMP do some ROM accesses in addition to RAM and I/O, and that ROM accesses always take only 1 cycle. I guess I'll have to disassemble it as well to be sure.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 7:07 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
nocash's model is certainly more elegant, but let's note a few things.

First, he missed the purpose of $00f0.d2 as being RAM disable. You can still run code out of the I/O ports with RAM disabled. Particularly troublesome as he analyzed my uPD96050 emulation and used my new instruction mnemonics, but I guess he didn't look at my SMP implementation.

Second, he mentions $00f0.d6,d7 as controlling not just I/O but ROM access timing ... I can't imagine he's talking about the IPLROM. So ... what ROM?

This is kind of a theme with nocash. He's often right, but when he hits undocumented things, sometimes his theories are just off the wall, and he doesn't note them as theories, and he provides no proof of his claims. I suppose that's where we come in (well, Revenant mostly) :P

The $00f8,$00f9 note I am much more inclined to believe because he is pointing to actual CPU pins. But I do wonder where he got the P4/P5/P5RD labels from. Similarly, documentation indicates $00fa-00fc is TnTARGET, not TnDIV. He's just renaming registers as he chooses to.

Again, I'm not foolish enough to say he's wrong. But I'd love to see some tests to indicate if he's right. Especially one to prove the 20% speed case. Looking at blargg's test_timer_speed:

0 = 2731 (100%)
1 = 1639 (60%)
2 = 911 (33%)
3 = 482 (17.5%)

Not quite 1/2/5/10. But you seem to have a better grasp on how executing code out of I/O registers can affect timers, so ... I'm willing to make the change he talks about into a higan fork branch, and see if blargg's tests still align properly.

Next, he's implying that NOP's idle cycle is a "RAM timing" and TCALL's three idle cycles are "I/O timing." Yet Overload's logic analyzer traces show that the first cycle of both has the program counter on the bus with RWB=1. In other words, this sounds like memory reads.

If both Overload and blargg were able to detect the mov (x)+ anomaly, then how would they have missed that other opcodes had weird effects like that?

Again, it's nocash. What even is the "SPC700 waitstates on internal cycles" table? How did he make that? Where is it from? How did he verify that? No answers. Just a table and "just go with it."

And finally, I/O writes fall through and update the underlying APU RAM. So what happens if your I/O speed is 1 but your RAM speed is 10? Does the RAM write just silently fail? Or silently work anyway? Maybe it'd fail if your -actual- APU RAM were timed ten times slower, whereas since it's perfectly capable of 1-waitstate operation, it'll just always work in this case?


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 7:34 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Revenant, I feel pretty bad for working you like a mule but I've got another test program I'd like you to write. This one should be simpler than the echo buffer test.

Upload this tiny program to the SMP (somewhere that won't get clobbered by the IPL ROM--$0200 would be good) and jump to it:

Code:
    mov $f1,#$b0    ; enable IPL ROM; clear input ports
    mov $f4,#$00    ; clear output port $F4
:   mov a,$f5
    beq :-          ; loop until the CPU writes nonzero to $2141
    mov $f0,$f4     ; set TEST to whatever the CPU wrote to $2140
    jmp $ffc0       ; jump to IPL ROM entry point


Then, have the CPU write different TEST values to $2140-2141 and time how long it takes for the IPL ROM to clear APU zero page and put out the $AA handshake. Something like this on the CPU side:

Code:
(A contains value to write to TEST; $0A, $1A, $4A and $5A should be safe)
($CA and $DA are probably safe too, at least on Revenant's SNES)
    rep #$30
    ora #$FF00  ; set high bits which will be written to $2141
    tay
    lda #$2100
    tcd         ; use direct page to make our loop tighter
    sep #$20
    lda #$AA
    ldx #$0000
    sty $40     ; = $2140
:   inx
    beq hung
    cmp $40     ; = $2140
    bne :-      ; loop until $2140 = #$AA (meaning IPL finished clearing zero page)
    (now print TEST value and X to the screen)

hung:
    (whoops, looks like we hung the SMP with a bad TEST value...)


The idea is to see which of $F0 d4-d7, if any, affect execution speed when the SMP is accessing ROM and RAM (the IPL ROM startup code runs out of ROM and writes to RAM).

byuu wrote:
What even is the "SPC700 waitstates on internal cycles" table? How did he make that? Where is it from? How did he verify that? No answers. Just a table and "just go with it."


Agree that that table is probably nonsense. Nocash claims NOP's idle cycle has "RAM timing", but blargg's test_speed.smc does NOPs and isn't affected by d4-d5 at all.

Quote:
So what happens if your I/O speed is 1 but your RAM speed is 10? Does the RAM write just silently fail? Or silently work anyway? Maybe it'd fail if your -actual- APU RAM were timed ten times slower, whereas since it's perfectly capable of 1-waitstate operation, it'll just always work in this case?


The TEST register is probably meant to slow down the SMP while it's running in an ICE or something like that, and isn't meant to be touched on a production system. It's called TEST, after all.

Quote:
But you seem to have a better grasp on how executing code out of I/O registers can affect timers


The test program I disassembled (test_speed.smc) doesn't use or touch the SMP timers at all.


Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 9:47 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 328
Location: FL
http://revenant1.net/smptesttest.sfc

Results: http://imgur.com/a/rsHPN


Last edited by Revenant on Thu Jun 29, 2017 11:08 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Jun 29, 2017 10:20 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Sorry, can you provide source code for that ROM? Even in bsnes-classic, TEST d4-d5 seem to be affecting the result by more than rounding error and $DA is locking up the SMP, and I'm not sure what's going on... You don't have NMI enabled while you're doing the timing loops, do you?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 86 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: melanokardios, Yahoo [Bot] and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group