Some tidbits about the Cx4 (attn: byuu, nocash)

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
jbo_85
Posts: 13
Joined: Wed Apr 01, 2009 12:03 pm
Location: Langara

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by jbo_85 »

Cool! That sped up the spinning wireframe head in the intro which was quite demanding before.
Revenant
Posts: 462
Joined: Sat Apr 25, 2015 1:47 pm
Location: FL

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Revenant »

Almost everything in this thread is now emulated in the bsnes-plus master branch, excluding some of the details about $6000-7fff memory access while the Cx4 is running. I might add that stuff later but it's nothing that either of MMX2 or MMX3 rely on to work correctly, I think.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Near »

A year late, but not forgotten ...

Questions in the event anyone knows.

Is it confirmed in LoROM mode that the Cx4 bus can't see ROM at $c0-ff:0000-ffff?
DMA source and destination CAN reference the same bus but not the same chip
I mean, what other bus could it access? It's not like the Cx4 can transfer data to VRAM. The only valid transfers seem to be cart ROM->cart RAM or Cx4 DRAM, cart RAM->DRAM, or Cx4 DRAM->cart RAM. Unknown if DMA to/from IO registers work, but hey, maybe. Then you get the fun possibility of a DMA transfer firing off a DMA transfer.

Are the DMA transfers staggered like SNES DMA? Eg:

cycle 1: read source[0]
cycle 2: write target[0], read source[1]
cycle n: write target[n-1], read target[n]
cycle n+1: write target[n]

Or are they more like this?

cycle 1: read source[0]
cycle 2: write target[0]
cycle n*2: read source[n]
cycle n*2+1: write target[n]

What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?

What happens if I write $7f48, but the requested page is already cached? Will it forcefully reload it again? Seems like we'd kind of want it to, right? Eg if you were loading from cart RAM, the data may be modified.

Is $7f4d-7f4f cached to an internal program counter register, or is $7f4d-7f4f the actual program counter itself? Eg if I write to it while the Cx4 is running, do horrible things happen?

Does writing to $7f53 get the Cx4 out of a lockup from a bad DMA transfer as well?
$7f55: Any write access indefinitely suspends the Cx4 (registers can be
read and written but the CPU shows no reaction: no buffering occurs,
no code is run and $7f53 is not updated).
Cx4 status bit 0 is set.
Apparently $7f53 is updated if status bit 0 is set.

What happens if I write register $20? Can I actually simulate a (short) PC jump that way? Very bizarre it's not the full 24-bit PC, but given the way $7f4d-7f4f are laid out (bytes 8-15, bytes 16-23, THEN bytes 0-7), I suspect the PC is only 8-bit, and the extra 16-bits are just the "active" P register.

With $2e and $2f, what happens if I change the bus address during the fetch period? Will it screw up the read/write, or is the address cached? I suspect it must be cached, because if $4000 (increment bus address) occurs before the wait, then well ... that's not good, is it?

nocash doesn't seem to have the ability to test on real hardware, so with:
http://problemkaputt.de/fullsnes.htm#sn ... cx4opcodes
6000h+nnoooooooo ?? ??? mov A/ext_dta/?/prg_page,<op>

Where does the prg_page part come from? It's always so hard to tell in nocash's docs what's speculation and what's confirmed behavior. MMX2/3 doesn't use $63xx instructions, so I don't see how he'd know that. It's not obvious from other instructions that it's some sort of pattern.

Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?

What happens if I start a DMA or cache page operation while the Cx4 is active? Will it take priority over the Cx4 instruction processing, stay pending until the Cx4 instructions are halted, or just do nothing?

What happens if the bus address used for $2e isn't ROM, or for $2f isn't RAM? Will it just transform the address and access said bus ANYWAY, ignore the operation, or lock the chip up?

What happens if a DMA crosses from RAM into ROM during the transfer?

Are the DMA source/length/target values updated during a transfer?

What does a DMA transfer length of zero do?

What's the actual overhead of DMA transfers per byte? Eg you say "EXTRA waitstates" on cart ROM/RAM accesses, so does that imply that accesses to internal RAM take one cycle? Or that the wait states are really waitstates+1?
Overload
Posts: 47
Joined: Mon May 30, 2011 4:38 pm
Location: Australia
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Overload »

Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011. Information regarding register mapping and the entire instruction set was published. The complete instruction set has been on the snes9x dsp website for 7 years.

http://users.tpg.com.au/advlink/dsp/cx4.html

The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.

I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by nocash »

Overload wrote:Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011. Information regarding register mapping and the entire instruction set was published. The complete instruction set has been on the snes9x dsp website for 7 years.
http://users.tpg.com.au/advlink/dsp/cx4.html
The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.
I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?
Very nice document. That was available in 2011??? I've implemented the CX4 stuff in no$sns in December 2011, back then I've only found two txt files (attached below), maybe that were much earlier versions of your doc? The cx4.html file appears in the internet archive starting at 2013, though I've missed it back then, too, and never heard about it until today : /

Going by ikari's findings, opcode 4000h does merely do an address increment (it isn't RDBUS).

The CLEAR opcode and XNOR opcodes are new (to me), good to know about them.
Some opcodes can use only "Rx" (r0-r15) instead of "reg" (all registers)? Nasty : (

Having info about affected flags is useful (and knowing about the overflow flag to exist).

What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it? Or wait, is it some sort of "signed carry"? Ie. the Sign and Overflow flags being merged into a single flag?
Attachments
Cx4info.txt
(1.17 KiB) Downloaded 419 times
cx4opcodes.txt
(2.93 KiB) Downloaded 414 times
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Near »

Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011.
"Most" feels like a strong overstatement.

Your page states the meaning of register $28, and has a few more opcodes (super, super awesome, by the way, thank you so much!) ... but there's nothing about program RAM caching, the extra MMIO registers, wait state on ROM/RAM accesses, or answers to any? of my questions.

On your instruction set ... you're missing $1c00 (wait for external bus reads), and it seems there's permanent confusion about the sequence of $612e,$4000,$1c00's purpose. So ... it's definitely not 100% complete. Again, this is awesome stuff and I'm not trying to be negative.

Code: Select all

101000SS .xxxxxxx	A0	XNOR	A, reg	Inverse Exclusive OR	A = A ^ !reg	N Z
101001SS xxxxxxxx	A4	XNOR	A, imm	Inverse Exclusive OR	A = A ^ !imm	N Z
For XNOR, when imm is 8-bit, is it a = a ^ ~(uint8_t)imm, or is it a & ~imm? Eg what happens to bits 8-23 of the 8-bit immediate value?

Code: Select all

V	Overflow Flag	0
Holy heck I was missing an entire flag o_O

Can you please share the algorithm for computing this flag? Is it just the standard?

ADD: V = ~(A ^ data) & (A ^ result) & 0x8000;
SUB: V = ~(A ^ data) & (A ^ result) & 0x8000;

Or is it the slightly modified variant?

ADD: V = ~(A ^ data) & (A ^ result) & 0x8000;
SUB: V = (A ^ data) & (A ^ result) & 0x8000;

I guess you're probably not too interested in updating the document at this point with more details, heh.

Code: Select all

03	MBR	Memory Buffer Register	8*	RW	00
08	ROMB	Immediate ROM	24	R	DATA_ROM[0]
0C	RAMB	RAM Buffer	24	RW	000000
13	MAR	Memory Address Register	24	RW	FFFFFF
1C	DPR	RAM Address	12*	RW	000
It really feels like 18 would be a data ROM address, but obviously that isn't necessary ... wow, the $74xx a=dataROM[abs] opcode was quite the find!

Code: Select all

01100001 .xxxxxxx	61	MOV	MBR, reg**	Data Transfer Instruction	MBR = reg**
01100010 ....xxxx	62	MOV	MAR, Rx	Data Transfer Instruction	MAR = Rx

Code: Select all

11100000 .xxxxxxx	E0	MOV	reg, A	Data Transfer Instruction	reg = A
11100001 .xxxxxxx	E1	MOV	reg**, MBR	Data Transfer Instruction	reg** = MBR
... is it really the case that $e0xx can write to all registers, but not $e1xx? That makes zero sense ... I mean, really ... absolutely zero sense. Why on earth would they impose such a limitation? Same for 62xx/61xx and reads. In the read case, does $61xx from registers 00-5f just always return 0, or just not set the MBR at all, or mirror the GPRs throughout the whole space?
What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it?
Yeah, it seems to be. Your document also says:
Carry Flag is CLEARED on borrow (ie. opposite as on 80x86 CPUs).
I have it as:

int r = ri() - sa();
cf = r >= 0;

I really don't think MMX2/3 would be playable if we had the carry flag implemented incorrectly ... right?

Also from your docs:
Call Stack is reportedly 16 levels deep, at least 16bits per level.
Pretty sure it's 8x23-bits, not 16x16+-bits.
Stack Circular, 8 x 23-Bits
Internal RAM 4 x 384 x 16-Bits
Also, what the heck with the data RAM? Isn't it just 3072x8-bits? The way the RDRAM and WRRAM instructions work doesn't jive with it being 4 x ... x 16-bits at all. If it were 1 x ... 24-bits, then we wouldn't need L,M,H (L,H,B) variant instructions, presumably. And if it were 16-bits, we would only have two instead of three of them ...
28 P Page Select 15* RW 00FF
Are sure this is P and not the upper 15-bits of IP/PC? You say P, so I'm going with that, but good to confirm I guess.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Near »

Also ...

Code: Select all

000011F. xxxxxxxx	0C	BEQ	imm	Branch on Equal (Z=1)
000100F. xxxxxxxx	10	BGE	imm	Branch on Greater or Equal (T=1)
000101F. xxxxxxxx	14	BMI	imm	Branch on Minus (N=1)
000110F. xxxxxxxx	18	BVS	imm	Branch on Overflow (V=1)
We have the unused bit at d8 ... why the heck isn't this used to implement BNE, BLT, BPL, BVC? Augh ... so ridiculous.

I'm very torn between implementing the unused bits as opcode mirrors versus as no operations. Overload, what is your confidence in the .s being mirrors? Eg I notice for NOP, it could really have 11 dots instead of 10 to fill in a missing gap there.

Code: Select all

01011001 ........	59	EXTS	A	Sign Extension (8 bits)		N Z
01011010 ........	5A	EXTS	A	Sign Extension (16 bits)		N Z
Wow, this one's a lot more limited than I thought. I figured it would take a register-or-immediate. I guess it would have been awkward since there were no shift bits. Also, I wasn't setting flags on these, fun.
Overload
Posts: 47
Joined: Mon May 30, 2011 4:38 pm
Location: Australia
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Overload »

nocash wrote:Very nice document. That was available in 2011??? I've implemented the CX4 stuff in no$sns in December 2011, back then I've only found two txt files (attached below), maybe that were much earlier versions of your doc? The cx4.html file appears in the internet archive starting at 2013, though I've missed it back then, too, and never heard about it until today : /
Yes it was. It's been on the internet since the 17th of November 2011, that's the date listed on the main page. Until yesterday the site hadn't been updated since 2011. I also dumped the CX4 data ROM on the 8th of June 2011 incase you are interested.
nocash wrote:Going by ikari's findings, opcode 4000h does merely do an address increment (it isn't RDBUS).
This was segher's interpretation of the opcode, as it says in the notes I couldn't get it to work and reading from $2e only moved zero into MBR as was the case for all addresses from $0 to $5f.
nocash wrote:What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it? Or wait, is it some sort of "signed carry"? Ie. the Sign and Overflow flags being merged into a single flag?
Mnemonics are based on Hitachi namings. "Greater Than Flag" is a carry Flag.

Cx4info.txt is seghers doc and cx4opcodes.txt is from the "CX4 Program ROM thread" on byuu's forum. Some of those are my comments including the exploit I used to dump the data rom.
Last edited by Overload on Thu Sep 06, 2018 10:26 pm, edited 2 times in total.
Overload
Posts: 47
Joined: Mon May 30, 2011 4:38 pm
Location: Australia
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Overload »

byuu wrote:"Most" feels like a strong overstatement.
Maybe not most but a lot of it was already known. I have information that is unpublished as well. I'm pretty sure some of that information was already in the "CX4 Program ROM" thread on your forums.
byuu wrote:On your instruction set ... you're missing $1c00 (wait for external bus reads), and it seems there's permanent confusion about the sequence of $612e,$4000,$1c00's purpose. So ... it's definitely not 100% complete. Again, this is awesome stuff and I'm not trying to be negative.
Is it confirmed, is there definite proof that it is a wait instruction and not simply a nop or double nop?
byuu wrote: For XNOR, when imm is 8-bit, is it a = a ^ ~(uint8_t)imm, or is it a & ~imm? Eg what happens to bits 8-23 of the 8-bit immediate value?
The immediate value is zero extended to 24 bits and the whole 24bits inverted.
byuu wrote:Holy heck I was missing an entire flag o_O

Can you please share the algorithm for computing this flag? Is it just the standard?
standard overflow
byuu wrote:I guess you're probably not too interested in updating the document at this point with more details, heh.
I just updated it with some more info. I will update it if any of the information is incorrect or proven or If I feel like adding more info. I am still involved in emulation, kindred is still an active project.
byuu wrote:It really feels like 18 would be a data ROM address, but obviously that isn't necessary ...
I think they threw darts at a dart board when they were deciding the address numbers.
byuu wrote:... is it really the case that $e0xx can write to all registers, but not $e1xx? That makes zero sense ... I mean, really ... absolutely zero sense. Why on earth would they impose such a limitation? Same for 62xx/61xx and reads. In the read case, does $61xx from registers 00-5f just always return 0, or just not set the MBR at all, or mirror the GPRs throughout the whole space?
It does in the sense that MBR, the target is a Special Purpose Register and logically you can't be reading and writing to an SPR at the same time as they would exist in the same array. The General Purpose Registers must exist in a separate array with a datapath between them and the SPR array. You can't transfer from SPR to SPR. I hope that makes sense?
byuu wrote: Pretty sure it's 8x23-bits, not 16x16+-bits.
definitelty 8x23-bits, I tested that. More than 8 pushes will overwrite previous stack pushes.
byuu wrote:Also, what the heck with the data RAM? Isn't it just 3072x8-bits? The way the RDRAM and WRRAM instructions work doesn't jive with it being 4 x ... x 16-bits at all. If it were 1 x ... 24-bits, then we wouldn't need L,M,H (L,H,B) variant instructions, presumably. And if it were 16-bits, we would only have two instead of three of them ...
I can't remember whether it was I or somebody else who came up with that. Maybe it has something to do with how it is laid out on the die. I'll have to think about it some.
byuu wrote:Are sure this is P and not the upper 15-bits of IP/PC? You say P, so I'm going with that, but good to confirm I guess.
Correct. Not the upper bits of IP/PC.
Overload
Posts: 47
Joined: Mon May 30, 2011 4:38 pm
Location: Australia
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Overload »

byuu wrote: We have the unused bit at d8 ... why the heck isn't this used to implement BNE, BLT, BPL, BVC? Augh ... so ridiculous.

I'm very torn between implementing the unused bits as opcode mirrors versus as no operations. Overload, what is your confidence in the .s being mirrors? Eg I notice for NOP, it could really have 11 dots instead of 10 to fill in a missing gap there.
I was looking at these last night. My test program and kindred both have the following which is got me scratching my head. It's been so long since I worked on this.

Code: Select all

000010F0 xxxxxxxx	08	BRA	imm	Branch (Always)
000011F0 xxxxxxxx	0C	BEQ	imm	Branch on Equal (Z=1)
000100F0 xxxxxxxx	10	BGE	imm	Branch on Greater or Equal (T=1)
000101F0 xxxxxxxx	14	BMI	imm	Branch on Minus (N=1)
000110F0 xxxxxxxx	18	BVS	imm	Branch on Overflow (V=1)
001010F0 xxxxxxxx	28	BSR	imm	Branch Subroutine
001011F0 xxxxxxxx	2C	BSREQ	imm	Branch Subroutine on Equal (Z=1)
001100F0 xxxxxxxx	30	BSRGE	imm	Branch Subroutine on Greater or Equal (T=1)
001101F0 xxxxxxxx	34	BSRMI	imm	Branch Subroutine on Minus (N=1)
001110F0 xxxxxxxx	38	BSRVS	imm	Branch Subroutine on Overflow (V=1)
I would think that these are probably correct as the test program ran in parallel with the hardware. So $09xx, $0Bxx, $0Dxx, etc.. I have as nops.
How can we be certain that $04xx is a nop, it could be a mirror of $1cxx :wink:
There are gaps everywhere that all seem to be nops, so many nops.
byuu wrote: Wow, this one's a lot more limited than I thought. I figured it would take a register-or-immediate. I guess it would have been awkward since there were no shift bits. Also, I wasn't setting flags on these, fun.
I hope it doesn't break anything :D
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by nocash »

I am slowly working through the new findings. Phew, there are a quite a lot of details that need to be changed... and I'll need to change most things four times: for assembler + disassembler + emulator + specifications : (
Overload wrote:Mnemonics are based on Hitachi namings. "Greater Than Flag" is a carry Flag.
The Greater naming is really weird, first of, it should be called GreaterOrEqual, and second, Less/Greater does conventionally imply signed comparisions (as opposed to Above/Below or Higher/Lower for unsigned comparisions) (but as far as I understand the CX4 "Greater" is meant to be unsigned). So, please rename it to Carry (or add some caution saying that Greater doesn't actually mean Greater).
Oh, and, just to be sure: Hitachi didn't release any actual CX4 opcode specs, or did they?

I am almost 100% sure that opcode 4000h does just do "inc ext_ptr". Ikari said so, and I did have guessed/used it that way since no$sns v1.0, too. And the CX4 disassembly doesn't make too much sense otherwise (it's here and there using that opcode just for incrementing the ext_ptr, without actually doing any memory access in that places).
I am quite sure that you can test opcode 4000h with your hardware setup and won't need a clock signal for it.
The actual memory access should consist of opcodes 612Eh+1C00h, that might actually hang on your hardware (assuming that they need a clock source for the waitstate counter).
Ikari seems to have encountered crashes when using 612Eh without trailing 1C00h. Though maybe one could replace 1C00h by four NOPs (equivalent to the usual 4 waitstates), or maybe the hardware still screws up somehow despite of the NOPs.
Overload wrote:Is it confirmed, is there definite proof that it is a wait instruction and not simply a nop or double nop?
Yes, I would say so. Ikari seems to have tested the timings with different waitstate settings (affecting the cycles for 1C00h), and also with/without 4000h (taking one cycles less on 1C00h if opcode 4000h was already taking up one cycle). For details, search for "1C00" and "4000" in ikari's specs.

---

The older specs (in the cx4opcodes.txt file posted above) did include "reg=00h" for using "A" as operand.
The newer specs (cx4.html file) leaves reg=00h undefined.
Which one is correct?

If "A" can be used then one could do stuff like "ADD A,A*2,A" (aka multiply by 3) or "CMP A,A" (aka clear N,Z flags).
If "A" cannot be used then CPU emulation would be much easier/faster (as I have "A" stored in a 80x86 register).

Alongsides, I've noticed that the older specs did permit accessing internal rom/ram via [reg], and that existing code is actually using "[A]" in that place - but the newer specs say that those opcodes can use only [A] (ie. not [reg]).
I guess that might have been the reason for originally believing that "reg=00h" would mean "A".
Last edited by nocash on Fri Sep 07, 2018 7:50 pm, edited 2 times in total.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Near »

I'm pretty sure some of that information was already in the "CX4 Program ROM" thread on your forums.
That's indeed quite possible. It was a hectic time. We tore through all of the DSPs in short order.
The immediate value is zero extended to 24 bits and the whole 24bits inverted.
Ah, good, so it should follow traditional XNOR equality, then: ~(A^B) == ~A^B == A^~B
I hope that makes sense?
Mostly, yes. I'll trust your expertise and add the limitation, then.
It's been so long since I worked on this.
Indeed ... the time goes by so fast anymore.

Nonetheless, I greatly appreciate you jumping back into this and reashing this for nocash and I.
I regret it took me so long to get back to the Cx4, but your help is truly invaluable here, thank you.
I hope it doesn't break anything
Genius me, I decided to rewrite the entire CPU core to not be a mess of else if((addr&mask)==patern) blocks. So I'm sure I'll end up breaking all kinds of things :3
Phew, there are a quite a lot of details that need to be changed... and I'll need to change most things four times: for assembler + disassembler + emulator + specifications : (
Right? >_<

Code: Select all

01100001 .xxxxxxx	61	MOV	MBR, reg**	Data Transfer Instruction	MBR = reg**
01100010 ....xxxx	62	MOV	MAR, Rx	Data Transfer Instruction	MAR = Rx
So bizarre that both aren't just ....xxxx Rx, or both 011.xxxx reg**
What exactly happens on MOV MBR,reg[00-5f]? Does it get loaded with zero?

Code: Select all

11100000 .xxxxxxx	E0	MOV	reg, A	Data Transfer Instruction	reg = A
11100001 .xxxxxxx	E1	MOV	reg**, MBR	Data Transfer Instruction	reg** = MBR
And then no E2,E3 ... this is truly a bizarre architecture.

...

I wonder what happens if we read past the end of data RAM ... whether it mirrors (000-3ff,400-7ff,800-bff,800-bff) or just returns zeroes.
Last edited by Near on Fri Sep 07, 2018 8:03 pm, edited 1 time in total.
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by nocash »

byuu wrote:I mean, what other bus could it access? It's not like the Cx4 can transfer data to VRAM. The only valid transfers seem to be cart ROM->cart RAM or Cx4 DRAM, cart RAM->DRAM, or Cx4 DRAM->cart RAM. Unknown if DMA to/from IO registers work, but hey, maybe. Then you get the fun possibility of a DMA transfer firing off a DMA transfer.

Are the DMA transfers staggered like SNES DMA? Eg:

cycle 1: read source[0]
cycle 2: write target[0], read source[1]
cycle n: write target[n-1], read target[n]
cycle n+1: write target[n]
SNES DMA doesn't work like that. The two basic DMA schemes are this:

Code: Select all

 Transfer with src+dst on different address busses:
  1st cycle: read source[0], write dest[0]   ;aka [dst+0]=[src+0]
  2nd cycle: read source[1], write dest[1]   ;aka [dst+1]=[src+1]
 Transfer with src+dst on same address busses:
  1st cycle: read source[0], write temp      ;aka temp=[src+0]
  2nd cycle: read temp, write dest[0]        ;aka [dst+0]=temp
  3rd cycle: read source[1], write temp      ;aka temp=[src+1]
  4th cycle: read temp, write dest[1]        ;aka [dst+1]=temp
The first case is faster, and SNES DMA works like that. And I would assume that CX4 DMA to other address bus does also work as so (so the DMA would probably take only 1+WS cycles per byte) (though ikari wasn't perfectly clear about DMA timings, it might be also 0+WS, or 2+WS or whatever).
The second case is slower, and it's used for CX4 DMA on same bus (same bus means CartROM to CartRAM) (as opposed to internal CX4RAM which probably doesn't use the Cart bus). I guess transfer time would be 1+WS1+1+WS2 per byte (again, ikari wasn't too clear, it might be also 0+WS1+0+WS2 or whatever).
byuu wrote:What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?
Main purpose is probably just using it as linker base address (so the CX4 code can be linked more easily with SNES code; it affects only opcode fetches, not data fetches, ie.doesn't work out for cases where the CX4 code reads data from 24bit CardROM addresses; hence the changed bank numbers in CX4 code in MegamanX2 vs MegamanX3).
Aside from linking, the base address might also help on fitting 16bit program bank numbers into 8bit immediates (even if the CX4 code is located at higher memory addresses).

But what do you mean by Program RAM? I don't know of a way to execute CX4 in CartRAM, nor in internal CX4RAM, do you?
If there's a way to do such a thing then it would likewise require changing some special flag. Just changing the "ROM address" into a "RAM address" probably won't do it (since CX4 also needs different opcodes when reading data from CartROM vs CartRAM).
Hmmmm, or well, ikari seems to be saying that DMA works for CartROM+CartRAM (as far as I can see without needing to change any flags for CartROM vs CartRAM vs InternalCX4RAM). And also says that $7f48 could cache CartROM and CartRAM (though the part about "CPU misc. (caching)" does mention ROM only).
Don't know if there's a way to execute code in CartRAM (but existing retail carts don't have any CartRAM installed anyways).
byuu wrote:What happens if I write register $20? Can I actually simulate a (short) PC jump that way?
Might be so. If it works then it would probably end up with a 1-2 cycle branch delay (alike branch delays on MIPS processors).
Not that it would be too useful (or good practice) to do such things.
byuu wrote:With $2e and $2f, what happens if I change the bus address during the fetch period?
The whole idea about the "inc ext_ptr" opcode is that you do change the bus address during the fetch period. You could probably also issue more than one "inc ext_ptr" during fetch.
There might some restrictions about using other opcodes than "inc ext_ptr" during the fetch period (for example, maybe things could screw up when accessing r0-r15 during fetch).
byuu wrote:Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?
Concerning opcodes it's "1+WS" cycles, ikari is listing a bunch of test cases (at the end of his doc), and also mentions 250ns for WS=4.
Concerning DMA it's probably also 1+WS (or 1+WS1+1+WS2 for same bus dma).
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Near »

The first case is faster, and SNES DMA works like that.
It can't work like that. To write a value, the data bus has to be valid for the entire duration of the cycle. And to read a value, the data bus won't be populated for some time, potentially up to the entire length of the cycle, but usually, in DMA's case, halfway through the cycle (4 of 8 clocks.)

SNES DMA + HDMA very very conveniently have an extra cycle (8 clocks) that suddenly makes sense if you consider the second, staggered approach. Otherwise under your model, there's an extra 8 clocks of setup time that doesn't really seem necessary.

I'm not too sure how we could prove this either way. It's possible to read $2137 to latch the counters during a DMA B bus read, but not to latch the counters from a DMA B bus write, so we can't simply use that to determine when the relevant reads and write occur within the DMA itself.

I guess we'd need a logic analyzer to prove whose theory is correct here.
Aside from linking, the base address might also help on fitting 16bit program bank numbers into 8bit immediates (even if the CX4 code is located at higher memory addresses).
Oooh, very clever! This sounds the most plausible reason to me. I hadn't thought of that.
But what do you mean by Program RAM?
Oh, that's just what I'm calling the 2x256x16-bit instruction cache.
The whole idea about the "inc ext_ptr" opcode is that you do change the bus address during the fetch period.
Yeah, so we pretty much *need* to cache the current bus address while we wait for the value to be populated.

............

Code: Select all

01100001 .xxxxxxx	61	MOV	MBR, reg**	Data Transfer Instruction	MBR = reg**
** Registers 60-7F only.
Implementing this breaks sprites in Rockman X2's opening sequence, the 2 on the X2 title screen, and I stopped looking after that.

Allowing it to access 00-7f makes the game work again.

The instruction being executed is $612e, which is of course the first of the three-opcode sequence for reading from the bus.
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: Some tidbits about the Cx4 (attn: byuu, nocash)

Post by Near »

For whatever this is worth ...

Code: Select all

0000 00.. .... ....  NOP
0000 01.. .... ....  ???
0000 10f. dddd dddd  JMP imm
0000 11f. dddd dddd  JMP EQ,imm
0001 00f. dddd dddd  JMP GE,imm
0001 01f. dddd dddd  JMP MI,imm
0001 10f. dddd dddd  JMP VS,imm
0001 11.. .... ....  WAIT
0010 00.. .... ....  ???
0010 0100 .... ...t  SKIP V
0010 0101 .... ...t  SKIP C
0010 0110 .... ...t  SKIP Z
0010 0111 .... ...t  SKIP N
0010 10f. dddd dddd  JSR
0010 11f. dddd dddd  JSR EQ,imm
0011 00f. dddd dddd  JSR GE,imm
0011 01f. dddd dddd  JSR MI,imm
0011 10f. dddd dddd  JSR VS,imm
0011 11.. .... ....  RTS
0100 00.. .... ....  INC MAR
0100 01.. .... ....  ???
0100 10ss .rrr rrrr  CMPR A<<s,reg
0100 11ss iiii iiii  CMPR A<<s,imm
0101 00ss .rrr rrrr  CMP A<<s,reg
0101 01ss iiii iiii  CMP A<<s,imm
0101 1000 .... ....  ???
0101 1001 .... ....  SXB
0101 1010 .... ....  SXW
0101 1011 .... ....  ???
0101 11.. .... ....  ???
0110 0000 .rrr rrrr  LD A,reg
0110 0001 .rrr rrrr  LD MDR,reg
0110 0010 .rrr rrrr  LD MAR,reg
0110 0011 .... rrrr  LD P,Rn
0110 0100 iiii iiii  LD A,imm
0110 0101 iiii iiii  LD MDR,imm
0110 0110 iiii iiii  LD MAR,imm
0110 0111 iiii iiii  LD P,imm
0110 1000 .... ....  RDRAM 0,A
0110 1001 .... ....  RDRAM 1,A
0110 1010 .... ....  RDRAM 2,A
0110 1011 .... ....  ???
0110 1100 iiii iiii  RDRAM 0,imm
0110 1101 iiii iiii  RDRAM 1,imm
0110 1110 iiii iiii  RDRAM 2,imm
0110 1111 .... ....  ???
0111 00.. .... ....  RDROM A
0111 01ii iiii iiii  RDROM imm
0111 10.. .... ....  ???
0111 1100 iiii iiii  LD PL,imm
0111 1101 .iii iiii  LD PH,imm
0111 111. .... ....  ???
1000 00ss .rrr rrrr  ADD A<<s,reg
1000 01ss iiii iiii  ADD A<<s,imm
1000 10ss .rrr rrrr  SUBR A<<s,reg
1000 11ss iiii iiii  SUBR A<<s,imm
1001 00ss .rrr rrrr  SUB A<<s,reg
1001 01ss iiii iiii  SUB A<<s,imm
1001 10.. .rrr rrrr  MUL reg
1001 11.. iiii iiii  MUL imm
1010 00ss .rrr rrrr  XNOR A<<s,reg
1010 01ss iiii iiii  XNOR A<<s,imm
1010 10ss .rrr rrrr  XOR A<<s,reg
1010 11ss iiii iiii  XOR A<<s,imm
1011 00ss .rrr rrrr  AND A<<s,reg
1011 01ss iiii iiii  AND A<<s,imm
1011 10ss .rrr rrrr  OR A<<s,reg
1011 11ss iiii iiii  OR A<<s,imm
1100 00.. .rrr rrrr  SHR A,reg
1100 01.. ...i iiii  SHR A,imm
1100 10.. .rrr rrrr  ASR A,reg
1100 11.. ...i iiii  ASR A,imm
1101 00.. .rrr rrrr  ROR A,reg
1101 01.. ...i iiii  ROR A,imm
1101 10.. .rrr rrrr  SHL A,reg
1101 11.. ...i iiii  SHL A,imm
1110 0000 .rrr rrrr  ST reg,A
1110 0001 .rrr rrrr  ST reg,MDR
1110 001. .... ....  ???
1110 01.. .... ....  ???
1110 1000 .... ....  WRRAM 0,A
1110 1001 .... ....  WRRAM 1,A
1110 1010 .... ....  WRRAM 2,A
1110 1011 .... ....  ???
1110 1100 iiii iiii  WRRAM 0,imm
1110 1101 iiii iiii  WRRAM 1,imm
1110 1110 iiii iiii  WRRAM 2,imm
1110 1111 .... ....  ???
1111 00.. .... rrrr  SWAP A,Rn
1111 01.. .... ....  ???
1111 10.. .... ....  CLEAR
1111 11.. .... ....  HALT
To analyze the unknown instructions ...

Code: Select all

0000 01.. .... ....  probably a valid instruction; but possibly NOP is 0000 0... .... ....
0010 00.. .... ....  very likely to be a valid instruction
0100 01.. .... ....  perhaps INC MDR, INC P, or INC DPR?
0101 1000 .... ....  would be sign-extend 0-bits ... may set A to zero?
0101 1011 .... ....  would be sign-extend 24-bits (no change) ... may set N/Z flags?
0101 11.. .... ....  very likely to be a valid instruction
0110 1011 .... ....  would be RDRAM 3,A ... most likely a no-op
0110 1111 .... ....  would be RDRAM 3,imm ... most likely a no-op
0111 10.. .... ....  could be a RDROM variant?
0111 111. .... ....  could be wasted on LD P[16-23],[24-31] which doesn't exist ...
1110 001. .... ....  likely to be ST reg,MAR and ST reg,P
1110 01.. .... ....  very likely to be a valid instruction
1110 1011 .... ....  would be WRRAM 3,A ... most likely a no-op
1110 1111 .... ....  would be WRRAM 3,imm ... most likely a no-op
1111 01.. .... ....  almost guaranteed to be a valid instruction; sitting between SWAP and CLEAR
Post Reply