Some tidbits about the Cx4 (attn: byuu, nocash)
Moderator: Moderators
Forum rules
- For making cartridges of your Super NES games, see Reproduction.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Cool! That sped up the spinning wireframe head in the intro which was quite demanding before.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Almost everything in this thread is now emulated in the bsnes-plus master branch, excluding some of the details about $6000-7fff memory access while the Cx4 is running. I might add that stuff later but it's nothing that either of MMX2 or MMX3 rely on to work correctly, I think.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
A year late, but not forgotten ...
Questions in the event anyone knows.
Is it confirmed in LoROM mode that the Cx4 bus can't see ROM at $c0-ff:0000-ffff?
Are the DMA transfers staggered like SNES DMA? Eg:
cycle 1: read source[0]
cycle 2: write target[0], read source[1]
cycle n: write target[n-1], read target[n]
cycle n+1: write target[n]
Or are they more like this?
cycle 1: read source[0]
cycle 2: write target[0]
cycle n*2: read source[n]
cycle n*2+1: write target[n]
What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?
What happens if I write $7f48, but the requested page is already cached? Will it forcefully reload it again? Seems like we'd kind of want it to, right? Eg if you were loading from cart RAM, the data may be modified.
Is $7f4d-7f4f cached to an internal program counter register, or is $7f4d-7f4f the actual program counter itself? Eg if I write to it while the Cx4 is running, do horrible things happen?
Does writing to $7f53 get the Cx4 out of a lockup from a bad DMA transfer as well?
What happens if I write register $20? Can I actually simulate a (short) PC jump that way? Very bizarre it's not the full 24-bit PC, but given the way $7f4d-7f4f are laid out (bytes 8-15, bytes 16-23, THEN bytes 0-7), I suspect the PC is only 8-bit, and the extra 16-bits are just the "active" P register.
With $2e and $2f, what happens if I change the bus address during the fetch period? Will it screw up the read/write, or is the address cached? I suspect it must be cached, because if $4000 (increment bus address) occurs before the wait, then well ... that's not good, is it?
nocash doesn't seem to have the ability to test on real hardware, so with:
http://problemkaputt.de/fullsnes.htm#sn ... cx4opcodes
6000h+nnoooooooo ?? ??? mov A/ext_dta/?/prg_page,<op>
Where does the prg_page part come from? It's always so hard to tell in nocash's docs what's speculation and what's confirmed behavior. MMX2/3 doesn't use $63xx instructions, so I don't see how he'd know that. It's not obvious from other instructions that it's some sort of pattern.
Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?
What happens if I start a DMA or cache page operation while the Cx4 is active? Will it take priority over the Cx4 instruction processing, stay pending until the Cx4 instructions are halted, or just do nothing?
What happens if the bus address used for $2e isn't ROM, or for $2f isn't RAM? Will it just transform the address and access said bus ANYWAY, ignore the operation, or lock the chip up?
What happens if a DMA crosses from RAM into ROM during the transfer?
Are the DMA source/length/target values updated during a transfer?
What does a DMA transfer length of zero do?
What's the actual overhead of DMA transfers per byte? Eg you say "EXTRA waitstates" on cart ROM/RAM accesses, so does that imply that accesses to internal RAM take one cycle? Or that the wait states are really waitstates+1?
Questions in the event anyone knows.
Is it confirmed in LoROM mode that the Cx4 bus can't see ROM at $c0-ff:0000-ffff?
I mean, what other bus could it access? It's not like the Cx4 can transfer data to VRAM. The only valid transfers seem to be cart ROM->cart RAM or Cx4 DRAM, cart RAM->DRAM, or Cx4 DRAM->cart RAM. Unknown if DMA to/from IO registers work, but hey, maybe. Then you get the fun possibility of a DMA transfer firing off a DMA transfer.DMA source and destination CAN reference the same bus but not the same chip
Are the DMA transfers staggered like SNES DMA? Eg:
cycle 1: read source[0]
cycle 2: write target[0], read source[1]
cycle n: write target[n-1], read target[n]
cycle n+1: write target[n]
Or are they more like this?
cycle 1: read source[0]
cycle 2: write target[0]
cycle n*2: read source[n]
cycle n*2+1: write target[n]
What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?
What happens if I write $7f48, but the requested page is already cached? Will it forcefully reload it again? Seems like we'd kind of want it to, right? Eg if you were loading from cart RAM, the data may be modified.
Is $7f4d-7f4f cached to an internal program counter register, or is $7f4d-7f4f the actual program counter itself? Eg if I write to it while the Cx4 is running, do horrible things happen?
Does writing to $7f53 get the Cx4 out of a lockup from a bad DMA transfer as well?
Apparently $7f53 is updated if status bit 0 is set.$7f55: Any write access indefinitely suspends the Cx4 (registers can be
read and written but the CPU shows no reaction: no buffering occurs,
no code is run and $7f53 is not updated).
Cx4 status bit 0 is set.
What happens if I write register $20? Can I actually simulate a (short) PC jump that way? Very bizarre it's not the full 24-bit PC, but given the way $7f4d-7f4f are laid out (bytes 8-15, bytes 16-23, THEN bytes 0-7), I suspect the PC is only 8-bit, and the extra 16-bits are just the "active" P register.
With $2e and $2f, what happens if I change the bus address during the fetch period? Will it screw up the read/write, or is the address cached? I suspect it must be cached, because if $4000 (increment bus address) occurs before the wait, then well ... that's not good, is it?
nocash doesn't seem to have the ability to test on real hardware, so with:
http://problemkaputt.de/fullsnes.htm#sn ... cx4opcodes
6000h+nnoooooooo ?? ??? mov A/ext_dta/?/prg_page,<op>
Where does the prg_page part come from? It's always so hard to tell in nocash's docs what's speculation and what's confirmed behavior. MMX2/3 doesn't use $63xx instructions, so I don't see how he'd know that. It's not obvious from other instructions that it's some sort of pattern.
Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?
What happens if I start a DMA or cache page operation while the Cx4 is active? Will it take priority over the Cx4 instruction processing, stay pending until the Cx4 instructions are halted, or just do nothing?
What happens if the bus address used for $2e isn't ROM, or for $2f isn't RAM? Will it just transform the address and access said bus ANYWAY, ignore the operation, or lock the chip up?
What happens if a DMA crosses from RAM into ROM during the transfer?
Are the DMA source/length/target values updated during a transfer?
What does a DMA transfer length of zero do?
What's the actual overhead of DMA transfers per byte? Eg you say "EXTRA waitstates" on cart ROM/RAM accesses, so does that imply that accesses to internal RAM take one cycle? Or that the wait states are really waitstates+1?
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011. Information regarding register mapping and the entire instruction set was published. The complete instruction set has been on the snes9x dsp website for 7 years.
http://users.tpg.com.au/advlink/dsp/cx4.html
The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.
I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?
http://users.tpg.com.au/advlink/dsp/cx4.html
The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.
I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Very nice document. That was available in 2011??? I've implemented the CX4 stuff in no$sns in December 2011, back then I've only found two txt files (attached below), maybe that were much earlier versions of your doc? The cx4.html file appears in the internet archive starting at 2013, though I've missed it back then, too, and never heard about it until today : /Overload wrote:Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011. Information regarding register mapping and the entire instruction set was published. The complete instruction set has been on the snes9x dsp website for 7 years.
http://users.tpg.com.au/advlink/dsp/cx4.html
The only thing I couldn't test on my mashmods flash programmer was RDBUS which most likely needs the snes clock signal to function.
I didn't find anything at $2e or $2f it could be that these addresses also require a clocking signal from the snes?
Going by ikari's findings, opcode 4000h does merely do an address increment (it isn't RDBUS).
The CLEAR opcode and XNOR opcodes are new (to me), good to know about them.
Some opcodes can use only "Rx" (r0-r15) instead of "reg" (all registers)? Nasty : (
Having info about affected flags is useful (and knowing about the overflow flag to exist).
What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it? Or wait, is it some sort of "signed carry"? Ie. the Sign and Overflow flags being merged into a single flag?
- Attachments
-
- Cx4info.txt
- (1.17 KiB) Downloaded 419 times
-
- cx4opcodes.txt
- (2.93 KiB) Downloaded 414 times
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
"Most" feels like a strong overstatement.Most of this information (except timing info) was made available when segher and I reverse engineered the CX4 chip back in 2011.
Your page states the meaning of register $28, and has a few more opcodes (super, super awesome, by the way, thank you so much!) ... but there's nothing about program RAM caching, the extra MMIO registers, wait state on ROM/RAM accesses, or answers to any? of my questions.
On your instruction set ... you're missing $1c00 (wait for external bus reads), and it seems there's permanent confusion about the sequence of $612e,$4000,$1c00's purpose. So ... it's definitely not 100% complete. Again, this is awesome stuff and I'm not trying to be negative.
Code: Select all
101000SS .xxxxxxx A0 XNOR A, reg Inverse Exclusive OR A = A ^ !reg N Z
101001SS xxxxxxxx A4 XNOR A, imm Inverse Exclusive OR A = A ^ !imm N Z
Code: Select all
V Overflow Flag 0
Can you please share the algorithm for computing this flag? Is it just the standard?
ADD: V = ~(A ^ data) & (A ^ result) & 0x8000;
SUB: V = ~(A ^ data) & (A ^ result) & 0x8000;
Or is it the slightly modified variant?
ADD: V = ~(A ^ data) & (A ^ result) & 0x8000;
SUB: V = (A ^ data) & (A ^ result) & 0x8000;
I guess you're probably not too interested in updating the document at this point with more details, heh.
Code: Select all
03 MBR Memory Buffer Register 8* RW 00
08 ROMB Immediate ROM 24 R DATA_ROM[0]
0C RAMB RAM Buffer 24 RW 000000
13 MAR Memory Address Register 24 RW FFFFFF
1C DPR RAM Address 12* RW 000
Code: Select all
01100001 .xxxxxxx 61 MOV MBR, reg** Data Transfer Instruction MBR = reg**
01100010 ....xxxx 62 MOV MAR, Rx Data Transfer Instruction MAR = Rx
Code: Select all
11100000 .xxxxxxx E0 MOV reg, A Data Transfer Instruction reg = A
11100001 .xxxxxxx E1 MOV reg**, MBR Data Transfer Instruction reg** = MBR
Yeah, it seems to be. Your document also says:What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it?
I have it as:Carry Flag is CLEARED on borrow (ie. opposite as on 80x86 CPUs).
int r = ri() - sa();
cf = r >= 0;
I really don't think MMX2/3 would be playable if we had the carry flag implemented incorrectly ... right?
Also from your docs:
Pretty sure it's 8x23-bits, not 16x16+-bits.Call Stack is reportedly 16 levels deep, at least 16bits per level.
Also, what the heck with the data RAM? Isn't it just 3072x8-bits? The way the RDRAM and WRRAM instructions work doesn't jive with it being 4 x ... x 16-bits at all. If it were 1 x ... 24-bits, then we wouldn't need L,M,H (L,H,B) variant instructions, presumably. And if it were 16-bits, we would only have two instead of three of them ...Stack Circular, 8 x 23-Bits
Internal RAM 4 x 384 x 16-Bits
Are sure this is P and not the upper 15-bits of IP/PC? You say P, so I'm going with that, but good to confirm I guess.28 P Page Select 15* RW 00FF
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Also ...
We have the unused bit at d8 ... why the heck isn't this used to implement BNE, BLT, BPL, BVC? Augh ... so ridiculous.
I'm very torn between implementing the unused bits as opcode mirrors versus as no operations. Overload, what is your confidence in the .s being mirrors? Eg I notice for NOP, it could really have 11 dots instead of 10 to fill in a missing gap there.
Wow, this one's a lot more limited than I thought. I figured it would take a register-or-immediate. I guess it would have been awkward since there were no shift bits. Also, I wasn't setting flags on these, fun.
Code: Select all
000011F. xxxxxxxx 0C BEQ imm Branch on Equal (Z=1)
000100F. xxxxxxxx 10 BGE imm Branch on Greater or Equal (T=1)
000101F. xxxxxxxx 14 BMI imm Branch on Minus (N=1)
000110F. xxxxxxxx 18 BVS imm Branch on Overflow (V=1)
I'm very torn between implementing the unused bits as opcode mirrors versus as no operations. Overload, what is your confidence in the .s being mirrors? Eg I notice for NOP, it could really have 11 dots instead of 10 to fill in a missing gap there.
Code: Select all
01011001 ........ 59 EXTS A Sign Extension (8 bits) N Z
01011010 ........ 5A EXTS A Sign Extension (16 bits) N Z
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Yes it was. It's been on the internet since the 17th of November 2011, that's the date listed on the main page. Until yesterday the site hadn't been updated since 2011. I also dumped the CX4 data ROM on the 8th of June 2011 incase you are interested.nocash wrote:Very nice document. That was available in 2011??? I've implemented the CX4 stuff in no$sns in December 2011, back then I've only found two txt files (attached below), maybe that were much earlier versions of your doc? The cx4.html file appears in the internet archive starting at 2013, though I've missed it back then, too, and never heard about it until today : /
This was segher's interpretation of the opcode, as it says in the notes I couldn't get it to work and reading from $2e only moved zero into MBR as was the case for all addresses from $0 to $5f.nocash wrote:Going by ikari's findings, opcode 4000h does merely do an address increment (it isn't RDBUS).
Mnemonics are based on Hitachi namings. "Greater Than Flag" is a carry Flag.nocash wrote:What is a "Greater Than Flag"??? Going by the opcode descriptions it's "Greater or Equal (T=1)". If it's greater or equal... then it's just a regular "Carry Flag" isn't it? Or wait, is it some sort of "signed carry"? Ie. the Sign and Overflow flags being merged into a single flag?
Cx4info.txt is seghers doc and cx4opcodes.txt is from the "CX4 Program ROM thread" on byuu's forum. Some of those are my comments including the exploit I used to dump the data rom.
Last edited by Overload on Thu Sep 06, 2018 10:26 pm, edited 2 times in total.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
Maybe not most but a lot of it was already known. I have information that is unpublished as well. I'm pretty sure some of that information was already in the "CX4 Program ROM" thread on your forums.byuu wrote:"Most" feels like a strong overstatement.
Is it confirmed, is there definite proof that it is a wait instruction and not simply a nop or double nop?byuu wrote:On your instruction set ... you're missing $1c00 (wait for external bus reads), and it seems there's permanent confusion about the sequence of $612e,$4000,$1c00's purpose. So ... it's definitely not 100% complete. Again, this is awesome stuff and I'm not trying to be negative.
The immediate value is zero extended to 24 bits and the whole 24bits inverted.byuu wrote: For XNOR, when imm is 8-bit, is it a = a ^ ~(uint8_t)imm, or is it a & ~imm? Eg what happens to bits 8-23 of the 8-bit immediate value?
standard overflowbyuu wrote:Holy heck I was missing an entire flag o_O
Can you please share the algorithm for computing this flag? Is it just the standard?
I just updated it with some more info. I will update it if any of the information is incorrect or proven or If I feel like adding more info. I am still involved in emulation, kindred is still an active project.byuu wrote:I guess you're probably not too interested in updating the document at this point with more details, heh.
I think they threw darts at a dart board when they were deciding the address numbers.byuu wrote:It really feels like 18 would be a data ROM address, but obviously that isn't necessary ...
It does in the sense that MBR, the target is a Special Purpose Register and logically you can't be reading and writing to an SPR at the same time as they would exist in the same array. The General Purpose Registers must exist in a separate array with a datapath between them and the SPR array. You can't transfer from SPR to SPR. I hope that makes sense?byuu wrote:... is it really the case that $e0xx can write to all registers, but not $e1xx? That makes zero sense ... I mean, really ... absolutely zero sense. Why on earth would they impose such a limitation? Same for 62xx/61xx and reads. In the read case, does $61xx from registers 00-5f just always return 0, or just not set the MBR at all, or mirror the GPRs throughout the whole space?
definitelty 8x23-bits, I tested that. More than 8 pushes will overwrite previous stack pushes.byuu wrote: Pretty sure it's 8x23-bits, not 16x16+-bits.
I can't remember whether it was I or somebody else who came up with that. Maybe it has something to do with how it is laid out on the die. I'll have to think about it some.byuu wrote:Also, what the heck with the data RAM? Isn't it just 3072x8-bits? The way the RDRAM and WRRAM instructions work doesn't jive with it being 4 x ... x 16-bits at all. If it were 1 x ... 24-bits, then we wouldn't need L,M,H (L,H,B) variant instructions, presumably. And if it were 16-bits, we would only have two instead of three of them ...
Correct. Not the upper bits of IP/PC.byuu wrote:Are sure this is P and not the upper 15-bits of IP/PC? You say P, so I'm going with that, but good to confirm I guess.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
I was looking at these last night. My test program and kindred both have the following which is got me scratching my head. It's been so long since I worked on this.byuu wrote: We have the unused bit at d8 ... why the heck isn't this used to implement BNE, BLT, BPL, BVC? Augh ... so ridiculous.
I'm very torn between implementing the unused bits as opcode mirrors versus as no operations. Overload, what is your confidence in the .s being mirrors? Eg I notice for NOP, it could really have 11 dots instead of 10 to fill in a missing gap there.
Code: Select all
000010F0 xxxxxxxx 08 BRA imm Branch (Always)
000011F0 xxxxxxxx 0C BEQ imm Branch on Equal (Z=1)
000100F0 xxxxxxxx 10 BGE imm Branch on Greater or Equal (T=1)
000101F0 xxxxxxxx 14 BMI imm Branch on Minus (N=1)
000110F0 xxxxxxxx 18 BVS imm Branch on Overflow (V=1)
001010F0 xxxxxxxx 28 BSR imm Branch Subroutine
001011F0 xxxxxxxx 2C BSREQ imm Branch Subroutine on Equal (Z=1)
001100F0 xxxxxxxx 30 BSRGE imm Branch Subroutine on Greater or Equal (T=1)
001101F0 xxxxxxxx 34 BSRMI imm Branch Subroutine on Minus (N=1)
001110F0 xxxxxxxx 38 BSRVS imm Branch Subroutine on Overflow (V=1)
How can we be certain that $04xx is a nop, it could be a mirror of $1cxx
There are gaps everywhere that all seem to be nops, so many nops.
I hope it doesn't break anythingbyuu wrote: Wow, this one's a lot more limited than I thought. I figured it would take a register-or-immediate. I guess it would have been awkward since there were no shift bits. Also, I wasn't setting flags on these, fun.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
I am slowly working through the new findings. Phew, there are a quite a lot of details that need to be changed... and I'll need to change most things four times: for assembler + disassembler + emulator + specifications : (
Oh, and, just to be sure: Hitachi didn't release any actual CX4 opcode specs, or did they?
I am almost 100% sure that opcode 4000h does just do "inc ext_ptr". Ikari said so, and I did have guessed/used it that way since no$sns v1.0, too. And the CX4 disassembly doesn't make too much sense otherwise (it's here and there using that opcode just for incrementing the ext_ptr, without actually doing any memory access in that places).
I am quite sure that you can test opcode 4000h with your hardware setup and won't need a clock signal for it.
The actual memory access should consist of opcodes 612Eh+1C00h, that might actually hang on your hardware (assuming that they need a clock source for the waitstate counter).
Ikari seems to have encountered crashes when using 612Eh without trailing 1C00h. Though maybe one could replace 1C00h by four NOPs (equivalent to the usual 4 waitstates), or maybe the hardware still screws up somehow despite of the NOPs.
---
The older specs (in the cx4opcodes.txt file posted above) did include "reg=00h" for using "A" as operand.
The newer specs (cx4.html file) leaves reg=00h undefined.
Which one is correct?
If "A" can be used then one could do stuff like "ADD A,A*2,A" (aka multiply by 3) or "CMP A,A" (aka clear N,Z flags).
If "A" cannot be used then CPU emulation would be much easier/faster (as I have "A" stored in a 80x86 register).
Alongsides, I've noticed that the older specs did permit accessing internal rom/ram via [reg], and that existing code is actually using "[A]" in that place - but the newer specs say that those opcodes can use only [A] (ie. not [reg]).
I guess that might have been the reason for originally believing that "reg=00h" would mean "A".
The Greater naming is really weird, first of, it should be called GreaterOrEqual, and second, Less/Greater does conventionally imply signed comparisions (as opposed to Above/Below or Higher/Lower for unsigned comparisions) (but as far as I understand the CX4 "Greater" is meant to be unsigned). So, please rename it to Carry (or add some caution saying that Greater doesn't actually mean Greater).Overload wrote:Mnemonics are based on Hitachi namings. "Greater Than Flag" is a carry Flag.
Oh, and, just to be sure: Hitachi didn't release any actual CX4 opcode specs, or did they?
I am almost 100% sure that opcode 4000h does just do "inc ext_ptr". Ikari said so, and I did have guessed/used it that way since no$sns v1.0, too. And the CX4 disassembly doesn't make too much sense otherwise (it's here and there using that opcode just for incrementing the ext_ptr, without actually doing any memory access in that places).
I am quite sure that you can test opcode 4000h with your hardware setup and won't need a clock signal for it.
The actual memory access should consist of opcodes 612Eh+1C00h, that might actually hang on your hardware (assuming that they need a clock source for the waitstate counter).
Ikari seems to have encountered crashes when using 612Eh without trailing 1C00h. Though maybe one could replace 1C00h by four NOPs (equivalent to the usual 4 waitstates), or maybe the hardware still screws up somehow despite of the NOPs.
Yes, I would say so. Ikari seems to have tested the timings with different waitstate settings (affecting the cycles for 1C00h), and also with/without 4000h (taking one cycles less on 1C00h if opcode 4000h was already taking up one cycle). For details, search for "1C00" and "4000" in ikari's specs.Overload wrote:Is it confirmed, is there definite proof that it is a wait instruction and not simply a nop or double nop?
---
The older specs (in the cx4opcodes.txt file posted above) did include "reg=00h" for using "A" as operand.
The newer specs (cx4.html file) leaves reg=00h undefined.
Which one is correct?
If "A" can be used then one could do stuff like "ADD A,A*2,A" (aka multiply by 3) or "CMP A,A" (aka clear N,Z flags).
If "A" cannot be used then CPU emulation would be much easier/faster (as I have "A" stored in a 80x86 register).
Alongsides, I've noticed that the older specs did permit accessing internal rom/ram via [reg], and that existing code is actually using "[A]" in that place - but the newer specs say that those opcodes can use only [A] (ie. not [reg]).
I guess that might have been the reason for originally believing that "reg=00h" would mean "A".
Last edited by nocash on Fri Sep 07, 2018 7:50 pm, edited 2 times in total.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
That's indeed quite possible. It was a hectic time. We tore through all of the DSPs in short order.I'm pretty sure some of that information was already in the "CX4 Program ROM" thread on your forums.
Ah, good, so it should follow traditional XNOR equality, then: ~(A^B) == ~A^B == A^~BThe immediate value is zero extended to 24 bits and the whole 24bits inverted.
Mostly, yes. I'll trust your expertise and add the limitation, then.I hope that makes sense?
Indeed ... the time goes by so fast anymore.It's been so long since I worked on this.
Nonetheless, I greatly appreciate you jumping back into this and reashing this for nocash and I.
I regret it took me so long to get back to the Cx4, but your help is truly invaluable here, thank you.
Genius me, I decided to rewrite the entire CPU core to not be a mess of else if((addr&mask)==patern) blocks. So I'm sure I'll end up breaking all kinds of things :3I hope it doesn't break anything
Right? >_<Phew, there are a quite a lot of details that need to be changed... and I'll need to change most things four times: for assembler + disassembler + emulator + specifications : (
Code: Select all
01100001 .xxxxxxx 61 MOV MBR, reg** Data Transfer Instruction MBR = reg**
01100010 ....xxxx 62 MOV MAR, Rx Data Transfer Instruction MAR = Rx
What exactly happens on MOV MBR,reg[00-5f]? Does it get loaded with zero?
Code: Select all
11100000 .xxxxxxx E0 MOV reg, A Data Transfer Instruction reg = A
11100001 .xxxxxxx E1 MOV reg**, MBR Data Transfer Instruction reg** = MBR
...
I wonder what happens if we read past the end of data RAM ... whether it mirrors (000-3ff,400-7ff,800-bff,800-bff) or just returns zeroes.
Last edited by Near on Fri Sep 07, 2018 8:03 pm, edited 1 time in total.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
SNES DMA doesn't work like that. The two basic DMA schemes are this:byuu wrote:I mean, what other bus could it access? It's not like the Cx4 can transfer data to VRAM. The only valid transfers seem to be cart ROM->cart RAM or Cx4 DRAM, cart RAM->DRAM, or Cx4 DRAM->cart RAM. Unknown if DMA to/from IO registers work, but hey, maybe. Then you get the fun possibility of a DMA transfer firing off a DMA transfer.
Are the DMA transfers staggered like SNES DMA? Eg:
cycle 1: read source[0]
cycle 2: write target[0], read source[1]
cycle n: write target[n-1], read target[n]
cycle n+1: write target[n]
Code: Select all
Transfer with src+dst on different address busses:
1st cycle: read source[0], write dest[0] ;aka [dst+0]=[src+0]
2nd cycle: read source[1], write dest[1] ;aka [dst+1]=[src+1]
Transfer with src+dst on same address busses:
1st cycle: read source[0], write temp ;aka temp=[src+0]
2nd cycle: read temp, write dest[0] ;aka [dst+0]=temp
3rd cycle: read source[1], write temp ;aka temp=[src+1]
4th cycle: read temp, write dest[1] ;aka [dst+1]=temp
The second case is slower, and it's used for CX4 DMA on same bus (same bus means CartROM to CartRAM) (as opposed to internal CX4RAM which probably doesn't use the Cart bus). I guess transfer time would be 1+WS1+1+WS2 per byte (again, ikari wasn't too clear, it might be also 0+WS1+0+WS2 or whatever).
Main purpose is probably just using it as linker base address (so the CX4 code can be linked more easily with SNES code; it affects only opcode fetches, not data fetches, ie.doesn't work out for cases where the CX4 code reads data from 24bit CardROM addresses; hence the changed bank numbers in CX4 code in MegamanX2 vs MegamanX3).byuu wrote:What exactly is the purpose of $7f49-7f4b? You can address every possible location with $7f4d-7f4f already. Does $7f49-7f4b get taken into account with program RAM page tagging?
Aside from linking, the base address might also help on fitting 16bit program bank numbers into 8bit immediates (even if the CX4 code is located at higher memory addresses).
But what do you mean by Program RAM? I don't know of a way to execute CX4 in CartRAM, nor in internal CX4RAM, do you?
If there's a way to do such a thing then it would likewise require changing some special flag. Just changing the "ROM address" into a "RAM address" probably won't do it (since CX4 also needs different opcodes when reading data from CartROM vs CartRAM).
Hmmmm, or well, ikari seems to be saying that DMA works for CartROM+CartRAM (as far as I can see without needing to change any flags for CartROM vs CartRAM vs InternalCX4RAM). And also says that $7f48 could cache CartROM and CartRAM (though the part about "CPU misc. (caching)" does mention ROM only).
Don't know if there's a way to execute code in CartRAM (but existing retail carts don't have any CartRAM installed anyways).
Might be so. If it works then it would probably end up with a 1-2 cycle branch delay (alike branch delays on MIPS processors).byuu wrote:What happens if I write register $20? Can I actually simulate a (short) PC jump that way?
Not that it would be too useful (or good practice) to do such things.
The whole idea about the "inc ext_ptr" opcode is that you do change the bus address during the fetch period. You could probably also issue more than one "inc ext_ptr" during fetch.byuu wrote:With $2e and $2f, what happens if I change the bus address during the fetch period?
There might some restrictions about using other opcodes than "inc ext_ptr" during the fetch period (for example, maybe things could screw up when accessing r0-r15 during fetch).
Concerning opcodes it's "1+WS" cycles, ikari is listing a bunch of test cases (at the end of his doc), and also mentions 250ns for WS=4.byuu wrote:Just to be clear, are the ROM/RAM waitstates the exact # of cycles requires, or the # of ADDITIONAL cycles required, above 1? Eg WS1=3, is that "the data's ready in 3 cycles", or "the data's ready in 4 cycles"?
Concerning DMA it's probably also 1+WS (or 1+WS1+1+WS2 for same bus dma).
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
It can't work like that. To write a value, the data bus has to be valid for the entire duration of the cycle. And to read a value, the data bus won't be populated for some time, potentially up to the entire length of the cycle, but usually, in DMA's case, halfway through the cycle (4 of 8 clocks.)The first case is faster, and SNES DMA works like that.
SNES DMA + HDMA very very conveniently have an extra cycle (8 clocks) that suddenly makes sense if you consider the second, staggered approach. Otherwise under your model, there's an extra 8 clocks of setup time that doesn't really seem necessary.
I'm not too sure how we could prove this either way. It's possible to read $2137 to latch the counters during a DMA B bus read, but not to latch the counters from a DMA B bus write, so we can't simply use that to determine when the relevant reads and write occur within the DMA itself.
I guess we'd need a logic analyzer to prove whose theory is correct here.
Oooh, very clever! This sounds the most plausible reason to me. I hadn't thought of that.Aside from linking, the base address might also help on fitting 16bit program bank numbers into 8bit immediates (even if the CX4 code is located at higher memory addresses).
Oh, that's just what I'm calling the 2x256x16-bit instruction cache.But what do you mean by Program RAM?
Yeah, so we pretty much *need* to cache the current bus address while we wait for the value to be populated.The whole idea about the "inc ext_ptr" opcode is that you do change the bus address during the fetch period.
............
Code: Select all
01100001 .xxxxxxx 61 MOV MBR, reg** Data Transfer Instruction MBR = reg**
** Registers 60-7F only.
Allowing it to access 00-7f makes the game work again.
The instruction being executed is $612e, which is of course the first of the three-opcode sequence for reading from the bus.
Re: Some tidbits about the Cx4 (attn: byuu, nocash)
For whatever this is worth ...
To analyze the unknown instructions ...
Code: Select all
0000 00.. .... .... NOP
0000 01.. .... .... ???
0000 10f. dddd dddd JMP imm
0000 11f. dddd dddd JMP EQ,imm
0001 00f. dddd dddd JMP GE,imm
0001 01f. dddd dddd JMP MI,imm
0001 10f. dddd dddd JMP VS,imm
0001 11.. .... .... WAIT
0010 00.. .... .... ???
0010 0100 .... ...t SKIP V
0010 0101 .... ...t SKIP C
0010 0110 .... ...t SKIP Z
0010 0111 .... ...t SKIP N
0010 10f. dddd dddd JSR
0010 11f. dddd dddd JSR EQ,imm
0011 00f. dddd dddd JSR GE,imm
0011 01f. dddd dddd JSR MI,imm
0011 10f. dddd dddd JSR VS,imm
0011 11.. .... .... RTS
0100 00.. .... .... INC MAR
0100 01.. .... .... ???
0100 10ss .rrr rrrr CMPR A<<s,reg
0100 11ss iiii iiii CMPR A<<s,imm
0101 00ss .rrr rrrr CMP A<<s,reg
0101 01ss iiii iiii CMP A<<s,imm
0101 1000 .... .... ???
0101 1001 .... .... SXB
0101 1010 .... .... SXW
0101 1011 .... .... ???
0101 11.. .... .... ???
0110 0000 .rrr rrrr LD A,reg
0110 0001 .rrr rrrr LD MDR,reg
0110 0010 .rrr rrrr LD MAR,reg
0110 0011 .... rrrr LD P,Rn
0110 0100 iiii iiii LD A,imm
0110 0101 iiii iiii LD MDR,imm
0110 0110 iiii iiii LD MAR,imm
0110 0111 iiii iiii LD P,imm
0110 1000 .... .... RDRAM 0,A
0110 1001 .... .... RDRAM 1,A
0110 1010 .... .... RDRAM 2,A
0110 1011 .... .... ???
0110 1100 iiii iiii RDRAM 0,imm
0110 1101 iiii iiii RDRAM 1,imm
0110 1110 iiii iiii RDRAM 2,imm
0110 1111 .... .... ???
0111 00.. .... .... RDROM A
0111 01ii iiii iiii RDROM imm
0111 10.. .... .... ???
0111 1100 iiii iiii LD PL,imm
0111 1101 .iii iiii LD PH,imm
0111 111. .... .... ???
1000 00ss .rrr rrrr ADD A<<s,reg
1000 01ss iiii iiii ADD A<<s,imm
1000 10ss .rrr rrrr SUBR A<<s,reg
1000 11ss iiii iiii SUBR A<<s,imm
1001 00ss .rrr rrrr SUB A<<s,reg
1001 01ss iiii iiii SUB A<<s,imm
1001 10.. .rrr rrrr MUL reg
1001 11.. iiii iiii MUL imm
1010 00ss .rrr rrrr XNOR A<<s,reg
1010 01ss iiii iiii XNOR A<<s,imm
1010 10ss .rrr rrrr XOR A<<s,reg
1010 11ss iiii iiii XOR A<<s,imm
1011 00ss .rrr rrrr AND A<<s,reg
1011 01ss iiii iiii AND A<<s,imm
1011 10ss .rrr rrrr OR A<<s,reg
1011 11ss iiii iiii OR A<<s,imm
1100 00.. .rrr rrrr SHR A,reg
1100 01.. ...i iiii SHR A,imm
1100 10.. .rrr rrrr ASR A,reg
1100 11.. ...i iiii ASR A,imm
1101 00.. .rrr rrrr ROR A,reg
1101 01.. ...i iiii ROR A,imm
1101 10.. .rrr rrrr SHL A,reg
1101 11.. ...i iiii SHL A,imm
1110 0000 .rrr rrrr ST reg,A
1110 0001 .rrr rrrr ST reg,MDR
1110 001. .... .... ???
1110 01.. .... .... ???
1110 1000 .... .... WRRAM 0,A
1110 1001 .... .... WRRAM 1,A
1110 1010 .... .... WRRAM 2,A
1110 1011 .... .... ???
1110 1100 iiii iiii WRRAM 0,imm
1110 1101 iiii iiii WRRAM 1,imm
1110 1110 iiii iiii WRRAM 2,imm
1110 1111 .... .... ???
1111 00.. .... rrrr SWAP A,Rn
1111 01.. .... .... ???
1111 10.. .... .... CLEAR
1111 11.. .... .... HALT
Code: Select all
0000 01.. .... .... probably a valid instruction; but possibly NOP is 0000 0... .... ....
0010 00.. .... .... very likely to be a valid instruction
0100 01.. .... .... perhaps INC MDR, INC P, or INC DPR?
0101 1000 .... .... would be sign-extend 0-bits ... may set A to zero?
0101 1011 .... .... would be sign-extend 24-bits (no change) ... may set N/Z flags?
0101 11.. .... .... very likely to be a valid instruction
0110 1011 .... .... would be RDRAM 3,A ... most likely a no-op
0110 1111 .... .... would be RDRAM 3,imm ... most likely a no-op
0111 10.. .... .... could be a RDROM variant?
0111 111. .... .... could be wasted on LD P[16-23],[24-31] which doesn't exist ...
1110 001. .... .... likely to be ST reg,MAR and ST reg,P
1110 01.. .... .... very likely to be a valid instruction
1110 1011 .... .... would be WRRAM 3,A ... most likely a no-op
1110 1111 .... .... would be WRRAM 3,imm ... most likely a no-op
1111 01.. .... .... almost guaranteed to be a valid instruction; sitting between SWAP and CLEAR