It is currently Sun Oct 22, 2017 9:25 pm

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 147 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10
Author Message
PostPosted: Mon Jun 13, 2016 1:30 am 
Online

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
lidnariq wrote:
93143 wrote:
...how would such a system handle 8-bit writes? Would it be possible to assert a write and then just not put a signal on half the data lines, or would it have to read the word, modify it, and then store it back?
Could do the same as the 68k, and have separate "upper byte" and "lower byte" strobes.

You mean writing each byte separately, in parallel but staggered by half a memory cycle? With memory constructed as a pair of 8-bit units instead of one 16-bit unit?

Yeah, I guess that'd work. Not perfect, but 8-bit writes wouldn't be horrible any more. Solves the PPU bus problem too. That's what I get for posting a stream of consciousness instead of taking the time to think it through, or at least waiting for an answer...

But the question remains - how hard would it have been to do this? Did Nintendo pass up an easy method of supercharging the console, or is there something about this that would have been prohibitive in 1990, or is there another theoretical issue I haven't thought of?


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 2:41 am 
Offline

Joined: Mon Mar 30, 2015 10:14 am
Posts: 177
With 1 cycle for accessing memory you need 180ns chip (@5.36 mhz), easily doable but you must cut the 128ko of WRAM to 64/32 ko for reduce costs ..
For ROMS, since sega used 150ns chip in her cartridges , I do not see how Nintendo could not do the same .

i think really that the snes was scheduled to be out in 88/89, but delayed by the PPU and/or spc developpement,because in 90's 65816 was more faster than 5/6 mhz, close to 14 if i remember correctly .


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 2:59 am 
Online

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
The 8-bit WRAM is already good for 2.68 MHz. Therefore a 2x8-bit dual WRAM should be good for 5.37 MHz (with wait states for random access) if the hypothetical 16-bit memory controller can overlap low and high byte accesses, giving each one a full memory cycle. (This may assume a 7.16 MHz internal CPU speed, analogous to the current 3.58 MHz internal speed, which I suspect allows 5 master cycles for the WRAM to respond...? That would explain the ratio between the SlowROM and FastROM specifications... Okay, so everyone probably already knew this. I'm not a hardware guy, all right?)

Based on testing with the B bus WRAM gate, the S-WRAM can stably respond to at least a couple of sequential accesses at FastROM speeds, which implies at least ~150 ns performance or so.


Wait... it seems to me that if you were to demux the bank byte and use the full cycle for data, you could read at twice the speed (writes would be delayed half a cycle because the data's not there yet). Could the ROM in SA-1 games actually be ordinary SlowROM? Could the SNES have been designed so that the 120 ns FastROM spec was sufficient for 14.3 MHz? Or am I missing something, and it's already reading as fast as it can?


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 5:51 am 
Offline

Joined: Thu Aug 12, 2010 3:43 am
Posts: 1589
93143 wrote:
You mean writing each byte separately, in parallel but staggered by half a memory cycle? With memory constructed as a pair of 8-bit units instead of one 16-bit unit?

The 68000 has a 16-bit bus but can do byte accesses. What it does is have a lower strobe and an upper strobe indicating which bytes it wants to access (lower strobe for low byte, upper strobe for high byte, both strobes for word). Those are two more signals (much like e.g. the address lines)

So the suggestion was to use strobes to let hardware know whether a byte or a word access is wanted =P


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 10:12 am 
Offline

Joined: Mon Mar 30, 2015 10:14 am
Posts: 177
but it can not access to an odd address .. :(


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 11:16 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6297
Location: Seattle
It cannot access a 16-bit word or 32-bit long at an odd address, but it does let byte addresses be unaligned.

As near as I can tell, basically every >8-bit cpu except x86 does the same (MIPS, ARM, POWER), and you can opt in to "fault instead of be slow" on x86: https://en.wikipedia.org/wiki/Bus_error#Example


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 12:24 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2295
93143 wrote:

Wait... it seems to me that if you were to demux the bank byte and use the full cycle for data, you could read at twice the speed (writes would be delayed half a cycle because the data's not there yet). Could the ROM in SA-1 games actually be ordinary SlowROM? Could the SNES have been designed so that the 120 ns FastROM spec was sufficient for 14.3 MHz? Or am I missing something, and it's already reading as fast as it can?

I also wonder if they got rid of the multixing to begin with, they could've got the write data earlier in the cycle.


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 12:28 pm 
Online

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 430
93143 wrote:
byuu wrote:
The SNES could have been a beast had they included a NEC uPD7725 with program and data RAM (for per-game upload of firmware) instead of ROM.

I've looked up the datasheet for the μPD77C/P25, and now I'm wondering why the DSP-1 took so long to do stuff. According to the datasheet a 16x16 signed multiply is just one of several things that can all happen in one cycle, but the SNES manual lists that same multiply as taking 26 cycles. The datasheet says 2.58 μs for a sin/cos, but the SNES manual says 7.8 μs at about the same clock speed. Is there really that much overhead involved in getting this chip to do something on demand?


Yes, yes there is. The uPD7725 has no way of implementing anything like a jump table. A significant portion of the program ROM in the DSP-1 is dedicated to command decoding, which has to be done with a tree of test-and-branches on each bit of the command in turn. That's why the DSP-1 has so many mirrored commands, and the more important commands have more mirrors than the less important ones.

ETA: Here's what command decoding in the DSP-1B looks like (comments added, obviously) You can count for yourself how many cycles it takes to decode each command.

Code:
000: 97c000 jrqm   000
001: c10007 ld     0400,sr
002: c02006 ld     0080,dr
003: c03002 ld     00c0,b
004: 97c010 jrqm   004
005: 128081 mov    dr,a
            and    dr,b
006: 91800c jnzb   003
007: c00007 ld     0000,sr
008: 0b0000 shr1   a    ; xxxxxxx*
009: 9040a0 jca    028  ; xxxxxxx1
00a: 0b0000 shr1   a    ; xxxxxx*0
00b: 904080 jca    020  ; xxxxxx10
00c: 0b0000 shr1   a    ; xxxxx*00
00d: 904060 jca    018  ; xxxxx100
00e: 0b0000 shr1   a    ; xxxx*000
00f: 90404c jca    013  ; xxxx1000
010: 0b0000 shr1   a    ; xxx*0000
011: 9006b0 jnca   1ac  ; xxx00000 0x00, 0x20
012: 9046cc jca    1b3  ; xxx10000 0x10, 0x30
013: 0b0000 shr1   a    ; xxx*1000
014: 904784 jca    1e1  ; xxx11000 0x18, 0x38
015: 0b0000 shr1   a    ; xx*01000
016: 900740 jnca   1d0  ; xx001000 0x08
017: 9046f4 jca    1bd  ; xx101000 0x28
018: 0b0000 shr1   a    ; xxxx*100
019: 904074 jca    01d  ; xxxx1100
01a: 0b0000 shr1   a    ; xxx*0100
01b: 9007d0 jnca   1f4  ; xxx00100 0x04, 0x24
01c: 904800 jca    200  ; xxx10100 0x14, 0x34
01d: 0b0000 shr1   a    ; xxx*1100
01e: 9008fc jnca   23f  ; xxx01100 0x0c, 0x2c
01f: 904940 jca    250  ; xxx11100 0x1c, 0x3c
020: 0b0000 shr1   a    ; xxxxx*10
021: 904094 jca    025  ; xxxxx110
022: 0b0000 shr1   a    ; xxxx*010
023: 9009ec jnca   27b  ; xxxx0010 0x02, 0x12, 0x22, 0x32
024: 904d80 jca    360  ; xxxx1010 0x0a, 0x1a, 0x2a, 0x3a
025: 0b0000 shr1   a    ; xxxx*110
026: 900e8c jnca   3a3  ; xxxx0110 0x06, 0x16, 0x26, 0x36
027: 905068 jca    41a  ; xxxx1110 0x0e, 0x1e, 0x2e, 0x3e
028: 0b0000 shr1   a    ; xxxxxx*1
029: 9040b8 jca    02e  ; xxxxxx11
02a: 0b0000 shr1   a    ; xxxxx*01
02b: 0b0000 shr1   a    ; xxxx*x01
02c: 901120 jnca   448  ; xxxx0x01
02d: 905224 jca    489  ; xxxx1x01
02e: 0b0000 shr1   a    ; xxxxx*11
02f: 9040cc jca    033  ; xxxxx111
030: 0b0000 shr1   a    ; xxxx*011
031: 90128c jnca   4a3  ; xxxx0011
032: 9052f4 jca    4bd  ; xxxx1011
033: 0b0000 shr1   a    ; xxxx*111
034: 0b0000 shr1   a    ; xxx*x111
035: 9053a8 jca    4ea  ; xxx1x111 0x17, 0x1f, 0x37, 0x3f
036: 0b0000 shr1   a    ; xx*0x111
037: 90133c jnca   4cf  ; xx00x111 0x07, 0x0f
038: 9053c0 jca    4f0  ; xx10x111 0x27, 0x2f


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 11:42 pm 
Online

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
Sik wrote:
93143 wrote:
You mean writing each byte separately, in parallel but staggered by half a memory cycle? With memory constructed as a pair of 8-bit units instead of one 16-bit unit?

The 68000 has a 16-bit bus but can do byte accesses. What it does is have a lower strobe and an upper strobe indicating which bytes it wants to access (lower strobe for low byte, upper strobe for high byte, both strobes for word). Those are two more signals (much like e.g. the address lines)

So the suggestion was to use strobes to let hardware know whether a byte or a word access is wanted =P

Yeah, I got that. I googled it, and that's where I found the bit about dual 8-bit memories - apparently it's a popular setup for a 68000, and obviously has no trouble taking a half-word write without clobbering the other half. The SNES uses this setup for VRAM.

But it seems to me that there are additional considerations with the 65816. It uses an 8-bit bus on the CPU side, so technically all memory accesses are 8-bit. So if you want the doubled speed from using 16-bit memory, it would be faster for the memory controller to stagger writes by one CPU cycle, or half a memory cycle, so as to use each byte as soon as it comes through rather than waiting until the whole word is ready. Getting any smarter than that seems like it would require CPU emulation in the glue logic.

AWJ wrote:
The uPD7725 has no way of implementing anything like a jump table.

Why not map an input register to the program space? Check for a new command and branch to the instruction register, which contains a jump to the desired program.

I shouldn't spend any time studying this chip right now. Busy busy...


Top
 Profile  
 
PostPosted: Tue Jun 14, 2016 12:11 am 
Offline

Joined: Thu Aug 12, 2010 3:43 am
Posts: 1589
I think the point was that you couldn't do indirect jumps at all (i.e. address has to be hardcoded in the opcode), otherwise you could just copy the address from a table into a register and jump there. I guess self-modifying code wasn't an option either, right?


Top
 Profile  
 
PostPosted: Tue Jun 14, 2016 12:19 am 
Online

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 430
That's what I meant. The uPD7725 has no indirect or computed jumps, only absolute. And it can't execute out of RAM; it's a pure Harvard architecture with program ROM, data ROM and data RAM as completely separate address spaces.


Top
 Profile  
 
PostPosted: Tue Jun 14, 2016 1:07 am 
Online

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
No, I meant map an input port into the DSP's program ROM space, to allow the S-CPU to write in an absolute jump instruction (or the relevant part of one, the rest being ordinary ROM). The thing can at least do conditional jumps, so a normal input register could be checked for a 'start' command and the result used to branch to the mapped address. Or, better yet, you could map the input port directly to the address section of the conditional jump and eliminate the extra instruction...

Even a single externally writable byte in the program memory would be enough to implement a jump table, and two bytes would make a jump table unnecessary by allowing a direct jump to anywhere in the ROM (though that would require the S-CPU to write a 16-bit value, and S-CPU cycles might be more precious than DSP cycles).

I think there's an SPC loading program that does something similar (but far more timing-sensitive), having the S-SMP jump to the I/O ports so the S-CPU can operate it like a puppet to load the last few bytes of the RAM image...

I'm not sure exactly how tightly integrated the DSP and its program memory are in a hardware sense, or how easy it would be to insert this level of customization... though it strikes me that byuu's program RAM idea would trivially allow this sort of stunt if you used dual-ported memory...


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 147 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10

All times are UTC - 7 hours


Who is online

Users browsing this forum: AWJ and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group