It is currently Thu Oct 19, 2017 4:10 am

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 147 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next
Author Message
PostPosted: Thu May 26, 2016 6:58 pm 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 936
ehaliewicz wrote:
tepples wrote:
The fastest you could scroll on a GS was about 10 to 15 fps using the obscure "PEA field" technique that overlays the stack on the hardware frame buffer, and the back buffer is stored backwards as a block of self-modifying code.


Another guy used the same kind of technique to get what looks like 30+fps scrolling.
http://iigs.dreamhosters.com/gte/gte.html
https://www.youtube.com/watch?v=IsXPn6OCMF8
8-)

Both very fascinating! ...I wonder what happened to the last bits.


Top
 Profile  
 
PostPosted: Sun Jun 05, 2016 4:36 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 784
byuu wrote:
The SNES could have been a beast had they included a NEC uPD7725 with program and data RAM (for per-game upload of firmware) instead of ROM.

I've looked up the datasheet for the μPD77C/P25, and now I'm wondering why the DSP-1 took so long to do stuff. According to the datasheet a 16x16 signed multiply is just one of several things that can all happen in one cycle, but the SNES manual lists that same multiply as taking 26 cycles. The datasheet says 2.58 μs for a sin/cos, but the SNES manual says 7.8 μs at about the same clock speed. Is there really that much overhead involved in getting this chip to do something on demand?

Also, at 50 mA peak, this thing draws as much current as a SNES mouse. I suppose that rules out its use with the Super FX or SA-1...

...

Regarding the tvtropes page, the CPU entry reads like this (I deleted the extra bit about how the slow CPU was an attempt to save money by shoving the load off onto special chips in cartridges):
Quote:
*Like the NES, the Super NES has a Central Processing Unit for main data processing, and a Picture Processing Unit for the graphics. Also like the NES, the Super NES CPU and PPU have a master clock speed of 21.477 MHz, but the CPU divides it down to between 1.79, 2.68 and 3.58 MHz due to slow (cheap) cartridge ROM, and it was cheaper to make the system with said clock speed. This led to the belief that the SNES is a slow system, and that too much on screen action would slow it down.

I was thinking about putting an expandable note at the end of that section, as follows:
Quote:
The reality seems to have been a bit more complicated. The 65C816 was more cycle-efficient than the 68000, especially with typical game logic of the era, meaning the difference in clock speed with the Sega Genesis was less important than it looked. However, the 65C816 wasn't nearly as popular/widespread or easy to use, and hackers have reported some fairly boneheaded programming in commercial games, particularly in early releases. In addition, while the graphics processor in the SNES was more powerful than its competition and loaded with features, it was complex and fiddly to work with. It was certainly possible to put a lot of action on screen in a SNES game without slowdown, as demonstrated by later games like ''Rendering Ranger R2''.

I should probably also finesse the part that implies that 1.79 MHz ROM access was a thing... Is it true that a game that relies exclusively on autopoll need never run at 1.79 MHz? I mean, the autopoll doesn't interrupt the CPU, right?

The sub-entry after the main CPU entry used to read:
Quote:
** The processor itself was a 65C816, a 16-bit successor to the 6502 used in the NES, Apple II, Commodore 64, and Atari consoles and computers. Nintendo actually used Apple IIGS computers as development systems, since they also used the 65C816.

The recent edits added the following to the sub-entry:
Quote:
Since the 6502 family has only one accumulator register, every operation that uses a second operand must reference the RAM. Accessing the RAM is limited by the 8-bit data bus. Therefore, 16-bit operations were slower than 8 bit operations, but the 16-bit operations were still faster than emulating them with 8-bit instructions.

I was thinking of replacing that with an expandable note that reads:
Quote:
This explains the lower clock speeds vs. the Sega Genesis. The 6502 was designed as a budget processor when RAM was faster than CPUs, and thus it used an accumulator-based architecture with very simple instructions that required it to access the bus almost every cycle. The 65C816 inherited this operational paradigm, along with the 8-bit data bus that forced it to access code and data one byte at a time (this meant that 16-bit operations were somewhat slower than 8-bit operations, though still far faster than the sequence of 8-bit operations that would be required to do the same task). The 68000, by contrast, used more complex instructions that took more cycles to execute, and accessed its 16-bit data bus only once every four CPU cycles, relying on its array of internal general-purpose registers to keep processing speed up. This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU while using slower, cheaper memory. Note however that the fast turnaround of the 65C816 makes it more powerful at a given clock speed than the 68000, so the advantage isn't as big as it looks (bus throughput is nearly identical on both systems; ironically the more sophisticated 68000 is better at moving bulk data and the more primitive 65C816 is better at navigating complicated logic).

The advantage of an expandable note is that it allows long explanations without cluttering the default view with walls of text, and is thus less likely to get flagged as "natter" and cut. Still, that's an awfully long explanation...

Comments?


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 12:35 am 
Offline

Joined: Mon Mar 30, 2015 10:14 am
Posts: 175
Quote:
This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU while using slower, cheaper memory.

I think this is correct for a CPU only system(like atari ST or the apple 2GS),but when stuffs like DMA are involved, you need fast RAM/ROM too CPUs don't count anymore.
The Md's WRAM is 150 ns,the ROM also needs to be 150ns (if you use DMA),the PCE needs 140ns RAM/ROM with his 7,16 mhz CPU .

The 68k was more expensive than the 816,even in the 16bit era,really the snes's problem is more his architecture than his cpu speed .Of course i don't said a 65816 @2,6 is enough, but his low frequency is impacted by the non sense of the snes's architecture .

I think the 68k would have been better suited to the snes's architecture, you can have a faster cpu's clock,with the snes's slow memory .


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 2:34 am 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 784
...hmm. I guess that's true; the VDP's DMA unit can use the whole 16-bit bus at two pixels per word, which is an equivalent bus speed to the SNES DMA unit even in H32 mode. In H40 mode it's nearly as fast as FastROM... Thanks for pointing that out.

I thought I remembered something about certain MD games using super slow ROM (~500 ns in one case) and still running well. Maybe that was nonsense, or maybe I misunderstood or remembered wrong...

How about this:
Quote:
This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU without needing correspondingly faster, more expensive memory.

No need to get into DMA speed comparisons at this point in the article...

EDIT: Found the thread: http://www.sega-16.com/forum/showthread ... -ROM-speed
From what I can tell, some games might have used chips that slow, but they'd have had to avoid using DMA to pull directly from ROM. Well, whatever; the new version is less misleading...


Further comments?


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 3:38 am 
Offline

Joined: Mon Mar 30, 2015 10:14 am
Posts: 175
Code:
I thought I remembered something about certain MD games using super slow ROM (~500 ns in one case) and still running well. Maybe that was nonsense, or maybe I misunderstood or remembered wrong...

it's only possible if no DMA are used from ROM,and i don't think,even in early games,they didn't use DMA from ROM at all .
If Md could transfert in active display(more than 4 words) it would be a different story, but with an unlimited acces only in vblank, it's not concevable for me.

http://gendev.spritesmind.net/forum/vie ... 8&start=30

Quote:
Quote:
This is why the Genesis was able to use a CPU clocked more than twice as fast as the SNES CPU without needing correspondingly faster, more expensive memory.


No need to get into DMA speed comparisons at this point in the article...

Of course we need, because it's related to memory speed,which is true like i said for systems where the CPU is the fastest chip for accessing datas,and we don't speak of a simple 68k/816 costs comparison, but the use of those CPU in two 2D game systems,that are not only CPU dependent,and involving some other chips.

Usually the 65xx needs 1 cycle for accessing memory, like DMA in general,this is why the PCE's CPU needs 140ns memory @7,16 mhz, and MD 150ns for his 6,67mhz DMA .
I think Nintendo has focused on the snes's PPU and the audio chip, which were (very ??) expensive, and reduced costs on the other parts (CPU / RAM / ROM) .
I think really that the couple of MD's CPU (68k + z80) was way more expensive than the 816 + his DMA controler.


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 2:31 pm 
Offline

Joined: Thu Aug 12, 2010 3:43 am
Posts: 1589
The big thing about the 68000 was that there was a lot more of people experienced with it both for its use in computers and in arcades.

That said, I was under the impression ROM speed had to be 120ns? I mean, the access still has to happen in a single cycle after all, the reason the 68000 spends four cycles is because of its microcoded nature. But I could be wrong, I should look up 68000's bus cycle timings probably.

EDIT: issues the access in 2nd cycle, data appears in 3rd cycle, stops accessing in 4th cycle... OK I guess that reacting within 2 cycles (or maybe 1.5) works.


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 2:44 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 784
TOUKO wrote:
Quote:
No need to get into DMA speed comparisons at this point in the article...

Of course we need, because it's related to memory speed

This note is in the CPU section. Memory is further down the page. (And it looks like it needs work too; there's been a rant added about how horrible the 8-bit bus was, as if it were independent of the choice of CPU.)


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 3:08 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19099
Location: NE Indiana, USA (NTSC)
So as far as I can tell:

Super NES: 3.58 MHz, reads 8 bits in I think half a cycle (140 ns), overall peak throughput 3.58 MB/s
Genesis: 7.67 MHz, reads 16 bits in 2 cycles (261 ns), overall peak throughput 3.84 MB/s

On the one hand, an 8-bit read or write will finish faster than it would on a half-speed 16-bit bus where 8-bit accesses are the same speed as 16-bit accesses. On the other hand, a 16-bit bus can use slower memory for a given throughput.


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 3:54 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 784
I'm not sure it makes sense to use a 16-bit bus with a 65816, even with glue logic to split and merge the bytes. The 65816 needs byte-aligned access; a simple operation like inc $0C27 in 16-bit mode would require six accesses for just seven bytes. I suppose it could be handled with wait states... You'd have to increase the CPU speed a fair bit just to come out ahead.


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 4:12 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19099
Location: NE Indiana, USA (NTSC)
Wait states like those used in the SA-1, a coprocessor with an embedded 10.7 MHz 65816? I seem to remember SA-1 games also using 16-bit ROM.


Top
 Profile  
 
PostPosted: Tue Jun 07, 2016 4:45 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 784
But what was the ROM speed? With the ubiquity of random byte accesses in 65816 code, there's no way it could actually sustain 10.74 MHz on a ROM that couldn't respond at that speed.

The only wait states I'm aware of were to pause the SA-1 on a cycle-pair basis to allow the S-CPU unfettered access to shared memory. Because apparently the S-CPU is a juggernaut that doesn't understand the concept of a wait state...

EDIT: Hang on, there's something interesting in the manual... yep; apparently there can be wait cycles introduced on jumps and returns, on branches to odd addresses, and on data reads from ROM. Sounds like a 16-bit chip with glue logic to me...

This also explains something I was wondering about - how Nintendo managed to afford 50 ns ROM for the SA-1 (which was used by a ton of games that totally didn't need it, possibly as copy protection) despite even the later versions of the Super FX being limited to 5 master cycles per byte outside the instruction cache. The answer is apparently that they didn't...


Top
 Profile  
 
PostPosted: Sun Jun 12, 2016 2:50 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2286
Did they also fix the 65816s half cycle long RAM accesses? If that's the case, could they have released the SNES with a SA-1? Can you imagine an SNES, that can upload 24 kB in one frame!


Top
 Profile  
 
PostPosted: Sun Jun 12, 2016 6:58 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19099
Location: NE Indiana, USA (NTSC)
psycopathicteen wrote:
could they have released the SNES with a SA-1?

Probably not at 200 USD in fourth quarter 1991.

Quote:
Can you imagine an SNES, that can upload 24 kB in one frame!

DMA speed would have depended on how fast the S-PPU and other B bus devices can accept writes.

And on the hardware we did get, it depends on how much far you're willing to letterbox. If your game is framed for a modern widescreen TV, you can get away with showing only 168 lines of active picture and the rest with forced blanking. (This also means the GBA port can use the same framing.)
Then you have 262 - (168+1) = 93 lines of blanking, because you need 1 to prime the sprite renderer. Then 93 * 1324 / 8 = 15856 bytes, or nearly all of sprite VRAM.


Top
 Profile  
 
PostPosted: Sun Jun 12, 2016 9:25 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 784
Seems like it could have at least accepted one byte per dot, since it reads faster than that (mind you, that's two 8-bit memories in parallel)... How expensive would it have been to add a bus terminal to the CPU die and use 16-bit busing everywhere else on the board? The CPU wouldn't have been able to sustain 5.37 MHz (or 7.16 MHz in FastROM), but it would have been noticeably faster than what we got, and DMA would have run at double speed except for a slight penalty on odd start/end addresses... and when updating only the low or high byte, like with Mode 7...

...how would such a system handle 8-bit writes? Would it be possible to assert a write and then just not put a signal on half the data lines, or would it have to read the word, modify it, and then store it back? I'm guessing the SA-1 didn't need to worry about this - from the description of the wait behaviour, it looks like only the ROM was 16-bit. You could redesign the SNES to work the same way, with the bus terminal in front of the ROM instead of on the CPU, and if the PPU bus could accept 8-bit writes at 5.37 MHz you could still double the DMA speed, at least when updating both low and high bytes in VRAM... uh, do DMA and wait states mix?

Running in WRAM would be an issue. Perhaps the bottom 8 KB could have been fast SRAM, to allow full-speed operation like with the SA-1's I-RAM...

Anybody have an idea of how feasible this sort of thing would have been in 1990?

EDIT: What I said about 8-bit writes on a 16-bit bus applies to any write, since they come from the CPU one byte at a time. You could buffer low bytes for one cycle to make sure there isn't a high byte coming, but that doesn't help much unless the bus terminal has access to the program counter or something so it can predict what the CPU will want next and intelligently interleave accesses. Even then, if making an 8-bit write to a 16-bit chip isn't possible (and from the way the secondary pixel cache on the Super FX works I suspect it's not), the only way to make 8-bit writes happen any faster than two memory cycles would be to use separate buses for each system component, so a fresh ROM fetch could happen in parallel with RAM access... Furthermore, even putting the bus terminal on cartridge access only doesn't fully solve the problem, because you still have to deal with SRAM and special chip registers. The only solution there seems to be to put the bus terminal in the cartridge itself, and then you've irretrievably lost the design philosophy of the S-CPU in which timing is determined on-die and wait states don't exist.

Hang on - a lot of PPU registers are write-only and 8 bits wide. I'm guessing you'd have to change that...


Top
 Profile  
 
PostPosted: Mon Jun 13, 2016 12:06 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6280
Location: Seattle
93143 wrote:
...how would such a system handle 8-bit writes? Would it be possible to assert a write and then just not put a signal on half the data lines, or would it have to read the word, modify it, and then store it back?
Could do the same as the 68k, and have separate "upper byte" and "lower byte" strobes.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 147 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group