It is currently Fri Sep 22, 2017 10:49 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 95 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Thu Jan 19, 2017 6:47 pm 
Offline
User avatar

Joined: Wed Feb 13, 2008 9:10 am
Posts: 571
Location: Estonia, Rapla city (50 and 60Hz compatible :P)
93143 wrote:
Baseline 65816 has an 8-bit data bus. Opcodes are 8-bit, and operands and data are read or written at one cycle per byte. As I understand it, the only real difference with the 65802 was that it could only address 64 KB due to 6502 pin compatibility requirements, which is definitely not true of the 5A22.

Going to a 16-bit system bus in the Super Famicom would have required some reasonably sophisticated glue logic, plus either external wait states or greatly complicated on-die timing.

A 6502 successor with a 16-bit data bus, no half-cycle strobe, and no concern for backward compatibility could have been a monster, approaching double the performance of the 65816 at the same clock speed, or quadruple the performance at the same memory speed. (I'm assuming here that processing would remain more or less I/O-bound...)


I learned something new then, I always thought a vanilla 65816 had a 16bit bus and it would have been something you described, and 5A22 is severely bottlenecked due to the 8 bit bus.

_________________
http://www.tmeeco.eu


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 6:59 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2265
Here's a slightly optimized version of the Sieve benchmark posted in this thread, compared to a 68000 version of the same code. How can the 68000 be faster at this? :?

Code:
65816:

top:
tax      //2
sep #$20   //3 5
stz flags,x   //5 10
rep #$21   //3 13
adc prime   //4 17
cmp #size+1   //3 20
bcc top      //3 23 cycles


68000:

top:
move.b   (d0,a0),d1      //14
add.w   d0,d2         //4  18
cmp.w   d0,d3         //4  22
bcc top            //10 32 cycles


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 10:30 pm 
Offline

Joined: Wed Nov 30, 2016 4:45 pm
Posts: 84
Location: Southern California
93143 wrote:
Baseline 65816 has an 8-bit data bus. Opcodes are 8-bit, and operands and data are read or written at one cycle per byte. As I understand it, the only real difference with the 65802 was that it could only address 64 KB due to 6502 pin compatibility requirements, which is definitely not true of the 5A22.

Going to a 16-bit system bus in the Super Famicom would have required some reasonably sophisticated glue logic, plus either external wait states or greatly complicated on-die timing.

A 6502 successor with a 16-bit data bus, no half-cycle strobe, and no concern for backward compatibility could have been a monster, approaching double the performance of the 65816 at the same clock speed, or quadruple the performance at the same memory speed. (I'm assuming here that processing would remain more or less I/O-bound...)

I don't know if this is getting too off-topic. Let us know if so, OP and moderator. I'm pretty new on this forum.

The '816, in spite of its 8-bit data bus, handles 16-bit quantities far more efficiently than the '02, in ease of program writing, and in code compactness, and in execution speed. (See my post comparing 6502 to 65816 efficiency at http://forum.6502.org/viewtopic.php?p=9705#p9705 .) But rather than just extending the 6502 to greater widths, a successor processor needs to address the needs that will become more and more glaring as the door is opened to multitasking, relocatable code, and other things that the '02 was poorly suited to. The '816 does some of this, but I would like to see it taken further.

There's the 65Org32 proposed processor, with the discussion at http://forum.6502.org/viewtopic.php?f=1&t=1419 . Basically it's a 65816 extended to 32-bit in almost every way except maybe the status register. 32-bit data bus, address bus, A, X, Y, "direct page" register (although it becomes just an offset, able to address the entire 4 gigaword address space, and not required to align with any particular boundaries), "data bank" and "address bank" registers (although again they just become offsets, able to address the entire address space, with no actual banks), stack pointer, program counter, ALU, and anything else I might have forgotten. It would not execute legacy 6502 code, but would nevertheless be very much a 65-family processor, not having lots of registers, nor deep pipelining, nor branch prediction, nor merged instructions (ie, operands will not be merged with op codes). It would add a barrel shifter, hardware multiply, and a few other things. The HDL enthusiasts have not started tackling this one yet. I've been tempted to emulate it with a microcontroller. The performance doing it this way would be terrible, but it would let me experiment with the instruction set and write applications.

There is also a 65Org16, but that's basically just a double-wide NMOS 6502, with no extra capabilities. Sam Gaskil (ElEctric_EyE on the 6502.org forum) was doing this one, and had made a lot of progress, but I have not heard any updates in quite a while.

Then there's Michael Barry's 65m32 which might have the best chance of becoming reality at this point. See http://anycpu.org/forum/viewtopic.php?f=23&t=300 . It has some things in common with the 6502, but a lot of divergence too, having more registers and merging most operands with the op code so you can fetch an op code and an operand up to 23 bits all in one cycle. Code density is excellent.

I remember earlier efforts, particularly the 65832 design which was finished by WDC but never put into production since they didn't get a big order like from Apple, and Gideon Schweitzer (sp?) and friends' 65GZ032 which was actually running but still needed various things worked out when life changed for the lead engineer and progress came to a stop. These had design goals of still being able to run already-assembled 6502 code, a requirement I hope we can drop because of the complexities and limitations that come with it. The 65GZ032 in native mode was hardly a 65-family processor, but rather a RISC.

The attraction for any of these, over other processors available today, is the easier assembly language (being just a grown-up 6502) and being able to take advantage of one's extensive experience in 6502 and one's 6502 way of mentally forming solutions.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 6:19 am 
Offline

Joined: Sun Mar 27, 2011 10:49 am
Posts: 186
psycopathicteen wrote:
Here's a slightly optimized version of the Sieve benchmark posted in this thread, compared to a 68000 version of the same code. How can the 68000 be faster at this? :?

Code:
65816:

top:
tax      //2
sep #$20   //3 5
stz flags,x   //5 10
rep #$21   //3 13
adc prime   //4 17
cmp #size+1   //3 20
bcc top      //3 23 cycles


68000:

top:
move.b   (d0,a0),d1      //14
add.w   d0,d2         //4  18
cmp.w   d0,d3         //4  22
bcc top            //10 32 cycles

Not that it's material, but isn't ADC PRIME 5 cycles rather than 4?

But anyway, you're right. I tracked down the original BYTE magazine article (it's on archive.org here) and unfortunately there's no actual code given for the 68k assembly version (readers submitted dozens of programs in different languages/platforms, so only the runtimes are shown; in fact, the times were reader submitted too and weren't verified).

I tried a bunch of stuff, thinking maybe the outer loop - which is executed almost as many times as the inner one, it turns out - could be implemented fast enough on the 68k to overcome, but as far as I can tell, you just can't; in the end the 65816 comes out a little bit faster every time. So unless someone else can come up with a speedier 68k implementation than me, I'm convinced that the 65816 does indeed beat the 68k at this algorithm, and that the time given for the 68k in BYTE back in '83 and reprinted in Programming the 65816 is a myth.

FWIW, here's my code. It's untested, so I make no claims about correctness:

Code:
; FLAGS     = a0
; PRIME     = d0
; REMAINING = d1
; I         = d2
; ONE       = d3
; COUNT     = d4

  moveq #1,PRIME
  moveq #1,ONE
  move.w #SIZE-1,REMAINING
  moveq #0,COUNT
main:
  addq   #2,PRIME        ; next candidate         (4)
  tst.b  (FLAGS)+        ; is this a prime?       (8)
  dbeq   REMAINING,main  ;                        (10/12/14)
  cmp.w  #-1,REMAINING   ; are we done?           (8)
  beq.s  .end            ;                        (8/10)
  move.w PRIME,I         ;                        (4)
  bra.s  .test           ;                        (10)
.top:
  move.b ONE,-1(FLAGS,I) ; mark the non-prime     (14)
  add.w  PRIME,I         ; move forward           (4)
.test:
  cmp.w  REMAINING,I     ; are we done?           (4)
  bcc    .top            ;                        (8/10)
  addq   #1,COUNT        ; we found a prime       (4)
  subq   #1,REMAINING    ; do what dbeq didn't    (4)
  bra.s  main            ;                        (10)


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 10:29 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2265
I found this:

http://www.keil.com/benchmarks/sieve.asp

Quote:
Continue until the next remaining number is greater than the square root of the largest number in the original series. In this case, the next number, 7, is greater than the square root of 25, so the process stops. The remaining numbers are all prime.


Yeah, that is NOT what the 65816 code does.


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 10:42 am 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 860
Location: Gothenburg, Sweden
Reviving something pages ago
bregalad wrote:
For me it sounds much simpler to use a digital palette index as input rather than using an analogic input video signal.

I was thinking something like bregalad did earlier in this thread, but jacking it into the NES PPU ext. It would require severing the ext-gnd connection if i'm not mistaken?

_________________
http://www.frankengraphics.com - personal NES blog


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 10:51 am 
Offline

Joined: Sun Mar 27, 2011 10:49 am
Posts: 186
psycopathicteen wrote:
I found this:

http://www.keil.com/benchmarks/sieve.asp

Quote:
Continue until the next remaining number is greater than the square root of the largest number in the original series. In this case, the next number, 7, is greater than the square root of 25, so the process stops. The remaining numbers are all prime.


Yeah, that is NOT what the 65816 code does.

It's not what any of the sample code in BYTE does, either; it's possible that the contributor implemented an optimization like that on his own (which would've been disingenuous, I think), but unfortunately we'll never know.


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 1:24 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2265
The 8086 cannot possibly be 4 times slower than the 68000 either.


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 4:16 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 760
adam_smasher wrote:
isn't ADC PRIME 5 cycles rather than 4?

It's a direct-page instruction in the original (6586h, where PRIME is $86), and I don't see any instructions modifying DP (which starts at zero), so no. It's 4 cycles.


Top
 Profile  
 
PostPosted: Thu Mar 02, 2017 6:24 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7208
Location: Chexbres, VD, Switzerland
So I just thought about this thread and the idea again. It's fun because back when it was created I thought the idea wasn't interesting (i.e. there's so many exisitng retro systems, while create your own ?), and then I had more and more thoughts, and finally I think it's an interesting idea. The problem is that you have to know what exactly you want to do, and there's several ways to make a system "retro" :

  • A) By using actual chips that existed in 1980s and restrict yourself to use that only
  • B) By using modern chips (microcontroller, CPLD) chips to simulate chips existing in the 1980s or chips that could have been designed in the 80s, while still have an old style system architecture
  • C) By using a FPGA development kit and end up with a system-on-a-chip, simulating an entiere system architecture that could have been built in the 80s.

A) means you'll have to desolder chips from existing hardware. Not a huge deal, but you'll need to find old computer, arcade boards or whathever as a source for chips, as I do not think you can buy modern replacement for video and sound chips of the 80s. So the availability of old, mass manufactured hardware that is not shameful to destroy will be an important part of design choices. Even if you restrict to the most common chips, you'll probably have to stack up several of them in order to not have too much graphical and sound limitations.

GRAPHICS : The TMS9918 chip seems to the only "general purpose" graphical chip ever made, all other chips being very specificly made for one particular system. To get something analog to the NES in quality of graphics, you'd need at least 2 TMS9918 for backgrounds and 3 for sprites, but then you'll probably want your sprites to be always above the backgrounds. So you'll have to use at least 4 TMS9918 chips, the first one doing only background, the 2nd one doing background + sprites and the last 2 doing only sprites. Also even with that configuration, you're still limited to 8-pixel granularity scrolling and 16 colour palette, so graphics will still be slightly inferior as opposed to the NES. The only other realist option is to reuse graphical chips from a specific system that happens to be widely available and not shameful to destroy.

SOUND : For sound you have a large array of YM sound chips available, which is very nice. You could get them from old sound cards, arcade boards, video game consoles or keyboards so they're widely available. You can also team smaller and widely available sound chips such as the SN76489, alone it is quite shitty but 2 or 3 of them can create plenty of sound channels and you can combine them to generate more interesting sounds and do effects such like chorus and echo.

CPU : While you could use a 6502 or Z80, I think the most reasonable would be to use the 68000, because it is supported by GCC, and hence there'd be no need to code anything in assembly. The only reason I see to use a 6502 or a Z80 is if you are specially found of those CPUs or if you're porting a game you already wrote in assembly for those processors, for example if you're porting a NES game to your custom system, then it makes sense to keep the 6502 CPU and change only the graphics/sound/input code, in order to avoid a complete rewrite. But a complete rewrite in your favourite high level language is probably less a hassle than a complete rewrite in assembly, for example when translating from 6502 to Z80.

RAM/ROM : I don't have strong opinion but if you can it's probably better to do what the NES didn't : Keep VRAM and RAM on the same chip, it worsen system performance as a whole but you have only one bus, one adress space and it makes the overall design simpler and less sensitive to tight timing issues. You need a bus that interleave access between the CPU and the graphics chip.

B) Basically the same as above, but you can replace the RAM and CPU by a modern microcontroller, reducing part count. Or keep a genuine vintage CPU if you prefer. You can use a real vintage sound chip, and pair it up with either a CPLD or a second microcontroller to generate graphics. The ability to mix modern and old technology at will makes building such a system less a hassle, especially when it comes to generating graphics since dedicated graphics chips aren't widely available. I guess even if you use a modern chip just for graphics and keep everything else retro, it'll make everything a lot simpler and less limited already.

C) Is pretty much open and doing a game console based on a FPGA devkit wouldn't be very hard. However I do not think it's terribly interesting by itself, basically you could do the same with a lot less effort just by coding a game that looks and sound retro using the Raspberry Pi, it'd be the exact same results but a lot less hassle.

It's something else since the parts aren't vintage, but I'll mention Uzebox which is pretty much a video game console made out of a single microcontroller - it seems impressive ! However the graphics are lower resolution as the NES. I'll try to get one someday.


Top
 Profile  
 
PostPosted: Thu Mar 02, 2017 12:03 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6178
Location: Seattle
Bregalad wrote:
The TMS9918 chip seems to the only "general purpose" graphical chip ever made, all other chips being very specifically made for one particular system.
I think there might possibly be some argument to be made about the AY-3-8900 video IC used by the Intellivision. But I might be being misled by the widespread subsequent success of the -8910/2/4 sound ICs.

Quote:
You can also team smaller and widely available sound chips such as the SN76489, alone it is quite shitty but 2 or 3 of them can create plenty of sound channels and you can combine them to generate more interesting sounds and do effects such like chorus and echo.
I thought I already posted this somewhere here, but there's a series of arcade machines (Mr. Do's Castle, Mr. Do's Wild Ride, Do! Run Run) that uses four.

Listening to the in-game music on youtube is underwhelming, unfortunately.


Top
 Profile  
 
PostPosted: Thu Mar 02, 2017 9:24 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2265
I wonder how fast an ARM and a frame buffer can be used to draw graphics, if you make circuitry to ignore byte writes if the pixel is color 0. With an ARM you can write 4 pixels at once.


Top
 Profile  
 
PostPosted: Fri Mar 03, 2017 12:21 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7208
Location: Chexbres, VD, Switzerland
lidnariq wrote:
I think there might possibly be some argument to be made about the AY-3-8900 video IC used by the Intellivision.

It's graphical possibilities are extremely shitty, though.

Quote:
I wonder how fast an ARM and a frame buffer can be used to draw graphics, if you make circuitry to ignore byte writes if the pixel is color 0. With an ARM you can write 4 pixels at once.

Sure but it's not retro at all. If you use an ARM chip to simulate a tilemap and sprites or something in the like, then it's like solution B).


Top
 Profile  
 
PostPosted: Fri Mar 03, 2017 11:54 am 
Offline
User avatar

Joined: Mon Feb 07, 2011 12:46 pm
Posts: 918
The 6845 CRTC can be use to do the timing for video signals; you can then add the other logic to do other stuff. They need a input clock in tiles, since 6845 is doing all of the timing in tiles, not in pixels. The registers to program 6845 are then used to program the address, the height of tiles, how many tiles per row, how many rows, the timing offsets, and the address of the cursor. You can then add the other logic to take the address, row number in the tile, whether or not it is the cursor, whether or not the visible part of picture (these things are output from 6845), to make the picture.

_________________
.


Top
 Profile  
 
PostPosted: Fri Mar 03, 2017 5:14 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2265
That's pretty dang interesting. Yeah I think that's a good idea. The only limitation is that lines would have to be an even number of pixels so you can't have 341 pixels/cycles per line.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 95 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group