It is currently Sat Aug 18, 2018 1:44 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 83 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Tue May 08, 2018 1:37 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 512
It comes down to how they take their clocks and how they are marketed.

So a 6502 takes a a 2 phase non-overlapping clock, this basically gives it a 2mhz clock from a 1Mhz clock. This means its clock drivers are more complicated. The Z80 for example takes a 4Mhz clock, single phase. The 6510 fixed the clock issue and generates the clock internally from a single input clock. So from the 1Mhz clock the 6502 gets 4 events, from the 4Mhz clock the Z80 gets 4 events.

You can't just have something happen from nothing, you need an "event" to trigger it, and to gate the logic steps.
Say LDA #4
Phi2 Hi: You need put the PC on the address bus
Phi2 Lo: You need to read the data from the data bus
Phi1 Hi: decode the opcode
Phi1 Lo: increment the PC
Phi2 Hi: put the PC on the address bus
Phi2 Lo:read the data from the data bus
Phi1 Hi: set the A with the value
Phi1 Lo: increment the PC

Waiting cycles is not for RAM, you can just slow down the clock speed if you want to use slower RAM.

Take the Z80
Clock 0 : put PC on address bus
Clock 1 : increment PC
Clock 2 : read value from data bus
Clock 4 : Do Ram Refresh, decode Instruction
Clock 5 : put PC on address bus
Clock 6 : increment PC
Clock 7 : read value from data bus into A

If you want to do more things at once, you need to have move adders and the like on the die. So if you do it step by step you can use the ALU to increment the PC, as well as do an ADD instruction. Each bit of parallelization requires more die, which drops the number of CPUs per wafer and more gates equals higher chance of a chip failing and hence yes working chips, so that pushes the price up more.
your max clock speed is determined by which step takes the longest amount of time to do. So you might break a step down into smaller steps which makes it take longer but it allows for a higher clock overall.


Top
 Profile  
 
PostPosted: Tue May 08, 2018 11:50 pm 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 512
Sophie Wilson seems to have just given a talk about the design of the ARM https://hackaday.com/2018/05/08/sophie- ... efficient/


Top
 Profile  
 
PostPosted: Thu May 24, 2018 11:24 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2731
I saw that they talked about the Dhrystone benchmark again. Was that another rigged "see how well a CPU can emulate 68000 instructions 1:1" benchmark? Is it doing math with a bunch of random absolute long addressing?


Last edited by psycopathicteen on Thu May 24, 2018 11:33 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu May 24, 2018 11:29 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7392
Location: Seattle
Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. The name "Dhrystone" is a pun on a different benchmark algorithm called Whetstone.
[...]
The Dhrystone benchmark contains no floating point operations, thus the name is a pun on the then-popular Whetstone benchmark for floating point operations. The output from the benchmark is the number of Dhrystones per second (the number of iterations of the main code loop per second).
[...]
Dhrystone remains remarkably resilient as a simple benchmark, but its continuing value in establishing true performance is questionable. It is easy to use, well documented, fully self-contained, well understood, and can be made to work on almost any system. In particular, it has remained in broad use in the embedded computing world, though the recently developed EEMBC benchmark suite, HINT, Stream, and even Bytemark are widely quoted and used, as well as more specific benchmarks for the memory subsystem (Cachebench), TCP/IP (TTCP), and many others.
[...]
Dhrystone may represent a result more meaningfully than MIPS (million instructions per second) because instruction count comparisons between different instruction sets (e.g. RISC vs. CISC) can confound simple comparisons. For example, the same high-level task may require many more instructions on a RISC machine, but might execute faster than a single CISC instruction.
Dhrystones actually measure integer computational ability. MIPS don't.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 11:58 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2731
I know RISC CPUs need more instructions to load and store memory, but once the memory is loaded into registers, the amount of instructions needed evens out with CISC CPUs.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 12:50 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7392
Location: Seattle
That is simply not true.

(edit) There are just too many variables for such a flat statement to possibly be true. 8-bit PIC is more-or-less RISC (albeit accumulator-based, so...) but it does a lot less per instruction than any real 32-bit ISA.

VLIW, approximately the CISCiest of CISC things, does tremendously more per instruction during every instruction. By definition. Many modern ISAs include SIMD instructions, which are CISCy, and their corresponding truly-RISCy things are horrifically more verbose than natively supporting SIMD.


Last edited by lidnariq on Thu May 24, 2018 1:01 pm, edited 2 times in total.

Top
 Profile  
 
PostPosted: Thu May 24, 2018 12:54 pm 
Offline

Joined: Sun Mar 27, 2011 10:49 am
Posts: 259
Location: Seattle
Depending on your CISC architecture, your task, whether it's you or a compiler (/how good the compiler is) writing the machine code...

CISC architectures don't just have more powerful addressing modes. They also often have instructions to automate loops, string operations, block memory transfers...even polynomial evaluation.

And while the number of instructions might in some cases be comparable, the number of bytes often aren't - variable length encodings sometimes let CISC ISAs use a single byte for a common instruction that on the RISC side would require four.

Here's a neat paper I found when googling, inspired by this topic. In their measurements they found that CISC generally did tend towards rather denser code. Depending on the problem and the architectures in question, you're often looking at half as much.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 1:46 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2731
Comparing ARM to the 68000, I just don't see that many instructions the 68000 has that the ARM can't do. I never saw anything as complex as polynomial equations in one instruction on the 68000.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 1:54 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7392
Location: Seattle
If you actually want to see how the 68k is competitive with ARMv1 despite being 8 years older (using half as many instructions per dhrystone), you'll need to look at the actual resulting machine code for both machines.

Otherwise... you're just letting your preconceptions blind you.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 2:00 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2731
What do you mean by "preconceptions"? You think I'm not looking up the actual instruction sets?


Top
 Profile  
 
PostPosted: Thu May 24, 2018 2:04 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7392
Location: Seattle
Fact: ARMv1 and 68k both clocked at 8MHz calculate comparable number of dhrystones per second, despite ARMv1 executing ≈250% as many instructions. Therefore the ARMv1 must be doing 50% the work per instruction.

If you want to see how this is true, you need to actually look at the sequence of instructions that are used in the dhrystone benchmark on these two machines, not just look at the available instructions.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 2:26 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2731
Quote:
If you want to see how this is true, you need to actually look at the sequence of instructions that are used in the dhrystone benchmark on these two machines.


I don't know if that can be found on the internet.


Top
 Profile  
 
PostPosted: Thu May 24, 2018 3:33 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20413
Location: NE Indiana, USA (NTSC)
If a cross-compiler targeting each architecture is available on the Internet, and the source code of Dhrystone is available on the Internet, then the assembly code resulting from compiling Dhrystone can be calculated from these.


Top
 Profile  
 
PostPosted: Fri May 25, 2018 8:19 am 
Offline

Joined: Mon Mar 30, 2015 10:14 am
Posts: 275
adam_smasher wrote:
Here's a neat paper I found when googling, inspired by this topic. In their measurements they found that CISC generally did tend towards rather denser code. Depending on the problem and the architectures in question, you're often looking at half as much.

The problem is that this code density is determined from a C compiled code,for me it's unreliable because it depend on how good the compiler is for each CPU,and can drastically reduce the final code density.


Top
 Profile  
 
PostPosted: Fri May 25, 2018 8:30 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2731
The 68000 would've had a head start with compilers.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 83 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group