The Difficulty of ARM Assembly

You can talk about almost anything that you want to on this board.

Moderator: Moderators

93143
Posts: 1112
Joined: Fri Jul 04, 2014 9:31 pm

Re: The Difficulty of ARM Assembly

Post by 93143 » Sun Feb 17, 2019 5:51 am

65xx at 32-bit (or 16-bit for that matter) would work so much better if they just widened the data bus. You'd lose drop-in compatibility, but the alternative - which WDC seems to have chosen - is to let 65xx die chained to the archaic bus interface of the 6502. Imagine if x64 was still using the same pinout from the 8088 because they'd taken "PC compatibility" too literally...

Garth
Posts: 152
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: The Difficulty of ARM Assembly

Post by Garth » Mon Feb 18, 2019 1:34 am

93143 wrote:65xx at 32-bit (or 16-bit for that matter) would work so much better if they just widened the data bus. You'd lose drop-in compatibility, but the alternative - which WDC seems to have chosen - is to let 65xx die chained to the archaic bus interface of the 6502. Imagine if x64 was still using the same pinout from the 8088 because they'd taken "PC compatibility" too literally...
I have always lamented that the 65816 (and '832 also) were bound by requirements to be able to emulate the 6502 and run pre-written 6502 code. Think how much better they could have performed without that requirement. As for the 40-pin requirement, I don't know why that was there, since the '816 was not quite pin-compatible anyway. (The 65802 was an '816 that could be dropped into a 6502 socket and give many of the benefits of the '816, just staying in the first 64K of memory map.) A 48-pin DIP, which was also a standard size, would have at least removed the requirement to multiplex the high byte of the address bus. The 68K used a 64-pin DIP. Unfortunately Apple made 6502 emulation capability a requirement for buying the '816 for their IIGS. It was also a shame that Apple management limited the IIGS to 2.8MHz because they didn't want it to make the MacIntoshes look bad. I don't know if the '832 came with the same requirement from a potential customer, but although it was designed, it never got made.

I propose such a 32-bit 6502, starting with the third post of the 6502.org topic, "Improving the 6502, some ideas." It really just takes the 65816 and expands the data bus, non-multiplexed address bus, and all registers (except maybe the status register) to 32 bits, getting rid of page and bank boundaries and requirements. That way the bank registers, direct-page register, and stack-pointer register become merely offsets and you can still address the entire 4 gigaword space from anywhere. Absolute address modes become the same thing as ZP (or DP), except with the data or program bank offset rather than the direct-page offset. Long addressing is the same except with no offset applied. Operands are always picked up in a single 32-bit memory read cycle. The 6502 flavor is strongly preserved. It has not been made so far; but although I have not gotten into programmable logic and FPGAs, I might emulate it with a PIC microcontroller. The performance that way would be extremely poor, but it would let me experiment with the instruction set.

The 32-bit pseudo-6502 that seems to be most likely to someday reach reality is Michael Barry's 65m32. The link goes to a topic in the AnyCPU forum though because more of the 6502 flavor is lost in this processor. He does make it more efficient, merging the operand with the instruction so they can be fetched in a single cycle in cases where the operand is 24 bits or less. So for example LDA $123456 is all fetched in a single memory cycle. The operand for LDA $12345678 would have to be separate from the instruction.
http://WilsonMinesCo.com/ lots of 6502 resources

Oziphantom
Posts: 785
Joined: Tue Feb 07, 2017 2:03 am

Re: The Difficulty of ARM Assembly

Post by Oziphantom » Mon Feb 18, 2019 7:05 am

I think the M65 4510 is the closest to being "made", it has a full 6502 emulation mode, but also has a turbo 6502 mode that uses the larger bus to pull in opcode + param data for faster execution. He was able to push it up to 192Mhz but it broke the 8bit feel and the way the "8bit processors worked" so its locked to 48mhz. It has 32bit extensions as well Z register.

psycopathicteen
Posts: 2911
Joined: Wed May 19, 2010 6:12 pm

Re: The Difficulty of ARM Assembly

Post by psycopathicteen » Mon Feb 18, 2019 12:21 pm

I just thought of a way to make an improve 6502-like 8-bit CPU. The instruction format will be like this:

bits 0-3: instruction field
bits 4-7: operand field

The operand field contains hardwired combinations of register/register pairs or memory/register pairs.

Code: Select all

0000: a, #imm
0001: a, dp
0010: a, abs
0011: a, (dp)
0100: a, abs,x
0101: a, abs,y
0110: a, (dp),x
0111: a, (dp),y
1000: a, x
1001: a, y
1010: x, #imm
1011: y, #imm
1100: x, abs
1101: y, abs
1110: x, abs,y
1111: y, abs,x

User avatar
koitsu
Posts: 4216
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: The Difficulty of ARM Assembly

Post by koitsu » Mon Feb 18, 2019 8:29 pm

I think everyone here should take the time to look at the 6809 before reinventing the wheel. You might be surprised what was available in 1978, and used all the way into the 90s, particularly in arcade games. Programmers manual (PDF with a really crappy web front end)

psycopathicteen
Posts: 2911
Joined: Wed May 19, 2010 6:12 pm

Re: The Difficulty of ARM Assembly

Post by psycopathicteen » Tue Feb 19, 2019 12:37 am

It's too bad Motorola cut the cord on the 6809 so early on.

Garth
Posts: 152
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: The Difficulty of ARM Assembly

Post by Garth » Tue Feb 19, 2019 1:56 am

The 6809 had a really nice instruction set, but it didn't really perform any better than the 6502. The 65816 is a much better upgrade IMO, and the SuperCPU ran it at 20MHz over 20 years ago.
http://WilsonMinesCo.com/ lots of 6502 resources

Oziphantom
Posts: 785
Joined: Tue Feb 07, 2017 2:03 am

Re: The Difficulty of ARM Assembly

Post by Oziphantom » Tue Feb 19, 2019 3:27 am

psycopathicteen wrote:I just thought of a way to make an improve 6502-like 8-bit CPU. The instruction format will be like this:

bits 0-3: instruction field
bits 4-7: operand field

The operand field contains hardwired combinations of register/register pairs or memory/register pairs.

Code: Select all

0000: a, #imm
0001: a, dp
0010: a, abs
0011: a, (dp)
0100: a, abs,x
0101: a, abs,y
0110: a, (dp),x
0111: a, (dp),y
1000: a, x
1001: a, y
1010: x, #imm
1011: y, #imm
1100: x, abs
1101: y, abs
1110: x, abs,y
1111: y, abs,x
You do realize that is how it basically already works right?

tepples
Posts: 21840
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: The Difficulty of ARM Assembly

Post by tepples » Tue Feb 19, 2019 7:21 am

A 4-bit operand/mode field is indeed similar to how bits 4-2 from an actual 6502 instruction work in group 1 instructions (bits 1-0 = 00):

00001 (dd,X)
00101 dd
01001 #ii
01101 aaaa
10001 (d),Y
10101 dd,X
11001 aaaa,Y
11101 aaaa,X

Similarly for group 2 RMW instructions (bits 1-0 = 00):

00110 dd
01010 A
01110 aaaa
10100 dddd,X
11110 aaaa,X

But psycopathicteen's proposal is slightly closer to orthogonal, as it allows use of X or Y instead of A as the target in more cases. I'm under the impression that you can't get true orthogonality with indexed addressing modes in an 8-bit opcode; you need prefixes or 16-bit opcodes for that.

User avatar
Dwedit
Posts: 4251
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: The Difficulty of ARM Assembly

Post by Dwedit » Tue Feb 19, 2019 11:24 am

koitsu wrote:I think everyone here should take the time to look at the 6809 before reinventing the wheel. You might be surprised what was available in 1978, and used all the way into the 90s, particularly in arcade games. Programmers manual (PDF with a really crappy web front end)
If you go to Download Options on that page, you can get the actual PDF file.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

psycopathicteen
Posts: 2911
Joined: Wed May 19, 2010 6:12 pm

Re: The Difficulty of ARM Assembly

Post by psycopathicteen » Tue Feb 19, 2019 12:07 pm

Garth wrote:The 6809 had a really nice instruction set, but it didn't really perform any better than the 6502. The 65816 is a much better upgrade IMO, and the SuperCPU ran it at 20MHz over 20 years ago.
Why does the 65816 perform better? Is it because the 6809 takes an extra instruction fetch on indexing instructions? Or is it just because Motorola didn't sell it at faster speeds?

psycopathicteen
Posts: 2911
Joined: Wed May 19, 2010 6:12 pm

Re: The Difficulty of ARM Assembly

Post by psycopathicteen » Tue Feb 19, 2019 12:11 pm

tepples wrote:A 4-bit operand/mode field is indeed similar to how bits 4-2 from an actual 6502 instruction work in group 1 instructions (bits 1-0 = 00):

00001 (dd,X)
00101 dd
01001 #ii
01101 aaaa
10001 (d),Y
10101 dd,X
11001 aaaa,Y
11101 aaaa,X

Similarly for group 2 RMW instructions (bits 1-0 = 00):

00110 dd
01010 A
01110 aaaa
10100 dddd,X
11110 aaaa,X

But psycopathicteen's proposal is slightly closer to orthogonal, as it allows use of X or Y instead of A as the target in more cases. I'm under the impression that you can't get true orthogonality with indexed addressing modes in an 8-bit opcode; you need prefixes or 16-bit opcodes for that.
Also 2 of the "addressing modes" are register to register modes.

Also what would be nice would be:
-adds without carry
-shifting instructions on X and Y
-barrel shifting on A
-register swaps

Of course I don't know if all these would fit in 256 instructions, but I do know that the 6502 has a lot of unused opcodes, and the 65816 spends a lot of opcodes with long addressing modes and stuff that makes sense for a computer, but not so much for a video game system.

Garth
Posts: 152
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: The Difficulty of ARM Assembly

Post by Garth » Tue Feb 19, 2019 1:33 pm

psycopathicteen wrote:
Garth wrote:The 6809 had a really nice instruction set, but it didn't really perform any better than the 6502. The 65816 is a much better upgrade IMO, and the SuperCPU ran it at 20MHz over 20 years ago.
Why does the 65816 perform better? Is it because the 6809 takes an extra instruction fetch on indexing instructions? Or is it just because Motorola didn't sell it at faster speeds?
My information about the 6809 performance comparison is from a very knowledgeable friend who really likes the 6809. He writes, "6809 is hobbled by a wearisome prevalence of dead cycles. Even a simple operation such as an 8-bit load using Absolute address mode takes 5 cycles on 6809, as compared to 4 cycles on 6502. Using Direct-Page/Zero-Page mode the numbers are 4 cycles as compared to 3 cycles."

As for the 65816, my 65816 Forth runs two to three times as fast as my 6502 Forth, at a given clock speed. It's primarily that the '816 is so much more efficient at handling the 16-bit cells than the '02 which has to take 8 bits at a time and increment addresses or indexes in between and such. Here's the simple example of @ (pronounced "fetch"), which takes a 16-bit address placed on the top of the data stack and replaces it with the 16-bit contents of that address.

First for 6502:

Code: Select all

       LDA  (0,X)
       PHA
       INC  0,X
       BNE  fet1
       INC  1,X
fet1:  LDA  (0,X)
       JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:   STA  1,X
       PLA
       STA  0,X
For the '816, the whole thing is only:

Code: Select all

       LDA  (0,X)
       STA  0,X         ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.
@ was given such a short name because it's one of the things used most. You can see the difference in the code length, 2 instructions for the 65816 versus 10 for the 6502.

Then there are the 816's extra instructions and addressing modes that improve efficiency, like MVN and MVP (the memory-move instructions), and the stack-relative addressing which helps in looping for incrementing the index and comparing to the limit and getting the indexes for nested loops. (Think of the typical I and J indexes in nested BASIC FOR-NEXT loops, except that you can do it in Forth without taking variable space.) These are just a couple off the top of my head.

Note of course that I'm not talking about running '02 Forth on an '816, something which in itself would not result in any performance gain. The '816 Forth was re-written to take advantage of the 816's extra capabilities.

Then because of the shorter assembly code, it became practical to re-write a lot of secondaries as primitives.
http://WilsonMinesCo.com/ lots of 6502 resources

ccovell
Posts: 1007
Joined: Sun Mar 19, 2006 9:44 pm
Location: Japan
Contact:

Re: The Difficulty of ARM Assembly

Post by ccovell » Wed Feb 20, 2019 5:25 pm

One CPU I would love to get into if I could start all over again and if it had become a popular juggernaut in home computers is the Hitachi 6309. It's compatible with the Motorola 6809 but can run at a higher clock speed, and in enhanced mode, it executes in fewer cycles, has more instructions, and adds extra 8-bit accumulators so that 16- and 32-bit math can be done.

User avatar
Drew Sebastino
Formerly Espozo
Posts: 3503
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: The Difficulty of ARM Assembly

Post by Drew Sebastino » Thu Feb 21, 2019 12:55 am

I just looked up the 6309; looks like a fun processor to work with, but it's unfortunate nothing very popular ever used it.

I have a processor in that same situation, and it's the 65ce02. From what I've read of it, It's mostly less capable than the 65816, being closer to the 65c02, except for one big difference, and that's a third "z" index register, which I'm very envious of... Minimum cycles per instruction is also 1 instead of 2, so performance might actually be better in certain areas. Looks like it was only used for the Amiga serial port card though...

Post Reply