It is currently Mon Feb 18, 2019 12:49 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Feb 02, 2019 11:46 pm 
Offline
Formerly Espozo
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3430
Location: Richmond, Virginia
I recently started programming on an STM32 microcontroller (ARM Cortex-M0 processor) for college and was naïve enough to try programming in assembly. There's little way to work with immediate numbers, and the limitations always seem to change heavily based on the instruction (sometimes it's an 8 bit value that can be shifted, sometimes it's a regular 16 bit number and sometimes it's even a 12 bit number). There's never any absolute addressing either due to the 32 bit instruction size, and the limitations of relative addressing appear just as random as that of immediate values. It's confusing enough for a human (or me at least) that I wouldn't be surprised if a compiler generated substantially faster code.


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 12:09 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8136
Location: Seattle
ARM assemblers usually deliberately allocate a region in the CODE segment near any given routine called a "constant pool" for exactly this reason.


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 4:25 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 675
Yeah ARM and RISC in general is designed for compilers. The idea being having a smaller instruction set means you can't do any complex paths, which reduces the search space for a compiler, thus the compiler makes about as good code. That being said when I was doing M0 the code gcc was making was horrendous.

Basically the instruction are fixed bit length, so you get number of bits to encode the instruction + param = what ever is left. As the ARM instruction set has been moving more and more CISC they have started to need to create with instruction packing.

Typically when you hand asm some RISC you use an "higher level asm" although I don't know once for ARM, but on MIPS you have MAL which is a helper for TAL.


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 9:31 am 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 484
Location: Rive nord de Montréal
ARMv6-M (the core of Cortex-M0) is a bastardization; it's a stripped down, not very orthogonal copy of ARMv7-M (Cortex-M3). For instance, on the ARMv6-M, much of the instructions can only act on r0-r7, and the destination is forced to be the same than one of the operand —very few instructions can use all registers. And indeed, you can only move an 8-bit litteral, or load a 32-bit constant from a constant pool. If you want something more pleasant to work with, consider a Cortex-M3, much less restrictions, more fun to work with.

_________________
((λ (x) (x x)) (λ (x) (x x)))


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 9:56 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21097
Location: NE Indiana, USA (NTSC)
But how does ARMv6-M compare to ARMv4 Thumb, as used in Game Boy Advance? Wikipedia says it supports "most" Thumb instructions and "some" Thumb-2 instructions. Does Thumb cause more of a problem than it did on GBA?


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 10:04 am 
Offline
Formerly Espozo
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3430
Location: Richmond, Virginia
This is probably a stupid question, but for what reason would a processor be forced to use a certain instruction width? (Well, THUMB is either 16 or 32 bit, but cannot be larger than that). ...I was just about to say about I'm just confused as to why there would be a 16 bit instruction that contains the address of a 32 bit number that still needs to be loaded, but I suppose with a 16 bit data bus, the waste of having that address included in the instruction doesn't really matter...

And I was running into problems before I wrote "processor cpu32_6m". :|

And what did Thumb do on the GBA?


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 11:50 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8136
Location: Seattle
Drew Sebastino wrote:
This is probably a stupid question, but for what reason would a processor be forced to use a certain instruction width?
Theoretically, it makes things a lot simpler, because you don't need special lookup tables or handling to deal with variable length instructions.

In practice ... it turns out that a fixed length of 32 bit instructions is actually kinda lousy. Cache pressure and memory bandwidth is often the biggest hindrance to any modern CPU, and the simplest way to address that is to make your instructions shorter. (edit: and having to refer to a constant pool that's not in the literal flow of instructions means you need to special case cache prefetch anyway, so there's less benefit) And the cost of evaluating where the program counter needs to be isn't large.

Hence SuperH, THUMB, and MIPS16le. (btw, THUMB's always 16-bit)

Quote:
And what did Thumb do on the GBA?
Basically exactly what I said above: more instructions can fit into the GBA's internal 32KB of 32-bit RAM and 256KB of 16-bit RAM, take less time to execute from the 256KB internal RAM, and take less time to fetch or execute from the cart.


Top
 Profile  
 
PostPosted: Sun Feb 03, 2019 10:10 pm 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 675
basically in the old days, RAM was small and slow. so things like the Z80 and the 68K would use large amounts of die, to have fancy FSMs and variable length instructions to get the most out of small RAM and they had clocks to spend where they don't touch the bus to work out what to do next etc.

RAM got cheaper and faster, and did so faster than transistors got smaller. So RISC ditched the fancy FSMs and Microcode to just hit RAM hard and fast, as this gave more data through the CPU, since it knows its going to get data each clock, the pipeline was simplified and more of the chip could be used for instructions.

Then cooling got better, process shrunk faster than external bus speed could be speed up. Then Cache became the way to solve the slow 'FSB' to which we are back to RAM being precious and packing more in is a lot better. As the CPUs can hit Cache at 100Mhz but RAM at 33mhz. Thus CISC start to pull back, and now even ARM is CISC ditching the RISC purity in favor of power.

The really dumb aspect is making a 32bit cpu with a 32bit instruction size, if you have a 32bit bus and a 16bit cpu it would make a lot more sense. This is what Thumb is, it drops you to a 16bit CPU but still has a 32bit bus. Thus it can get instruction + data every clock, which really boosts your speed. Just you can't go over 64K anymore. Personally I think going to a 24bit CPU would be the sweet spot, honestly when I'm coding I very rarely need more than 65,536 values, normally I'm doing <1000, but I can see for things like spreadsheets etc 65,000 is not enough. However 16,777,216 is probably plenty for 98% of the time. The issue then becomes that it limits you do a measly 16MB, doubling it up to get 48bit pointers however gets you to 281,474,976,710,656 or 32GB which is starting to get "normal" but I still think is overkill.


Top
 Profile  
 
PostPosted: Mon Feb 04, 2019 12:21 am 
Offline

Joined: Fri Feb 24, 2012 12:09 pm
Posts: 812
I like coding ARM in ASM. The instructions, addressing modes, and register set are much more powerful than 6502 or the like. Needing the literal pool for 32bit immediates might be a bit unfamilar at first. But if you get familar with it then you have 32bit maths, and memory accesses with auto-increasing addresses, and ALU opcodes that could do obscure things like "IF equal THEN r0=r2 xor (r3*8)" in single opcode & single clock cycle, and there are enough registers to store operands & pointers & loop counters in registers instead of RAM.

At least ARM can do that. THUMB should be able to do most of that, too, but it might come up with some confusing restrictions & its syntax is having confusing rules about whether/which opcodes do update flags. I don't know if THUMB-2 has fixed some of that restrictions and syntax issues.

Using compiler code: What I have seen in commercial games on GBA and NDS consoles isn't optimized at all. You would need to be really confused to create anything equivalent in ASM.

Oziphantom wrote:
This is what Thumb is, it drops you to a 16bit CPU but still has a 32bit bus. Thus it can get instruction + data every clock

Uh, that is vice and versa and still not quite right.
The CPU is 32bit no matter if using THUMB or ARM (it can do 32bit maths and has 32bit address space).

THUMB 16bit opcodes can be faster than 32bit opcodes if your memory is "uncached memory with 16bit databus" (if your memory doesn't have that restriction then THUMB is just smaller, but not actually faster).

If you think that 16bit opcode and 16bit data can be transferred through 32bit databus within a single clock cycle: No, they can't. What you mean might be memory systems with separate data cache and code cache, that might work in a single clock cycle - but that's unrelated to using 32bit ARM opcodes or 16bit THUMB opcodes.


Top
 Profile  
 
PostPosted: Mon Feb 04, 2019 5:39 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 675
the GBA is 16bit bus, no cache right?

I also though that it made its more practical do do 16bit operations, in that you ignore the upper half and just focus on the lower half of registers. But it has been a long time, and a lot of ARM variants since :D maybe it was 16 registers not 16bits...


Top
 Profile  
 
PostPosted: Mon Feb 04, 2019 6:23 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21097
Location: NE Indiana, USA (NTSC)
Game Boy Advance has a 32-bit bus to BIOS, IWRAM, and MMIO, and a 16-bit bus to most other memory (ROM, EWRAM, VRAM, CGRAM, and OAM). IWRAM is also fairly small (32768 bytes) yet with fewer wait states than EWRAM or ROM, so if ARM in IWRAM is too big, Thumb in IWRAM may make sense.


Top
 Profile  
 
PostPosted: Thu Feb 07, 2019 4:00 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2810
If I was programming the GBA in assembly, I'd probably dedicate a register as an indexed to a table of constants.


Top
 Profile  
 
PostPosted: Thu Feb 07, 2019 5:06 pm 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 4136
You don't need an indexed table of constants, you just use the program counter for that.
There's even a pseudo-instruction for that: `ldr r0,=0x12345678`, which transforms to a PC-relative load to a local literal pool.

Now an indexed table of global variables, that's far more useful.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
PostPosted: Thu Feb 07, 2019 5:10 pm 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 484
Location: Rive nord de Montréal
... and that's essentially what a Global Offset Table (GOT) is.

_________________
((λ (x) (x x)) (λ (x) (x x)))


Top
 Profile  
 
PostPosted: Mon Feb 11, 2019 9:05 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2810
Dwedit wrote:
You don't need an indexed table of constants, you just use the program counter for that.
There's even a pseudo-instruction for that: `ldr r0,=0x12345678`, which transforms to a PC-relative load to a local literal pool.

Now an indexed table of global variables, that's far more useful.


How does the assember know where to put the table?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group