It is currently Fri Nov 17, 2017 2:34 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: ARM Assembler Question
PostPosted: Mon May 16, 2016 4:24 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 368
Location: ...
(Assuming the Nintendo DS is "obsolete" so this thread can be in Other Retro Dev, move if need be).

I wanted to check out ARM assembly language, but don't have any money for a Raspberry Pi (my preferred method), so I decided to go with the Nintendo DS using libnds.

ARM makes sense to some extent, but due to problems with GAS not letting me grab just #defines from a file and ignore C code, if I continue with the NDS then C will be the primary language rather than the assembler.

That's not exactly what I'm wondering about, though. What I want to know is why GAS chooses to have registers "point to pointers" instead of be pointers.

As an example, for iprintf("HI!");, GAS would generate:
Code:
ldr r0, [message_pointer]
bl iprintf

message_pointer:
.word message

message:
.ascii "HI!\0"


The pointer isn't necessary, though, since afaik registers can be pointers for themselves:
Code:
ldr r0, =message
bl iprintf

message:
.ascii "HI!\0"


This is kind of a difficult question now that I think about it (and may or may not be kind of an excuse to have people to talk about ARM with) but does anyone know why GAS chooses to manually make pointers rather than just skip that step of the assembly process entirely? Is there a good reason, or is that just a random consequence of having a computer make code for you?


Top
 Profile  
 
PostPosted: Mon May 16, 2016 4:28 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6437
Location: UK (temporarily)
ARM has an explicit instruction for "pull out the 32-bit pointer from this 16-bit offset in memory", but no "just load a 32-bit number directly into a register" (The latter is a pseudoinstruction that might string several individual instructions together, depending on the specific numeric value of the pointer)


Top
 Profile  
 
PostPosted: Mon May 16, 2016 4:34 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19221
Location: NE Indiana, USA (NTSC)
Probably because of the literal pool thing that ARM does. An immediate operand is limited to 8 bits but can be shifted, but pointers are 32 bits. There are two workarounds:
  1. The MIPS approach of constructing a constant in two steps with a load immediate shifted followed by an OR immediate. This works well on MIPS because constants can be 16 bits, with an optional 16-bit left shift only for load immediate (lui). On ARM, this can be effective for Game Boy Advance and Nintendo DS MMIO ports in 0x04000000-0x040003FC or 0x04800000-0x048003FC because they can be expressed as (0x40 << 20) | (regaddr << 2). But generic addresses aren't necessarily easy to build this way.
  2. Store the pointer in a table between one subroutine and the next and load it using PC-relative addressing. The assembler offers a ldr rxx, =whatever syntax to add the address to a nearby pool and generate a load instruction.


Top
 Profile  
 
PostPosted: Mon May 16, 2016 4:42 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 368
Location: ...
Confusion ensues.

Quote:
An immediate operand is limited to 8 bits but can be shifted, but pointers are 32 bits.

Would this be an immediate operand? Because it loads a 32-bit number and assembles just fine: ldr r3, =0xffffffff


Top
 Profile  
 
PostPosted: Mon May 16, 2016 4:53 pm 
Online
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2981
Location: Tampere, Finland
Assemble the code, then disassemble (you can use e.g. objdump to disassemble an object file). You should see that your instruction has changed into a different form.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Mon May 16, 2016 5:13 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19221
Location: NE Indiana, USA (NTSC)
nicklausw wrote:
Quote:
An immediate operand is limited to 8 bits but can be shifted, but pointers are 32 bits.

Would this be an immediate operand? Because it loads a 32-bit number and assembles just fine: ldr r3, =0xffffffff

The mvn instruction loads a number and EORs it with 0xFFFFFFFF it before storing it. For example, mvn r3, #0x000B puts 0xFFFFFFF4 into r3.

Because ldr rxx,=value is a macro, it can do any of several things. Usually it'll be assembled to mov, mvn, mov then orr, mvn then bic, or ldr from a constant pool. The last is most likely in Thumb.


Top
 Profile  
 
PostPosted: Mon May 16, 2016 5:59 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 368
Location: ...
Oh! So the "double-pointers" take up less space and are faster then? Okay, I had it all backwards.

Is using immediate addressing that much of a slow-down, though, if at all? Because having to have a section of code just for pointers is not ideal.


Top
 Profile  
 
PostPosted: Mon May 16, 2016 6:03 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6437
Location: UK (temporarily)
The literal pool takes two 32-bit memory cycles. (or one 16-bit plus one 32-bit in thumb mode)
Two in-line instructions take two 32-bit memory cycles (or ... some number of 16-bit fetches in thumb mode)

More to the point, if you try to prevent the compiler and/or assembler from using a literal pool, you're going to be fighting it the whole way.


Top
 Profile  
 
PostPosted: Mon May 16, 2016 6:11 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 368
Location: ...
lidnariq wrote:
More to the point, if you try to prevent the compiler and/or assembler from using a literal pool, you're going to be fighting it the whole way.

Yeah, I'm not really trying to get the compiler to be evasive with that type of thing. If libnds functions will run like that, then I never have to look at that machine code mess, so it doesn't really concern me. It's just that I don't want to be redundant with assembler code written on my own.


Top
 Profile  
 
PostPosted: Mon May 16, 2016 6:19 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19221
Location: NE Indiana, USA (NTSC)
Premature optimization is a root of all kinds of evil. Get it working first, and then get it fast once it's working.


Top
 Profile  
 
PostPosted: Tue May 17, 2016 1:04 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7265
Location: Chexbres, VD, Switzerland
This is a problem typical to RISC processors. The entire design is based on a word size, which is also the instruction size and register size. Additionally, ALL instructions are that word size, and there is no instructions that (directly) uses multiple words. As such, it is impossible to have a "load immediate into register" instruction, because this immediate already takes the whole word (in this case, 32-bit).

The only solution is to have a two-word instruction, but this is also impossible because RISC philosophy says all instructions should be 1 word long, supporting variable length instruction would make it much harder to pipeline the processor.

The solution used by ARM is to have the "second word of the instructions" (in this case, your pointer) stored not in the code, but right after the code. (this was an arbitrary decision, it could go before and work just as well). This turned out to be very practical for romhacking, as there is a pool of parameters used in a function right to eachother, which often makes you able to change things without even disassemble the routine :)

Note that RISC was never made to save memory (on the other hand, it wastes a lot of ROM as programs are stored very inefficiently - except for THUMB mode in ARM where it gets decent). It was made to simplify instruction decoding within the processor in order to get them to run faster.

Also note that there is NO pointer to your pointer. It is just an instruction relative to the program counter (r15). ARM assembly uses many pseudom-instructions, which acually are compiled to different instructions. Google "arm pseudo instruction" to get more details.

In this case
Code:
lda r0, =something


is probably equivalent to something like
Code:
here:
    lda r0, [r15], #something-here-8

The PC is always 2 words (8 bytes) ahead because of the pipeline.


Last edited by Bregalad on Tue May 17, 2016 8:58 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Tue May 17, 2016 5:37 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
AFAIK Hitachi SuperH (used by the Saturn and Dreamcast) also uses literal pools.


Top
 Profile  
 
PostPosted: Tue May 17, 2016 9:41 pm 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 445
Location: Rive nord de Montréal
Bregalad wrote:
In this case
Code:
lda r0, =something


is probably equivalent to something like
Code:
here:
    lda r0, [r15], #something-here-8

The PC is always 2 words (8 bytes) ahead because of the pipeline.

Hell no, not only you fetch the wrong word, you'll corrupt the PC, or it will fault. This is post-indexed addressing, instead of regular offset addressing, which is the only form accepted for base addresses based on PC.
So it's more like:
Code:
    ldr r0, [pc, #off-8]

The -8 thing is true, PC is "ahead" because of pipeline. This is important to consider upon receiving imprecise faults (if I remember correctly!), the old PC points after the faulty instruction.


Top
 Profile  
 
PostPosted: Thu May 19, 2016 5:46 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 368
Location: ...
Um...does anyone have any idea why lr might magically turn into pc in a subroutine? Because I have a problem where my subroutines will randomly turn into a bx lr loop sometimes, and I can't figure things out at all. Not sure what other information to provide.


Top
 Profile  
 
PostPosted: Thu May 19, 2016 6:05 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 368
Location: ...
nicklausw wrote:
Um...does anyone have any idea why lr might magically turn into pc in a subroutine? Because I have a problem where my subroutines will randomly turn into a bx lr loop sometimes, and I can't figure things out at all. Not sure what other information to provide.

Update, I figured this one out on my own.

Putting:
Code:
stmfd  sp!, {lr}

at the beginning of subroutines, and:
Code:
ldmfd  sp!, {pc}

at the end prevents recursive lr's. Now to figure out what the crap "stmfd" and "ldmfd" mean.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group