Hopefully this thread will shed some light for those who seek or try to debug "obscure GCC behaviour" . There's a ton of things I have tested / coded on the Nintendo DS as of the past 6 years or so and the environment used is entirely GNU GCC (having written linkers, makefiles, filesystem drivers, video, etc). There is missing some stuff but the NintendoDS is well documented.
"The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain and the standard compiler for most Unix-like operating systems. The Free Software Foundation distributes GCC under the GNU General Public License."
I am strictly speaking of a closed source, somewhat limited embedded environment and the GCC tools required to be able to run GCC ARM/C/C++ code compiled to such platform.
Having said that, most open source NintendoDS homebrew rely on such tools.
In an embedded development platform like the NintendoDS the GCC tools are used a lot (gcc compilers, linkers), the libraries are used a lot (libc, libstdc++, libgcc, libsupc++) and some libraries used not so often (libiberty of which code was absorbed by POSIX standards, thus going into libc libraries).
The GCC linker:
The GCC linker works like any other linker (you compile object files from an ABI compliant compiler, the linker must support that exact ABI so it understands the object format) and then the linker builds a binary using your "physical addressed map file" (or linker script) and places together these object files. From there you can adjust in pretty much every way possible the output binary by customizing what objects are added or whatnot.
The GCC linker builds a binary of which, it is segmented into "sections":
Common Sections:
data, rodata, text, bss both (required by both C/C++):
.gnu.linkonce.armexidx.
.gnu.linkonce.b.
.gnu.linkonce.r.
.gnu.linkonce.t
.gnu.linkonce.d
ARM Specific:
.ARM.exidx
Also, there's also arguments you can fed into the linker (or what GCC devs call, feed arguments from GCC into the linker):
-Wl,--gc-sections,-Map,$(MAPFILE) : discard any sections the linker may find useless and output a readable map file of exactly the binary you built
-Wl,-z,defs : will catch the problematic cases involving underlinking (which has been a problem where external source code compiled from objects may not be added by the linker and excluded, causing exceptions because there is not code to execute in that section!)
And my findings:
The toolchain I have built "ToolchainGenericDS" by adding my own code and using a small subset of the already available NintendoDS open source code, will generate binaries through ndstool. (which is, an ARM7 binary + ARM9 binary, of which a header then is appended and tells where the ARM7 is located and where it should go to, same for ARM9), then a binary is packaged together. I also build newlib for NDS and the output is a .a librarian format library. Then the ToolchainGenericDS layer which ends up in the same .a librarian format library. And then the TGDS environment (linkers, makefiles, project template) will build C/C++ code from source code (.S,.s,.c,.cpp) into objects. And then the linker will look for TGDS environment inside these libraries (NDS specific API), and then, the standard POSIX + newlib calls, so, if any of these are called, these end up built into the NDS executable).
Thing is, in lieu of saving binary space, because the NDS has scarse 4MB of ram, there's a GCC linker flag --gc-sections that helps, but it will sometimes discard sections incorrectly, and thus, removing segments of code from the codepath. The "codepath" detection in the GCC linker is somewhat bugged. Let me explain:
(https://elinux.org/images/2/2d/ELC2010- ... asenko.pdf refer to page 6 to see a "graphical" description of what I mean)
Once you told the linker through the above flag to optimize and discard dead code, you will now find inside the readable map file a "Discarded input sections" section. If you keep the binary file small there's bout 80% chance the NDS binary will work correctly, that is assuming little was discarded because the codepath was simpler to deal with. But, if you add more source code and there's a lot more source code built around, the section relocatable code (of which the codepath/Program Counter will traverse later on) must be inspected carefully, because critical sections will be stubbed out! Watch out for that.
The linker will happily generate binaries, but if you have no debugger at hand, your program may have undefined behaviour or will behave incorrectly, such as throwing segfaults randomly or simply not run and be stuck running in possibly some weird place (such as the reset vectors).
a)
So to save space (meaning having --gc-sections passed onto the linker) and at the same time prevent the linker from tossing invaluable section data, you must define manually the KEEP() attribute on the section desired in your linker script.
b)
There's a magic section approach in the PDF I listed earlier to get a hold of what's going on. It addresses b) issue somewhat but not entirely.
The trick to help the linker to decide "better" which code resides, or is removed, is to turn "non-relocatable code" into "relocatable code" before it goes inside the librarian format. So then the linker will treat the object as relocatable and may work around it if subsequently, there IS code that was optimized and shortened or adultered.
So:
I had this kind of "bug" for ages, and sometimes this was resolved by hand-picking the sections the linker must NOT discard. ( a) approach), but then I guessed why depending on code density, some stuff could end broken/missing suddenly, and if size were to change, it'd magically work. This is a bug in the way GCC handles files. The linker should really add a warning like
"the librarian objects are NON relocatable, you will face weird bugs if you decide to link against it, if so then you desire to use optimized code, your code will break".
So always, always if you decide to link against .a libraries, make sure the code in there is relocatable/Position Independent (-fpic flag). That way if the Linker has no idea what to do with an object that may have been hampered/optimized/etc, then at least the linker will relocate that code which has been adapted to it.
GCC related things
Re: GCC related things
Can confirm rebuilding ToolchainGenericDS (https://coto88.bitbucket.io/) using the exact toolchain setup (because of how I built TGDS so I can swap GCC toolchains with ease between any version), here's a neat result someone else already mentioned:
TGDS: (source code: https://bitbucket.org/Coto88/newlib-nds ... rc/master/)
- ARM none EABI: GCC 6.2.x , C++ version: 7.2.1
vs
TGDS: (source code: https://bitbucket.org/Coto88/newlib-nds/src/master/)
- ARM EABI: GCC 4.9.2 , C++ version: 4.9.2
ARM none EABI is about 30% slower in everything (SnemulDS included) as opposed to ARM EABI.
There's absolutely no reason to go back from ARM EABI to ARM none EABI. Perhaps newer processors may benefit but older processors such as AMR7/ARM9 absolutely not.
I can post SnemulDS binaries being compiled for each GCC platform if someone wants to.
TGDS: (source code: https://bitbucket.org/Coto88/newlib-nds ... rc/master/)
- ARM none EABI: GCC 6.2.x , C++ version: 7.2.1
vs
TGDS: (source code: https://bitbucket.org/Coto88/newlib-nds/src/master/)
- ARM EABI: GCC 4.9.2 , C++ version: 4.9.2
ARM none EABI is about 30% slower in everything (SnemulDS included) as opposed to ARM EABI.
There's absolutely no reason to go back from ARM EABI to ARM none EABI. Perhaps newer processors may benefit but older processors such as AMR7/ARM9 absolutely not.
I can post SnemulDS binaries being compiled for each GCC platform if someone wants to.
Re: GCC related things
ToolchainGenericDS now uses Clang (v8.0.1). Disassembled code is more or less what you'd see in paid development environment, such as ARM RVCT.
Basically I have newlib-nds (newlib for NintendoDS recompiled) and ToolchainGenericDS exported as the ARM ABI librarian format (.a), which emits armv4t and armv5te code as relocatable (may be optimized, but each instruction is assumed to have relative offsets in blank, in a placeholder variable defined by the ARM ABI), and then the TGDS project itself which links and builds the final binary out of it. In GCC I used to have a lot of issues when linking relocatable code, because emited code would sometimes not account what I described earlier.
This means sometimes C++ libraries linked externally with GCC 4.9.3 showed erratic behaviour (which had a better software quality control back then in 2008). Some screenshots:
Clang emited code:
Decompiled 1:

Decompiled 2:
(decompiled 2) source code:
With that said, if you really want to do ARM embedded development, either stick to a toolchain provided enough unit tests to ensure your code will work, or switch to Clang, if you want to program serious code such as a 3D Game or a video player.
It took me good 8 years of work, build a dev environment for NintendoDS and a set of tools in C/C++to realize that. GCC may be used as an alternative, but I'd not recommend it, for sanity and safety reasons.
No need to say, everything I build out of Clang for NDS, works exactly in the way I want to.
https://bitbucket.org/Coto88/
Edit:
Unit Tests : https://bitbucket.org/Coto88/toolchaing ... s-unittest
Basically I have newlib-nds (newlib for NintendoDS recompiled) and ToolchainGenericDS exported as the ARM ABI librarian format (.a), which emits armv4t and armv5te code as relocatable (may be optimized, but each instruction is assumed to have relative offsets in blank, in a placeholder variable defined by the ARM ABI), and then the TGDS project itself which links and builds the final binary out of it. In GCC I used to have a lot of issues when linking relocatable code, because emited code would sometimes not account what I described earlier.
This means sometimes C++ libraries linked externally with GCC 4.9.3 showed erratic behaviour (which had a better software quality control back then in 2008). Some screenshots:
Clang emited code:
Decompiled 1:

Code: Select all
Disassembly of section .init:
00000000 <_start>:
@Thus, BIOS exceptions must be handled from such DS BIOS RAM stacks.
@Also, MPU exceptions are triggered in USR mode (0x10&psr), SYS mode exceptions are ignored. (0x1f&psr)
@disable exceptions when MPU setup. Enter SYS mode.
ldr r0,=0x04000204
0: e59f0080 ldr r0, [pc, #128] ; 88 <copy+0x14>
mov r1,#0
4: e3a01000 mov r1, #0
str r1,[r0]
8: e5801000 str r1, [r0]
ldr r3, =MPUSet
blx r3
pop {r0-r3,lr}
#endif
mov r0, #0x12 @ irq
c: e3a00012 mov r0, #18
msr cpsr, r0
10: e129f000 msr CPSR_fc, r0
ldr sp, =sp_IRQ
14: e59fd070 ldr sp, [pc, #112] ; 8c <copy+0x18>
mov r0, #0x13 @ svc dtcm stacks, irq enable
18: e3a00013 mov r0, #19
msr cpsr, r0
1c: e129f000 msr CPSR_fc, r0
ldr sp, =sp_SVC
20: e59fd068 ldr sp, [pc, #104] ; 90 <copy+0x1c>
@on undefined instruction exceptions,
@on data/prefetch aborts (caused by the protection unit),
@on FIQ (possibly caused by hardware debuggers).
@It is also called by accidental software-jumps to the reset vector, and by unused SWI numbers within range 0..1Fh.
mov r3 , #0x1b @ undef dtcm stacks, irq enable
24: e3a0301b mov r3, #27
msr CPSR, r3
28: e129f003 msr CPSR_fc, r3
ldr sp, =sp_UND
2c: e59fd060 ldr sp, [pc, #96] ; 94 <copy+0x20>
mov r3 , #0x17 @ dataabt dtcm stacks, irq enable
30: e3a03017 mov r3, #23
msr CPSR,r3
34: e129f003 msr CPSR_fc, r3
ldr sp, =sp_ABT
38: e59fd058 ldr sp, [pc, #88] ; 98 <copy+0x24>
mov r3 , #0x11 @FIRQ mode, irq enable
3c: e3a03011 mov r3, #17
msr CPSR, r3
40: e129f003 msr CPSR_fc, r3
ldr sp, =sp_FIQ
44: e59fd050 ldr sp, [pc, #80] ; 9c <copy+0x28>
mov r0, #0x1F @ usr/sys dtcm stacks, irq enable (ignore all exceptions usermode must be excluded here)
48: e3a0001f mov r0, #31
msr cpsr, r0
4c: e129f000 msr CPSR_fc, r0
ldr sp, =sp_SYS
50: e59fd048 ldr sp, [pc, #72] ; a0 <copy+0x2c>
cmp r0, #0x63 @iQueDS Lite
beq FirmwareARM7OK
b waitForFirmwareARM7Setup
FirmwareARM7OK:
#endif
bl initHardware
54: ebfffffe bl 0 <initHardware>
ldr lr, =exception_sysexit
58: e59fe044 ldr lr, [pc, #68] ; a4 <copy+0x30>
#ifdef ARM7
b main @entrypoint
5c: eafffffe b 0 <main>
00000060 <clear>:
#ifdef ARM9
b mainARGV @entrypoint
#endif
@format: r0 = src_vma, r1 = dest_vma
clear:
mov r2, #0
60: e3a02000 mov r2, #0
00000064 <clearlop>:
clearlop: cmp r0, r1
64: e1500001 cmp r0, r1
strcc r2, [r0],#4
68: 34802004 strcc r2, [r0], #4
bcc clearlop
6c: 3afffffc bcc 64 <clearlop>
bx lr
70: e12fff1e bx lr
00000074 <copy>:
@format: r0 = src_vma_section, r1 = dest_lma_start, r2 = dest_lma_end (where both lma are a whole physical memory region from start to end range address)
copy:
cmp r2,r3 /* check if we've reached the end */
74: e1520003 cmp r2, r3
ldrlo r0,[r1],#4 /* if end not reached, get word and advance source pointer */
78: 34910004 ldrcc r0, [r1], #4
strlo r0,[r2],#4 /* if end not reached, store word and advance destination pointer */
7c: 34820004 strcc r0, [r2], #4
blo copy /* if end not reached, branch back to loop */
80: 3afffffb bcc 74 <copy>
bx lr /* return to caller */
84: e12fff1e bx lr
88: 04000204 .word 0x04000204
...

Code: Select all
main.o: file format elf32-littlearm
Disassembly of section .text.main:
00000000 <main>:
#include "biosTGDS.h"
#include "CPUARMTGDS.h"
//---------------------------------------------------------------------------------
int main(int _argc, sint8 **_argv) {
0: e92d4800 push {fp, lr}
4: e1a0b00d mov fp, sp
8: e24dd050 sub sp, sp, #80 ; 0x50
c: e3a00d09 mov r0, #576 ; 0x240
10: e3800301 orr r0, r0, #67108864 ; 0x4000000
//---------------------------------------------------------------------------------
/* TGDS 1.6 Standard ARM7 Init code start */
//wait for VRAM D to be assigned from ARM9->ARM7 (ARM7 has load/store on byte/half/words on VRAM)
while (!(*((vuint8*)0x04000240) & 0x2));
14: e5d01000 ldrb r1, [r0]
18: e3110002 tst r1, #2
1c: 0afffffc beq 14 <main+0x14>
installWifiFIFO();
20: ebfffffe bl 0 <installWifiFIFO>
24: e1a0400d mov r4, sp
int argBuffer[MAXPRINT7ARGVCOUNT];
memset((unsigned char *)&argBuffer[0], 0, sizeof(argBuffer));
28: e3a0104c mov r1, #76 ; 0x4c
2c: e2840004 add r0, r4, #4
30: ebfffffe bl 0 <__aeabi_memclr4>
34: e59f0070 ldr r0, [pc, #112] ; ac <main+0xac>
argBuffer[0] = 0xc070ffff;
writeDebugBuffer7("TGDS ARM7.bin Boot OK!", 1, (int*)&argBuffer[0]);
38: e3a01001 mov r1, #1
3c: e1a02004 mov r2, r4
40: e3a05001 mov r5, #1
installWifiFIFO();
int argBuffer[MAXPRINT7ARGVCOUNT];
memset((unsigned char *)&argBuffer[0], 0, sizeof(argBuffer));
argBuffer[0] = 0xc070ffff;
44: e58d0000 str r0, [sp]
writeDebugBuffer7("TGDS ARM7.bin Boot OK!", 1, (int*)&argBuffer[0]);
48: e59f0060 ldr r0, [pc, #96] ; b0 <main+0xb0>
4c: ebfffffe bl 0 <writeDebugBuffer7>
50: e59f705c ldr r7, [pc, #92] ; b4 <main+0xb4>
54: e59f405c ldr r4, [pc, #92] ; b8 <main+0xb8>
58: e3a06f61 mov r6, #388 ; 0x184
5c: e3866301 orr r6, r6, #67108864 ; 0x4000000
#ifdef ARM7
extern bool isArm7ClosedLid;
static inline void handleARM7SVC(){
//Lid Closing + backlight events (ARM7)
if(isArm7ClosedLid == false){
60: e5d70000 ldrb r0, [r7]
64: e3500000 cmp r0, #0
68: 1a000007 bne 8c <main+0x8c>
if((REG_KEYXY & KEY_HINGE) == KEY_HINGE){
6c: e15604be ldrh r0, [r6, #-78] ; 0xffffffb2
70: e3100080 tst r0, #128 ; 0x80
74: 0a000004 beq 8c <main+0x8c>
SendFIFOWords(FIFO_IRQ_LIDHASCLOSED_SIGNAL, 0);
78: e1a00004 mov r0, r4
7c: e3a01000 mov r1, #0
80: ebfffffe bl 0 <SendFIFOWords>
screenLidHasClosedhandlerUser();
84: ebfffffe bl 0 <screenLidHasClosedhandlerUser>
isArm7ClosedLid = true;
88: e5c75000 strb r5, [r7]
}
}
//Handles Sender FIFO overflows
if(REG_IPC_FIFO_CR & IPC_FIFO_ERROR){
8c: e1d600b0 ldrh r0, [r6]
90: e3100901 tst r0, #16384 ; 0x4000
REG_IPC_FIFO_CR = (REG_IPC_FIFO_CR | IPC_FIFO_SEND_CLEAR); //bit14 FIFO ERROR ACK + Flush Send FIFO
94: 11d600b0 ldrhne r0, [r6]
98: 13800008 orrne r0, r0, #8
9c: 11c600b0 strhne r0, [r6]
/* TGDS 1.6 Standard ARM7 Init code end */
while (1) {
handleARM7SVC(); /* Do not remove, handles TGDS services */
IRQWait(IRQ_HBLANK);
a0: e3a00002 mov r0, #2
a4: ebfffffe bl 0 <IRQWait>
a8: eaffffec b 60 <main+0x60>
ac: c070ffff .word 0xc070ffff
...
b8: ffff020f .word 0xffff020f
(decompiled 2) source code:
Code: Select all
/*
Copyright (C) 2017 Coto
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
USA
*/
#include <string.h>
#include "main.h"
#include "InterruptsARMCores_h.h"
#include "interrupts.h"
#include "ipcfifoTGDSUser.h"
#include "wifi_arm7.h"
#include "usrsettingsTGDS.h"
#include "timerTGDS.h"
#include "biosTGDS.h"
#include "CPUARMTGDS.h"
//---------------------------------------------------------------------------------
int main(int _argc, sint8 **_argv) {
//---------------------------------------------------------------------------------
/* TGDS 1.6 Standard ARM7 Init code start */
//wait for VRAM D to be assigned from ARM9->ARM7 (ARM7 has load/store on byte/half/words on VRAM)
while (!(*((vuint8*)0x04000240) & 0x2));
installWifiFIFO();
int argBuffer[MAXPRINT7ARGVCOUNT];
memset((unsigned char *)&argBuffer[0], 0, sizeof(argBuffer));
argBuffer[0] = 0xc070ffff;
writeDebugBuffer7("TGDS ARM7.bin Boot OK!", 1, (int*)&argBuffer[0]);
/* TGDS 1.6 Standard ARM7 Init code end */
while (1) {
handleARM7SVC(); /* Do not remove, handles TGDS services */
IRQWait(IRQ_HBLANK);
}
return 0;
}
With that said, if you really want to do ARM embedded development, either stick to a toolchain provided enough unit tests to ensure your code will work, or switch to Clang, if you want to program serious code such as a 3D Game or a video player.
It took me good 8 years of work, build a dev environment for NintendoDS and a set of tools in C/C++to realize that. GCC may be used as an alternative, but I'd not recommend it, for sanity and safety reasons.
No need to say, everything I build out of Clang for NDS, works exactly in the way I want to.
https://bitbucket.org/Coto88/
Edit:
Unit Tests : https://bitbucket.org/Coto88/toolchaing ... s-unittest