GCC related things

Discussion of development of software for any "obsolete" computer or video game system.
Post Reply
Posts: 55
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

GCC related things

Post by coto » Mon Apr 01, 2019 8:57 am

Hopefully this thread will shed some light for those who seek or try to debug "obscure GCC behaviour" . There's a ton of things I have tested / coded on the Nintendo DS as of the past 6 years or so and the environment used is entirely GNU GCC (having written linkers, makefiles, filesystem drivers, video, etc). There is missing some stuff but the NintendoDS is well documented.

"The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain and the standard compiler for most Unix-like operating systems. The Free Software Foundation distributes GCC under the GNU General Public License."

I am strictly speaking of a closed source, somewhat limited embedded environment and the GCC tools required to be able to run GCC ARM/C/C++ code compiled to such platform.

Having said that, most open source NintendoDS homebrew rely on such tools.

In an embedded development platform like the NintendoDS the GCC tools are used a lot (gcc compilers, linkers), the libraries are used a lot (libc, libstdc++, libgcc, libsupc++) and some libraries used not so often (libiberty of which code was absorbed by POSIX standards, thus going into libc libraries).

The GCC linker:
The GCC linker works like any other linker (you compile object files from an ABI compliant compiler, the linker must support that exact ABI so it understands the object format) and then the linker builds a binary using your "physical addressed map file" (or linker script) and places together these object files. From there you can adjust in pretty much every way possible the output binary by customizing what objects are added or whatnot.

The GCC linker builds a binary of which, it is segmented into "sections":

Common Sections:
data, rodata, text, bss both (required by both C/C++):

ARM Specific:

Also, there's also arguments you can fed into the linker (or what GCC devs call, feed arguments from GCC into the linker):

-Wl,--gc-sections,-Map,$(MAPFILE) : discard any sections the linker may find useless and output a readable map file of exactly the binary you built

-Wl,-z,defs : will catch the problematic cases involving underlinking (which has been a problem where external source code compiled from objects may not be added by the linker and excluded, causing exceptions because there is not code to execute in that section!)

And my findings:
The toolchain I have built "ToolchainGenericDS" by adding my own code and using a small subset of the already available NintendoDS open source code, will generate binaries through ndstool. (which is, an ARM7 binary + ARM9 binary, of which a header then is appended and tells where the ARM7 is located and where it should go to, same for ARM9), then a binary is packaged together. I also build newlib for NDS and the output is a .a librarian format library. Then the ToolchainGenericDS layer which ends up in the same .a librarian format library. And then the TGDS environment (linkers, makefiles, project template) will build C/C++ code from source code (.S,.s,.c,.cpp) into objects. And then the linker will look for TGDS environment inside these libraries (NDS specific API), and then, the standard POSIX + newlib calls, so, if any of these are called, these end up built into the NDS executable).

Thing is, in lieu of saving binary space, because the NDS has scarse 4MB of ram, there's a GCC linker flag --gc-sections that helps, but it will sometimes discard sections incorrectly, and thus, removing segments of code from the codepath. The "codepath" detection in the GCC linker is somewhat bugged. Let me explain:

(https://elinux.org/images/2/2d/ELC2010- ... asenko.pdf refer to page 6 to see a "graphical" description of what I mean)

Once you told the linker through the above flag to optimize and discard dead code, you will now find inside the readable map file a "Discarded input sections" section. If you keep the binary file small there's bout 80% chance the NDS binary will work correctly, that is assuming little was discarded because the codepath was simpler to deal with. But, if you add more source code and there's a lot more source code built around, the section relocatable code (of which the codepath/Program Counter will traverse later on) must be inspected carefully, because critical sections will be stubbed out! Watch out for that.

The linker will happily generate binaries, but if you have no debugger at hand, your program may have undefined behaviour or will behave incorrectly, such as throwing segfaults randomly or simply not run and be stuck running in possibly some weird place (such as the reset vectors).

So to save space (meaning having --gc-sections passed onto the linker) and at the same time prevent the linker from tossing invaluable section data, you must define manually the KEEP() attribute on the section desired in your linker script.

There's a magic section approach in the PDF I listed earlier to get a hold of what's going on. It addresses b) issue somewhat but not entirely.
The trick to help the linker to decide "better" which code resides, or is removed, is to turn "non-relocatable code" into "relocatable code" before it goes inside the librarian format. So then the linker will treat the object as relocatable and may work around it if subsequently, there IS code that was optimized and shortened or adultered.

I had this kind of "bug" for ages, and sometimes this was resolved by hand-picking the sections the linker must NOT discard. ( a) approach), but then I guessed why depending on code density, some stuff could end broken/missing suddenly, and if size were to change, it'd magically work. This is a bug in the way GCC handles files. The linker should really add a warning like

"the librarian objects are NON relocatable, you will face weird bugs if you decide to link against it, if so then you desire to use optimized code, your code will break".

So always, always if you decide to link against .a libraries, make sure the code in there is relocatable/Position Independent (-fpic flag). That way if the Linker has no idea what to do with an object that may have been hampered/optimized/etc, then at least the linker will relocate that code which has been adapted to it.

Posts: 55
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: GCC related things

Post by coto » Fri Apr 19, 2019 10:38 pm

Can confirm rebuilding ToolchainGenericDS (https://coto88.bitbucket.io/) using the exact toolchain setup (because of how I built TGDS so I can swap GCC toolchains with ease between any version), here's a neat result someone else already mentioned:

TGDS: (source code: https://bitbucket.org/Coto88/newlib-nds ... rc/master/)
- ARM none EABI: GCC 6.2.x , C++ version: 7.2.1


TGDS: (source code: https://bitbucket.org/Coto88/newlib-nds/src/master/)
- ARM EABI: GCC 4.9.2 , C++ version: 4.9.2

ARM none EABI is about 30% slower in everything (SnemulDS included) as opposed to ARM EABI.

There's absolutely no reason to go back from ARM EABI to ARM none EABI. Perhaps newer processors may benefit but older processors such as AMR7/ARM9 absolutely not.

I can post SnemulDS binaries being compiled for each GCC platform if someone wants to.

Posts: 55
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: GCC related things

Post by coto » Wed Jan 20, 2021 5:04 pm

ToolchainGenericDS now uses Clang (v8.0.1). Disassembled code is more or less what you'd see in paid development environment, such as ARM RVCT.

Basically I have newlib-nds (newlib for NintendoDS recompiled) and ToolchainGenericDS exported as the ARM ABI librarian format (.a), which emits armv4t and armv5te code as relocatable (may be optimized, but each instruction is assumed to have relative offsets in blank, in a placeholder variable defined by the ARM ABI), and then the TGDS project itself which links and builds the final binary out of it. In GCC I used to have a lot of issues when linking relocatable code, because emited code would sometimes not account what I described earlier.

This means sometimes C++ libraries linked externally with GCC 4.9.3 showed erratic behaviour (which had a better software quality control back then in 2008). Some screenshots:

Clang emited code:

Decompiled 1:

Code: Select all

Disassembly of section .init:
 00000000 <_start>:
    @Thus, BIOS exceptions must be handled from such DS BIOS RAM stacks.
    @Also, MPU exceptions are triggered in USR mode (0x10&psr), SYS mode exceptions are ignored. (0x1f&psr)
    @disable exceptions when MPU setup. Enter SYS mode.
    ldr r0,=0x04000204
   0:   e59f0080    ldr r0, [pc, #128]  ; 88 <copy+0x14>
    mov r1,#0
   4:   e3a01000    mov r1, #0
    str r1,[r0]
   8:   e5801000    str r1, [r0]
    ldr r3, =MPUSet
    blx r3
    pop {r0-r3,lr}
    mov r0, #0x12       @ irq
   c:   e3a00012    mov r0, #18
    msr cpsr, r0
  10:   e129f000    msr CPSR_fc, r0
    ldr sp, =sp_IRQ
  14:   e59fd070    ldr sp, [pc, #112]  ; 8c <copy+0x18>
    mov r0, #0x13       @ svc dtcm stacks, irq enable
  18:   e3a00013    mov r0, #19
    msr cpsr, r0
  1c:   e129f000    msr CPSR_fc, r0
    ldr sp, =sp_SVC     
  20:   e59fd068    ldr sp, [pc, #104]  ; 90 <copy+0x1c>
    @on undefined instruction exceptions, 
    @on data/prefetch aborts (caused by the protection unit), 
    @on FIQ (possibly caused by hardware debuggers). 
    @It is also called by accidental software-jumps to the reset vector, and by unused SWI numbers within range 0..1Fh.
    mov     r3 , #0x1b  @ undef dtcm stacks, irq enable
  24:   e3a0301b    mov r3, #27
    msr     CPSR, r3
  28:   e129f003    msr CPSR_fc, r3
    ldr     sp, =sp_UND
  2c:   e59fd060    ldr sp, [pc, #96]   ; 94 <copy+0x20>
    mov     r3 , #0x17  @ dataabt dtcm stacks, irq enable
  30:   e3a03017    mov r3, #23
    msr     CPSR,r3 
  34:   e129f003    msr CPSR_fc, r3
    ldr     sp, =sp_ABT
  38:   e59fd058    ldr sp, [pc, #88]   ; 98 <copy+0x24>
    mov     r3 , #0x11  @FIRQ mode, irq enable
  3c:   e3a03011    mov r3, #17
    msr     CPSR, r3
  40:   e129f003    msr CPSR_fc, r3
    ldr     sp, =sp_FIQ
  44:   e59fd050    ldr sp, [pc, #80]   ; 9c <copy+0x28>
    mov r0, #0x1F       @ usr/sys dtcm stacks, irq enable (ignore all exceptions usermode must be excluded here)
  48:   e3a0001f    mov r0, #31
    msr cpsr, r0
  4c:   e129f000    msr CPSR_fc, r0
    ldr sp, =sp_SYS
  50:   e59fd048    ldr sp, [pc, #72]   ; a0 <copy+0x2c>
        cmp r0, #0x63       @iQueDS Lite
        beq FirmwareARM7OK
        b waitForFirmwareARM7Setup
    bl initHardware
  54:   ebfffffe    bl  0 <initHardware>
    ldr lr, =exception_sysexit
  58:   e59fe044    ldr lr, [pc, #68]   ; a4 <copy+0x30>
    #ifdef ARM7
    b   main            @entrypoint
  5c:   eafffffe    b   0 <main>
00000060 <clear>:
    #ifdef ARM9
    b   mainARGV            @entrypoint
@format:    r0 = src_vma, r1 = dest_vma 
    mov r2, #0
  60:   e3a02000    mov r2, #0
00000064 <clearlop>:
clearlop:   cmp r0, r1
  64:   e1500001    cmp r0, r1
    strcc   r2, [r0],#4
  68:   34802004    strcc   r2, [r0], #4
    bcc clearlop
  6c:   3afffffc    bcc 64 <clearlop>
    bx lr
  70:   e12fff1e    bx  lr
00000074 <copy>:
@format:    r0 = src_vma_section, r1 = dest_lma_start, r2 = dest_lma_end    (where both lma are a whole physical memory region from start to end range address)
    cmp                 r2,r3                       /* check if we've reached the end */
  74:   e1520003    cmp r2, r3
    ldrlo               r0,[r1],#4                  /* if end not reached, get word and advance source pointer */
  78:   34910004    ldrcc   r0, [r1], #4
    strlo               r0,[r2],#4                  /* if end not reached, store word and advance destination pointer */
  7c:   34820004    strcc   r0, [r2], #4
    blo                 copy                        /* if end not reached, branch back to loop */
  80:   3afffffb    bcc 74 <copy>
    bx                  lr                          /* return to caller */
  84:   e12fff1e    bx  lr
  88:   04000204    .word   0x04000204
Decompiled 2:

Code: Select all

main.o:     file format elf32-littlearm
Disassembly of section .text.main:

00000000 <main>:
#include "biosTGDS.h"
#include "CPUARMTGDS.h"

int main(int _argc, sint8 **_argv) {
   0:	e92d4800 	push	{fp, lr}
   4:	e1a0b00d 	mov	fp, sp
   8:	e24dd050 	sub	sp, sp, #80	; 0x50
   c:	e3a00d09 	mov	r0, #576	; 0x240
  10:	e3800301 	orr	r0, r0, #67108864	; 0x4000000
	/*			TGDS 1.6 Standard ARM7 Init code start	*/
	//wait for VRAM D to be assigned from ARM9->ARM7 (ARM7 has load/store on byte/half/words on VRAM)
	while (!(*((vuint8*)0x04000240) & 0x2));
  14:	e5d01000 	ldrb	r1, [r0]
  18:	e3110002 	tst	r1, #2
  1c:	0afffffc 	beq	14 <main+0x14>
  20:	ebfffffe 	bl	0 <installWifiFIFO>
  24:	e1a0400d 	mov	r4, sp
	memset((unsigned char *)&argBuffer[0], 0, sizeof(argBuffer));
  28:	e3a0104c 	mov	r1, #76	; 0x4c
  2c:	e2840004 	add	r0, r4, #4
  30:	ebfffffe 	bl	0 <__aeabi_memclr4>
  34:	e59f0070 	ldr	r0, [pc, #112]	; ac <main+0xac>
	argBuffer[0] = 0xc070ffff;
	writeDebugBuffer7("TGDS ARM7.bin Boot OK!", 1, (int*)&argBuffer[0]);
  38:	e3a01001 	mov	r1, #1
  3c:	e1a02004 	mov	r2, r4
  40:	e3a05001 	mov	r5, #1
	memset((unsigned char *)&argBuffer[0], 0, sizeof(argBuffer));
	argBuffer[0] = 0xc070ffff;
  44:	e58d0000 	str	r0, [sp]
	writeDebugBuffer7("TGDS ARM7.bin Boot OK!", 1, (int*)&argBuffer[0]);
  48:	e59f0060 	ldr	r0, [pc, #96]	; b0 <main+0xb0>
  4c:	ebfffffe 	bl	0 <writeDebugBuffer7>
  50:	e59f705c 	ldr	r7, [pc, #92]	; b4 <main+0xb4>
  54:	e59f405c 	ldr	r4, [pc, #92]	; b8 <main+0xb8>
  58:	e3a06f61 	mov	r6, #388	; 0x184
  5c:	e3866301 	orr	r6, r6, #67108864	; 0x4000000

#ifdef ARM7
extern bool isArm7ClosedLid;
static inline void handleARM7SVC(){
	//Lid Closing + backlight events (ARM7)
	if(isArm7ClosedLid == false){
  60:	e5d70000 	ldrb	r0, [r7]
  64:	e3500000 	cmp	r0, #0
  68:	1a000007 	bne	8c <main+0x8c>
  6c:	e15604be 	ldrh	r0, [r6, #-78]	; 0xffffffb2
  70:	e3100080 	tst	r0, #128	; 0x80
  74:	0a000004 	beq	8c <main+0x8c>
  78:	e1a00004 	mov	r0, r4
  7c:	e3a01000 	mov	r1, #0
  80:	ebfffffe 	bl	0 <SendFIFOWords>
  84:	ebfffffe 	bl	0 <screenLidHasClosedhandlerUser>
			isArm7ClosedLid = true;
  88:	e5c75000 	strb	r5, [r7]
	//Handles Sender FIFO overflows
  8c:	e1d600b0 	ldrh	r0, [r6]
  90:	e3100901 	tst	r0, #16384	; 0x4000
  94:	11d600b0 	ldrhne	r0, [r6]
  98:	13800008 	orrne	r0, r0, #8
  9c:	11c600b0 	strhne	r0, [r6]
	/*			TGDS 1.6 Standard ARM7 Init code end	*/
    while (1) {
		handleARM7SVC();	/* Do not remove, handles TGDS services */
  a0:	e3a00002 	mov	r0, #2
  a4:	ebfffffe 	bl	0 <IRQWait>
  a8:	eaffffec 	b	60 <main+0x60>
  ac:	c070ffff 	.word	0xc070ffff
  b8:	ffff020f 	.word	0xffff020f

(decompiled 2) source code:

Code: Select all


			Copyright (C) 2017  Coto
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301

#include <string.h>
#include "main.h"
#include "InterruptsARMCores_h.h"
#include "interrupts.h"

#include "ipcfifoTGDSUser.h"
#include "wifi_arm7.h"
#include "usrsettingsTGDS.h"
#include "timerTGDS.h"
#include "biosTGDS.h"
#include "CPUARMTGDS.h"

int main(int _argc, sint8 **_argv) {
	/*			TGDS 1.6 Standard ARM7 Init code start	*/
	//wait for VRAM D to be assigned from ARM9->ARM7 (ARM7 has load/store on byte/half/words on VRAM)
	while (!(*((vuint8*)0x04000240) & 0x2));
	memset((unsigned char *)&argBuffer[0], 0, sizeof(argBuffer));
	argBuffer[0] = 0xc070ffff;
	writeDebugBuffer7("TGDS ARM7.bin Boot OK!", 1, (int*)&argBuffer[0]);
	/*			TGDS 1.6 Standard ARM7 Init code end	*/
    while (1) {
		handleARM7SVC();	/* Do not remove, handles TGDS services */
	return 0;

With that said, if you really want to do ARM embedded development, either stick to a toolchain provided enough unit tests to ensure your code will work, or switch to Clang, if you want to program serious code such as a 3D Game or a video player.

It took me good 8 years of work, build a dev environment for NintendoDS and a set of tools in C/C++to realize that. GCC may be used as an alternative, but I'd not recommend it, for sanity and safety reasons.

No need to say, everything I build out of Clang for NDS, works exactly in the way I want to.



Unit Tests : https://bitbucket.org/Coto88/toolchaing ... s-unittest

Post Reply