Reverse-engineering DLDI specs for NDS

Discussion of development of software for any "obsolete" computer or video game system.
coto
Posts: 19
Joined: Wed Mar 06, 2019 6:00 pm

Re: Reverse-engineering DLDI specs for NDS

Post by coto » Sun Aug 18, 2019 10:45 pm

nocash wrote:
coto wrote:and otherwise, I just copy -> paste the DLDI in there, I get segfaults just by executing it from that address. Even when updating the DLDI headers.
Yes, do that, simply memcopy the block from 0200xxxxh to 0680xxxxh, without any address adjustments.
There should be no segfaults if you have correctly specified 0680xxxxh as dldi target address in the "empty dldi area" at time when creating the .nds file.
"empty dldi area" ????????????????????????? wtf???

-

Anyway, since I can demonstrate what I am saying (i'm not lying, and I can prove it):

Sources: https://bitbucket.org/Coto88/gbarunner2/src/master/

I enabled DLDI @ ARM9, recompiled GBARunner2, and just dma copied the DLDI from EWRAM -> VRAM (because i cannot use memcpy, as it uses 8 bit writes to VRAM in NintendoDS, which fails). And I get undefined aborts.

Steps to reproduce that:

dldigba.c -> replace this function:

(this one does the relocation perfectly)
#ifdef ARM9
PUT_IN_VRAM
#endif
bool dldiPatchLoader(bool clearBSS, u32 DldiRelocatedAddress, u32 dldiSourceInRam)
{
addr_t memOffset; // Offset of DLDI after the file is loaded into memory
addr_t patchOffset; // Position of patch destination in the file
addr_t relocationOffset; // Value added to all offsets within the patch to fix it properly
addr_t ddmemOffset; // Original offset used in the DLDI file
addr_t ddmemStart; // Start of range that offsets can be in the DLDI file
addr_t ddmemEnd; // End of range that offsets can be in the DLDI file
addr_t ddmemSize; // Size of range that offsets can be in the DLDI file

addr_t addrIter;

data_t *pDH;
data_t *pAH;

size_t dldiFileSize = 0;

// Target the DLDI we want to use as stub copy and then relocate it to a DldiRelocatedAddress address
DLDI_INTERFACE* dldiInterface = (DLDI_INTERFACE*)DldiRelocatedAddress;
pDH = (data_t*)dldiInterface;
pAH = (data_t *)dldiSourceInRam;

dldiFileSize = 1 << pAH[DO_driverSize];

// Copy the DLDI patch into the application
dmaCopyWords(0, (void*)pAH, (void*)pDH, dldiFileSize);

if (*((u32*)(pDH + DO_ioType)) == DEVICE_TYPE_DLDI) {
// No DLDI patch
return false;
}

if (pDH[DO_driverSize] > pAH[DO_allocatedSpace]) {
// Not enough space for patch
return false;
}


memOffset = DldiRelocatedAddress; //readAddr (pAH, DO_text_start);
if (memOffset == 0) {
memOffset = readAddr (pAH, DO_startup) - DO_code;
}
ddmemOffset = readAddr (pDH, DO_text_start);
relocationOffset = memOffset - ddmemOffset;

ddmemStart = readAddr (pDH, DO_text_start);
ddmemSize = (1 << pDH[DO_driverSize]);
ddmemEnd = ddmemStart + ddmemSize;

// Remember how much space is actually reserved
pDH[DO_allocatedSpace] = pAH[DO_allocatedSpace];


// Fix the section pointers in the DLDI @ VRAM header
writeAddr (pDH, DO_text_start, readAddr (pAH, DO_text_start) + relocationOffset);
writeAddr (pDH, DO_data_end, readAddr (pAH, DO_data_end) + relocationOffset);
writeAddr (pDH, DO_glue_start, readAddr (pAH, DO_glue_start) + relocationOffset);
writeAddr (pDH, DO_glue_end, readAddr (pAH, DO_glue_end) + relocationOffset);
writeAddr (pDH, DO_got_start, readAddr (pAH, DO_got_start) + relocationOffset);
writeAddr (pDH, DO_got_end, readAddr (pAH, DO_got_end) + relocationOffset);
writeAddr (pDH, DO_bss_start, readAddr (pAH, DO_bss_start) + relocationOffset);
writeAddr (pDH, DO_bss_end, readAddr (pAH, DO_bss_end) + relocationOffset);

// Fix the function pointers in the header
writeAddr (pDH, DO_startup, readAddr (pAH, DO_startup) + relocationOffset);
writeAddr (pDH, DO_isInserted, readAddr (pAH, DO_isInserted) + relocationOffset);
writeAddr (pDH, DO_readSectors, readAddr (pAH, DO_readSectors) + relocationOffset);
writeAddr (pDH, DO_writeSectors, readAddr (pAH, DO_writeSectors) + relocationOffset);
writeAddr (pDH, DO_clearStatus, readAddr (pAH, DO_clearStatus) + relocationOffset);
writeAddr (pDH, DO_shutdown, readAddr (pAH, DO_shutdown) + relocationOffset);

if (pDH[DO_fixSections] & FIX_ALL) {
// Search through and fix pointers within the data section of the file
for (addrIter = (readAddr(pDH, DO_text_start) - ddmemStart); addrIter < (readAddr(pDH, DO_data_end) - ddmemStart); addrIter++) {
if ((ddmemStart <= readAddr(pAH, addrIter)) && (readAddr(pAH, addrIter) < ddmemEnd)) {
writeAddr (pAH, addrIter, readAddr(pAH, addrIter) + relocationOffset);
}
}
}


if (pDH[DO_fixSections] & FIX_GLUE) {
// Search through and fix pointers within the glue section of the file
for (addrIter = (readAddr(pDH, DO_glue_start) - ddmemStart); addrIter < (readAddr(pDH, DO_glue_end) - ddmemStart); addrIter++) {
if ((ddmemStart <= readAddr(pAH, addrIter)) && (readAddr(pAH, addrIter) < ddmemEnd)) {
writeAddr (pAH, addrIter, readAddr(pAH, addrIter) + relocationOffset);
}
}
}

if (pDH[DO_fixSections] & FIX_GOT) {
// Search through and fix pointers within the Global Offset Table section of the file
for (addrIter = (readAddr(pDH, DO_got_start) - ddmemStart); addrIter < (readAddr(pDH, DO_got_end) - ddmemStart); addrIter++) {
if ((ddmemStart <= readAddr(pAH, addrIter)) && (readAddr(pAH, addrIter) < ddmemEnd)) {
writeAddr (pAH, addrIter, readAddr(pAH, addrIter) + relocationOffset);
}
}
}

/*
if (clearBSS && (pDH[DO_fixSections] & FIX_BSS)) {
// Initialise the BSS to 0, only if the disc is being re-inited
for(int i = 0; i < (readAddr(pDH, DO_bss_end) - readAddr(pDH, DO_bss_start)) / 4; i++)
{
((uint32_t*)&pAH[readAddr(pDH, DO_bss_start) - ddmemStart]) = 0;
}

}
*/
return true;
}


And instead replace it with:


(just DMA copy DLDI from EWRAM -> VRAM)
#ifdef ARM9
PUT_IN_VRAM
#endif
bool dldiPatchLoader(bool clearBSS, u32 DldiRelocatedAddress, u32 dldiSourceInRam)
{
addr_t memOffset; // Offset of DLDI after the file is loaded into memory
addr_t patchOffset; // Position of patch destination in the file
addr_t relocationOffset; // Value added to all offsets within the patch to fix it properly
addr_t ddmemOffset; // Original offset used in the DLDI file
addr_t ddmemStart; // Start of range that offsets can be in the DLDI file
addr_t ddmemEnd; // End of range that offsets can be in the DLDI file
addr_t ddmemSize; // Size of range that offsets can be in the DLDI file

addr_t addrIter;

data_t *pDH;
data_t *pAH;

size_t dldiFileSize = 0;

// Target the DLDI we want to use as stub copy and then relocate it to a DldiRelocatedAddress address
DLDI_INTERFACE* dldiInterface = (DLDI_INTERFACE*)DldiRelocatedAddress;
pDH = (data_t*)dldiInterface;
pAH = (data_t *)dldiSourceInRam;

dldiFileSize = 1 << pAH[DO_driverSize];

// Copy the DLDI patch into the application
dmaCopyWords(0, (void*)pAH, (void*)pDH, dldiFileSize);

/*
if (*((u32*)(pDH + DO_ioType)) == DEVICE_TYPE_DLDI) {
// No DLDI patch
return false;
}

if (pDH[DO_driverSize] > pAH[DO_allocatedSpace]) {
// Not enough space for patch
return false;
}


memOffset = DldiRelocatedAddress; //readAddr (pAH, DO_text_start);
if (memOffset == 0) {
memOffset = readAddr (pAH, DO_startup) - DO_code;
}
ddmemOffset = readAddr (pDH, DO_text_start);
relocationOffset = memOffset - ddmemOffset;

ddmemStart = readAddr (pDH, DO_text_start);
ddmemSize = (1 << pDH[DO_driverSize]);
ddmemEnd = ddmemStart + ddmemSize;

// Remember how much space is actually reserved
pDH[DO_allocatedSpace] = pAH[DO_allocatedSpace];


// Fix the section pointers in the DLDI @ VRAM header
writeAddr (pDH, DO_text_start, readAddr (pAH, DO_text_start) + relocationOffset);
writeAddr (pDH, DO_data_end, readAddr (pAH, DO_data_end) + relocationOffset);
writeAddr (pDH, DO_glue_start, readAddr (pAH, DO_glue_start) + relocationOffset);
writeAddr (pDH, DO_glue_end, readAddr (pAH, DO_glue_end) + relocationOffset);
writeAddr (pDH, DO_got_start, readAddr (pAH, DO_got_start) + relocationOffset);
writeAddr (pDH, DO_got_end, readAddr (pAH, DO_got_end) + relocationOffset);
writeAddr (pDH, DO_bss_start, readAddr (pAH, DO_bss_start) + relocationOffset);
writeAddr (pDH, DO_bss_end, readAddr (pAH, DO_bss_end) + relocationOffset);

// Fix the function pointers in the header
writeAddr (pDH, DO_startup, readAddr (pAH, DO_startup) + relocationOffset);
writeAddr (pDH, DO_isInserted, readAddr (pAH, DO_isInserted) + relocationOffset);
writeAddr (pDH, DO_readSectors, readAddr (pAH, DO_readSectors) + relocationOffset);
writeAddr (pDH, DO_writeSectors, readAddr (pAH, DO_writeSectors) + relocationOffset);
writeAddr (pDH, DO_clearStatus, readAddr (pAH, DO_clearStatus) + relocationOffset);
writeAddr (pDH, DO_shutdown, readAddr (pAH, DO_shutdown) + relocationOffset);

if (pDH[DO_fixSections] & FIX_ALL) {
// Search through and fix pointers within the data section of the file
for (addrIter = (readAddr(pDH, DO_text_start) - ddmemStart); addrIter < (readAddr(pDH, DO_data_end) - ddmemStart); addrIter++) {
if ((ddmemStart <= readAddr(pAH, addrIter)) && (readAddr(pAH, addrIter) < ddmemEnd)) {
writeAddr (pAH, addrIter, readAddr(pAH, addrIter) + relocationOffset);
}
}
}


if (pDH[DO_fixSections] & FIX_GLUE) {
// Search through and fix pointers within the glue section of the file
for (addrIter = (readAddr(pDH, DO_glue_start) - ddmemStart); addrIter < (readAddr(pDH, DO_glue_end) - ddmemStart); addrIter++) {
if ((ddmemStart <= readAddr(pAH, addrIter)) && (readAddr(pAH, addrIter) < ddmemEnd)) {
writeAddr (pAH, addrIter, readAddr(pAH, addrIter) + relocationOffset);
}
}
}

if (pDH[DO_fixSections] & FIX_GOT) {
// Search through and fix pointers within the Global Offset Table section of the file
for (addrIter = (readAddr(pDH, DO_got_start) - ddmemStart); addrIter < (readAddr(pDH, DO_got_end) - ddmemStart); addrIter++) {
if ((ddmemStart <= readAddr(pAH, addrIter)) && (readAddr(pAH, addrIter) < ddmemEnd)) {
writeAddr (pAH, addrIter, readAddr(pAH, addrIter) + relocationOffset);
}
}
}
*/

/*
if (clearBSS && (pDH[DO_fixSections] & FIX_BSS)) {
// Initialise the BSS to 0, only if the disc is being re-inited
for(int i = 0; i < (readAddr(pDH, DO_bss_end) - readAddr(pDH, DO_bss_start)) / 4; i++)
{
((uint32_t*)&pAH[readAddr(pDH, DO_bss_start) - ddmemStart]) = 0;
}

}
*/
return true;
}


shared.h -> disable DLDI @ ARM7 (makes DLDI @ ARM9):

#ifndef __SHARED_H__
#define __SHARED_H__

//DLDI ARM7 Support (requires to recompile the project)
//#define ARM7_DLDI

#define SOUND_BUFFER_SIZE (8192)
#define SAVE_DATA_SIZE (0x20000) //128K SRAM/EEprom/Flash 8bit/16bit/32bit write compatible RAM memory.
#define ROM_DATA_LENGTH (0x3A0000 - (32*1024) ) //0x400000 - 0x40000 (hypervisor) - 0x20000 (128K) = 0x3A0000
#define ROM_ADDRESS_MAX (0x08000000 + ROM_DATA_LENGTH)
#define MAIN_MEMORY_ADDRESS_SAVE_DATA (0x02400000 - SAVE_DATA_SIZE) //-> (0x023E0000 ~ 0x023FFFFF) -> mirror: 0x01FE0000
#define MAIN_MEMORY_ADDRESS_SDCACHE (MAIN_MEMORY_ADDRESS_SAVE_DATA - (32*1024)) //-> (0x23D8000): -- used for shared DLDI sector memory between ARM7/ARM9

#define SOUND_EMU_QUEUE_LEN 64
#define address_dtcm (0x02C00000) //@0x04F00000 @0x01800000
#define MAIN_MEMORY_ADDRESS_ROM_DATA (0x02040000)

//VRAM Layout
#define sd_cluster_cache_addr (0x06840000)
#define sd_access_driver (0x06860000)

#define FIFO_CNT_EMPTY (1 << 8)
#define REG_FIFO_CNT (*((vu32*)0x04000184))
#define REG_SEND_FIFO (*((vu32*)0x04000188))
#define REG_RECV_FIFO (*((vu32*)0x04100000))

#ifdef __ASSEMBLER__

sd_cluster_cache = sd_cluster_cache_addr
sd_data_base = sd_access_driver
sd_is_cluster_cached_table = (sd_data_base + 32768 + (64*1024) ) @(sd_data_base + (224 * 1024))
sd_cluster_cache_info = (sd_is_cluster_cached_table + (16 * 1024))
sd_sd_info = (sd_cluster_cache_info + (256 * 8 + 4)) @0x0685C404
pu_data_permissions = 0x33600603 @0x33600003 @0x33660003

#endif

#ifndef __ASSEMBLER__

#ifdef ARM9
#define PUT_IN_VRAM __attribute__((section(".vram")))
#define ITCM_CODE __attribute__((section(".itcm"), long_call))
#endif

#define PACKED __attribute__ ((packed))

#endif //__ASSEMBLER__

#endif



now re-build / recompile GBARunner2.

You'll get a build that causes undefined exceptions (registers scope within the original DLDI address, but while the Program Counter is in VRAM).

Thus, I cannot simply copy (or dmacopy) the DLDI from EWRAM to VRAM. Which is why I am actually replying here and calling it "a trick".

nocash
Posts: 1089
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Reverse-engineering DLDI specs for NDS

Post by nocash » Mon Aug 19, 2019 8:32 am

The "empty dldi area" is the memory area in your .nds file that is reserved for the flashcart driver. If the driver hasn't been installed yet, then it is initially "empty", or "almost empty" because there are few non-empty bytes in there: The dldi ID bytes/string with EDh,A5h,8Dh,BFh,20h,"Chishm",00h at offset 00h, the size byte at offset 0Fh, and the load address at offset 40h.

The important value here is the load address at offset 40h, I assume you (or your devkit) have configured that value to 02xxxxxxh, and that is wrong because your code will actually load it to VRAM, so the correct value would be 068xx000h (or wherever you have it in VRAM).

If you figure out how to change that value then you won't manually need to adjust the FIX_GLUE and FIX_GOT stuff, and that will make things a good bit less unreliable, and that will probably fix mysterious issues like problems with SD vs SDHC... unless that issue was caused by missing 8bit write support in VRAM.

Post Reply