Re: Reverse-engineering DLDI specs for NDS
No, you rather don't want to know that - it just didn't work out well. And, there's almost nobody left having started new threads in the DS development section in past 5 years. Maybe there'll be some bigger NDS/DSi retro scene in some years, but at the moment it's not, at least not at gbadev.org.tepples wrote:In order to help you further, I might need to understand what went wrong last time you asked on forum.gbadev.org.
Or maybe it'll be in nesdev other retro section - the NDS is really kinda retro now. And the DSi was widely seen as NDS cost-down-retro-revision even when it was new. I only hope nobody feels distracted when I post that kind of stuff here - it's just because I like this forum (and mind that the GBA/NDS video engines aren't so distant from SNES).
If the flashcart bootloaders were supporting multithreading for that purpose (I would guess not).tepples wrote:This interleaving might be part of why you don't notice a wait when loading an application using DLDI.
Finding the DLDI header won't take up too much time anyways (if it's in first some Kbytes of the ARM9 area).
The bigger problem would loading NDS programs without DLDI support - then it would need to scan the whole ARM9 area.
I didn't do the maths yet on how slow it could be - maybe it isn't taking up too many milliseconds even on large ARM9 bootcode blocks. But, two ideas for speeding up things:
If the RAM=ROM+1FFFE00h formula does apply to all cart headers, then one could use that to detect files with DLDI (and skip patching normal software without DLDI).
And, there are those sixteen FFh-bytes at 1230h in the above ROM dump, that looks as if the DLDI header at 1240h were 32-byte aligned? If that's the case in all .nds files then one would need to verify only each 8th word. The memory system would still have the same amount of traffic (for loading 32-byte cache lines), but "LDR r0" should be around 7 clks faster than "LDMIA r0-r7", and needing only one CMP instead of eight CMP/CMPNE's should be also 7 clks faster.
Oh, and ARM7 support. My current theory would be that the code is loaded to ARM9 memory space (using the RAM=ROM+1FFFE00h addressing), and ARM7 might simply jump to the same Main RAM locations when using the DLDI functions.
The things that could cause ARM7 issues:
The driver maker might have used ARM9 opcodes (eg. BLX) in DLDI.
And, if the .nds file does NOT align the driver area to 32-byte cache lines, then things could go wrong if it would store variables at the very end of the allocated DLDI memory space (but unlikely that it has squeezed memory that tightly).