Page 1 of 1

Best method of doing a full disassembly of a SNES ROM?

Posted: Sat Nov 26, 2016 11:13 pm
by Sverker
Pretty much what the title says, I'm looking for input as to the best way of creating a disassembly of a SNES ROM that can be modified and compiled into a full ROM. I'm at the point where I'm reaching the limits of what spot-patching ASM in a ROM hack can accomplish. I am relatively well versed in 65816 assembly and have done extensive reverse-engineering and mapping out of the ROM in question that I'm looking to disassemble (Final Fight). I've done some extensive disassembly on this ROM already in IDA Pro but the program is limited in some ways, especially with how it poorly handles the M and X flags. I've searched this forum for topic titles relating to disassembly but nothing came up so my apologies if this is a done-to-death subject, but even still over time certain methods are held to be better than others so I'm interested in what you guys have to say.

Thanks for reading.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 12:06 am
by AWJ
I've written my own disassembler for NES, SNES and Game Boy that's meant to be able to generate output that can be reassembled. It can apply labels to code lines and RAM addresses and convert almost any data structure into .byte/.word directives. Being something I created for my own use, it's frankly not user friendly at all (you have to write Python code to use it; it's more of a "disassembly framework" than a disassembler per se) but if you're interested, send me a PM with contact info and I'll send it to you.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 12:13 am
by koitsu
There is no program that can "magically" give you a disassembly in a complete flawless fashion that will meet your needs. The are two reasons, one which is 65816-specific and the other is universal:

1) The 65816 as you mention has runtime dynamic register sizing (m/x, 8 vs. 16-bit), which affects the disassembly output. The problem is that it's done at runtime, not at assemble-time, thus the only way a disassembler could know what the proper values are to use at that time would be for an disassembler to *also* be a full-on SNES emulator.

2) There is no way to know purely from a disassembly what is data and what is code. It is up to you to figure that out through reverse-engineering efforts.

The method I've used for decades (for NES/Famcom and SNES/SFC -- the latter is just more complex, and I'll focus on that) is to learn about the memory mode of the ROM in question (mode 20, mode 21, what banks it's assembled to use), then proceed to split the ROM up into relevant segments (those may be 32KBytes of 64KBytes) and make those separate .bin files (ex. bank-c0.bin). Then disassemble each .bin file separately with whatever relevant flags (for bank value) are necessary. There are a few 65816 disassemblers that do all of that (I wrote one long ago: TRaCER, but it's for MS-DOS); dispel is one I've commonly used used off and on. There are flags that the disassembler need to support to output code that can be reassembled later with an assembler (important: the disassembler and assembler therefore need to be compatible/use the same syntax). I then start reverse-engineering the code using a combination of an emulator that has a 65816 debugger (ex. SNES9x, bsnes-plus, etc.) to see what the behaviour is, following along in the relevant .asm files I've made, and fixing up anything that has wrong rep/sep sizing results manually (knowing machine language is helpful).

The process is tedious, time consuming, and just how it's done.

Reassembling is pretty easy: make an .asm file that basically sets up the assembler/banking directives/etc. and then include all the other .asm files (per bank) in there in appropriate order (using .incsrc or whatever include directive the assembler offers -- read its documentation!), and then compare the resulting assembled file to the original (in Windows/DOS that's fc /b orig new or use whatever hex editor or file compare utility you want). If you do end up modifying code/data, be aware that for things to "look right" you will need to regenerate the CRC16 checksum stored in the SNE header -- but 99% of emulators I've used will ignore that and run things anyway (but it's something you may want to fix before a "final release" in an IPS patch or similar).

If there's a specific part of a game or something you're trying to RE, then you may be able to use an emulator with a debugger to reverse-engineer that (again: it's tedious and lots of time required), and then find the correlating code/data in the ROM and then only have to, say, disassemble one bank or some particular section.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 12:42 am
by Sverker
Yes, I am aware of this limitation and I'm not looking for nor expecting a magic bullet. I've already done a lot of groundwork with this ROM in terms of mapping out what is code and what is data and have many subroutines already analyzed, just looking for ways to turn this into something that can be reassembled because spot-patching is only taking me so far. I will look into this bank splitting method and dispel. Thanks for the reply.

Also thanks AWJ but I'm not a fan of Python so I'll pass for now.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 12:52 am
by rainwarrior
A code/data log is a good place to start. I think most debugging emulators tend to have this feature. You play the game for a while (ideally you want to play completely through the game, and try to do at least a little of everything you can do on the way), and it keeps a log of what parts of the ROM were run as code, and what parts were used as data.

After doing that, feed this to a disassembler for your first pass.

Once you have this first pass of disassembly, start annotating it, mostly naming RAM locations and parts of code, adding comments where you need it. Find a disassembler that lets you feed this annotation back to the disassembler and generate another pass. Each pass the annotations will propagate through the disassembled code (names for variables go a long way here), and the disassembly gets easier to read with every pass. (You might also be able to feed those annotations to a debugger to make it easier to inspect the code at runtime too.)

ca65 has a dissasembler called da65 that lets you do annotations through an "info" file. In that case you start by turning a code/data log made by your emulator into an info file, and then for each pass you add more annotations to the info file and run the disassembler again. I'm not sure how good da65's support for the SNES CPU is (have only used it for NES), but this kind of process should apply to other disassemblers too.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 1:18 am
by Sverker
That sounds ideal, but unfortunately it looks like da65 does not support 65816 and I'm not sure if any SNES debuggers create that kind of log. Thank you for the suggestion though, I didn't know that was even possible.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 1:41 am
by rainwarrior
I'd be really surprised if there isn't an SNES debugger out there that can generate a code/data log, but if one doesn't exist, modifying an emulator to log code regions at least should be a pretty minimal effort. (Just look at the PC every time an instruction is executed and flag that byte of ROM in your log as being code.) Even if you just have a code log to start with, you should be able to find other code regions you missed as you start annotating.

If there isn't a suitable disassembler, maybe even just write one? Writing a disassembler is a lot simpler than writting an assembler. (A lot of people do seem to write their own, too. I've written a very minimal one for NES before.)

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 1:47 am
by qwertymodo
bsnes-plus definitely creates a usage log, you just have to go into the settings to be sure it actually gets saved to a file.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 6:22 am
by dougeff
In theory, you should be able to trace a SNES from starting position, through all possible branches, and set appropriate processor size every time you see a REP,SEP...however, some games push processor to stack and then restore later, and if that was during a timed event (IRQ,NMI) it might be tricky to trace that...and I suppose all bytes not traced at the end can be treated like data...which may be old code that wasn't used...so you should at least attempt a trace of it...

certain bytes are easy to spot as opcode...A9, A5, AD, for LDA...60, rts, 6b, rtl, 40, rti...could help to guess if it is code.

And I'm not sure how you would handle indirect jumps.

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 9:09 am
by tepples
At some point the combinatorial explosion of possible paths in response to input helps you understand why the halting problem is undecidable for Turing machines and intractable for linear bounded automata (that is, real computers).

Re: Best method of doing a full disassembly of a SNES ROM?

Posted: Sun Nov 27, 2016 12:48 pm
by rainwarrior
tepples wrote:At some point the combinatorial explosion of possible paths in response to input helps you understand why the halting problem is undecidable for Turing machines and intractable for linear bounded automata (that is, real computers).
There may be an effectively "infinite" variations in input, but there's still a finite amount of code in the ROM.

There's probably no perfect algorithmic solution for finding all the reachable code in a ROM, but an imperfect solution can still be quite useful. A trace that doesn't actually execute instructions or keep track of memory states or inputs and just recursively takes both of every branch could probably generate a great starting point for a disassembly.

There's lots of things you can't really catch without tracking states (self modifying code, jump tables, branches that will always be taken, etc.) but they don't necessarily dominate the problem, especially if you can manually intervene and restart the process.