It is currently Sat Sep 21, 2019 4:38 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Wed Feb 13, 2019 4:29 pm 
Offline

Joined: Wed Feb 13, 2019 2:10 pm
Posts: 6
Hi all- apologies if these have already been answered, I've searched and also read through what I think are the relevant sections of Mesen's manual and played with the menus but haven't found answers yet.

I'm taking a stab at reverse engineering Crystalis. My goal is to get some kind of rebuildable disassembly although I know this ROM is a bear. My strategy so far is to use Mesen's debugger to help with code identification and then attempt a literal disassembly to get a rebuildable code base that I can work from going forward. I've finished a fairly thorough play-thru trying to cover as many edge cases as possible, so I think the CDL is fairly complete in terms of regular gameplay. Now I have questions about how to interpret Mesen's CDL files and its disassembly output.

* The main frustration I'm having is completing the CDL. I tried scanning it for unidentified data and overwriting those parts of the ROM with zeroes. This does affect the game-- why was this data "unknown" in the CDL when it's clearly doing something? I get that some wouldn't be encountered during a regular playthrough, but I think there's something else going on.
* As far as disassembling the ROM, Mesen adds labels- which is great, but I also want a literal, rebuildable, disassembly. I read Mesen's manual and looked through all the menus but I can't figure out a way to turn off the auto labels and get the output I want -- is there a way to do this or will I need to address this programatically? (I've also tried a few disassemblers that are supposed to work with FCEUX's CDL files but no luck, although I know Mesen uses a slightly different CDL format).

Thank you!


Top
 Profile  
 
PostPosted: Wed Feb 13, 2019 8:09 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4208
Location: A world gone mad
What version of Mesen are you using? (Help -> About) It matters greatly here.

Sour can explain the changes to the CDL format he used. (I asked about some of these in a PM with him and he was happy to explain them.)

Getting a usable disassembly (for an entire game, vs. small snippets) isn't something Mesen has right now. You'll need something like disasm6, which can use CDL files (but I'm not sure if it can use Mesen's CDLs, as they aren't 100% identical to FCEUX). Except disasm6 doesn't have MMC3 support, so you'll likely have to manually disassemble PRG banks one by one, or make hand-made CDL files for each of them, or just keep a gigantic list of addresses you care about along with each bank. You're welcome.

clever-disasm might be another alternative, except... well... I'll keep my opinions to myself (hint: good luck with that).

I think you'll find that "the disassembler you want" for this task doesn't really exist. In fact, as the author of a disassembler in the mid-to-late 90s, I would say the situation today is really not all that great for something "easy". You really do have to put in the long hours splitting stuff up. It's a *lot* of work, on top of the work you have to do just to understand the code itself.

P.S. -- You might try asking the Stardust Crusaders folks or on the romhacking.net forum to see if someone else has already done the initial work/pain for you, to get something reassemble-able.


Top
 Profile  
 
PostPosted: Wed Feb 13, 2019 8:33 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8564
Location: Seattle
koitsu wrote:
clever-disasm might be another alternative, except... well... I'll keep my opinions to myself (hint: good luck with that).
From prior experience, it's really not up for MMC3 disassemblies. Regardless, it requires a lot of guidance—I just find that it requires a lot less than generating a sufficiently complete CDL. (Specifically, I mean that I've generated disassemblies using an FCEUX CDL file and disasm6, and I've generated disassemblies using clever-disasm, and I think it took less effort to get something comparable with clever-disasm. It's just that that effort is "repeatedly run it and edit the descriptor file until the ambiguities are cleared up" instead of "play through the game until sufficiently close to all the code in the ROM is marked")

Either way, 90%+ of the work is still converting all the automatic labels and automatic names into human-comprehensible ones.


Top
 Profile  
 
PostPosted: Wed Feb 13, 2019 9:42 pm 
Offline

Joined: Wed Feb 13, 2019 2:10 pm
Posts: 6
Quote:
Except disasm6 doesn't have MMC3 support, so you'll likely have to manually disassemble PRG banks one by one, or make hand-made CDL files for each of them, or just keep a gigantic list of addresses you care about along with each bank. You're welcome.


That fixed it. And works with the split up CDL, too. Didn't occur to me that this was the problem since without the CDL it would output a disassembly, just with code/data mixed. Thanks :facepalm:

FWIW I'm using Mesen 0.9.7. The CDLs are the same as FCEUX's except that Mesen sets bit #7 to mark entry points.


Top
 Profile  
 
PostPosted: Wed Feb 13, 2019 9:55 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 725
taters wrote:
FWIW I'm using Mesen 0.9.7. The CDLs are the same as FCEUX's except that Mesen sets bit #7 to mark entry points.
As a heads up, the current dev builds also change the meaning of the $10 bit to mark bytes that are the destination of a jump/branch instruction. AFAIK, that PHP disassembler only uses the $01/$02 bits in the CDL file.

You can turn off the default labels by either just selecting them and pressing delete, and you can remove them permanently by going to File->Workspace->Disable default labels.

It's possible that the unknown data is a bug (although I'm not too sure how it could happen), if you have a way to reproduce the problem, I'm happy to take a look.


Top
 Profile  
 
PostPosted: Wed Feb 13, 2019 11:28 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4208
Location: A world gone mad
Sour wrote:
AFAIK, that PHP disassembler only uses the $01/$02 bits in the CDL file.

As of version disasm6 v1.5, I believe that's correct (from looking at the PHP code) -- it only cares about bits 0 and 1 in the CDL file.

As for the OP encountering issues in Mesen where some bytes are considered unknown despite being accessed/used: I may have seen this. I definitely had some scenarios where a single byte (it was ALWAYS a single byte too) in a block of data I'd previously marked as code or data was unexpectedly marked "unknown". Going back and re-assigning it to code/data would alleviate the problem. I spent some time trying to figure out how this could/would happen, but failed / couldn't reproduce it reliably (read: it could be reproduced, but the situation/scenario I wasn't able to determine reliably, which makes reporting it very hard). It was on an older version of Mesen however, so I'd be better off starting over with a latest Mesen debug build + fresh RE project files + seeing if I could reproduce it there.

If this isn't the problem the OP saw, then I guess there could be two bugs, haha. :-)


Top
 Profile  
 
PostPosted: Thu Feb 14, 2019 11:47 am 
Offline

Joined: Wed Feb 13, 2019 2:10 pm
Posts: 6
Sour wrote:

You can turn off the default labels by either just selecting them and pressing delete, and you can remove them permanently by going to File->Workspace->Disable default labels.



Do you mean the list on the right side of the debugger? I saw that in the instructions but it doesn't work for me (nor the option under the Workspace menu).

koitsu wrote:
As for the OP encountering issues in Mesen where some bytes are considered unknown despite being accessed/used: I may have seen this. I definitely had some scenarios where a single byte (it was ALWAYS a single byte too) in a block of data I'd previously marked as code or data was unexpectedly marked "unknown". Going back and re-assigning it to code/data would alleviate the problem. I spent some time trying to figure out how this could/would happen, but failed / couldn't reproduce it reliably (read: it could be reproduced, but the situation/scenario I wasn't able to determine reliably, which makes reporting it very hard). It was on an older version of Mesen however, so I'd be better off starting over with a latest Mesen debug build + fresh RE project files + seeing if I could reproduce it there.

Sounds like it may be similar/same issue. I have not noticed it today but will keep trying to reproduce it.


Top
 Profile  
 
PostPosted: Thu Feb 14, 2019 3:20 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 725
Are you talking about the automatic jump labels? e.g L8040, L88A9, etc?
If so, you need to disable that option first (in Options->Auto-create jump labels), and then File->Workspace->Reset labels should clear all the labels.


Top
 Profile  
 
PostPosted: Thu Feb 14, 2019 4:53 pm 
Offline

Joined: Wed Feb 13, 2019 2:10 pm
Posts: 6
Yes and that worked!

Thank you!


Top
 Profile  
 
PostPosted: Sat Feb 16, 2019 8:50 pm 
Offline

Joined: Wed Feb 13, 2019 2:10 pm
Posts: 6
I have another question about configuring Mesen's disassembly output. I've noticed that following a branch instruction, even if the CDL has logged subsequent bytes as data, Mesen disassembles it, I guess to show what the instructions are in case of a non-branching test.

For example:

LDA #$00
STA $09
BEQ L3C900
... [some data bytes that get disassembled into nonsense instructions (and worse, throw off the reading frame)]

The affected data is highlighted in green, so at least I can visually see that it's not actually code. But I don't know how to stop Mesen from disassembling it. I've tried a few different settings but no luck so far. Any way to stop the auto disassembly?

p.s. I apologize for asking what are obviously simple questions that maybe there is an answer in Mesen's documentation or via search, but I didn't find it or it was not clear to me


Top
 Profile  
 
PostPosted: Sat Feb 16, 2019 9:29 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 725
There isn't a way to do this, Mesen will always assume something after a conditional branch is potential code, because this will be the case 99.9% of the time. The green color means that it's been marked as code, but hasn't been executed yet.

Changing this would make the debugger somewhat less user-friendly, since the alternative would require all branches to be taken before they're actually marked as code (which means they wouldn't be disassembled at all depending on your settings)

I'm not sure what you mean by "throwing off the reading frame", though?


Top
 Profile  
 
PostPosted: Sat Feb 16, 2019 9:52 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21595
Location: NE Indiana, USA (NTSC)
Unlike 65C02, 65816, Z80, and SM83, the MOS 6502 has no BRA instruction, and an unconditional conditional branch saves 1 byte relative to JMP. So I imagine one might approximate detecting the most common unconditional conditional branch setups by adding a flag to the debugger.

  • Set an internal flag when NZ are set by loading a known value:
    LDA/X/Y #immediate or from LDA/X/Y from absolute ROM in the same or a fixed bank
  • Clear it when NZ are set by any other instruction
  • Clear it when PC is changed (JMP, JSR, B??, RTS, RTI)
  • If the flag is set, then taken BMI, BPL, BEQ, BNE should not mark the untaken side as code

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Sun Feb 17, 2019 12:27 am 
Offline

Joined: Wed Feb 13, 2019 2:10 pm
Posts: 6
Sour wrote:
There isn't a way to do this, Mesen will always assume something after a conditional branch is potential code, because this will be the case 99.9% of the time. The green color means that it's been marked as code, but hasn't been executed yet.

Changing this would make the debugger somewhat less user-friendly, since the alternative would require all branches to be taken before they're actually marked as code (which means they wouldn't be disassembled at all depending on your settings)


I appreciate the heuristics, just wondering if there was a way to switch it off since that branch is never going to execute

Sour wrote:
I'm not sure what you mean by "throwing off the reading frame", though?


In the example I pasted, the last byte of the inappropriately disassembled data matches the opcode for CMP. The next two bytes are 0xA9 0x09-- which should be disassembled as LDA #$09 but 0xA9 ends up being treated as the operand for the CMP opcode.


Top
 Profile  
 
PostPosted: Sun Feb 17, 2019 12:32 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 757
For this kind of work I use https://csdb.dk/release/?id=149429 it won't understand the NES's banking, so you would need to manually split up the files into banks. While messan is handy for working things out, I find Regenerator is better at making source code that will re assemble. For example if you could make the CMP a byte so it will put and then dissasemble the LDA. It has zero smarts so you need to tell it what it is where. You can also set up Lo/Hi tables and have labels made for it etc.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group