Trying ROM Disassembly & getting lots of Invalid Opcodes

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

Post Reply
Dugongue
Posts: 5
Joined: Wed Apr 18, 2018 4:53 pm

Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by Dugongue »

Hi,
I'm trying to teach myself to read NES assembly code & I hit a snag.
I used disasm6 v1.5, both with and without use of a CDL file from FCEUX and the resultant ASM file is chock full of invalid Opcode errors.
Here is a sample section:

Code: Select all

bvc __83aa         ; $83e8: 50 c0     
            cld                ; $83ea: d8        
            .hex ef a3 40      ; $83eb: ef a3 40  Invalid Opcode - ISC $40a3
            .hex cf 0c 72      ; $83ee: cf 0c 72  Invalid Opcode - DCP $720c
            bcs __83a6         ; $83f1: b0 b3     
            sbc #$9c           ; $83f3: e9 9c     
            .hex a3 02         ; $83f5: a3 02     Invalid Opcode - LAX ($02,x)
            .hex cf 20 09      ; $83f7: cf 20 09  Invalid Opcode - DCP $0920
            sbc $00            ; $83fa: e5 00     
            brk                ; $83fc: 00        
            .hex 6f 0c 7e      ; $83fd: 6f 0c 7e  Invalid Opcode - RRA $7e0c
            .hex d3 b3         ; $8400: d3 b3     Invalid Opcode - DCP ($b3),y
            sbc #$60           ; $8402: e9 60     
            .hex d4 04         ; $8404: d4 04     Invalid Opcode - NOP $04,x
            .hex 57 c0         ; $8406: 57 c0     Invalid Opcode - SRE $c0,x
            cld                ; $8408: d8        
            .hex 1f a4 0c      ; $8409: 1f a4 0c  Invalid Opcode - SLO $0ca4,x
            .hex 72            ; $840c: 72        Invalid Opcode - KIL 
            .hex d3 8b         ; $840d: d3 8b     Invalid Opcode - DCP ($8b),y
            .hex 12            ; $840f: 12        Invalid Opcode - KIL 
            lda $8c,x          ; $8410: b5 8c     
            pla                ; $8412: 68        
            bvs __83d0         ; $8413: 70 bb     
            .hex b3 e9         ; $8415: b3 e9     
I'm assuming that this isn't correct and that something went wrong?
I know that it's possible for this to happen if I have a bad ROM dump & I saw bootgod's site mentioned but I don't know how to check the CRC32 values of the file I have against the database.

Thanks
User avatar
dougeff
Posts: 3078
Joined: Fri May 08, 2015 7:17 pm

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by dougeff »

Out of context, it's impossible to know what this is. Not code. Possibly data.

Probably better to step through the running game in a debugger, if you want to learn. Or better yet, look at some open-source properly labelled source code.

I've been reading 6502 asm for 30 years, and I can't just look at a random bit of compiled code and have any idea what I'm seeing. Sometimes I do, like when there's a STA to a sound register, I know it's part of the music code. But usually I don't know.
Last edited by dougeff on Thu May 24, 2018 4:33 pm, edited 1 time in total.
nesdoug.com -- blog/tutorial on programming for the NES
User avatar
nesrocks
Posts: 563
Joined: Thu Aug 13, 2015 4:40 pm
Location: Rio de Janeiro - Brazil
Contact:

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by nesrocks »

Do like dougeff said. With the emulator you can even hex edit the code/data 1 byte at a time in real time and see how the game changes (use savestates and breakpoints for this). Emulators recommended: mesen and fceux.
https://twitter.com/bitinkstudios <- Follow me on twitter! Thanks!
https://www.patreon.com/bitinkstudios <- Support me on Patreon!
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by tokumaru »

I'm off the opinion that disassembling code is not a particularly efficient way to get started with learning assembly. A full commercial game can be pretty daunting, specially without labels and comments. Also, if you're not already acquainted with common game programming patterns, it'll be really hard to make any sense of the raw assembly code.

I can understand the sentiment of wanting to see what makes your favorite games tick, as opposed to dealing with boring introductory programs, but even experienced coders have a hard time making sense of disassembled code, so that's hardly a good path for people who are just getting started.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by koitsu »

That's data, not code. 100% certain. I can tell by the pattern. You get a kind of "gut feeling" for this stuff the familiar you get with it; pretty sure everyone here can attest to that. :-)

You cannot simply take a ROM and dissassemble it and expect "obvious" results. It doesn't work like that. It's up to you, the individual doing the reverse-engineering, to figure out what is code vs. data. Disassemblers tend to not cater well to this kind of thing**. You get to put in long, long, long hours. PRG doesn't always mean "program/code" either, as many games use CHR-RAM thus graphical data is stored in PRG (and sometimes is compressed too, varies per game). You get to RE all of it. :-)

You're better off if you can pinpoint something you want to reverse-engineer -- try to use breakpoints in FCEUX/Mesen/etc. to find out "where" the code might be, disassemble things (hopefully with the correct PRG bank and origin address), then look at that specific code. You will be working in the emulator in real-time *as well* as referring to the dissassembly. Large games that involve mappers (i.e. switch PRG) make this a bit more complicated.

I could show you what I did with the Neo Demiforce FF2e/FF2j intro replacement, where I had to disassemble the last PRG bank (which represented $C000-FFFF) + inject my intro code and graphics + reassemble the entire thing (meaning I had to use a dissasembler that generated code that could be reassembled by a compatible assembler), but I haven't gotten around to putting together something official/clear/concise that explains it. It's a mess -- RE'ing always is. Here's the directory listing and bat file, which won't make much sense to you, but gives you some idea of the mess. The disassembler I used was TRaCER for MS-DOS (I'm the author) and the assembler I used was x816 for MS-DOS. Note neither of these programs work on present-day Windows:

Code: Select all

1998-08-23  13:38           116,397 C000.ASM
1998-08-23  13:48            16,384 C000.BIN
1998-08-23  13:48           553,527 C000.LST
1998-03-19  00:28           245,776 FF2-MOD.NES
1998-08-23  13:48           262,160 ff2-test.nes
1998-08-24  03:59             8,192 ff2-test.sav
1998-06-13  15:53           169,479 FF2-TEST.ZIP
1998-08-23  13:48             8,638 INTRO.ASM
1998-03-16  09:52               111 make.bat
1998-03-26  14:09             1,736 pal.txt
make.bat:

Code: Select all

@echo off
x816 -l c000.asm
del ff2-test.nes
copy /b ff2-mod.nes+c000.bin ff2-test.nes > nul
dir ff2-*.*
Other random brain dump stuff:

Part of the above process involved using fc /b to compare the newly-assembled ROM to that of the old, to ensure only the bytes that differed were the stuff I had changed. Back then hex editors like HxD didn't exist, but there were other tools that did it similarly; I just happen to be used to fc which came with DOS/Windows.

I recommend doing exactly that with a straight disassembly (i.e. no changes) as well: disassemble something, then reassemble it using a compatible assembler, and see if the results are identical. If they aren't, then you get to figure out why -- it's often the assembler picking some wrong addressing mode, an incompatibility between disassembler and assembler, or a downright bug in the assembler.

If things match, great. After that, you get to try and start "piecing together" bits. That often means going and editing the disassembly by hand to deal with places where there are code and data intermixed, turning the data into .db statements, and then manually fixing the code so it's legible. This often happens on a "data/code boundary", where the piece of data happens to be a legitimate instruction (but isn't actually code), thus disassembled code with an operand, but the *actual* code is all of, or part of, the operand bytes themselves. This is hard to describe in words so here's an example. Take these addresses and their raw bytes:

Code: Select all

f9fb: d4
f9fc: 15
f9fd: e2
f9fe: ea
f9ff: 45
fa00: 4c
fa01: 9e
fa02: fa
These might get disassembled like so:

Code: Select all

f9fb: d4        .db $d4      ; invalid opcode
f9fc: 15 e2     ora $e2,x
f9fe: ea        nop
f9ff: 45 4c     eor $4c
fa01: 9e        .db $9e      ; invalid opcode
fa02: fa        .db $fa      ; invalid opcode
But after appropriate analysis/reverse-engineering, it's determined that $FA00 is actually code, while preceding bytes are data. So once cleaned up manually, you get this:

Code: Select all

f9fb: d4        .db $d4
f9fc: 15        .db $15
f9fd: e2        .db $e2
f9fe: ea        .db $ea
f9ff: 45        .db $45
fa00: 4c 9e fa  jmp $fa9e
How you determine this is through a combination of using an emulator to "walk through" the code in real-time to figure out what's actual code vs. data, and staring at the disassembly + following it carefully. In this example, I learned that there was code doing jmp $fa00 so I therefore knew $fa00 itself had to contain code -- then later found some code that read $f970 to $f9ff as data. Hence, I knew what was what.

I'll add that this was in the days *before* there were emulators with CDL capability... but it doesn't change a thing -- you still have to know what it is you're looking at, you can't rely on a machine to "magically figure it out".

If your reaction is "oh my god, doesn't that take a lot of time to do?!?!" the answer is a big fat yes. Welcome to reverse-engineering.

As for games which use mappers / multiple banks of PRG, quite often you have to split the ROM up into its individual PRG banks and dissassemble each as needed. An example of some of the never-completed work I did on Rampart (Japanese release) required that (I've since deleted all my work since someone last year or the year before made an English release so my work was pointless), ditto with some work I was doing on RoboWarrior / Bomber King:

Code: Select all

2016-09-19  00:12           557,626 prg00.asm
2016-09-19  00:07            16,384 prg00.bin
2016-09-19  00:58           553,584 prg01.asm
2016-09-19  00:07            16,384 prg01.bin
2016-09-19  00:07            16,384 prg02.bin
2016-09-19  00:07            16,384 prg03.bin
2016-09-19  00:07            16,384 prg04.bin
2016-09-19  00:07            16,384 prg05.bin
2016-09-19  01:35           524,844 prg06.asm
2016-09-19  00:07            16,384 prg06.bin
2016-09-19  01:00           470,263 prg07.asm
2016-09-19  00:07            16,384 prg07.bin
2016-06-13  22:45           131,088 Robo Warrior (U) [!].nes
2016-09-19  00:09           131,072 rom.bin
RoboWarrior is mapper 2 / UNROM game, so the last 16KB bank is hard-wired to $C000-FFFF. You'll probably notice how I disassembled certain banks: prg00, prg01, and prg06 all had scattered code I was needing to examine/RE, and of course, prg07 (the last bank, thus a good starting point since it contains all 6502 vectors; for mappers like this it tends to contain "the guts" with critical routines that get used everywhere, etc.). Started with the last bank, then gradually worked through the code (emulator + disassembler combined) until I figured out what other PRG banks I needed to look at, disassembled them, blah blah. The disassembler I used for this was disasm6 v1.5. There's no .bat file because I was doing everything manually by hand.

Starting to get the picture?

** Warning to pedants: yes I'm aware of a PHP-based disassembler that can read/use CDL files. You cannot tote this as a one-line "solution", and shouldn't. Because then you have to explain how to generate a CDL file, and how an emulator generates that file, and how the emulator determines code vs. data, and how the person playing the game while generating the CDL has to do every single thing possible in the game, down to miniscule things like loading a menu from point A vs. point B, pressing up vs. down vs. left vs. right in every place in the game including pause screens, etc. -- when I say LITERALLY DO EVERY SINGLE THING, I do mean LITERALLY **EVERY SINGLE THING**.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by Oziphantom »

There must be some "manual" for 6502 in Japanese that shows how to do the "read data past the JSR with stack manipulations" as for some reason a lot of Japanese games I look at do it. so you get
JSR somePlace
.word Address, Address, Address, Address
code

just to make things more confusing.. this trend seems to carry onto the SNES where there are multiple ways to skin the cat...

But basically as said above, reversing a 6502 game is a "grand-master" task, you really need to know how code works, flows, understand all the archaic tricks the programmers might have used, be able to make judgment calls on what is code and data, find Tables etc However you need to understand the data and then you will understand the code. Also using something interactive like Regenerator makes it a fair bit faster to edit and add variables.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by AWJ »

Oziphantom wrote:There must be some "manual" for 6502 in Japanese that shows how to do the "read data past the JSR with stack manipulations" as for some reason a lot of Japanese games I look at do it. so you get
JSR somePlace
.word Address, Address, Address, Address
code
On the 6502, this technique goes all the way back to the Apple II firmware written by Woz. It's not unusual on the Z80 either.

Dragon Warrior 3 and 4 actually do this with BRK! Effectively they use BRK as a prefix for a set of software-defined instructions. The most frequently used one, as you might expect, is the "far JSR" (call a subroutine in a different ROM bank, storing the return bank on the stack) but some of the software-defined instructions are more high-level things like playing sound effects, printing messages, and getting/setting character status (HP, experience, etc.)

If you use BRK this way on the NES, your NMI handler has to detect the NMI hijack condition, and if necessary chain into your BRK handler rather than RTI into invalid code. The way the Dragon Warrior games do this is by encoding the software-defined instructions such that the first byte after the BRK always has bits 0 and 1 set--which no legal 6502 instruction does. The NMI handler epilogue peeks at the byte at the return address, and if it is not a 6502 instruction then it jumps to the BRK handler instead of executing an RTI.
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by thefox »

If you want to learn to read NES (6502) assembly by using this method, take one of the commented disassemblies, like http://www.romhacking.net/documents/367/ (Metroid). You can find disassemblies of many other commercial games on that site, with varying levels of completeness.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
CaH4e3
Posts: 71
Joined: Thu Oct 13, 2005 10:39 am

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by CaH4e3 »

little bit late, but Dragon Warrior 3 fully disassembled and somehow re'ed. https://github.com/zeromus/DragonWarrior3

the special BRK opcodes are macrocified and can be read much more easier than the original commands
frantik
Posts: 377
Joined: Tue Mar 03, 2009 3:56 pm

Re: Trying ROM Disassembly & getting lots of Invalid Opcodes

Post by frantik »

That’s data that disasm6 doesn’t recognize as data for whatever’s reason so it treats it as code. Disasm6 tries to guess what is data based on if the memory area is read using lda or similar.. this chunk must not be read using those methods nor has the emulator encountered it when you recorded the cdl file. It could just be dummy bytes
Post Reply