Page 1 of 1

Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:02 pm
by PolarBITS
I've been mapping out hex opcodes to instructions for the last couple of days. When I check my rom, though, the last opcode seems to be cut off. It's the last byte of the rom, but accepts a 2 byte address as an argument. Did I make an error somewhere in mapping out the modes for each opcode?

Here's what I have: https://pastebin.com/4Ga9AuJu

Also as far as I'm aware, these are the byte lengths of the instructions (opcodes and arguments included):

Implied: 1,
Accumulator: 1,
Immediate: 2,
Zero Page: 2,
Zero Page, X: 2,
Zero Page, Y: 2,
Absolute: 3,
Absolute, X: 3,
Absolute, Y: 3,
(Indirect): 3,
(Indirect, X): 2,
(Indirect), Y: 2

Where did I make my mistake?

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:19 pm
by lidnariq
Your mistake is in thinking that every byte in the ROM is machine code and not other data or metadata.

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:27 pm
by PolarBITS
Where do I set the cutoff for where the last opcode is then? I know where the first one is based on the reset vector at offset $FFFC, but is there another one for the end of the machine code?

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:30 pm
by Myask
You're missing "relative" mode…no, you just call branch instructions' mode "immediate" for some reason.
The following lines of your opcode table appear messed up:
'0d':['ORA', 'abs', '4+'], //should be 4
'1d':['ORA', 'absX'], //should be 4+, has nothing
'3d':['AND', 'absX'], //should be 4+, has nothing
'b6':['LDX', 'zpX', '4'], //X-load with z,x? should be zpY
'be':['LDX', 'absX', '4+'], //x-load with a,x? should be absY
'f6':['INC', 'absX', '6'], //should be zpX
PPE: any byte can be executed. Not all will be.

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:32 pm
by lidnariq
Nope.

Unlike modern operating systems and program containers, which bother to distinguish between "stuff that the CPU will execute" and "read only data that the CPU will refer to", the NES shoves it all into a single chunk of data. The only ways to way to handle this is to either make a tracing disassembler, or give up.

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:38 pm
by PolarBITS
Myask wrote:You're missing "relative" mode…no, you just call branch instructions' mode "immediate" for some reason.
The following lines of your opcode table appear messed up:
'0d':['ORA', 'abs', '4+'], //should be 4
'1d':['ORA', 'absX'], //should be 4+, has nothing
'3d':['AND', 'absX'], //should be 4+, has nothing
'b6':['LDX', 'zpX', '4'], //X-load with z,x? should be zpY
'be':['LDX', 'absX', '4+'], //x-load with a,x? should be absY
'f6':['INC', 'absX', '6'], //should be zpX
PPE: any byte can be executed. Not all will be.

Thanks, it didn't fix my problem but it's good I got corrected.

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:38 pm
by PolarBITS
lidnariq wrote:Nope.

Unlike modern operating systems and program containers, which bother to distinguish between "stuff that the CPU will execute" and "read only data that the CPU will refer to", the NES shoves it all into a single chunk of data. The only ways to way to handle this is to either make a tracing disassembler, or give up.
How would I go about doing this? Is there a way to distinguish between data and machine code?

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:43 pm
by rainwarrior
PolarBITS wrote:How would I go about doing this? Is there a way to distinguish between data and machine code?
Run the program. Anything that gets executed is code. That's basically it. (Self modifying code and/or code copied to RAM is an exception, though.)

FCEUX has a "code data log" feature that does this as you play. The main problem is trying to get all code branches executed-- you have to play the entire game, generally, and try to make everything that can possibly happen happen.

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:46 pm
by PolarBITS
rainwarrior wrote:
PolarBITS wrote:How would I go about doing this? Is there a way to distinguish between data and machine code?
Run the program. Anything that gets executed is code. That's basically it. (Self modifying code and/or code copied to RAM is an exception, though.)

FCEUX has a "code data log" feature that does this as you play. The main problem is trying to get all code branches executed-- you have to play the entire game, generally, and try to make everything that can possibly happen happen.
What do you mean by running the program?

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 3:51 pm
by Myask
NES is a Von Neumann/Princeton architecture (one memory bank), not a Harvard one (data and instruction memories separate).

So the main way to tell is to, as lidnariq said, to write a tracing disassembler and run the code so, when an instruction executes, it can say "this byte was an instruction". Granted, you don't actually have to run it to do so, just step through by instruction-widths. (I mean, you're inferring from the entry-point which bytes are instructions and which arguments.)

The main other way to guess "this is data" (not infallible, as some few games will use them, but most don't) is to presume all unofficial opcodes are data.

(ed: but yes, the last six bytes are probably never going to be opcodes)

Re: Incorrect opcode byte lengths for some reason

Posted: Fri May 05, 2017 4:07 pm
by tokumaru
PolarBITS wrote:Where do I set the cutoff for where the last opcode is then?
There's no cutoff, code and data are intermixed throughout the ROM. There isn't even anything preventing the same bytes from being used as both code AND data.

The only way to know for sure is to execute the code, emulating the 6502. But even then there are things that are hard to catch.

Re: Incorrect opcode byte lengths for some reason

Posted: Mon May 08, 2017 2:27 am
by Oziphantom
And then you get the fun case like

Code: Select all

BNE +
LDA #1
.byte $2C
+
LDA #2
STA ADDR
So code is both data and code at the same time...