Branches aren't commonly used to jump across very distinct memory regions like that (PRG to RAM, or between PRG banks.) Not impossible that someone would want to, but extremely unlikely to do it with a branch. JMP or JSR is probably what you'd see doing that.
However, to get back to your question: branches and jumps may target RAM, or other places. Regardless of that, your disassembly should show what the code does whether or not it "makes sense". Often the point of doing a disassembly is to find a bug that was caused by just such a lapse of sensibility. The disassembler's function should be plain, not subject to complex interpretation of a code's intention.
In most disassembly environments, it's important that the user can either identify bytes as belonging to code or data to be disassembled into opcodes or not, or failing this be able to reposition their current view of the disassembly to align on something they know is an instruction. In something like FCEUX, it just does a naive disassembly from the top of what's in view, and if there's something "weird" there usually it can be fixed by moving the view up or down a byte or two until the code comes into alignment. Because of all the 1 byte instructions, naively disassembled 6502 code tends to self-align after a few lines anyway. If this is an offline tool, just give the user ways to specify what part of the file is code or data.
I think conceptually it's valid to think of a branch operand as analogous to an immediate one, sort of in the way that ADC can take an immediate and add it to A, a branch can take this "immediate" and add it to PC? If you want correct terminology though, this is not the name to use, and # is not the notation to use either.
For a disassembly, the standard thing is to use either a label or just the address as the operand, i.e. show where the branch goes, not the actual value of its operand. "BEQ label" or "BEQ $7F05"
The meaning of the byte after a $00 is entirely dependent on how the interrupt is handled. Different programs require different things here. Some will skip the byte, some will adjust the return address, some won't return at all from the interrupt.CrowleyBluegrass wrote:Should I just interpret all BRK instructions two bytes, with the second as a .db into the disassembly? I assume "being off" means that every BRK instruction is followed by what my assembler thinks is another "opcode" (UNDEFINED above).
There is no solution here except: let the user specify what to do in some way (like I suggested above manually marking bytes as an opcode, manually aligning the current auto-disassembly view, etc.). This is not necessarily a program-wide behaviour either. Case by case resolution will be needed sometimes.
The suggestion to disassembly BRK as a data byte is mostly a concession for if you want to disassemble and then re-assemble. The treatment of BRK varies between assemblers, sometimes as 1-byte, sometimes as 2, but a data byte is unambiguous. If your goal isn't to make output that can be re-assembled, then you're allowed to notate BRK however you want, I suppose, but aside allowing some user interaction to specify, all I can suggest is: don't do something that hides the following byte from the user.
Sorry if I'm causing confusion, I'm obviously missing something I made an error regarding how the disassembly would be displayed. I put #$ to mean the offset amount to be applied which, although would be the byte in the rom, is not what the disassembly would display. It would work out the address and print the address, obviously. Apologies.koitsu wrote:In all my years I've never seen someone use branch instructions and while using them "have to worry about if the ROM has enough room left for the branch". This doesn't really make any sense. Can you explain what exactly you're talking about here?
Having this quote in my mind:
I meant that whenever the dissasembler comes upon a branch instruction, the address itself is calculated by taking the address of the next instruction after the branch, and applying the offset (which is the operand to the branch) to that address. What I was trying to get across (and failing, I'm afraid) is the event in which a branch instruction is encountered, but either:koitsu wrote:Code:
8000: lda $1234 ; 8000: ad 34 12
8003: cmp #$10 ; 8003: c9 10
8005: bne $8000 ; 8005: d0 f9
8007: nop ; 8007: ea
As stated, f9 is a signed byte, which is -7. If you count 7 bytes backwards from $8007, you'll get $8000...
- a) there isn't even a "next instruction" to apply the offset to
b) the next instruction address, with the offset applied to it, would end up branching either before the PRG segment, after it, or some other "dubious" place (the definition of "dubious" being what I was unsure about also).
I think rainwarrior answered my overall question above. Just work out the address, and put it in the dissasembly. Whether it's "valid" or not is not the disassembler's concern. Still, I'd appreciate it if you could point out any errors in my reasoning, I fear there's quite a few lurking still However, I suppose this project is having the intended effect: (slowly and) steadily causing me to become more aware of how everything actually works.
As for brk: I've always advocated strongly that it be interpreted as 2 bytes for one good reason: the CPU actually increases PC by 2 when handling brk. Quoting Lichty/Eyes (and now WDC since they own the book):
However, if you read other 6502 books/documents, or even look at old (circa 80s) disassemblers, you'll see that many of them insist it's 1-byte instruction as a whole and that the PC+2 aspect is just a runtime nicety. As such, it is actually somewhat common -- at least on the Apple II series, which is not the platform you're focused on -- to find code where the programmer simply did brk without a signature byte, followed by code that was used normally (not just in the BRK handler!) except the first byte would be effectively skipped. This is a crappy example, but you'll see what I mean:Although BRK is a one-byte instruction, the program counter (which is pushed onto the stack by the
instruction) is incremented by two; this lets you follow the break instruction with a one-byte signature byte
indicating which break caused the interrupt. Even if a signature byte is not needed, either the byte following the
BRK instruction must be padded with some value or the break-handling routine must decrement the return
address on the stack to let an RTI (return from interrupt) instruction executed correctly.
Code: Select all
L_8000: lda $ee cmp #$16 bne Somewhere brk L_8007: lda #$ea sta $20fe ; ... ; Other legit code from this point on, blah blah blah ; ... jmp L_8007
BRK usage is pretty rare, even on the Apple II (don't know about other home computers). Most programmers would use it as a kind of "welp, everything is screwed up" situation and thus really don't intend to recover well from the situation, if they bothered implementing a BRK handler at all. In my experience, most didn't/dont -- buggy programs would literally crash the system and the results would vary depending on infinite criteria.
If you **really** want to do something unique, offer a flag/switch (e.g. --brk1byte) to treat brk as 1-byte and just emit brk when $00 is encountered. There may be cases where people might want that (probably more common if your tool was used on a non-NES platform).
I'll end on this note: it is very common for romhackers/etc. to disassemble a game, end up with a 1-byte-misalignment (due to code vs. data), split that part of the ROM file known to be code into its own BIN file, then re-disassemble that so that they get correct assembly output. What rainwarrior said is spot on: your tool should be "smart" but should not operate under the pretense of "trying to do everything" -- disassemblers CAN'T do everything because they're disassemblers, not emulators. But offer a good set of features and you'll find people will use your software with joy; that's what I found with TRaCER, anyway. I can't tell you how many times I've had to hand-modify disassembled output to fix that alignment. I'm not complaining, but it's *very VERY* common.