Multi-pass assembler questions

You can talk about almost anything that you want to on this board.

Moderator: Moderators

Post Reply
User avatar
tokumaru
Posts: 11692
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Multi-pass assembler questions

Post by tokumaru » Thu Jan 02, 2020 10:17 pm

Sometimes I still toy with the idea of writing my own assembler, but some things about how multi-pass assemblers work still confuse me.

1- When a not-yet-defined symbol is used in an expression, what do you do? The expression needs to evaluate to something, or the line of code it comes from can't be assembled, but you can't just return any arbitrary number because depending on how that value is used you could end up with an invalid operation, like a division by 0, an .org to a lower address, or something like that. How would one deal with that?

2- Also related to not-yet-defined symbols, how do you pick an addressing mode when such a symbol is used as part of the operand? The calculation of future addresses depends on the size of the current instruction being emitted, and while it's expected that earlier passes don't always generate the correct instructions, the assembler has to emmit something that doesn't cause future addresses to be completely off. Should the smallest possible form of each instruction be chosen?

These are the questions that are bothering me at the moment, but the whole concept of how things can change drastically from one pass to the next worries me a bit. I mean, if you use lots of future references in assembler directives like .org, .repeat or .if, it could take a while for everything to stabilize in their final locations, couldn't it?

creaothceann
Posts: 224
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Multi-pass assembler questions

Post by creaothceann » Fri Jan 03, 2020 6:55 am

tokumaru wrote:
Thu Jan 02, 2020 10:17 pm
When a not-yet-defined symbol is used in an expression, what do you do? The expression needs to evaluate to something, or the line of code it comes from can't be assembled
Just repeatedly do a pass that collects all the symbol names and values it can get, until the number of names without value reaches zero or stops changing. Then do the actual assembling.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

tepples
Posts: 21973
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Multi-pass assembler questions

Post by tepples » Fri Jan 03, 2020 7:06 am

I'm most familiar with ca65, which decides the size of each line of code in its first pass. I've tried to build a mental model of ASM6 and its multiple passes, but every time I've tried, I've run into what turned out to be some ASM6 bug that I've had to report to the maintainers of one or more forks.
tokumaru wrote:
Thu Jan 02, 2020 10:17 pm
1- When a not-yet-defined symbol is used in an expression, what do you do?
Like any other assembler that produces relocatable code, ca65 stores a placeholder for "this expression needs to be evaluated later." Then after its one assembly pass, it tries to evaluate all such expressions that contain no .import or .global symbols, and the rest it defers to the linker (ld65).
tokumaru wrote:
Thu Jan 02, 2020 10:17 pm
depending on how that value is used you could end up with an invalid operation, like a division by 0
Those are picked up later once the expression is evaluated. Evaluations after ca65's main pass can resolve expression values but cannot change the size of anything.
tokumaru wrote:
Thu Jan 02, 2020 10:17 pm
an .org to a lower address
That depends on what you mean by .org. Different assemblers define it differently. For a piece of code or data stored in ROM but intendeed to be copied to RAM, do labels with a .org block refer to the address in ROM or the address in RAM?
tokumaru wrote:
Thu Jan 02, 2020 10:17 pm
2- Also related to not-yet-defined symbols, how do you pick an addressing mode when such a symbol is used as part of the operand?
ca65 guesses the address size of a not-yet-defined symbol based on the memory model, which for the 6502 target means "absolute if possible." Within a single translation unit (a source code file and all the other file it .includes), RAM definitions in a zero page segment usually precede ROM definitions in the code. The .importzp keyword tells ca65 that a symbol defined in another translation unit can be represented with one byte.
tokumaru wrote:
Thu Jan 02, 2020 10:17 pm
I mean, if you use lots of future references in assembler directives like .org, .repeat or .if, it could take a while for everything to stabilize in their final locations, couldn't it?
ca65 raises an error if an expression used in the argument of .repeat or .if does not evaluate to a constant.

User avatar
tokumaru
Posts: 11692
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Multi-pass assembler questions

Post by tokumaru » Fri Jan 03, 2020 7:44 am

creaothceann wrote:
Fri Jan 03, 2020 6:55 am
Just repeatedly do a pass that collects all the symbol names and values it can get, until the number of names without value reaches zero or stops changing. Then do the actual assembling.
But you still have to "pretend" to assemble in the preliminary passes so that symbols get assigned their values. It's about this "pretend" phase that I'm asking about.

User avatar
tokumaru
Posts: 11692
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Multi-pass assembler questions

Post by tokumaru » Fri Jan 03, 2020 8:39 am

tepples wrote:
Fri Jan 03, 2020 7:06 am
I'm most familiar with ca65, which decides the size of each line of code in its first pass.
I'm familiar that model too, and I don't see how the solutions used in a single-pass assembler could help in this case.

From the little information I could find about this online, it looks like one approach is to generate the most optimistic version of the program, picking the shortest version of each instruction, and expanding them as necessary in subsequent passes as symbols get defined. It is possible for this approach to cause "oscillations", where certain conditions conflict with one another and keep going back and forth as the passes happen and don't ever stabilize. It is possible to create such conditions in ASM6, and the assembler just gives up after a number of passes. It's up to the user to avoid such conditions (not that they're very common).

I couldn't find anything about expressions using undefined or unstable symbols. I'm thinking about doing the following:

-Using placeholder values for expressions involving undefined symbols, and indicating this with flags or something;

-Not doing any conditional assembly that depends on undefined symbols;

-Attempting to do conditional assembly that depends on unstable symbols, but ignoring possible errors.

User avatar
Dwedit
Posts: 4306
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Multi-pass assembler questions

Post by Dwedit » Fri Jan 03, 2020 10:32 am

You can keep a symbol dependency tree if you need to. Declare that A depends on B and C, and B depends on D, etc. Much easier to do this in a language with easy hashtables.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

Garth
Posts: 174
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Multi-pass assembler questions

Post by Garth » Fri Jan 03, 2020 10:58 pm

I describe a couple of conditional-assembly situations where no number of passes will ever end the phase errors, or where dozens of passes may be required (which may be ok), at http://forum.6502.org/viewtopic.php?p=45897#p45897 . Those are unusual though. A solution for the first one might be to have an assembler variable telling what pass number it's on, and if it's more than #3 (or whatever number the programmer determines is adequate), his/her conditional assembly takes a different action. The pass number idea could be implemented by the programmer without it even being built into the assembler. A rich set of conditional assembly and macro capabilities goes a long way. I have a list of requests for anyone who writes an assembler, at http://wilsonminesco.com/AssyDefense/as ... uests.html .
http://WilsonMinesCo.com/ lots of 6502 resources

turboxray
Posts: 75
Joined: Thu Oct 31, 2019 12:56 am

Re: Multi-pass assembler questions

Post by turboxray » Sat Jan 04, 2020 2:42 pm

Having already done a simple C compiler with Lex and Yacc, I'd probably just tokenize everything and then build syntax tree. You can then iterate over the tree until all macros and symbols have been resolved.

Post Reply