Writing my own assembler
Moderator: Moderators
Writing my own assembler
As I recently mentioned in another thread, after years trying to adapt to many of the existing 6502 assemblers and feeling constantly frustrated due to quirks and lack of specific features, as well as to the time I have spent trying to customize them to suit my needs, I've decided to write my own assembler. It's not supposed to be the ultimate assembler to dethrone them all (far from it!), but it'll pack everything I need out of the box so I don't have to overcomplicate things with intricate macros and jerry-rigs. The goal is to write something simple (so it doesn't take forever to get done), easy to use (no need for complex configurations) and generic enough to produce binaries for any 6502 machine (no need for NES header directives, for example, that can be done with macros). If I can make it flexible enough so it's easy to add support for other CPUs, even better!
I'm modeling this mostly after ASM6, which is the assembler that most closely resembles the ideal tool for me, but I'm picking up ideas from other assemblers as well. You might be asking: "If this is so close to ASM6, then why not just create a fork of it?" Well, besides not knowing HOW exactly to do that (I'm not particularly skilled in C and I can hardly understand the design of ASM6 from just looking at the source code), I'd need to modify a few core features of the program, and that would probably be too hard or even impossible for me to do, so I might as well write the whole thing from scratch. Plus I'm a control freak and don't want to be bound by other people's design choices, if I decide to include even more stuff later on. For now I'll be using a higher-level language than C, most likely JavaScript (one of the languages I'm most proficient in) running on Node.js, at least for prototyping, in order to get something working ASAP. If everything goes well, I may or may not port it to something more efficient at a later time.
The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. If the end result is something people will be interested in using, I'll be more than happy to share the program, but like I said, my ultimate goal is to be able to write 6502 programs in a way that *I* am comfortable with.
First, I'd like to talk about the things I plan to carry over from ASM6, which are the following:
- LINEAR ASSEMBLY: I never cared much for outputting to different segments and linking a bunch of separate modules to put a ROM together, to me it makes more sense to simply fill the ROM linearly. Even when using ca65 I never felt the need to use segments or link individually assembled modules, I just used what as necessary to write my code as linearly as possible.
- MULTIPLE PASSES: I value this a lot because it allows for complex symbol resolving, which in turn allows for more dynamic memory arrangements, such as overlays, right-aligned sections and relocatable code, without specific features to handle those cases. All in all, the ability to freely use symbols/labels in expressions and directives greatly improves the versatility of the assembler.
- SIMPLE MACRO SYSTEM: Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler. Labels inside macros are all local. Recursion is not allowed.
Now here are a few things I'm also basing off of ASM6 but I plan on changing, taking other assemblers and my own use cases into account:
- OVERLAYS: ENUM is still the primary way to declare variables, but in order to facilitate the creation of overlays, ENUMs can optionally be named. This allows for an ENUM to pick up from where another ENUM left off. If two or more ENUMs pick up from the end of the same ENUM, they're effectively overlays. ASM6's ENUM can already kinda be used like that, you just have to define a label at the very end of the ENUM, and use that label as the starting address of another ENUM to keep going from where the first one stopped. The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).
- ANONYMOUS LABELS: Anonymous labels are unidirectional in ASM6, and I have had to awkwardly put a + label and a - label on consecutive lines because I needed to access that point both from before and after it. For this reason, I feel like ca65's way is superior - the label is just a colon, and the direction is specified in the reference instead. Matching the number of colons to the number of + and - symbols is a bit error-prone though, specially if you need to add or remove an anonymous label between others that were already there, requiring you to double check and adjust all the nearby references, but I can live with that, specially considering that you're not supposed to abuse anonymous labels in the first place.
- LOCAL LABELS: Local labels start with "@", but having their scope delimited by non-local labels is too restrictive IMO. I constantly write subroutines with multiple entry points, which are obviously defined via global labels, and having those global labels break the scope of the local labels that should be visible in the entire subroutine is a major annoyance. To fix this, scopes now must be explicitly started with the SCOPE directive. Unlike in ca65, scopes are not blocks, you can only end a scope by starting a new one, meaning it's not possible to have nested scopes. Scopes can be named, which allows their local labels to be accessed from the outside (e.g. SomeScope.@LocalVariable).
- REPEATED LABELS: Several NES mappers require reset stubs and common subroutines to be repeated across multiple banks, and this is a problem when assembling a program because labels can't be repeated. One way to work around that in ASM6 is to define labels by assigning the PC to a symbol (e.g. MyLabel = $), since symbols can be reassigned without errors. However, I have created a problem when I introduced the SCOPE directive, because now I have to get around repeated scope names too. The only solution I could think of was to ignore repeated scope names, and create a nameless scope whenever a repeated name is supplied. This will cause any duplicates to be essentially invisible to the rest of the program.
And finally, a few things that ASM6 doesn't have at all:
- ZP ADDRESSING OVERRIDING: ASM6 uses ZP addressing whenever possible, but when you're writing timed code, you may want to access ZP locations using absolute addressing. Maybe an address size modifier like in ca65 (a:address) is the answer.
- MEMORY BANKS: Keeping track of what's where when dealing with bank switching is a big annoyance, and doing that manually is just too error-prone. My solution is to simply create a BANK directive that you can use to set the current bank number, causing any subsequent labels to be assigned that bank number. This information can then be extracted from the labels whenever necessary. The same numbers can be used over and over, since you may need to index PRG-ROM banks, CHR-ROM banks, RAM banks, and so on.
- FUNCTIONS: As far as I can tell, ASM6 doesn't have any built-in functions, and doesn't offer any means for users to create their own. User defined functions are probably outside of the scope of my simple assembler, but a few built-in functions could be really useful. There will certainly be a function to extract bank numbers from labels, and maybe a function to test whether macro parameters are blank or not.
- TEXT OUTPUT: ASM6 can only output error messages, but other kinds of messages are also very important. The OUT directive can output text without aborting the assembly process. Since this is a multi-pass assembler, text output must be buffered during each pass, and only text generated during the last pass must be displayed.
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).
These are all my ideas so far. If anyone was brave enough to read through all of this, please share your opinions on what I have so far. Am I missing something important? Am I doing something in a dumb way? Please comment.
I'm modeling this mostly after ASM6, which is the assembler that most closely resembles the ideal tool for me, but I'm picking up ideas from other assemblers as well. You might be asking: "If this is so close to ASM6, then why not just create a fork of it?" Well, besides not knowing HOW exactly to do that (I'm not particularly skilled in C and I can hardly understand the design of ASM6 from just looking at the source code), I'd need to modify a few core features of the program, and that would probably be too hard or even impossible for me to do, so I might as well write the whole thing from scratch. Plus I'm a control freak and don't want to be bound by other people's design choices, if I decide to include even more stuff later on. For now I'll be using a higher-level language than C, most likely JavaScript (one of the languages I'm most proficient in) running on Node.js, at least for prototyping, in order to get something working ASAP. If everything goes well, I may or may not port it to something more efficient at a later time.
The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. If the end result is something people will be interested in using, I'll be more than happy to share the program, but like I said, my ultimate goal is to be able to write 6502 programs in a way that *I* am comfortable with.
First, I'd like to talk about the things I plan to carry over from ASM6, which are the following:
- LINEAR ASSEMBLY: I never cared much for outputting to different segments and linking a bunch of separate modules to put a ROM together, to me it makes more sense to simply fill the ROM linearly. Even when using ca65 I never felt the need to use segments or link individually assembled modules, I just used what as necessary to write my code as linearly as possible.
- MULTIPLE PASSES: I value this a lot because it allows for complex symbol resolving, which in turn allows for more dynamic memory arrangements, such as overlays, right-aligned sections and relocatable code, without specific features to handle those cases. All in all, the ability to freely use symbols/labels in expressions and directives greatly improves the versatility of the assembler.
- SIMPLE MACRO SYSTEM: Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler. Labels inside macros are all local. Recursion is not allowed.
Now here are a few things I'm also basing off of ASM6 but I plan on changing, taking other assemblers and my own use cases into account:
- OVERLAYS: ENUM is still the primary way to declare variables, but in order to facilitate the creation of overlays, ENUMs can optionally be named. This allows for an ENUM to pick up from where another ENUM left off. If two or more ENUMs pick up from the end of the same ENUM, they're effectively overlays. ASM6's ENUM can already kinda be used like that, you just have to define a label at the very end of the ENUM, and use that label as the starting address of another ENUM to keep going from where the first one stopped. The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).
- ANONYMOUS LABELS: Anonymous labels are unidirectional in ASM6, and I have had to awkwardly put a + label and a - label on consecutive lines because I needed to access that point both from before and after it. For this reason, I feel like ca65's way is superior - the label is just a colon, and the direction is specified in the reference instead. Matching the number of colons to the number of + and - symbols is a bit error-prone though, specially if you need to add or remove an anonymous label between others that were already there, requiring you to double check and adjust all the nearby references, but I can live with that, specially considering that you're not supposed to abuse anonymous labels in the first place.
- LOCAL LABELS: Local labels start with "@", but having their scope delimited by non-local labels is too restrictive IMO. I constantly write subroutines with multiple entry points, which are obviously defined via global labels, and having those global labels break the scope of the local labels that should be visible in the entire subroutine is a major annoyance. To fix this, scopes now must be explicitly started with the SCOPE directive. Unlike in ca65, scopes are not blocks, you can only end a scope by starting a new one, meaning it's not possible to have nested scopes. Scopes can be named, which allows their local labels to be accessed from the outside (e.g. SomeScope.@LocalVariable).
- REPEATED LABELS: Several NES mappers require reset stubs and common subroutines to be repeated across multiple banks, and this is a problem when assembling a program because labels can't be repeated. One way to work around that in ASM6 is to define labels by assigning the PC to a symbol (e.g. MyLabel = $), since symbols can be reassigned without errors. However, I have created a problem when I introduced the SCOPE directive, because now I have to get around repeated scope names too. The only solution I could think of was to ignore repeated scope names, and create a nameless scope whenever a repeated name is supplied. This will cause any duplicates to be essentially invisible to the rest of the program.
And finally, a few things that ASM6 doesn't have at all:
- ZP ADDRESSING OVERRIDING: ASM6 uses ZP addressing whenever possible, but when you're writing timed code, you may want to access ZP locations using absolute addressing. Maybe an address size modifier like in ca65 (a:address) is the answer.
- MEMORY BANKS: Keeping track of what's where when dealing with bank switching is a big annoyance, and doing that manually is just too error-prone. My solution is to simply create a BANK directive that you can use to set the current bank number, causing any subsequent labels to be assigned that bank number. This information can then be extracted from the labels whenever necessary. The same numbers can be used over and over, since you may need to index PRG-ROM banks, CHR-ROM banks, RAM banks, and so on.
- FUNCTIONS: As far as I can tell, ASM6 doesn't have any built-in functions, and doesn't offer any means for users to create their own. User defined functions are probably outside of the scope of my simple assembler, but a few built-in functions could be really useful. There will certainly be a function to extract bank numbers from labels, and maybe a function to test whether macro parameters are blank or not.
- TEXT OUTPUT: ASM6 can only output error messages, but other kinds of messages are also very important. The OUT directive can output text without aborting the assembly process. Since this is a multi-pass assembler, text output must be buffered during each pass, and only text generated during the last pass must be displayed.
- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).
These are all my ideas so far. If anyone was brave enough to read through all of this, please share your opinions on what I have so far. Am I missing something important? Am I doing something in a dumb way? Please comment.
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: Writing my own assembler
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.
Having bidirectional anonymous labels seems overkill, and more work that it is worth. They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for. It will be much more readable, and far less error prone than trying to balance the forwards and backwards. Its just pointless cryptic.
For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}
It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.
Having bidirectional anonymous labels seems overkill, and more work that it is worth. They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for. It will be much more readable, and far less error prone than trying to balance the forwards and backwards. Its just pointless cryptic.
For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}
It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.
Re: Writing my own assembler
I'm only using Node to run the .js file locally as a command line application (e.g. node.exe assembler.js source.asm game.nes), it's not much different from using Python, PHP, or any other scripting language. Also, Node has a file system library that really helps with reading/writing files, one of JavaScript's weaknesses.Oziphantom wrote:I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.
I think that the anonymous labels in ca65 are in fact easier to implement than those in ASM6, not that there's a huge difference though. Since the complexity is equivalent, I'd rather go with the method I find more useful.Having bidirectional anonymous labels seems overkill, and more work that it is worth.
It may not be very common, but even in tiny tight loops there are cases when I need to jump to the middle from both the top and the bottom, and I don't feel like naming a label in such a little piece of logic.They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for.
Exactly. Changing addresses sizes is the most complicated part IMO.For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.
Really? I've never seen that...ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs
Hum... that's interesting.CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}
I like this kind of automation. Have you seen this implemented anywhere?It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.
Thanks for the tips.
Re: Writing my own assembler
@Oziphantom
Nodejs is often the backend for electron based app so it doesn't mean that you use a server per se.
@Tokumaru
I took the time to read but didn't answer right away (I'm half asleep at the wheel today ^^;;) but for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.
As for Bank, I guess it's just a concept with no actual size? With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.
As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.
I may have more comments later, when I'm less sleepy
Nodejs is often the backend for electron based app so it doesn't mean that you use a server per se.
@Tokumaru
I took the time to read but didn't answer right away (I'm half asleep at the wheel today ^^;;) but for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.
As for Bank, I guess it's just a concept with no actual size? With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.
As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.
I may have more comments later, when I'm less sleepy
Re: Writing my own assembler
When it comes to local labels, I'm personally partial to the style used in NASM and FASM for x86. Local labels look like .foo, but can also be referred to as bar.foo globally, where bar is the first global label before .foo. So for instance, a subroutine with two entry points could look something like this:
Of course, whether it's too annoying to have to write entry2.loop or not is up to you.
Code: Select all
entry1:
...
...
bne entry2.loop
...
entry2:
...
.loop:
...
...
beq .loop
rts
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: Writing my own assembler
cscript.exe will run JS as well without most people having to install something is all.tokumaru wrote:I'm only using Node to run the .js file locally as a command line application (e.g. node.exe assembler.js source.asm game.nes), it's not much different from using Python, PHP, or any other scripting language. Also, Node has a file system library that really helps with reading/writing files, one of JavaScript's weaknesses.Oziphantom wrote:I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.
yeah but _l or in your case @l is just as much to type basicallytokumaru wrote:I think that the anonymous labels in ca65 are in fact easier to implement than those in ASM6, not that there's a huge difference though. Since the complexity is equivalent, I'd rather go with the method I find more useful.Having bidirectional anonymous labels seems overkill, and more work that it is worth.
It may not be very common, but even in tiny tight loops there are cases when I need to jump to the middle from both the top and the bottom, and I don't feel like naming a label in such a little piece of logic.They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for.
NES assemblers seem to have been written in vacuums by people who haven't used a traditional assembler. .w or @w are the other common methods, .w being a "They programmed for the 68K" identifiertokumaru wrote:Exactly. Changing addresses sizes is the most complicated part IMO.For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.
Really? I've never seen that...ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs
I've implemented it as a post process static anslyser. The hardest part was me reversing all of the TASS64 output to get back to the original code, building a call tree and once armed with this info, it was trivial to check. Then I started to get fancy with my coding and it just broke my parser. I've just spent a good week or so getting my new debugging format working so I have source code that shows the whole function as I step and it shows the me the 'local' variables. I'm going to add this "analyzer" code to it again, it was kind of handy.Hum... that's interesting.CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}
I like this kind of automation. Have you seen this implemented anywhere?It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.
Thanks for the tips.
Re: Writing my own assembler
I use anonymous labels very responsibly... I don't want to litter my code with labels for obvious things like zero checks, loop points and the like, so that's what I use anonymous labels for. I have little blocks like this:Banshaku wrote:for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.
Code: Select all
;clear page RAM $03
ldx #$00
txa
: sta $0300
inx
bne :-
Yeah, it's just a number that gets attached to labels so I can easily know what bank to map in to access something. For example, I can do this for CHR banks:As for Bank, I guess it's just a concept with no actual size?
Code: Select all
.bank $20
.org $0000
PlayerRunning:
.incbin "player-running.chr"
.bank $21
.org $0000
PlayerJumping:
.incbin "player-jumping.chr"
I did consider making the BANK directive more complex, where you could define the size of the bank in addition to its number, but in the end I figured that would kill some of the flexibility that I like so much. You don't need to set every size and every address in advance, if you just do your .ORGs and .BASEs right, everything will work just fine. With a multi-pass assembler you can use symbol math for almost anything, even calculating the amount of free space in each bank.With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.
I don'k know, I kinda consider the "@" as part of the name. It's true that in ca65 you access local labels like scope::localLabel, but in ca65, ALL labels are local inside a scope, not only those beginning with "@". I don't know if you can even access a cheap local label from the outside in ca65, but if you can, I bet the "@" is needed. But anyway, scopes in my assembler will be much simpler than those in ca65.As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.
Great!I may have more comments later, when I'm less sleepy
Re: Writing my own assembler
Before I say anything else, I'm going to quickly plug flat assembler. I'm pretty sure there's more than one assembler by that name, so please check the link! It's a self-hosted x86 assembler, but the second version, fasmg, is a generic macro assembler - you implement your own instruction sets with macros. There's already at least one 6502 implementation on the forums.
I'm mentioning it because it's extremely well-designed. The code is a wall of commentless x86 assembly code, but the author has posted a description of its internal workings, which is well-worth the read if you're writing your own assembler. I'd link it directly but I'm kind of pressed for time right now.
A few highlights relevant to tokumaru's post:
MULTIPLE PASSES
Fasm performs multiple passes of everything but macro expansion. Conditionals, for loops, while loops, error messages and user-specified debug output all "just work" as though the values were right to begin with. This is useful for much more than symbol resolving.
MACROS
Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively. So you can do this:
There is one more directive, "local", which makes all instances of the listed names local to each expansion of the macro. Fasm lets you redefine constants (at the cost of disabling forward references to that constant), and macros are often used to accumulate values. The results can then be assigned to a single-use label, allowing forward references. This can do the same thing as an "enum", without requring any extra code in the assembler.
OVERLAYS
Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.
LOCAL LABELS
Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.
REPEATED LABELS
Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.
ZP ADDRESSING OVERRIDING
Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.
FUNCTIONS
Macros and redefinable labels/variables can do anything functions can do, except recursion.
TEXT OUTPUT
Fasm has a "display" directive to output arbitrary text. It runs at assembly time and thus prints (or doesn't print) according to the result of the final pass.
I have more to say (and I've been ninja'd) but I'm about to lose power again. Sorry in advance if I messed up.
I'm mentioning it because it's extremely well-designed. The code is a wall of commentless x86 assembly code, but the author has posted a description of its internal workings, which is well-worth the read if you're writing your own assembler. I'd link it directly but I'm kind of pressed for time right now.
A few highlights relevant to tokumaru's post:
MULTIPLE PASSES
Fasm performs multiple passes of everything but macro expansion. Conditionals, for loops, while loops, error messages and user-specified debug output all "just work" as though the values were right to begin with. This is useful for much more than symbol resolving.
MACROS
I strongly disagree with this. Macros can automate a lot of tedium, not just repetitive code. Implementing a good macro system can save you having to implement a whole slew of other features at all.tokumaru wrote:Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler.
Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively. So you can do this:
Code: Select all
macro setall value, [dest]
{
common
lda value
forward
sta dest
}
setall #0, a, b, c, d
OVERLAYS
Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.
LOCAL LABELS
Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.
REPEATED LABELS
Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.
ZP ADDRESSING OVERRIDING
Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.
FUNCTIONS
Macros and redefinable labels/variables can do anything functions can do, except recursion.
TEXT OUTPUT
Fasm has a "display" directive to output arbitrary text. It runs at assembly time and thus prints (or doesn't print) according to the result of the final pass.
I have more to say (and I've been ninja'd) but I'm about to lose power again. Sorry in advance if I messed up.
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: Writing my own assembler
we are not talking about that case, yes in that case you use - and no issues there. We are talking about thetokumaru wrote:I use anonymous labels very responsibly... I don't want to litter my code with labels for obvious things like zero checks, loop points and the like, so that's what I use anonymous labels for. I have little blocks like this:Banshaku wrote:for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.
This is such a small piece of logic that the intent is beyond obvious, so there's no point in littering the place with dumb labels like "@Loop", "@Skip" and the like. I'll hardly jump more than 5 lines to an anonymous label, it's all very compact and with a clear comment at the top explaining what the whole block of code below is for.Code: Select all
;clear page RAM $03 ldx #$00 txa : sta $0300 inx bne :-
Code: Select all
ldx #$00
- txa
and #40
bne +
sta $0300,x
+
-
inx
sta $0300,x
dex
bpl -
bmi --
Code: Select all
ldx #$00
- txa
and #40
bne @l
sta $0300,x
@l
inx
sta $0300,x
dex
bpl @l
bmi -
Re: Writing my own assembler
That's actually where I got he idea from, but since I absolutely need to be able to define global labels from a local scope, I went with the explicit creation of scopes via a dedicated directive, rather than implicitly creating scopes with each global label.Nicole wrote:When it comes to local labels, I'm personally partial to the style used in NASM and FASM for x86. Local labels look like .foo, but can also be referred to as bar.foo globally, where bar is the first global label before .foo.
I don't see anything wrong with the "." notation per se, but since I'm used to having leading dots in assembler directives, that might make the code look confusing and harder to parse... I'm really used to dots meaning directives and @s meaning local labels.
I guess it's doable, but it's weird. It's not a matter of being annoying to type, it's just that this is supposed to be a single block. Semantically, it doesn't make much sense.So for instance, a subroutine with two entry points could look something like this:Code: Select all
entry1: ... ... bne entry2.loop ... entry2: ... .loop: ... ... beq .loop rts
Re: Writing my own assembler
It's also about 128 times slower, it seems. I've used cscript.exe in the past, but was less than impressed with its file system library (I remember having to use hacks in order to work with binary files!) and its performance. I also think it's badly outdated. Plus it's Windows only. Node.js on the other hand has the newest JavaScript features, tons of libraries, and I can even run it on my phone and develop anywhere. On Windows it's just a 15Mb download, and you don't even have to install anything, just decompress the .zip and use it. In today's world, where everything has to be installed and configured after GBs of downloads, and every piece of software thinks it owns your entire PC, I consider that a win!Oziphantom wrote:cscript.exe will run JS as well without most people having to install something is all.
Yeah, but that break is ugly, it makes a single task look like 2 tasks, it affects readability for me. I'd much rather do this:To which my point was do thisCode: Select all
ldx #$00 - txa and #40 bne @l sta $0300,x @l inx sta $0300,x dex bpl @l bmi -
Code: Select all
;do something
ldx #$00
: txa
and #40
bne :+
sta $0300,x
: inx
sta $0300,x
dex
bpl :-
bmi :--
Re: Writing my own assembler
Thanks for the link, I'll check it out. Seeing what other assemblers do really helps.Rahsennor wrote:Before I say anything else, I'm going to quickly plug flat assembler.
I agree with you in general, what I said was just describing how macros will be in my assembler, very bare-bones. While I agree that macros can be incredibly useful for a number of purposes, I don't have the time to make a complex assembler. I experimented a lot with ca65 macros, and have done a lot with them, but nearly all of that work was to implement the features I'm describing here, and since they'll be built-in this time around, I don't need a complex macro system right now.I strongly disagree with this. Macros can automate a lot of tedium, not just repetitive code. Implementing a good macro system can save you having to implement a whole slew of other features at all.tokumaru wrote:Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler.
Sounds interesting, but not very intuitive to read!Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively.
Sounds just like ENUM! I was not a big fan of the name "enum", but I guess it makes sense, since it's effectively just incrementing a counter after each symbol.OVERLAYS
Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.
I kinda like this system, but for multiple entry points, I prefer having explicit control over when scopes start and end, rather than let the global labels define that.LOCAL LABELS
Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.
This is really cool. Being able to replicate the binary data is a pretty interesting idea.REPEATED LABELS
Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.
That's a good approach, similar to how ca65 does it.ZP ADDRESSING OVERRIDING
Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.
But certain things are way more convenient when done inline, like lda #BANK(SomeLabel).FUNCTIONS
Macros and redefinable labels/variables can do anything functions can do, except recursion.
That's fine. Thanks for bringing in new ideas.Sorry in advance if I messed up.
Re: Writing my own assembler
Got a few more minutes.
That's exactly my motivation for wanting good (!= complex) macros in my assembler. Along with the multipass stage, they can cover many of the features I'd otherwise have to hardcode.tokumaru wrote:I don't have the time to make a complex assembler.
You get used to it, and it looks better with proper formatting. But yes, even Tomasz has expressed regret at the syntax and changed it in fasmg. I don't remember how the new version works though, and I didn't have time to write a proper example or copypaste tabs.tokumaru wrote:Sounds interesting, but not very intuitive to read!
Oh, I see what you mean now. I'd just build banks into the assembler and not worry about general-purpose functions. Fasm has no functions, only operators, and a few platform-specific features, like allowing label addresses to be relative to a register (very useful for the stack). Banks would fall into that category, I would think.tokumaru wrote:But certain things are way more convenient when done inline, like lda #BANK(SomeLabel).
Re: Writing my own assembler
That's possible in rgbasm from RGBDS, an assembler targeting the Game Boy CPU.tokumaru wrote:Hum... that's interesting.CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}
Last I checked, cscript.exe was exclusive to Microsoft Windows. I don't run Windows on my primary dev machine; nor does calima. Are there tips for writing a script to make it work on both cscript.exe (for users of Windows) and Node.js (for users of GNU/Linux and macOS)?Oziphantom wrote:cscript.exe will run JS as well without most people having to install something is all.tokumaru wrote:I'm only using Node to run the .js file locally as a command line application
ca65 has lda $00 for zero page, lda a:$00 for absolute, and lda f:$00 for 65816-exclusive absolute long. In my opinion, 68000's .w is completely different because it specifies data size, whereas $00 vs. a:$00 vs. f:$00 is about address size. And I'd recommend against ~$00 notation because ~ is already in use to mean one's complement.Oziphantom wrote:NES assemblers seem to have been written in vacuums by people who haven't used a traditional assembler. .w or @w are the other common methods, .w being a "They programmed for the 68K" identifiertokumaru wrote:Really? I've never seen that...ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs
Re: Writing my own assembler
You're never going to finish a NES game if you spend all your time making tools
With that said, I'd like it if all labels used the anonymous label +/- syntax. For example, if you have multiple labels with the same name, you can use +/- to distinguish them:
In other words, have labels and anonymous labels behave the same. Don't special case either.
With that said, I'd like it if all labels used the anonymous label +/- syntax. For example, if you have multiple labels with the same name, you can use +/- to distinguish them:
Code: Select all
jmp foo:+
foo:
jmp foo:++
foo:
foo:
jmp foo:---