It is currently Wed Dec 12, 2018 1:02 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 77 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject: Writing my own assembler
PostPosted: Tue Oct 09, 2018 9:53 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
As I recently mentioned in another thread, after years trying to adapt to many of the existing 6502 assemblers and feeling constantly frustrated due to quirks and lack of specific features, as well as to the time I have spent trying to customize them to suit my needs, I've decided to write my own assembler. It's not supposed to be the ultimate assembler to dethrone them all (far from it!), but it'll pack everything I need out of the box so I don't have to overcomplicate things with intricate macros and jerry-rigs. The goal is to write something simple (so it doesn't take forever to get done), easy to use (no need for complex configurations) and generic enough to produce binaries for any 6502 machine (no need for NES header directives, for example, that can be done with macros). If I can make it flexible enough so it's easy to add support for other CPUs, even better!

I'm modeling this mostly after ASM6, which is the assembler that most closely resembles the ideal tool for me, but I'm picking up ideas from other assemblers as well. You might be asking: "If this is so close to ASM6, then why not just create a fork of it?" Well, besides not knowing HOW exactly to do that (I'm not particularly skilled in C and I can hardly understand the design of ASM6 from just looking at the source code), I'd need to modify a few core features of the program, and that would probably be too hard or even impossible for me to do, so I might as well write the whole thing from scratch. Plus I'm a control freak and don't want to be bound by other people's design choices, if I decide to include even more stuff later on. For now I'll be using a higher-level language than C, most likely JavaScript (one of the languages I'm most proficient in) running on Node.js, at least for prototyping, in order to get something working ASAP. If everything goes well, I may or may not port it to something more efficient at a later time.

The reason I'm making this thread is not to advertise my assembler, create hype (as if!), take requests, or anything like that. I'm actually a little insecure about my design ideas and ways to implement them, seeing as I've never written a program like this before, so I'd like to discuss some of these ideas with you guys first, to make sure I'm not missing anything important and making bad decisions left and right. If the end result is something people will be interested in using, I'll be more than happy to share the program, but like I said, my ultimate goal is to be able to write 6502 programs in a way that *I* am comfortable with.

First, I'd like to talk about the things I plan to carry over from ASM6, which are the following:

- LINEAR ASSEMBLY: I never cared much for outputting to different segments and linking a bunch of separate modules to put a ROM together, to me it makes more sense to simply fill the ROM linearly. Even when using ca65 I never felt the need to use segments or link individually assembled modules, I just used what as necessary to write my code as linearly as possible.

- MULTIPLE PASSES: I value this a lot because it allows for complex symbol resolving, which in turn allows for more dynamic memory arrangements, such as overlays, right-aligned sections and relocatable code, without specific features to handle those cases. All in all, the ability to freely use symbols/labels in expressions and directives greatly improves the versatility of the assembler.

- SIMPLE MACRO SYSTEM: Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler. Labels inside macros are all local. Recursion is not allowed.

Now here are a few things I'm also basing off of ASM6 but I plan on changing, taking other assemblers and my own use cases into account:

- OVERLAYS: ENUM is still the primary way to declare variables, but in order to facilitate the creation of overlays, ENUMs can optionally be named. This allows for an ENUM to pick up from where another ENUM left off. If two or more ENUMs pick up from the end of the same ENUM, they're effectively overlays. ASM6's ENUM can already kinda be used like that, you just have to define a label at the very end of the ENUM, and use that label as the starting address of another ENUM to keep going from where the first one stopped. The problem with that is that all these extra labels will get exported to label files along with the ones that are actually relevant for debugging. Maybe the solution is not to change ENUM, but to take a cue from ca65 and implement two forms of assignment for symbols, one that marks the symbol as a label (:=) and one that doesn't (=).

- ANONYMOUS LABELS: Anonymous labels are unidirectional in ASM6, and I have had to awkwardly put a + label and a - label on consecutive lines because I needed to access that point both from before and after it. For this reason, I feel like ca65's way is superior - the label is just a colon, and the direction is specified in the reference instead. Matching the number of colons to the number of + and - symbols is a bit error-prone though, specially if you need to add or remove an anonymous label between others that were already there, requiring you to double check and adjust all the nearby references, but I can live with that, specially considering that you're not supposed to abuse anonymous labels in the first place.

- LOCAL LABELS: Local labels start with "@", but having their scope delimited by non-local labels is too restrictive IMO. I constantly write subroutines with multiple entry points, which are obviously defined via global labels, and having those global labels break the scope of the local labels that should be visible in the entire subroutine is a major annoyance. To fix this, scopes now must be explicitly started with the SCOPE directive. Unlike in ca65, scopes are not blocks, you can only end a scope by starting a new one, meaning it's not possible to have nested scopes. Scopes can be named, which allows their local labels to be accessed from the outside (e.g. SomeScope.@LocalVariable).

- REPEATED LABELS: Several NES mappers require reset stubs and common subroutines to be repeated across multiple banks, and this is a problem when assembling a program because labels can't be repeated. One way to work around that in ASM6 is to define labels by assigning the PC to a symbol (e.g. MyLabel = $), since symbols can be reassigned without errors. However, I have created a problem when I introduced the SCOPE directive, because now I have to get around repeated scope names too. The only solution I could think of was to ignore repeated scope names, and create a nameless scope whenever a repeated name is supplied. This will cause any duplicates to be essentially invisible to the rest of the program.

And finally, a few things that ASM6 doesn't have at all:

- ZP ADDRESSING OVERRIDING: ASM6 uses ZP addressing whenever possible, but when you're writing timed code, you may want to access ZP locations using absolute addressing. Maybe an address size modifier like in ca65 (a:address) is the answer.

- MEMORY BANKS: Keeping track of what's where when dealing with bank switching is a big annoyance, and doing that manually is just too error-prone. My solution is to simply create a BANK directive that you can use to set the current bank number, causing any subsequent labels to be assigned that bank number. This information can then be extracted from the labels whenever necessary. The same numbers can be used over and over, since you may need to index PRG-ROM banks, CHR-ROM banks, RAM banks, and so on.

- FUNCTIONS: As far as I can tell, ASM6 doesn't have any built-in functions, and doesn't offer any means for users to create their own. User defined functions are probably outside of the scope of my simple assembler, but a few built-in functions could be really useful. There will certainly be a function to extract bank numbers from labels, and maybe a function to test whether macro parameters are blank or not.

- TEXT OUTPUT: ASM6 can only output error messages, but other kinds of messages are also very important. The OUT directive can output text without aborting the assembly process. Since this is a multi-pass assembler, text output must be buffered during each pass, and only text generated during the last pass must be displayed.

- CHARACTER MAPPING: Mapping characters to specific indices is important because we usually have few tiles to dedicate to text so we can't afford to be slaves of the ASCII encoding. The idea here is to use a directive to define an index, and then the character to put at that index. The reason to supply the parameters in this order is that after the index, you can supply multiple characters and strings, and the index will auto-increment to accommodate as many characters as necessary (e.g. CHARMAP $00, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", " .!?", $0D).

These are all my ideas so far. If anyone was brave enough to read through all of this, please share your opinions on what I have so far. Am I missing something important? Am I doing something in a dumb way? Please comment.


Top
 Profile  
 
PostPosted: Tue Oct 09, 2018 11:32 pm 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 629
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.

Having bidirectional anonymous labels seems overkill, and more work that it is worth. They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for. It will be much more readable, and far less error prone than trying to balance the forwards and backwards. Its just pointless cryptic.

For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.

ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 12:05 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
Oziphantom wrote:
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.

I'm only using Node to run the .js file locally as a command line application (e.g. node.exe assembler.js source.asm game.nes), it's not much different from using Python, PHP, or any other scripting language. Also, Node has a file system library that really helps with reading/writing files, one of JavaScript's weaknesses.

Quote:
Having bidirectional anonymous labels seems overkill, and more work that it is worth.

I think that the anonymous labels in ca65 are in fact easier to implement than those in ASM6, not that there's a huge difference though. Since the complexity is equivalent, I'd rather go with the method I find more useful.

Quote:
They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for.

It may not be very common, but even in tiny tight loops there are cases when I need to jump to the middle from both the top and the bottom, and I don't feel like naming a label in such a little piece of logic.

Quote:
For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.

Exactly. Changing addresses sizes is the most complicated part IMO.

Quote:
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

Really? I've never seen that...

Quote:
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

Hum... that's interesting.

Quote:
It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.

I like this kind of automation. Have you seen this implemented anywhere?

Thanks for the tips.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 12:13 am 
Offline
User avatar

Joined: Tue Jun 24, 2008 8:38 pm
Posts: 2124
Location: Fukuoka, Japan
@Oziphantom

Nodejs is often the backend for electron based app so it doesn't mean that you use a server per se.

@Tokumaru

I took the time to read but didn't answer right away (I'm half asleep at the wheel today ^^;;) but for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.

As for Bank, I guess it's just a concept with no actual size? With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.

As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.

I may have more comments later, when I'm less sleepy :D


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 12:46 am 
Offline

Joined: Sun Mar 27, 2016 7:56 pm
Posts: 176
When it comes to local labels, I'm personally partial to the style used in NASM and FASM for x86. Local labels look like .foo, but can also be referred to as bar.foo globally, where bar is the first global label before .foo. So for instance, a subroutine with two entry points could look something like this:
Code:
entry1:
    ...
    ...
    bne entry2.loop
    ...
entry2:
    ...
.loop:
    ...
    ...
    beq .loop
    rts

Of course, whether it's too annoying to have to write entry2.loop or not is up to you.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 12:49 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 629
tokumaru wrote:
Oziphantom wrote:
I don't know much about JS, but isn't Node about servers and remote communication etc. I would think you would want to keep this neat JS such that you can just run it in a command line or a browser page locally and not have to install a bunch of other stuff to get it to work. Maybe Node adds something really useful for processing data on a local machine, but it would be best to avoid needing node/docker or some other overly convoluted web thing.

I'm only using Node to run the .js file locally as a command line application (e.g. node.exe assembler.js source.asm game.nes), it's not much different from using Python, PHP, or any other scripting language. Also, Node has a file system library that really helps with reading/writing files, one of JavaScript's weaknesses.

cscript.exe will run JS as well without most people having to install something is all.

tokumaru wrote:
Quote:
Having bidirectional anonymous labels seems overkill, and more work that it is worth.

I think that the anonymous labels in ca65 are in fact easier to implement than those in ASM6, not that there's a huge difference though. Since the complexity is equivalent, I'd rather go with the method I find more useful.

Quote:
They are for small loops and simple things. If you have a case where you want to go forward and backwards to the same location, use a local label that is what they are for.

It may not be very common, but even in tiny tight loops there are cases when I need to jump to the middle from both the top and the bottom, and I don't feel like naming a label in such a little piece of logic.

yeah but _l or in your case @l is just as much to type basically ;)

tokumaru wrote:
Quote:
For what you want, N pass will be needed. In Tass I generally hit about 7 passes. As you will need to be able to make an internal intermediate form, that you can work out how big something needs to be, by assembling it, working out what is a 16bit address, 8 bit address etc then doing the maths to pull it back from the end of the bank etc.

Exactly. Changing addresses sizes is the most complicated part IMO.

Quote:
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

Really? I've never seen that...

NES assemblers seem to have been written in vacuums by people who haven't used a traditional assembler. .w or @w are the other common methods, .w being a "They programmed for the 68K" identifier ;)

Quote:
Quote:
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

Hum... that's interesting.

Quote:
It seems what people want is a static analyzer, this way you can add directives for 'doesn't trash' allowing you to make sure that the code you jsr doesn't trash some shared variable. Building a list of things code modifies is pretty easy for an assembler as you have to resolve the labels anyway.

I like this kind of automation. Have you seen this implemented anywhere?

Thanks for the tips.

I've implemented it as a post process static anslyser. The hardest part was me reversing all of the TASS64 output to get back to the original code, building a call tree and once armed with this info, it was trivial to check. Then I started to get fancy with my coding and it just broke my parser. I've just spent a good week or so getting my new debugging format working so I have source code that shows the whole function as I step and it shows the me the 'local' variables. I'm going to add this "analyzer" code to it again, it was kind of handy.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 12:53 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
Banshaku wrote:
for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.

I use anonymous labels very responsibly... I don't want to litter my code with labels for obvious things like zero checks, loop points and the like, so that's what I use anonymous labels for. I have little blocks like this:

Code:
  ;clear page RAM $03
  ldx #$00
  txa
: sta $0300
  inx
  bne :-

This is such a small piece of logic that the intent is beyond obvious, so there's no point in littering the place with dumb labels like "@Loop", "@Skip" and the like. I'll hardly jump more than 5 lines to an anonymous label, it's all very compact and with a clear comment at the top explaining what the whole block of code below is for.

Quote:
As for Bank, I guess it's just a concept with no actual size?

Yeah, it's just a number that gets attached to labels so I can easily know what bank to map in to access something. For example, I can do this for CHR banks:

Code:
  .bank $20
  .org $0000
PlayerRunning:
  .incbin "player-running.chr"

  .bank $21
  .org $0000
PlayerJumping:
  .incbin "player-jumping.chr"

And then I can just use .bank(PlayerRunning) whenever I need to reference the bank that contains the player's running graphics (and PlayerRunning >> 4 to get the offset to the first tile), and I'm free to rearrange the tiles around and move them to different banks without the fear of breaking any references.

Quote:
With cc65 segments you know how big they are and knows how much is used with the map file, which is useful when looking in what used inside that segment/bank etc.

I did consider making the BANK directive more complex, where you could define the size of the bank in addition to its number, but in the end I figured that would kill some of the flexibility that I like so much. You don't need to set every size and every address in advance, if you just do your .ORGs and .BASEs right, everything will work just fine. With a multi-pass assembler you can use symbol math for almost anything, even calculating the amount of free space in each bank.

Quote:
As for local label, I don't think you should have to write the @ when writing the name of the scope too. In Ca65 you just write scope::localLabelName and it works fine. The @ seems superfluous and should only be used when defined the label so the parser knows that it's local.

I don'k know, I kinda consider the "@" as part of the name. It's true that in ca65 you access local labels like scope::localLabel, but in ca65, ALL labels are local inside a scope, not only those beginning with "@". I don't know if you can even access a cheap local label from the outside in ca65, but if you can, I bet the "@" is needed. But anyway, scopes in my assembler will be much simpler than those in ca65.

Quote:
I may have more comments later, when I'm less sleepy :D

Great!


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 12:55 am 
Offline

Joined: Thu Aug 20, 2015 3:09 am
Posts: 424
Before I say anything else, I'm going to quickly plug flat assembler. I'm pretty sure there's more than one assembler by that name, so please check the link! It's a self-hosted x86 assembler, but the second version, fasmg, is a generic macro assembler - you implement your own instruction sets with macros. There's already at least one 6502 implementation on the forums.

I'm mentioning it because it's extremely well-designed. The code is a wall of commentless x86 assembly code, but the author has posted a description of its internal workings, which is well-worth the read if you're writing your own assembler. I'd link it directly but I'm kind of pressed for time right now.

A few highlights relevant to tokumaru's post:

MULTIPLE PASSES

Fasm performs multiple passes of everything but macro expansion. Conditionals, for loops, while loops, error messages and user-specified debug output all "just work" as though the values were right to begin with. This is useful for much more than symbol resolving.

MACROS

tokumaru wrote:
Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler.

I strongly disagree with this. Macros can automate a lot of tedium, not just repetitive code. Implementing a good macro system can save you having to implement a whole slew of other features at all.

Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively. So you can do this:
Code:
macro setall value, [dest]
{
common
lda value
forward
sta dest
}

setall #0, a, b, c, d

There is one more directive, "local", which makes all instances of the listed names local to each expansion of the macro. Fasm lets you redefine constants (at the cost of disabling forward references to that constant), and macros are often used to accumulate values. The results can then be assigned to a single-use label, allowing forward references. This can do the same thing as an "enum", without requring any extra code in the assembler.

OVERLAYS

Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.

LOCAL LABELS

Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.

REPEATED LABELS

Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.

ZP ADDRESSING OVERRIDING

Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.

FUNCTIONS

Macros and redefinable labels/variables can do anything functions can do, except recursion.

TEXT OUTPUT

Fasm has a "display" directive to output arbitrary text. It runs at assembly time and thus prints (or doesn't print) according to the result of the final pass.


I have more to say (and I've been ninja'd) but I'm about to lose power again. Sorry in advance if I messed up.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 1:06 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 629
tokumaru wrote:
Banshaku wrote:
for now the only comment is just a personal one regarding anonymous labels: I never use them because they are prone to errors and mask the intention of the code. Code that looks fine for you may make no sense to another person and in 6 months you won't remember it too. But, this is your assembler and you add what you like so that part is up to you.

I use anonymous labels very responsibly... I don't want to litter my code with labels for obvious things like zero checks, loop points and the like, so that's what I use anonymous labels for. I have little blocks like this:

Code:
  ;clear page RAM $03
  ldx #$00
  txa
: sta $0300
  inx
  bne :-

This is such a small piece of logic that the intent is beyond obvious, so there's no point in littering the place with dumb labels like "@Loop", "@Skip" and the like. I'll hardly jump more than 5 lines to an anonymous label, it's all very compact and with a clear comment at the top explaining what the whole block of code below is for.

we are not talking about that case, yes in that case you use - and no issues there. We are talking about the
Code:
  ldx #$00
- txa
  and #40
  bne +
  sta $0300,x
+
-
  inx
  sta $0300,x
  dex
  bpl -
  bmi --
case you were referring to. To which my point was do this
Code:
  ldx #$00
- txa
  and #40
  bne @l
  sta $0300,x
@l
  inx
  sta $0300,x
  dex
  bpl @l
  bmi -


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 1:18 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
Nicole wrote:
When it comes to local labels, I'm personally partial to the style used in NASM and FASM for x86. Local labels look like .foo, but can also be referred to as bar.foo globally, where bar is the first global label before .foo.

That's actually where I got he idea from, but since I absolutely need to be able to define global labels from a local scope, I went with the explicit creation of scopes via a dedicated directive, rather than implicitly creating scopes with each global label.

I don't see anything wrong with the "." notation per se, but since I'm used to having leading dots in assembler directives, that might make the code look confusing and harder to parse... I'm really used to dots meaning directives and @s meaning local labels.

Quote:
So for instance, a subroutine with two entry points could look something like this:
Code:
entry1:
    ...
    ...
    bne entry2.loop
    ...
entry2:
    ...
.loop:
    ...
    ...
    beq .loop
    rts

I guess it's doable, but it's weird. It's not a matter of being annoying to type, it's just that this is supposed to be a single block. Semantically, it doesn't make much sense.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 1:52 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
Oziphantom wrote:
cscript.exe will run JS as well without most people having to install something is all.

It's also about 128 times slower, it seems. I've used cscript.exe in the past, but was less than impressed with its file system library (I remember having to use hacks in order to work with binary files!) and its performance. I also think it's badly outdated. Plus it's Windows only. Node.js on the other hand has the newest JavaScript features, tons of libraries, and I can even run it on my phone and develop anywhere. On Windows it's just a 15Mb download, and you don't even have to install anything, just decompress the .zip and use it. In today's world, where everything has to be installed and configured after GBs of downloads, and every piece of software thinks it owns your entire PC, I consider that a win!

Quote:
To which my point was do this
Code:
  ldx #$00
- txa
  and #40
  bne @l
  sta $0300,x
@l
  inx
  sta $0300,x
  dex
  bpl @l
  bmi -

Yeah, but that break is ugly, it makes a single task look like 2 tasks, it affects readability for me. I'd much rather do this:

Code:
  ;do something
  ldx #$00
: txa
  and #40
  bne :+
  sta $0300,x
: inx
  sta $0300,x
  dex
  bpl :-
  bmi :--

That's just a matter of personal preference.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 2:17 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
Rahsennor wrote:
Before I say anything else, I'm going to quickly plug flat assembler.

Thanks for the link, I'll check it out. Seeing what other assemblers do really helps.

Quote:
tokumaru wrote:
Macros are meant for consolidating repetitive assembly code, not for extending the functionality of the assembler.

I strongly disagree with this. Macros can automate a lot of tedium, not just repetitive code. Implementing a good macro system can save you having to implement a whole slew of other features at all.

I agree with you in general, what I said was just describing how macros will be in my assembler, very bare-bones. While I agree that macros can be incredibly useful for a number of purposes, I don't have the time to make a complex assembler. I experimented a lot with ca65 macros, and have done a lot with them, but nearly all of that work was to implement the features I'm describing here, and since they'll be built-in this time around, I don't need a complex macro system right now.

Quote:
Fasm's macros are fairly simple: they take a comma-seperated list of arguments. The last one may optionally be variable-length. The directives "common", "forward" and "reverse" cause the section afterwards to be emitted once, once for each variable argument, and once for each variable argument in reverse, respectively.

Sounds interesting, but not very intuitive to read!

Quote:
OVERLAYS

Fasm generalizes "enum" functionality into "virtual" blocks (which is a much better name IMHO). These take a starting address and turn off the output for the duration, without affecting any labels or constants generated. The "named enums" you describe can be implemented with ordinary labels inside virtual blocks. Multiple passes take care of the rest.

Sounds just like ENUM! I was not a big fan of the name "enum", but I guess it makes sense, since it's effectively just incrementing a counter after each symbol.

Quote:
LOCAL LABELS

Fasm starts local labels with a dot. Local labels are automatically visible outside their scope, as a suffix of the last global label. Clearly not what you want, but I find it convenient, so I mention the idea here for completion.

I kinda like this system, but for multiple entry points, I prefer having explicit control over when scopes start and end, rather than let the global labels define that.

Quote:
REPEATED LABELS

Fasm allows you to load data from any address at assembly time with the "load" operator. You can use this with a loop to copy code into as many places as you like, wrap it in a macro to automate it, put the first copy inside a virtual section so it's never emitted, XOR encode the copies etcetera, all without any explicit support from the assembler itself.

This is really cool. Being able to replicate the binary data is a pretty interesting idea.

Quote:
ZP ADDRESSING OVERRIDING

Fasm lets you override the type of a variable, the length of a jump and so on by simply prefixing it with the new type - it would look something like lda zeropage $03 or lda absolute $03 for 6502 code. When the size is not specified fasm defaults to the smallest instruction length.

That's a good approach, similar to how ca65 does it.

Quote:
FUNCTIONS

Macros and redefinable labels/variables can do anything functions can do, except recursion.

But certain things are way more convenient when done inline, like lda #BANK(SomeLabel).

Quote:
Sorry in advance if I messed up.

That's fine. Thanks for bringing in new ideas.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 2:55 am 
Offline

Joined: Thu Aug 20, 2015 3:09 am
Posts: 424
Got a few more minutes.

tokumaru wrote:
I don't have the time to make a complex assembler.

That's exactly my motivation for wanting good (!= complex) macros in my assembler. Along with the multipass stage, they can cover many of the features I'd otherwise have to hardcode.

tokumaru wrote:
Sounds interesting, but not very intuitive to read!

You get used to it, and it looks better with proper formatting. But yes, even Tomasz has expressed regret at the syntax and changed it in fasmg. I don't remember how the new version works though, and I didn't have time to write a proper example or copypaste tabs.

tokumaru wrote:
But certain things are way more convenient when done inline, like lda #BANK(SomeLabel).

Oh, I see what you mean now. I'd just build banks into the assembler and not worry about general-purpose functions. Fasm has no functions, only operators, and a few platform-specific features, like allowing label addresses to be relative to a register (very useful for the stack). Banks would fall into that category, I would think.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 5:59 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20854
Location: NE Indiana, USA (NTSC)
tokumaru wrote:
Quote:
CHARACTER MAPPING having multichar literals is also really handy {copyright} for example {heart}

Hum... that's interesting.

That's possible in rgbasm from RGBDS, an assembler targeting the Game Boy CPU.

Oziphantom wrote:
tokumaru wrote:
I'm only using Node to run the .js file locally as a command line application

cscript.exe will run JS as well without most people having to install something is all.

Last I checked, cscript.exe was exclusive to Microsoft Windows. I don't run Windows on my primary dev machine; nor does calima. Are there tips for writing a script to make it work on both cscript.exe (for users of Windows) and Node.js (for users of GNU/Linux and macOS)?

Oziphantom wrote:
tokumaru wrote:
Quote:
ZP ADDRESSING OVERRIDING typically one uses the ~ character.
LDA $00 -> ZP
LDA ~$00 -> Abs

Really? I've never seen that...

NES assemblers seem to have been written in vacuums by people who haven't used a traditional assembler. .w or @w are the other common methods, .w being a "They programmed for the 68K" identifier ;)

ca65 has lda $00 for zero page, lda a:$00 for absolute, and lda f:$00 for 65816-exclusive absolute long. In my opinion, 68000's .w is completely different because it specifies data size, whereas $00 vs. a:$00 vs. f:$00 is about address size. And I'd recommend against ~$00 notation because ~ is already in use to mean one's complement.


Top
 Profile  
 
PostPosted: Wed Oct 10, 2018 6:25 am 
Offline
User avatar

Joined: Thu Mar 31, 2016 11:15 am
Posts: 441
You're never going to finish a NES game if you spend all your time making tools :P

With that said, I'd like it if all labels used the anonymous label +/- syntax. For example, if you have multiple labels with the same name, you can use +/- to distinguish them:

Code:
jmp foo:+
foo:
jmp foo:++
foo:
foo:
jmp foo:---


In other words, have labels and anonymous labels behave the same. Don't special case either.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 77 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group