Writing my own assembler

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

Hangin10
Posts: 37
Joined: Thu Jun 04, 2009 9:07 am

Re: Writing my own assembler

Post by Hangin10 »

yeah, I was confusing his assembler feature with the talk of the global jmp/label ret.
zzo38
Posts: 1096
Joined: Mon Feb 07, 2011 12:46 pm

Re: Writing my own assembler

Post by zzo38 »

I use Node.js for command-line programs too (it doesn't need to be used for servers or GUI; as mentioned before, it is just another programming language and you could also use Python, Perl, or PHP). I think that Windows Script Host does not implement many ES6 features though? If you are writing a assembler in JavaScript you will likely want byte arrays.

I like the relative labels in Knuth's MIXAL and MMIXAL. If a label name is a digit and then H then you can access the nearest such label backward or forward by the number and then B or F respectively. (MMIXAL is strange and uses only a single pass though; forward references are resolved at load time instead. However, the same relative label format could be used in multi-pass assemblers too.)

I also like the nonstandard syntax used in NESASM/MagicKit (indirect addressing uses square brackets, and zero-page addressing is explicit), but maybe you prefer the standard syntax.

I also tend to use macros to define jump tables and so on, rather than doing them manually, meaning a simple macro system might not do (although if the assembler is written in JavaScript, it would be possible to support extensions also written in JavaScript without too much difficulty).
(Free Hero Mesh - FOSS puzzle game engine)
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Writing my own assembler

Post by tokumaru »

zzo38 wrote:it is just another programming language and you could also use Python, Perl, or PHP).
Exactly. You download the interpreter, and run your script through it, it's the same thing.
I think that Windows Script Host does not implement many ES6 features though?
That's what I meant by "outdated" a few posts ago. I think it's an old version of JavaScript, without little support for binary data and file manipulation.
If you are writing a assembler in JavaScript you will likely want byte arrays.
Not only that, but being a popular tool that's actively maintained, there are several modules for all kinds of things you might need.
I also like the nonstandard syntax used in NESASM/MagicKit (indirect addressing uses square brackets, and zero-page addressing is explicit), but maybe you prefer the standard syntax.
While square brackets for indirection makes a lot of sense in assembly (more than parentheses, I agree), there's just too much 6502 code out there using a standard that's probably as old as the CPU itself, and a change like that causes unnecessary confusion IMO.
I also tend to use macros to define jump tables and so on, rather than doing them manually, meaning a simple macro system might not do
I often use macros to help with this kind of thing too.
(although if the assembler is written in JavaScript, it would be possible to support extensions also written in JavaScript without too much difficulty)
Yeah, I'm considering allowing user-defined JavaScript funcions. Nothing is more versatile than a full programming language at your disposal.
User avatar
gauauu
Posts: 779
Joined: Sat Jan 09, 2016 9:21 pm
Location: Central Illinois, USA
Contact:

Re: Writing my own assembler

Post by gauauu »

tokumaru wrote:Yeah, I'm considering allowing user-defined JavaScript funcions. Nothing is more versatile than a full programming language at your disposal.
I like that idea a lot. So many times I'm torn between trying to wrestle macros into doing something that would be easier with a full programming language, and saying "forget it" and just running my own custom pre-processor (written in perl or python or something) over my code. This could be the best of both worlds.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Writing my own assembler

Post by Oziphantom »

Drakim wrote:Darn, this thread is just making me itch to take a stab at making my assembler as well. I just might give it a go. :D

I've had some ideas floating around for a long time, so I figure I might share them here if you are interested. I'm not sure if all of these ideas are realistic, I haven't tried to implement them myself anywhere yet.

1. By far my most common label is a @Return: label in front of a nearby RTS statement. Maybe it's my style of coding, but I find that subroutines always has some branching conditions that exits early. So I realized it would be pretty nifty if I could just write a branch jump like this:

Code: Select all

LDA MyVar
BEQ RTS
STA MyVar2
And the assembler would just treat a nearby RTS statement as a on-the-fly label destination (or throw an error if there are none within range), so you wouldn't have to put a @Return: label there. It's not really any new kind of functionality, just a sort of "auto-label" thing to make the code less verbose.

2. Sometimes as a programmer you can do things that the assembler can actually see is stupid. Like putting code in a bank, that uses a label from another bank (which is on the same page). Or if you have an absolute instruction with an Int instead of a label, which I'm guessing in 99.99% of cases is just the programmer forgetting a # symbol before the Int. It would be neat if the assembler could tell you about such mistakes.

3. Labels do a LOT of different jobs in asm code. They act as entry-points for subroutines. They act as starting offsets for data tables. And they act as holders of constant gameplay values. Sometimes I wish there was a way to mark a label as to what kind of job it does, and have the assembler throw an error at me if I'm trying to use it in a different way. So you can't JSR GHOST_ID since the label holds a constant value and not an address to a subroutine, and you can't LDA GHOST_INIT since the label holds an address to a subroutine. (Obviously sometimes you need to do tricky things like a RTS trampoline so there need to be a way to tell the assembler to not go bananas over it on specific lines).
I've been asking Soci for 1 for a year and a half, but we didn't really come up with a nice way to do it.. that looks perfect :D

Tass64 does 2 and 3 already, see -wImmediate and -wShadow warning options ;)
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Writing my own assembler

Post by Oziphantom »

tokumaru wrote:
zzo38 wrote:I also like the nonstandard syntax used in NESASM/MagicKit (indirect addressing uses square brackets, and zero-page addressing is explicit), but maybe you prefer the standard syntax.
While square brackets for indirection makes a lot of sense in assembly (more than parentheses, I agree), there's just too much 6502 code out there using a standard that's probably as old as the CPU itself, and a change like that causes unnecessary confusion IMO.
Yeah 65816 uses [] as well so you have
lda (zp),y
lda [zp],y
and those mean different things, so not wise to change the brackets, as it may cause issues and get people confused with other 65(X)XX lines.
tokumaru wrote:
zzo38 wrote:(although if the assembler is written in JavaScript, it would be possible to support extensions also written in JavaScript without too much difficulty)
Yeah, I'm considering allowing user-defined JavaScript funcions. Nothing is more versatile than a full programming language at your disposal.
See KickAss Assembler it is kind of a scripting language/assembler hybrid nobody really knows what it is, it kind of became a mess, but there are people who swear by it.
User avatar
Sogona
Posts: 186
Joined: Thu Jul 23, 2015 7:54 pm
Location: USA
Contact:

Re: Writing my own assembler

Post by Sogona »

Are you planning on making this open source? :)
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Writing my own assembler

Post by tokumaru »

I don't know, I first have to finish writing the thing, and I still have a long way to go. But being JavaScript, it'd probably be simpler to share the source than to package the native code generated by the V8 engine. I heard that there are tools to do that, but the performance is worse than simply using Node.js. Anyway, if there's any demand for it, I'll definitely consider it.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Writing my own assembler

Post by tokumaru »

One thing that's been keeping me from moving forward with this is that I can't think of a good syntax to right-align code. In ASM6 you can do it with .ORG and label math, which's a bit cumbersome and pollutes the label table with stuff you don't need, so I really wanted to come up with a dedicated solution. One of the things that came to mind was changing the way .ORG works, so that not only it sets the PC for what comes after it, but also for what comes before it, if the PC is undefined at that point. If that was the case, you could write the following at the beginning of your source file:

Code: Select all

Label:
	jmp Label
	.org $10000
And you'd get this:

Code: Select all

$fffd: jmp $fffd
Then, to be able to right-align code anywhere, all you'd need is a directive to "forget" the PC, so it can be defined by a future .ORG statement:

Code: Select all

	.org $8000
	;code starting at $8000 goes here
	.forgetpc
	;code to right-align to $10000 goes here
	.org $10000
To me that's as clean as it gets, but I don't know what would happen if .BASE, another directive that changes the PC is used while the PC is undefined. I guess .BASE can also set the PC for the preceding code, but without padding. But what would .PAD do when the PC is undefined? Maybe I should get rid of .PAD and only use .ORG for padding if I really need to.

Anyway, can anyone think of better syntax for right-aligning code?
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Writing my own assembler

Post by tokumaru »

I did think of simple solutions for other problems though:

Repeated labels: I will simply allow labels to repeat if they're defined with two colons rather than one (i.e. SomeLabel:: instead of SomeLabel:). This seemed like a good solution because regular labels will keep working the same way, and users can choose which labels can be reassigned. This is also really easy to implement. You just have to be careful when using these labels, because the assembler will not check if the multiple addresses are the same.

Local label scope: The only real problem I have with local scopes being delimited by global labels is that sometimes you need part of a subroutine to be above the global label that defines the entry point. To solve this in a non-intrusive way, I decided to create a directive that explicitly starts a new scope, but the name of that scope is defined by the next global label that's found. It works like this:

Code: Select all

  .scope ;starts a new scope, but we don't know what the parent label is yet

.return:
  rts

Ignore45: ;oh, so this is the parent label in this scope

  cmp #45
  beq .return
  ;rest of subroutine
If you don't use the new directive, global labels will continue to start new scopes, as usual.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Writing my own assembler

Post by koitsu »

I don't know what "right-aligning code" means in this context. To me that just looks like padding used for some form of alignment.

Have you looked at x816's documentation? The implementation/model there should alleviate some of your concerns/issues here, and relieve you of your blocker regarding what to do if someone specifies code before the very first .org directive:

Code: Select all

.ORG
Define origin address.
Sets the starting address of the source file.  X816 will
not assemble any code until this directive is found.
This is really the best choice. Honest. The proposal you have (to allow code specified before the first .org, but based on what that .org line says) makes no sense and will confuse everyone who uses this tool. Likewise, .forgetpc makes absolutely no sense -- there's no need for it, just let .org dictate things, and don't allow people to write actual code before the first .org statement. Problem solved.

A copy of x816's manual is here, along with several other manuals from assemblers. Just remember that x816 was intended for 65816 (which supports 24-bit addressing and native banks), but it should give you some good ideas on how to do things (like .base and how to handle some scope-related bits) -- see x816-v122f-norman-yen.txt: https://drive.google.com/open?id=1kcEKU ... Xcj9vKDRIR

Edit 2020/02/11: update link from Dropbox to pCloud
Edit 2020/06/28: update link from pcloud to Google Drive
Last edited by koitsu on Sun Jun 28, 2020 10:23 pm, edited 2 times in total.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Writing my own assembler

Post by tokumaru »

Right aligning means aligning code to an upper address, useful when you use a mapper that swaps the entire 32KB and you need to simulate a fixed bank near the CPU vectors, containing a reset stub, trampoline routines and other things.

No assembler I know of is equipped to do this easily, so people either use cumbersome hacks, or definine a constant size for their "fixed" banks, solutions that are far from optimal.

Also, I disagree that my proposed solutions are confusing, because I'm intentionally trying to think of solutions that don't affect the common ways of doing things. Don't like the new directives? Don't use them, and things will behave as they always did (as much as there is a standard for these things, anyway). But even if I was changing things radically, I made it very clear since the beginning that this isn't meant to please anyone, this is mostly for my own use.
User avatar
gauauu
Posts: 779
Joined: Sat Jan 09, 2016 9:21 pm
Location: Central Illinois, USA
Contact:

Re: Writing my own assembler

Post by gauauu »

tokumaru wrote:Right aligning means aligning code to an upper address, useful when you use a mapper that swaps the entire 32KB and you need to simulate a fixed bank near the CPU vectors, containing a reset stub, trampoline routines and other things.
This sounds pretty nice, and I could definitely see myself using it. That said, why does the simulated fixed bank need to be at the upper-end near the vectors? I just always put mine first-thing (ie left-aligned). Is there some disadvantage of how I'm doing it? (asking in good faith, not trying to pick nits and argue)
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Writing my own assembler

Post by tokumaru »

gauauu wrote:That said, why does the simulated fixed bank need to be at the upper-end near the vectors? I just always put mine first-thing (ie left-aligned). Is there some disadvantage of how I'm doing it?
To me personally, it makes sense to put the fixed stuff up there because of the CPU vectors, which are in the same category (i.e. thing that must be present in all banks), but what seals the deal for me is that I use the beginning of the bank for subroutines with timed code, or data that has to be aligned to memory pages for timing reasons, because it's easier align code/data to page boundaries there.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Writing my own assembler

Post by tepples »

tokumaru wrote:Repeated labels: I will simply allow labels to repeat if they're defined with two colons rather than one (i.e. SomeLabel:: instead of SomeLabel:).
That might be confused with RGBDS's double colon export syntax. man 5 rgbasm says these are equivalent:

Code: Select all

SomeLabel::
;is the same thing as this
SomeLabel:
  export SomeLabel
Post Reply