It is currently Sun Aug 25, 2019 7:55 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Sun Aug 11, 2019 11:23 pm 
Offline

Joined: Sun Jun 30, 2013 7:59 am
Posts: 41
Couldn't think how to phrase the title :/

I'm writing a generic parser for 6502 assembler. The assembler I'm most familiar with is asm6 so it's based on that as much as anything. Anyway, the question I'd like to ask is whether it ever makes sense to accept syntax like:
Code:
LDA $42424242

that is, with the operand length being "excessive". It seems like every time I assume something must be completely wrong, someone provides a use (or should I say, abuse) case. I guess you could maybe use this to force bytes into the binary? In that case though you'd just use a directive.

Just asking because I'm writing the operand parts of the parser and I'm unsure whether to put a hard limit on operand lengths for operators -- $FFFF for most addressing modes and $FF for zero page modes. The latter I'm not as unsure about because the operand length is practically the definition of the addressing mode, and without it you'd be unable to distinguish from absolute and zero page.

Thanks :)


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 12:11 am 
Offline

Joined: Wed Nov 30, 2016 4:45 pm
Posts: 146
Location: Southern California
I would think that whatever you're trying to do would be better done with macros.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 12:42 am 
Offline

Joined: Sun Jun 30, 2013 7:59 am
Posts: 41
Garth wrote:
I would think that whatever you're trying to do would be better done with macros.
I'm just doing this as an exercise/project. It's a fairly regular syntax to parse, especially compared to the majority of languages.

Edit: I think asm6 does do this?:
Code:
                if(opsize[type]==1) {
                    if(!dependant) {
                        if(val>255 || val<-128)
                            errmsg=OutOfRange;
                    }


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 2:12 am 
Offline
User avatar

Joined: Thu Mar 31, 2016 11:15 am
Posts: 525
You shouldn't allow addresses outside 16 bit, but you should allow numeric constants to be 32 or 64 bit.


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 3:57 am 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4208
Location: A world gone mad
To be a bit more clear than my above colleagues: 6502 operands technically can only be 0 bytes (ex. nop), 1 byte (ex. lda zp / lda $23 / bcc $e7 (which is PC-relative, so while that's sort of assembly + bytecode combined, that might be bcc $e230 or something like that), or 2 bytes (ex. jmp $800c). That's literally all the CPU supports. There is no other variance. Each addressing mode (and there are only a few) has a set/defined size. These are well-documented across tons of resources, books, everything. So, for your example LDA $42424242, this wouldn't assemble / would generate an out-of-range error or parser error of some sort.

I say this with total respect, not judgement: I think you need to spend some more time getting to understand the CPU if you're having to ask this question. Saying you are "familiar with asm6" *should* mean you are familiar with writing 6502 code, but possibly some part of it in your head is "mangled" because you haven't actually looked at assembled results before. May it would be helpful if you looked at the results of a _disassembler_, for learning purposes? Not sure. Since you're familiar with asm6, you should try generating a listings file (-l (lowercase ELL) flag) and look at the raw bytecode generated on a per-instruction basis. Or, well, just read actual CPU documentation... :-)

As for a "generic 6502 parser", you are going to have one hell of a time with this, specifically if you plan on comprehending *human-written assembly* with things like names for labels, equates, macros, and so on. This will probably break your brain, because every assembler is different. Consider variances like how NESASM forces you to use [] brackets for indirect addressing (ex. lda [$40,x]), while pretty much every other assembler since the 70s has used () parenthesis (ex. lda ($40,x)). Now consider assembler directives (a.k.a pseudo-ops), etc.. Is it possible? Probably after years of work, but I don't see what would really be gained by this (FYI, that is not me posing a question).


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 4:13 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 750
It depends on how you want to handle cart banks..
While a 6502 can not see beyond FFFF, carts can be a lot bigger than FFFF, a 24bit limit is probably sane enough. So as you assemble for cart bin files, you will need a 24bit address, so if you have labels that are in a bank and you allow the PC to be set to beyond FFFF so you can place things at specific points in the bin file or override points in the bin file to make a patch, you will get STA $XXXXXX cases, to which your code will need to be able to convert XXXXX into the correct XXXX value before it can assemble.


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 1:34 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4208
Location: A world gone mad
Oziphantom wrote:
It depends on how you want to handle cart banks..
While a 6502 can not see beyond FFFF, carts can be a lot bigger than FFFF, a 24bit limit is probably sane enough. So as you assemble for cart bin files, you will need a 24bit address, so if you have labels that are in a bank and you allow the PC to be set to beyond FFFF so you can place things at specific points in the bin file or override points in the bin file to make a patch, you will get STA $XXXXXX cases, to which your code will need to be able to convert XXXXX into the correct XXXX value before it can assemble.

No 6502 assembler works like this (read: some kind of faux-linear-addressing that abstracts out mapper PRG-ROM switching). CPU addressing space is $0000-FFFF. Addressing range in operands cannot exceed 16-bit. You will just confuse the OP with what you've said.


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 2:50 pm 
Offline

Joined: Sun Jun 30, 2013 7:59 am
Posts: 41
koitsu wrote:
To be a bit more clear than my above colleagues: 6502 operands technically can only be 0 bytes (ex. nop), 1 byte (ex. lda zp / lda $23 / bcc $e7 (which is PC-relative, so while that's sort of assembly + bytecode combined, that might be bcc $e230 or something like that), or 2 bytes (ex. jmp $800c). That's literally all the CPU supports. There is no other variance. Each addressing mode (and there are only a few) has a set/defined size. These are well-documented across tons of resources, books, everything. So, for your example LDA $42424242, this wouldn't assemble / would generate an out-of-range error or parser error of some sort.
That's how I've got it defined, I just wanted to be sure.

koitsu wrote:
I say this with total respect, not judgement: I think you need to spend some more time getting to understand the CPU if you're having to ask this question. Saying you are "familiar with asm6" *should* mean you are familiar with writing 6502 code, but possibly some part of it in your head is "mangled" because you haven't actually looked at assembled results before. May it would be helpful if you looked at the results of a _disassembler_, for learning purposes? Not sure. Since you're familiar with asm6, you should try generating a listings file (-l (lowercase ELL) flag) and look at the raw bytecode generated on a per-instruction basis. Or, well, just read actual CPU documentation... :-)
I'll definitely start looking at listings files if I have any doubts. This wasn't so much not understanding the CPU as it was not understanding whether this kind of thing might be done for some unknown purpose even thought it's "technically" incorrect.

Thanks for the advice. I'll admit (not that it was ever in doubt) I'm not the most experienced 6502 programmer, but I do understand how opcodes are placed in the binary w.r.t addressing mode, and how labels are resolved. That's enough for me personally to still enjoy this project and feel like I can make something I'll be proud of, so I'll continue on. I have no illusions of making the best assembler ever, plenty of great ones already exist. I'm having a lot of fun - and learning a lot - doing this.
koitsu wrote:
As for a "generic 6502 parser", you are going to have one hell of a time with this, specifically if you plan on comprehending *human-written assembly* with things like names for labels, equates, macros, and so on. This will probably break your brain, because every assembler is different. Consider variances like how NESASM forces you to use [] brackets for indirect addressing (ex. lda [$40,x]), while pretty much every other assembler since the 70s has used () parenthesis (ex. lda ($40,x)). Now consider assembler directives (a.k.a pseudo-ops), etc.. Is it possible? Probably after years of work, but I don't see what would really be gained by this (FYI, that is not me posing a question).
Apologies, I've given completely the wrong impression with the word "generic". Generic as in, not specific. Just for parsing opcodes, operands, labels, and defines. Not my best choice of verbiage - my bad.


Top
 Profile  
 
PostPosted: Mon Aug 12, 2019 3:22 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4208
Location: A world gone mad
"Parsing labels and defines" is going to cause you grief. Nothing stops an assembler from doing a mathematical expression that results in, say, a 32-bit value. The 6502's addressing modes, however, obviously cannot handle that because registers are only 8 bits, and absolute addressing modes only support a maximum of 16-bit values for addressing, i.e. $0000-FFFF in ROM space, so the programmer must use something like < (asm6 and others) or .LOBYTE() (ca65) to get at the lowest 8-bit piece of the 32-bit value. All of this is all done at assemble-time, not run-time! There is no standard for what the maximum size of an expression can be (and I would say most assembler documentations do not disclose this). So, in the end, CPU-wise, the size limits are as I stated: opcodes are always 1 byte in length, operands are either 0 (implied), 1 (immediate, ZP), or 2 bytes (absolute) depending on addressing mode.

The point is: if you're parsing human-written source code, you are in for a massive world of hurt and hair-pulling. There is no "standard" for this, sorry to say; humans are very good at creating ways to solve conundrums or limitations of a tool (assembler, linker, etc.) through other means, so parsing/handling all of that may break your brain. I wish you genuine luck in this endeavour.


Top
 Profile  
 
PostPosted: Wed Aug 14, 2019 2:27 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 750
koitsu wrote:
Oziphantom wrote:
It depends on how you want to handle cart banks..
While a 6502 can not see beyond FFFF, carts can be a lot bigger than FFFF, a 24bit limit is probably sane enough. So as you assemble for cart bin files, you will need a 24bit address, so if you have labels that are in a bank and you allow the PC to be set to beyond FFFF so you can place things at specific points in the bin file or override points in the bin file to make a patch, you will get STA $XXXXXX cases, to which your code will need to be able to convert XXXXX into the correct XXXX value before it can assemble.

No 6502 assembler works like this (read: some kind of faux-linear-addressing that abstracts out mapper PRG-ROM switching). CPU addressing space is $0000-FFFF. Addressing range in operands cannot exceed 16-bit. You will just confuse the OP with what you've said.

64Tass allows you to set a 24bit address range for the output file, which effectively allows you to make a 16MB file, this is how I make 512K CRT files. So while all code is assembled within a 16bit address limit the output is positioned into a linear 24bit file.

How does a NES assembler handle a cart > 64K if everything in the file is limited to 64K?


Top
 Profile  
 
PostPosted: Wed Aug 14, 2019 2:53 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7568
Location: Canada
You only enforce the range of operands, not all expressions.

How do you find a label that's in a bank? Well, the label has that metadata attached somehow by the assembler. It might have an associated ".bank" or ".segment" or some other property like this depending on the assembler. That's not part of the operand, though, so it's not applicable. (If you need to get an associated bank number to write to a banking register, some assemblers have mechanisms for that too. CA65 lets you add a bank number attribute to a segment and retrieve it with a pseudo-function, for example.)

The address of a thing in the file probably isn't really something you'd want to use in a 6502 assembler... I'm not sure of an application for that. For any 6502 instruction operand, you need it's address in memory, and that is never more than 16-bit. The platform doesn't do anything larger.


Some assemblers will truncate expressions larger than 16 bits without warning, and probably there are some people who prefer it that way, but personally I love to have range checking and will gladly accept having to manually truncate the rare cases where I need to. It's safer and more explicit. Though, one sticky point here is what to do about signed values... ca65's range checking is unsigned only and it doesn't have a mechanism to turn it off temporarily around code that needs signed stuff. On the other hand, something like allowing -128 to 255 has the converse problem of not catching unsigned values that have underflowed. I'd take either of these compromises over silent truncation in a heartbeat though.


Expressions on the other hand should be some practical large type size. 32-bit seems to be common. I wouldn't expect them to be limited to 16-bits unless this was a very old assembler written for an actual 16-bit computer. The extra bits are important when you need to do assemble-time calculations (especially multiplying and dividing). The range check only belongs on instruction operands.

koitsu wrote:
There is no standard for what the maximum size of an expression can be (and I would say most assembler documentations do not disclose this).

Well, NESASM and ASM6 don't, but ca65 explicitly documents it. I think I've seen it documented in several assembler manuals, but maybe we're talking about a domain where a lot of assemblers don't have very comprehensive documentation to begin with. :P


Top
 Profile  
 
PostPosted: Wed Aug 14, 2019 4:02 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 750
rainwarrior wrote:
Some assemblers will truncate expressions larger than 16 bits without warning, and probably there are some people who prefer it that way, but personally I love to have range checking and will gladly accept having to manually truncate the rare cases where I need to. It's safer and more explicit. Though, one sticky point here is what to do about signed values... ca65's range checking is unsigned only and it doesn't have a mechanism to turn it off temporarily around code that needs signed stuff. On the other hand, something like allowing -128 to 255 has the converse problem of not catching unsigned values that have underflowed. I'd take either of these compromises over silent truncation in a heartbeat though.
I would think the type that you are using should dictate this.
Code:
.byte  <some expression> ; unsigned
.char  <some expression> ; signed
.word  <some expression> ; unsigned 16bits
.sint  <some expression> ; signed 16bits
.addr  <some expression> ; 16bit unsigned, forced to PC limits auto clips bank byte
.rta   <some expression> ; rts return address, 16bits, forced to PC limits auto clips bank byte
.long  <some expression> ; unsigned 24 bits
.lint  <some expression> ; signed 24 bits
.dword <some expression> ; unsigned 32 bits
.dint  <some expression> ; signed 32 bits
for immediate values
Code:
lda #XX    <- unsigned range 0-255
lda #XXXXX <- unsigned range 0-65535
lda #+XX   <- signed range -128-127
lda #-XX   <- signed range -128-127
lda #+XXXX <- signed range -32768-32767
lda #-XXXX <- signed range -32768-32767


Top
 Profile  
 
PostPosted: Wed Aug 14, 2019 11:58 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7568
Location: Canada
Yeah, that's something I proposed on the CC65 mailing list in the past (ref), signed data types and some sort of "signed immediate" mechanism. I was thinking that we would need some alternative symbol besides # to indicate a signed immediate, but treating #- and #+ as a digraph that indicates this actually seems like a pretty good way to do it. I say digraph meaning it should only work if they are directly adjacent, so you can still use the negate operator with an unsigned value if you need it:
Code:
#(-o) ; still use an unsigned range check
#-o ; signed


Anyway, despite this being a longstanding irritation, it's only a minor one for me. (In that linked mailing list post I indicated several workarounds for the same problem.) I'd love to see this feature, but also it doesn't seem enough of a problem for me to implement it myself, so far. The overwhelming majority of my code only needs unsigned types anyway, so occasional exceptions for signed isn't too much of an issue for me.

Incidentally the char type is unsigned on CC65, so it is not an appropriate symbol for a signed byte. (This is allowed by the C spec, and probably an appropriate choice for a 6502 compiler.)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group