NORTH
Moderator: Moderators
- rainwarrior
- Posts: 8735
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NORTH
I guess interpreted FORTH should have a form of "self modifying code", because you can redefine old dictionary entries, right? This doesn't seem to apply to this compiled FORTH implementation, though, which appears to have a static dictionary at run-time. This is an interesting language feature to me, though, at least, even if not relevant here.
I can also see how having a separate parameter stack and execution stack makes a lot of argument wrangling very simple compared to most other languages. I guess part of that is only having one "type", as well, though that seems to be as much a help as hindrance...
Writing the smallest possible code would likely involve a bytecode interpreter, again not part of this implementation, but there are other examples of this (e.g. SWEET16).
My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once. You can use a separate parameter stack as a convention if you like. You can make bytecodes if it'll make things smaller. Maybe you would argue that once you have a bytecode interpreter you're not actually writing in "assembly" anymore, but I don't really want to have an argument about the definition, I'm just wondering if there's really something special in FORTH that you were making reference to that I'm not understanding.
I mean, if it was really a statement about code text size, that's fine, I'd agree, I just misunderstood the context because I thought we were talking about binary size. If it was a statement about the "FORTH mindset" producing better code, then that's some subjective thing that I probably don't care to argue about.
At any rate, I'm looking forward to seeing what you make with it. I'd really like to know what a larger well written game in FORTH looks like.
I can also see how having a separate parameter stack and execution stack makes a lot of argument wrangling very simple compared to most other languages. I guess part of that is only having one "type", as well, though that seems to be as much a help as hindrance...
Writing the smallest possible code would likely involve a bytecode interpreter, again not part of this implementation, but there are other examples of this (e.g. SWEET16).
My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once. You can use a separate parameter stack as a convention if you like. You can make bytecodes if it'll make things smaller. Maybe you would argue that once you have a bytecode interpreter you're not actually writing in "assembly" anymore, but I don't really want to have an argument about the definition, I'm just wondering if there's really something special in FORTH that you were making reference to that I'm not understanding.
I mean, if it was really a statement about code text size, that's fine, I'd agree, I just misunderstood the context because I thought we were talking about binary size. If it was a statement about the "FORTH mindset" producing better code, then that's some subjective thing that I probably don't care to argue about.
At any rate, I'm looking forward to seeing what you make with it. I'd really like to know what a larger well written game in FORTH looks like.
Re: NORTH
rainwarrior, I take it you are addressing multiple people here. My own last post was just giving some background to answer the implied question about why the term "words" is used in Forth. Parts of the Forth approach can be taken even in assembly language though, and you can also cut into and out of assembly at any point in an otherwise Forth program.
One of the Forth threading methods is called "token threading," where all the most commonly used words are given a one-byte token in the compilation instead of the two-byte address of the code to run for each word. Then there must be an interpreter that looks up the execution address from a table at run time. This slows it down but reduces the binary size. Assuming you allow for more than 256 words, at least one byte value must be reserved to tell this interpreter that next you're going to give an address or at least a second token byte, instead of it being a one-byte token. If you implement this, you can still add words at any time.
You can indeed do self-modifying code in Forth, but the more common way to redefine words is to just write your new version. Anytime there's compilation or there's interpretation of the input stream, FIND searches for the word in the current context, beginning with the most recently defined word in the context, and working backwards. That way the most recently defined version will be found first, ending the search. That one could even be defined in terms of the last previous defined version. If you decide you don't like the new one and you want to go back to the old one, you can just FORGET the new one, and the dictionary pointer is put back to point where it did right before you defined that newest one.
Parenthetical note: FIND normally runs only during compilation or when you're interpreting the text input stream, not when compiled code is running; so if you redefine a word that other already-defined word is using, that word will also have to be redefined to use the new one, or you'll need to modify it. Several approaches are possible, and none of them require re-compiling very much, which is one reason Forth's interactive development goes so fast.
If desired, you can have more stacks than just the return stack (which is in page 1 on the 6502) and the data/parameter stack (which is normally in page 0 on the 6502). The most common third stack is a floating-point stack (although it's also common to handle floating-point operations on the normal data stack if there's no separate floating-point stack). You could add a complex-number stack, even a string stack, or whatever you like.
One of the Forth threading methods is called "token threading," where all the most commonly used words are given a one-byte token in the compilation instead of the two-byte address of the code to run for each word. Then there must be an interpreter that looks up the execution address from a table at run time. This slows it down but reduces the binary size. Assuming you allow for more than 256 words, at least one byte value must be reserved to tell this interpreter that next you're going to give an address or at least a second token byte, instead of it being a one-byte token. If you implement this, you can still add words at any time.
You can indeed do self-modifying code in Forth, but the more common way to redefine words is to just write your new version. Anytime there's compilation or there's interpretation of the input stream, FIND searches for the word in the current context, beginning with the most recently defined word in the context, and working backwards. That way the most recently defined version will be found first, ending the search. That one could even be defined in terms of the last previous defined version. If you decide you don't like the new one and you want to go back to the old one, you can just FORGET the new one, and the dictionary pointer is put back to point where it did right before you defined that newest one.
Parenthetical note: FIND normally runs only during compilation or when you're interpreting the text input stream, not when compiled code is running; so if you redefine a word that other already-defined word is using, that word will also have to be redefined to use the new one, or you'll need to modify it. Several approaches are possible, and none of them require re-compiling very much, which is one reason Forth's interactive development goes so fast.
If desired, you can have more stacks than just the return stack (which is in page 1 on the 6502) and the data/parameter stack (which is normally in page 0 on the 6502). The most common third stack is a floating-point stack (although it's also common to handle floating-point operations on the normal data stack if there's no separate floating-point stack). You could add a complex-number stack, even a string stack, or whatever you like.
Absolutely. Go for it.My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once.
http://WilsonMinesCo.com/ lots of 6502 resources
- rainwarrior
- Posts: 8735
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: NORTH
It was mostly a continuation of my previous post, speculating about Pubby's comment about code reuse. I knew the FORTH definition of "word" but maybe your post got me thinking about it a little more.
Why does "token threading" take the name "threading"? Does it have anything to do with threads as concurrently executed code, or does it have a different meaning in this context? What you described sounds like a normal bytecode size optimization (which I've seen in several bytecode formats)-- maybe the extreme case could be a huffman bit encoding based on frequency of use rather than working at the byte level?
I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources, though most of them seem to talk about how small the interpreter and bytecode implementation are. I can see how it has relatively good value for size of interpreter vs. code utility, at least, as a language with a very small definition. Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore. Being simple to implement is also a big advantage on its own, though, and it's not like we have a good optimizing 6502 compiler for any language at the moment.
I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
Why does "token threading" take the name "threading"? Does it have anything to do with threads as concurrently executed code, or does it have a different meaning in this context? What you described sounds like a normal bytecode size optimization (which I've seen in several bytecode formats)-- maybe the extreme case could be a huffman bit encoding based on frequency of use rather than working at the byte level?
I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources, though most of them seem to talk about how small the interpreter and bytecode implementation are. I can see how it has relatively good value for size of interpreter vs. code utility, at least, as a language with a very small definition. Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore. Being simple to implement is also a big advantage on its own, though, and it's not like we have a good optimizing 6502 compiler for any language at the moment.
I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
Re: NORTH
I suppose it's like using a needle to pull a thread through a group of beads, where each "bead" is an instruction. Dr. Brad Rodriguez who's a big name in the field of Forth has an article on five different threading methods at http://www.bradrodriguez.com/papers/moving1.htm . The three most common are indirect-threaded code (or ITC) which I use regularly), direct-threaded code (DTC), and subroutine-threaded code (STC). STC is mostly a list of JSR's, but since it's essentially machine language, you can mix in non-JSR code as well, and select your balance of JSRs versus straightlining for speed. ITC and DTC are mostly lists of addresses, meaning they only take two bytes (instead of three) per instruction. Token threading takes only one byte for all the most commonly used instructions.rainwarrior wrote:Why does "token threading" take the name "threading"?
It is, but it has also been used for a lot of big systems, including (but not limited to) database, spacecraft, space shuttle experiments, airport facilities handling, and hospital & banking management. You can get a fairly powerful system in a small memory footprint too, although the kernel takes a certain amount of memory (probably not less than a few K for a rudimentary one) before the application goes in. A tiny application might be smaller in assembly language since it doesn't need the Forth kernel; but past a certain point, Forth's program memory savings start to pay off nicely.I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources
You can still get a huge advantage in development time once you get your system set up. I hope I don't sound like I'm trying to make anyone do things any certain way though. I've used a few algebraic languages, and most recently as I've tried to learn C, I have been realizing that some people's brains seem to be destined to think one way more than the other (algebraic versus RPN), like it's inborn, not just a matter of background. I definitely do better in RPN.Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore.
True, and probably any mid-level 6502 programmer who wants to apply himself can understand the innards of Forth, including the compiler (which is only a couple hundred bytes or less!).Being simple to implement is also a big advantage on its own
So true (and I'm especially thinking of cc65).and it's not like we have a good optimizing 6502 compiler for any language at the moment.
I've done tests to see how much stack space I was using with the most intensive Forth application I could think of, with IRQs serviced in high-level Forth, plus NMIs going too and serviced in assembly, and it was not even 20% of ZP and page 1. The only truly multitasking projects I've done were cooperative and without a multitasking OS, so they weren't hard on stacks at all. It is easy though to do a round-robin cooperative multitasking OS even on the 6502 though, by dividing the stack space into sections and assigning a section to each task. Then the number of tasks is limited by the stack space. Three would be no sweat at all. Six might be done with care. Beyond that, you'll have to do a lot of analysis, or be copying out sections of dormant tasks to higher memory to make room for active tasks.I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
http://WilsonMinesCo.com/ lots of 6502 resources
Re: NORTH
Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.rainwarrior wrote:speculating about Pubby's comment about code reuse.
I think the most enlightening statement I've heard is that stack languages model function composition by default. So "[foo bar qux baz]" can be read as the composition of four functions. You can do this in Haskell too, but it requires a decent amount of type plumbing, operator overloads, and it's not always flexible.Similar but maybe less rigidly than something like Haskell?
Re: NORTH
Might it be that you're using assembly language for different and less-complex jobs? My last major assembly-language project was on a PIC16 microcontroller which only has an 8-level return stack, and toward the end I was constantly overrunning it and had to find relevant places I could straight-line things that wouldn't run me out of program memory (whose limit I was also pushing). If the PIC16 would let you store data on the return stack, I could have used it even quite a lot deeper. (The PIC16 is really mickey-mouse compared to the 6502, BTW.) I don't think my assembly nests any less deep than my Forth; but my Forth work has definitely influenced my assembly.pubby wrote:Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.
http://WilsonMinesCo.com/ lots of 6502 resources
-
- Posts: 53
- Joined: Sun Jan 31, 2016 9:55 pm
Re: NORTH
Regarding letting interrupts use the same stack. I don't think you need to do anything fancy. The only rule is that the stack pointer needs to be accurate at all times -- you should never read from an unreserved part of the stack.
But it would be totally fine to read from any part of the reserved stack
The interrupt handler would need to have a type of ( -- ), that is, it should leave the stack undisturbed once it returns, and not read the current data on the stack. It would also need to save any temporaries the code uses and restore them once it's done.
The interrupt handler could also leave a small red zone.
Basically, it could leave a small (say 2 byte) buffer below the stack data of the normal code. That would allow normal code to use STACK-1, x and STACK-2, x, but no further. This could potentially cut down on the amount of stack pointer manipulation.
Here's the interrupt handler I'm thinking of. Assume for simplicity x is the stack pointer and STACK only holds 8 bit values.
Does this make sense?
Code: Select all
lda STACK-1, x ; not allowed (assuming stack grows down)
Code: Select all
lda STACK+1, x ; ok
The interrupt handler could also leave a small red zone.
Basically, it could leave a small (say 2 byte) buffer below the stack data of the normal code. That would allow normal code to use STACK-1, x and STACK-2, x, but no further. This could potentially cut down on the amount of stack pointer manipulation.
Here's the interrupt handler I'm thinking of. Assume for simplicity x is the stack pointer and STACK only holds 8 bit values.
Code: Select all
irq:
; save registers
pha
tya
pha
; reserve 2 bytes for temporaries, 2 bytes of red zone
dex
dex
dex
dex
; save temporaries
lda TMP1
sta STACK, x
lda TMP2
sta STACK+1, x
; do stuff... should preserve x by the end
; restore temporaries
lda STACK+1, x
sta TMP2
lda STACK, x
sta TMP1
; free stack space
inx
inx
inx
inx
; restore registers
pla
tay
pla
rti
Re: NORTH
That's true about the stack pointer (extra overhead notwithstanding), but there's at least one more consideration (and maybe more will come to me after I post, LOL). Forth typically has a scratchpad area called N which primitives can use however they like, as long as they are completely done with it when they finish. They cannot use it to pass data to another primitive, or to hold data there for the next time the same primitive runs. On the 6502, 8 bytes seems to be enough for N. It is put in ZP for efficiency, although its being in ZP also means you can use the extra ZP addressing modes in N if necessary. If you allow interrupt service to start without letting the primitive finish, then you'll also need to push all 8 bytes of N. You could have a separate N for interrupts, but then you would be consuming more precious ZP space, and interrupts would not be nestable, meaning you can't have a more-urgent interrupt that would be quick to service cut in on a lower-priority interrupt that may take longer to service.
http://WilsonMinesCo.com/ lots of 6502 resources
-
- Posts: 53
- Joined: Sun Jan 31, 2016 9:55 pm
Re: NORTH
That makes sense. Saving and restoring 8 bytes on the stack for every interrupt wouldn't be ideal. That's a lot of overhead. My code above assumes only 2 bytes that need to be saved.
Having a separate N also forces you to have different versions of the primitives that use N depending on whether it's an interrupt or not. That would preclude code sharing between interrupt code and normal code.
Having a separate N also forces you to have different versions of the primitives that use N depending on whether it's an interrupt or not. That would preclude code sharing between interrupt code and normal code.
-
- Posts: 53
- Joined: Sun Jan 31, 2016 9:55 pm
Re: NORTH
There are some trade-offs, but I wonder if the primitives could get by with just 2 bytes of fixed zero page, and a larger red zone? Fixed bytes have a 1 cycle speed advantage, but you can still use the useful (indirect, x) addressing mode in the red zone. If (indirect), y addressing is needed, then they can use the fixed bytes.
The hardest part would be to deal with cases where a primitive needs more than 1 pointer that uses indirect, y addressing, but you could probably work around it.
The code for an interrupt handler that leaves an arbitrary red zone is mostly the same as above. It just uses this code instead of a string of inx or dex:
The hardest part would be to deal with cases where a primitive needs more than 1 pointer that uses indirect, y addressing, but you could probably work around it.
The code for an interrupt handler that leaves an arbitrary red zone is mostly the same as above. It just uses this code instead of a string of inx or dex:
Code: Select all
txa
clc
adc #redzone
tax