I can also see how having a separate parameter stack and execution stack makes a lot of argument wrangling very simple compared to most other languages. I guess part of that is only having one "type", as well, though that seems to be as much a help as hindrance...
Writing the smallest possible code would likely involve a bytecode interpreter, again not part of this implementation, but there are other examples of this (e.g. SWEET16).
My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once. You can use a separate parameter stack as a convention if you like. You can make bytecodes if it'll make things smaller. Maybe you would argue that once you have a bytecode interpreter you're not actually writing in "assembly" anymore, but I don't really want to have an argument about the definition, I'm just wondering if there's really something special in FORTH that you were making reference to that I'm not understanding.
I mean, if it was really a statement about code text size, that's fine, I'd agree, I just misunderstood the context because I thought we were talking about binary size. If it was a statement about the "FORTH mindset" producing better code, then that's some subjective thing that I probably don't care to argue about.
At any rate, I'm looking forward to seeing what you make with it. I'd really like to know what a larger well written game in FORTH looks like.
One of the Forth threading methods is called "token threading," where all the most commonly used words are given a one-byte token in the compilation instead of the two-byte address of the code to run for each word. Then there must be an interpreter that looks up the execution address from a table at run time. This slows it down but reduces the binary size. Assuming you allow for more than 256 words, at least one byte value must be reserved to tell this interpreter that next you're going to give an address or at least a second token byte, instead of it being a one-byte token. If you implement this, you can still add words at any time.
You can indeed do self-modifying code in Forth, but the more common way to redefine words is to just write your new version. Anytime there's compilation or there's interpretation of the input stream, FIND searches for the word in the current context, beginning with the most recently defined word in the context, and working backwards. That way the most recently defined version will be found first, ending the search. That one could even be defined in terms of the last previous defined version. If you decide you don't like the new one and you want to go back to the old one, you can just FORGET the new one, and the dictionary pointer is put back to point where it did right before you defined that newest one.
Parenthetical note: FIND normally runs only during compilation or when you're interpreting the text input stream, not when compiled code is running; so if you redefine a word that other already-defined word is using, that word will also have to be redefined to use the new one, or you'll need to modify it. Several approaches are possible, and none of them require re-compiling very much, which is one reason Forth's interactive development goes so fast.
If desired, you can have more stacks than just the return stack (which is in page 1 on the 6502) and the data/parameter stack (which is normally in page 0 on the 6502). The most common third stack is a floating-point stack (although it's also common to handle floating-point operations on the normal data stack if there's no separate floating-point stack). You could add a complex-number stack, even a string stack, or whatever you like.
Absolutely. Go for it.My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once.
Why does "token threading" take the name "threading"? Does it have anything to do with threads as concurrently executed code, or does it have a different meaning in this context? What you described sounds like a normal bytecode size optimization (which I've seen in several bytecode formats)-- maybe the extreme case could be a huffman bit encoding based on frequency of use rather than working at the byte level?
I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources, though most of them seem to talk about how small the interpreter and bytecode implementation are. I can see how it has relatively good value for size of interpreter vs. code utility, at least, as a language with a very small definition. Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore. Being simple to implement is also a big advantage on its own, though, and it's not like we have a good optimizing 6502 compiler for any language at the moment.
I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
I suppose it's like using a needle to pull a thread through a group of beads, where each "bead" is an instruction. Dr. Brad Rodriguez who's a big name in the field of Forth has an article on five different threading methods at http://www.bradrodriguez.com/papers/moving1.htm . The three most common are indirect-threaded code (or ITC) which I use regularly), direct-threaded code (DTC), and subroutine-threaded code (STC). STC is mostly a list of JSR's, but since it's essentially machine language, you can mix in non-JSR code as well, and select your balance of JSRs versus straightlining for speed. ITC and DTC are mostly lists of addresses, meaning they only take two bytes (instead of three) per instruction. Token threading takes only one byte for all the most commonly used instructions.rainwarrior wrote:Why does "token threading" take the name "threading"?
It is, but it has also been used for a lot of big systems, including (but not limited to) database, spacecraft, space shuttle experiments, airport facilities handling, and hospital & banking management. You can get a fairly powerful system in a small memory footprint too, although the kernel takes a certain amount of memory (probably not less than a few K for a rudimentary one) before the application goes in. A tiny application might be smaller in assembly language since it doesn't need the Forth kernel; but past a certain point, Forth's program memory savings start to pay off nicely.I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources
You can still get a huge advantage in development time once you get your system set up. I hope I don't sound like I'm trying to make anyone do things any certain way though. I've used a few algebraic languages, and most recently as I've tried to learn C, I have been realizing that some people's brains seem to be destined to think one way more than the other (algebraic versus RPN), like it's inborn, not just a matter of background. I definitely do better in RPN.Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore.
True, and probably any mid-level 6502 programmer who wants to apply himself can understand the innards of Forth, including the compiler (which is only a couple hundred bytes or less!).Being simple to implement is also a big advantage on its own
So true (and I'm especially thinking of cc65).and it's not like we have a good optimizing 6502 compiler for any language at the moment.
I've done tests to see how much stack space I was using with the most intensive Forth application I could think of, with IRQs serviced in high-level Forth, plus NMIs going too and serviced in assembly, and it was not even 20% of ZP and page 1. The only truly multitasking projects I've done were cooperative and without a multitasking OS, so they weren't hard on stacks at all. It is easy though to do a round-robin cooperative multitasking OS even on the 6502 though, by dividing the stack space into sections and assigning a section to each task. Then the number of tasks is limited by the stack space. Three would be no sweat at all. Six might be done with care. Beyond that, you'll have to do a lot of analysis, or be copying out sections of dormant tasks to higher memory to make room for active tasks.I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.rainwarrior wrote:speculating about Pubby's comment about code reuse.
I think the most enlightening statement I've heard is that stack languages model function composition by default. So "[foo bar qux baz]" can be read as the composition of four functions. You can do this in Haskell too, but it requires a decent amount of type plumbing, operator overloads, and it's not always flexible.Similar but maybe less rigidly than something like Haskell?
Might it be that you're using assembly language for different and less-complex jobs? My last major assembly-language project was on a PIC16 microcontroller which only has an 8-level return stack, and toward the end I was constantly overrunning it and had to find relevant places I could straight-line things that wouldn't run me out of program memory (whose limit I was also pushing). If the PIC16 would let you store data on the return stack, I could have used it even quite a lot deeper. (The PIC16 is really mickey-mouse compared to the 6502, BTW.) I don't think my assembly nests any less deep than my Forth; but my Forth work has definitely influenced my assembly.pubby wrote:Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.
Code: Select all
lda STACK-1, x ; not allowed (assuming stack grows down)
Code: Select all
lda STACK+1, x ; ok
The interrupt handler could also leave a small red zone.
Basically, it could leave a small (say 2 byte) buffer below the stack data of the normal code. That would allow normal code to use STACK-1, x and STACK-2, x, but no further. This could potentially cut down on the amount of stack pointer manipulation.
Here's the interrupt handler I'm thinking of. Assume for simplicity x is the stack pointer and STACK only holds 8 bit values.
Code: Select all
irq: ; save registers pha tya pha ; reserve 2 bytes for temporaries, 2 bytes of red zone dex dex dex dex ; save temporaries lda TMP1 sta STACK, x lda TMP2 sta STACK+1, x ; do stuff... should preserve x by the end ; restore temporaries lda STACK+1, x sta TMP2 lda STACK, x sta TMP1 ; free stack space inx inx inx inx ; restore registers pla tay pla rti
Having a separate N also forces you to have different versions of the primitives that use N depending on whether it's an interrupt or not. That would preclude code sharing between interrupt code and normal code.
The hardest part would be to deal with cases where a primitive needs more than 1 pointer that uses indirect, y addressing, but you could probably work around it.
The code for an interrupt handler that leaves an arbitrary red zone is mostly the same as above. It just uses this code instead of a string of inx or dex:
Code: Select all
txa clc adc #redzone tax