NORTH

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: NORTH

Post by rainwarrior »

I guess interpreted FORTH should have a form of "self modifying code", because you can redefine old dictionary entries, right? This doesn't seem to apply to this compiled FORTH implementation, though, which appears to have a static dictionary at run-time. This is an interesting language feature to me, though, at least, even if not relevant here.

I can also see how having a separate parameter stack and execution stack makes a lot of argument wrangling very simple compared to most other languages. I guess part of that is only having one "type", as well, though that seems to be as much a help as hindrance...

Writing the smallest possible code would likely involve a bytecode interpreter, again not part of this implementation, but there are other examples of this (e.g. SWEET16).

My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once. You can use a separate parameter stack as a convention if you like. You can make bytecodes if it'll make things smaller. Maybe you would argue that once you have a bytecode interpreter you're not actually writing in "assembly" anymore, but I don't really want to have an argument about the definition, I'm just wondering if there's really something special in FORTH that you were making reference to that I'm not understanding.

I mean, if it was really a statement about code text size, that's fine, I'd agree, I just misunderstood the context because I thought we were talking about binary size. If it was a statement about the "FORTH mindset" producing better code, then that's some subjective thing that I probably don't care to argue about.


At any rate, I'm looking forward to seeing what you make with it. I'd really like to know what a larger well written game in FORTH looks like.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: NORTH

Post by Garth »

rainwarrior, I take it you are addressing multiple people here. My own last post was just giving some background to answer the implied question about why the term "words" is used in Forth. Parts of the Forth approach can be taken even in assembly language though, and you can also cut into and out of assembly at any point in an otherwise Forth program.

One of the Forth threading methods is called "token threading," where all the most commonly used words are given a one-byte token in the compilation instead of the two-byte address of the code to run for each word. Then there must be an interpreter that looks up the execution address from a table at run time. This slows it down but reduces the binary size. Assuming you allow for more than 256 words, at least one byte value must be reserved to tell this interpreter that next you're going to give an address or at least a second token byte, instead of it being a one-byte token. If you implement this, you can still add words at any time.

You can indeed do self-modifying code in Forth, but the more common way to redefine words is to just write your new version. Anytime there's compilation or there's interpretation of the input stream, FIND searches for the word in the current context, beginning with the most recently defined word in the context, and working backwards. That way the most recently defined version will be found first, ending the search. That one could even be defined in terms of the last previous defined version. If you decide you don't like the new one and you want to go back to the old one, you can just FORGET the new one, and the dictionary pointer is put back to point where it did right before you defined that newest one.

Parenthetical note: FIND normally runs only during compilation or when you're interpreting the text input stream, not when compiled code is running; so if you redefine a word that other already-defined word is using, that word will also have to be redefined to use the new one, or you'll need to modify it. Several approaches are possible, and none of them require re-compiling very much, which is one reason Forth's interactive development goes so fast.

If desired, you can have more stacks than just the return stack (which is in page 1 on the 6502) and the data/parameter stack (which is normally in page 0 on the 6502). The most common third stack is a floating-point stack (although it's also common to handle floating-point operations on the normal data stack if there's no separate floating-point stack). You could add a complex-number stack, even a string stack, or whatever you like.
My main thought here is just that nothing is out of bounds in assembly. You can use everybody's tricks at once.
Absolutely. Go for it.
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: NORTH

Post by rainwarrior »

It was mostly a continuation of my previous post, speculating about Pubby's comment about code reuse. I knew the FORTH definition of "word" but maybe your post got me thinking about it a little more.


Why does "token threading" take the name "threading"? Does it have anything to do with threads as concurrently executed code, or does it have a different meaning in this context? What you described sounds like a normal bytecode size optimization (which I've seen in several bytecode formats)-- maybe the extreme case could be a huffman bit encoding based on frequency of use rather than working at the byte level? ;)


I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources, though most of them seem to talk about how small the interpreter and bytecode implementation are. I can see how it has relatively good value for size of interpreter vs. code utility, at least, as a language with a very small definition. Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore. Being simple to implement is also a big advantage on its own, though, and it's not like we have a good optimizing 6502 compiler for any language at the moment. :P

I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: NORTH

Post by Garth »

rainwarrior wrote:Why does "token threading" take the name "threading"?
I suppose it's like using a needle to pull a thread through a group of beads, where each "bead" is an instruction. Dr. Brad Rodriguez who's a big name in the field of Forth has an article on five different threading methods at http://www.bradrodriguez.com/papers/moving1.htm . The three most common are indirect-threaded code (or ITC) which I use regularly), direct-threaded code (DTC), and subroutine-threaded code (STC). STC is mostly a list of JSR's, but since it's essentially machine language, you can mix in non-JSR code as well, and select your balance of JSRs versus straightlining for speed. ITC and DTC are mostly lists of addresses, meaning they only take two bytes (instead of three) per instruction. Token threading takes only one byte for all the most commonly used instructions.
I am seeing a lot of things around claiming that FORTH is good for "small" systems with limited resources
It is, but it has also been used for a lot of big systems, including (but not limited to) database, spacecraft, space shuttle experiments, airport facilities handling, and hospital & banking management. You can get a fairly powerful system in a small memory footprint too, although the kernel takes a certain amount of memory (probably not less than a few K for a rudimentary one) before the application goes in. A tiny application might be smaller in assembly language since it doesn't need the Forth kernel; but past a certain point, Forth's program memory savings start to pay off nicely.
Though, since this is a static compilation project, and not using a bytecode interpreter, I'm not sure it compares the same way anymore.
You can still get a huge advantage in development time once you get your system set up. I hope I don't sound like I'm trying to make anyone do things any certain way though. I've used a few algebraic languages, and most recently as I've tried to learn C, I have been realizing that some people's brains seem to be destined to think one way more than the other (algebraic versus RPN), like it's inborn, not just a matter of background. I definitely do better in RPN.
Being simple to implement is also a big advantage on its own
True, and probably any mid-level 6502 programmer who wants to apply himself can understand the innards of Forth, including the compiler (which is only a couple hundred bytes or less!).
and it's not like we have a good optimizing 6502 compiler for any language at the moment. :P
So true (and I'm especially thinking of cc65).
I can also see how the language's constraints push you toward many short "word" definitions, reusing them hierarchically with the return stack, and favours using the stacks to solve most of your problems. Similar but maybe less rigidly than something like Haskell? Of course, a deep stack has a significant memory footprint here as well when you've got less than 2k to work with. I'd imagine the rigid structure is well suited to static analysis for compiler optimization too, but again there's nothing to compare against anyway.
I've done tests to see how much stack space I was using with the most intensive Forth application I could think of, with IRQs serviced in high-level Forth, plus NMIs going too and serviced in assembly, and it was not even 20% of ZP and page 1. The only truly multitasking projects I've done were cooperative and without a multitasking OS, so they weren't hard on stacks at all. It is easy though to do a round-robin cooperative multitasking OS even on the 6502 though, by dividing the stack space into sections and assigning a section to each task. Then the number of tasks is limited by the stack space. Three would be no sweat at all. Six might be done with care. Beyond that, you'll have to do a lot of analysis, or be copying out sections of dormant tasks to higher memory to make room for active tasks.
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: NORTH

Post by pubby »

rainwarrior wrote:speculating about Pubby's comment about code reuse.
Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.
Similar but maybe less rigidly than something like Haskell?
I think the most enlightening statement I've heard is that stack languages model function composition by default. So "[foo bar qux baz]" can be read as the composition of four functions. You can do this in Haskell too, but it requires a decent amount of type plumbing, operator overloads, and it's not always flexible.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: NORTH

Post by Garth »

pubby wrote:Admittedly I wasn't trying to say very much. It was just a personal observation then when I write assembly, my subroutines rarely nest more than 2 or 3 layers deep. With, FORTH, my subroutine nest much deeper. So maybe that saves bytes, maybe it doesn't. Probably it falls into the "subjective" category.
Might it be that you're using assembly language for different and less-complex jobs? My last major assembly-language project was on a PIC16 microcontroller which only has an 8-level return stack, and toward the end I was constantly overrunning it and had to find relevant places I could straight-line things that wouldn't run me out of program memory (whose limit I was also pushing). If the PIC16 would let you store data on the return stack, I could have used it even quite a lot deeper. (The PIC16 is really mickey-mouse compared to the 6502, BTW.) I don't think my assembly nests any less deep than my Forth; but my Forth work has definitely influenced my assembly.
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: NORTH

Post by pubby »

Garth wrote:Might it be that you're using assembly language for different and less-complex jobs?
That's true, too.
russellsprouts
Posts: 53
Joined: Sun Jan 31, 2016 9:55 pm

Re: NORTH

Post by russellsprouts »

Regarding letting interrupts use the same stack. I don't think you need to do anything fancy. The only rule is that the stack pointer needs to be accurate at all times -- you should never read from an unreserved part of the stack.

Code: Select all

lda STACK-1, x ; not allowed (assuming stack grows down)
But it would be totally fine to read from any part of the reserved stack

Code: Select all

lda STACK+1, x ; ok
The interrupt handler would need to have a type of ( -- ), that is, it should leave the stack undisturbed once it returns, and not read the current data on the stack. It would also need to save any temporaries the code uses and restore them once it's done.

The interrupt handler could also leave a small red zone.
Basically, it could leave a small (say 2 byte) buffer below the stack data of the normal code. That would allow normal code to use STACK-1, x and STACK-2, x, but no further. This could potentially cut down on the amount of stack pointer manipulation.

Here's the interrupt handler I'm thinking of. Assume for simplicity x is the stack pointer and STACK only holds 8 bit values.

Code: Select all

irq:
    ; save registers
    pha
    tya
    pha

    ; reserve 2 bytes for temporaries, 2 bytes of red zone
    dex
    dex
    dex
    dex

    ; save temporaries
    lda TMP1
    sta STACK, x
    lda TMP2
    sta STACK+1, x

    ; do stuff... should preserve x by the end

    ; restore temporaries
    lda STACK+1, x
    sta TMP2
    lda STACK, x
    sta TMP1
    
    ; free stack space
    inx
    inx
    inx
    inx

    ; restore registers
    pla
    tay
    pla

    rti
Does this make sense?
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: NORTH

Post by Garth »

That's true about the stack pointer (extra overhead notwithstanding), but there's at least one more consideration (and maybe more will come to me after I post, LOL). Forth typically has a scratchpad area called N which primitives can use however they like, as long as they are completely done with it when they finish. They cannot use it to pass data to another primitive, or to hold data there for the next time the same primitive runs. On the 6502, 8 bytes seems to be enough for N. It is put in ZP for efficiency, although its being in ZP also means you can use the extra ZP addressing modes in N if necessary. If you allow interrupt service to start without letting the primitive finish, then you'll also need to push all 8 bytes of N. You could have a separate N for interrupts, but then you would be consuming more precious ZP space, and interrupts would not be nestable, meaning you can't have a more-urgent interrupt that would be quick to service cut in on a lower-priority interrupt that may take longer to service.
http://WilsonMinesCo.com/ lots of 6502 resources
russellsprouts
Posts: 53
Joined: Sun Jan 31, 2016 9:55 pm

Re: NORTH

Post by russellsprouts »

That makes sense. Saving and restoring 8 bytes on the stack for every interrupt wouldn't be ideal. That's a lot of overhead. My code above assumes only 2 bytes that need to be saved.

Having a separate N also forces you to have different versions of the primitives that use N depending on whether it's an interrupt or not. That would preclude code sharing between interrupt code and normal code.
russellsprouts
Posts: 53
Joined: Sun Jan 31, 2016 9:55 pm

Re: NORTH

Post by russellsprouts »

There are some trade-offs, but I wonder if the primitives could get by with just 2 bytes of fixed zero page, and a larger red zone? Fixed bytes have a 1 cycle speed advantage, but you can still use the useful (indirect, x) addressing mode in the red zone. If (indirect), y addressing is needed, then they can use the fixed bytes.

The hardest part would be to deal with cases where a primitive needs more than 1 pointer that uses indirect, y addressing, but you could probably work around it.

The code for an interrupt handler that leaves an arbitrary red zone is mostly the same as above. It just uses this code instead of a string of inx or dex:

Code: Select all

    txa
    clc
    adc #redzone
    tax
Post Reply