NORTH

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

NORTH

Post by pubby »

I've been meaning to write a FORTH for the NES for quite some time, and inspired by na_th_an's thread I finally got around to it. So here it is: NORTH; NES FORTH.

It's a compiled language. The compiler takes source code and spits out CA65-compatible assembly. The compiler does very little work though. Most of the implementation is just CA65 macros and subroutines.

Because the output is CA65 assembly, it is very easy to mix with hand-written assembly code. You can reference assembly labels in NORTH code, and also reference NORTH labels in assembly code. It's easy to write games that mix and match the two languages.

Anyway, here's a quick top-level overview:
  • All values are 16 bit.
  • Values are stored in a stack, stored in RAM. This can be up to 512 bytes, or less.
  • The X register is used to index into the stack.
  • The hardware stack at $100 is used to store return addresses.
  • 2 bytes of zeropage are needed for storing temporary variables.
And here's some example NORTH code that implements basic movement. I'll attach a ROM of this to this post.

Code: Select all

+=: [ over load + store ]
-=: [ over load minus store ]

umax: [ over over u< 'swap when drop ]
umin: [ over over u> 'swap when drop ]

smax: [ over over s< 'swap when drop ]
smin: [ over over s> 'swap when drop ]

negative?: [0 s<]
abs: [dup negative? [0 minus] when]

friction: [dup abs 2 u<= [drop 0] [dup negative? [2 +] [2 -] if] if]

movePlayer: [
    'player_x player_xspeed.load
    buttons_held.loadLo 'BUTTON_LEFT  & [4 -] when
    buttons_held.loadLo 'BUTTON_RIGHT & [4 +] when
    buttons_held.loadLo 'friction unless
    512 smin -512 smax
    player_xspeed.copy +=

    'player_y player_yspeed.load
    buttons_held.loadLo 'BUTTON_UP   & [4 -] when
    buttons_held.loadLo 'BUTTON_DOWN & [4 +] when
    buttons_held.loadLo 'friction unless
    512 smin -512 smax
    player_yspeed.copy +=
]

drawPlayerSprite: [
    player_y.load (240<<8) umin (OAM+0).storeHi 
    0                           (OAM+1).storeLo 
    0                           (OAM+2).storeLo 
    player_x.load               (OAM+3).storeHi 
]

init: [ 
    0 player_xspeed.store 
    0 player_yspeed.store
    (128<<8) player_x.store 
    (128<<8) player_y.store
]

mainLoop: [ movePlayer drawPlayerSprite ]
I'll put the code here: https://github.com/pubby/north
You'll need a Haskell compiler.

Maybe I'll write documentation if people are interested.
Attachments
example.nes
(24.02 KiB) Downloaded 219 times
User avatar
Anders_A
Posts: 88
Joined: Mon Nov 27, 2006 11:56 pm
Location: Sollentuna, Sweden

Re: NORTH

Post by Anders_A »

This is really interesting! I had never really looked at forth before, but I see that it's very well suited for the 6502.

How could you handle the fact that almost all nes games has to have two threads (main thread and nmi thread)? Should each thread have its own stack? How would the two threads communicate?
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: NORTH

Post by tepples »

Except for "all in NMI" games like Super Mario Bros., the NMI thread is usually limited to a couple things: pushing queued video memory updates and running a music engine. These would typically be in assembly language and thus not need their own Forth stack.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: NORTH

Post by rainwarrior »

Always glad to see a widening of the NES development toolset.
User avatar
NovaSquirrel
Posts: 483
Joined: Fri Feb 27, 2009 2:35 pm
Location: Fort Wayne, Indiana
Contact:

Re: NORTH

Post by NovaSquirrel »

In north.inc, "rot" and "rotR" seem to have a TAY where they should have a TYA. Figured I'd point that out.

About the actual language, I like that it's actually compiled instead of having the overhead that an interpreter would bring.
User avatar
Anders_A
Posts: 88
Joined: Mon Nov 27, 2006 11:56 pm
Location: Sollentuna, Sweden

Re: NORTH

Post by Anders_A »

tepples wrote:Except for "all in NMI" games like Super Mario Bros., the NMI thread is usually limited to a couple things: pushing queued video memory updates and running a music engine. These would typically be in assembly language and thus not need their own Forth stack.
Yes. Of course you could write the two threads in different languages so that they won't use each others resources. But that takes away some of the fun, doesn't it? :)

I saw this neat approach to interrupts in forth. Since it's a "real" forth environment, where words are interpreted at runtime, it can just acknowledge the interupt and insert a word to handle the interrupt after the word currently being processed. So the two can use the same stack without interfering with each other.

Since north is compiled, that approach won't be feasible though, so I'm just curious if pubby has any tricks up his sleave.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: NORTH

Post by tepples »

The Forth words @ and ! are supposed to peek and poke an integer; C@ and C! peek and poke individual bytes.

Source: Variables, Constants, and Arrays
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: NORTH

Post by Garth »

Anders_A wrote:I saw this neat approach to interrupts in forth. Since it's a "real" forth environment, where words are interpreted at runtime, it can just acknowledge the interupt and insert a word to handle the interrupt after the word currently being processed. So the two can use the same stack without interfering with each other.
Thanks for the mention. I'm the author, and that was one of my first published articles (in Forth Dimensions magazine), about 23 years ago, when I was quite green at writing. I've been using that interrupt method for at least 26 years now. I revised the article for 6502.org (where you linked to) in 2003, but making updates to the things I had there was too difficult, so I started my own 6502 site in 2012, and the article is at http://wilsonminesco.com/0-overhead_Forth_interrupts/ . It will remain on 6502.org as well, but my site will always be the one with the latest updates.

The threading method there is indirect threaded code (ITC) which is probably the slowest-running of all the major method but has a couple of advantages otherwise. Forth is always compiled (unless you're interpreting from an input text stream), but you might be thinking of subroutine-threaded code (STC). Interestingly, Bruce Clark explains how the faster-running STC Forth avoids the expected memory penalties. He gives 9 reasons, starting in the middle of his long post in the middle of the page. STC of course eliminates the need for NEXT, nest, and unnest, thus improving speed.
http://WilsonMinesCo.com/ lots of 6502 resources
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: NORTH

Post by pubby »

NovaSquirrel wrote:In north.inc, "rot" and "rotR" seem to have a TAY where they should have a TYA. Figured I'd point that out.
Thanks.
tepples wrote:Except for "all in NMI" games like Super Mario Bros., the NMI thread is usually limited to a couple things: pushing queued video memory updates and running a music engine. These would typically be in assembly language and thus not need their own Forth stack.
I agree.
Anders_A wrote:Since north is compiled, that approach won't be feasible though, so I'm just curious if pubby has any tricks up his sleave.
Sharing the same stack is possible if the stack register (X or Y) is preserved throughout the program. The current implementation doesn't do this though. You'd probably need to change register X to Y, then implement indirect loads as self-modifying code. If you wanted to store the stack register temporarily you'd use a semaphore system.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: NORTH

Post by rainwarrior »

My knowledge of FORTH is a bit rudimentary but I'm curious about terms like (OAM+1) and (128<<8). Why and how are they using an infix notation? Does () pass the enclosed text directly to the assembler without being processed by FORTH?

Is (xxx) equivalent to .word xxx in the generated bytecode or something?

If you're willing, it might be useful to see the assembly output (example.inc?) to understand an example of how things get translated, for people that don't have make and haskell (or other dependencies?) ready to go.
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: NORTH

Post by pubby »

rainwarrior wrote:My knowledge of FORTH is a bit rudimentary but I'm curious about terms like (OAM+1) and (128<<8).
I should probably clarify that this is not a true FORTH. Rather, it is a stack language that takes certain ideas from FORTH, certain ideas from Factor, and mixes in some convenient syntax to mesh with CA65 better. Actually, it's probably closer to Factor than to FORTH. Maybe I should have called it NACTOR.
rainwarrior wrote:Does () pass the enclosed text directly to the assembler without being processed by FORTH?
Yes, that's it. Stuff inside parenthesis is copied verboten into the output.
rainwarrior wrote:If you're willing, it might be useful to see the assembly output (example.inc?) to understand an example of how things get translated, for people that don't have make and haskell (or other dependencies?) ready to go.
https://pastebin.com/raw/f1WcCYfr

For whatever reason, FORTH people like to call subroutines "words" and so that's the terminology I'm using. Anyway, the compiler understands six types of expressions:

Integer Literals

Integer literals (e.g. 100, $FF, -5, %0101) are translated to:

Code: Select all

__push value
(where value is the integer literal)

Word Literals

Word literals, which are just identifiers prefixed by an apostrophe (e.g. 'foo, 'bar, 'qux, '+, '-) are translated to:

Code: Select all

__push wordname
(where wordname is the CA65 label of the word after name mangling)

CA65 Literals

CA65 literals, which are CA65 expressions inside parenthesis (e.g. (OAM+0), (128<<8), (.lobyte(FOO))) are translated to:

Code: Select all

__push expression
(where expression is the CA65 expression copied verboten) Note that (foo) is equivalent to 'foo except it doesn't perform name mangling.

Words

Words (e.g. foo, bar, +, -) are translated to:

Code: Select all

__call sub,wordname
(where wordname is the CA65 label of the word after name mangling) This gets translated into "jsr wordname", though the macro can inline certain calls.

Tail calls have the form:

Code: Select all

__call tail,wordname
And gets translated into "jmp wordname" instead.

Quotations (Lambdas)

Quotations, which are code expressions enclosed in [ ] brackets (e.g. [foo bar qux], [2 +]) are translated to:

Code: Select all

__push __quotN
(where __quotN is a CA65 label corresponding to the quotation subroutine, which will be defined later on in the assembly output)

Address Operations

Address operations, which are labels or CA65 expressions followed by a period and an operation name (e.g. foo.store, bar.load, (OAM+0).store) get translated to:

Code: Select all

__addrOp op,addr
(where addr is the expression to the left of the period and op is the expression to the right.)

Address operations are used to implement loads and stores inline, which would otherwise require slow indirect addressing. The operations are all defined in the macro body.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: NORTH

Post by rainwarrior »

pubby wrote:
rainwarrior wrote:If you're willing, it might be useful to see the assembly output (example.inc?) to understand an example of how things get translated, for people that don't have make and haskell (or other dependencies?) ready to go.
https://pastebin.com/raw/f1WcCYfr
Thanks for this. Really helps understand what's going on.



Ah, I'm realizing that these are all macros, and there is no bytecode, like I had initially assumed.

I notice that the two most common macros __push and __call are ~11 bytes each. This seems to be adding up quickly. This example.nes already appears to be using ~2k of compiled code? (Am I estimating this correctly?)

I mean, it's a speed vs size tradeoff of course, but I've usually found that size is the more precious resource on the NES, and speed can be addressed by selectively optimizing problem areas in assembly.

Anyhow, I know it's still very early, I'm not trying to criticize the design, just I was quite surprised to see that it translates directly into unrolled native code.
User avatar
pubby
Posts: 583
Joined: Thu Mar 31, 2016 11:15 am

Re: NORTH

Post by pubby »

rainwarrior wrote:I notice that the two most common macros __push and __call are ~11 bytes each. This seems to be adding up quickly. This example.nes already appears to be using ~2k of compiled code? (Am I estimating this correctly?)
Without inlining, _call is only 3 bytes (it's just a jsr or jmp). Maybe you were looking at .proc call (library code) by mistake?

Looking at the map file, the example NORTH code compiles to about ~1200 bytes. Disabling inlining and using zeropage stack drops it to ~600 bytes. Using a space efficient __push theoretically drops it to ~450. The library code adds a few hundred bytes on top of this.

Like you said, it's hard to know how these numbers would scale when writing a real program. FORTH can reuse code better than 6502 assembly, for example.

Oh, and there's also sonder's NES forth: Eight. It uses bytecode and was optimized for space, so maybe that would be good to compare to.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: NORTH

Post by rainwarrior »

pubby wrote:Without inlining, _call is only 3 bytes (it's just a jsr or jmp). Maybe you were looking at .proc call (library code) by mistake?
Yes, that was the mistake I made, sorry.

My ~2k estimate was that your NMI vector is at $C8BC (2236 bytes), and it seemed from the CFG/etc that everything above that address should be from "north.inc" and "example.inc". Was there something else in that block that I missed?
pubby wrote:FORTH can reuse code better than 6502 assembly, for example.
What code reuse feature is present in FORTH that is missing in assembly?

I mean, the text of assembly code is hella verbose, but the ability to reuse the binary code is extreme... you can jsr/jmp right into the middle of a subroutine or loop, for example. Self modifying code makes a ton of interesting reuse patterns possible too.

As far as "code reuse" language features, I'd think of things like lambdas, lisp macros, polymorphism, generics/templates etc. but does FORTH have stuff like that? What power of reuse are you invoking? (This is not a rhetorical question, my knowledge of FORTH is limited and I'd be happy to be educated.)

Like if we're talking about the text size of code, there's no argument here that you can do tons more with less text, but when you get down to the size of the binary code... I really don't believe anything has the power that assembly does? I'm having trouble imagining what you meant by that.
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: NORTH

Post by Garth »

pubby wrote:For whatever reason, FORTH people like to call subroutines "words"
Everything in Forth is a word except the data itself. Even . , @ ! " ( etc. are words, just with very short names since they're used constantly, and there is no real punctuation, nor syntax. For example, a variable may not seem like a "word" or routine, but it is one that puts the address of the variable's data space on the data stack. I wouldn't doubt if with some modern, very complex processors, that may take only a single assembly-language instruction.

Anyway, all words have definitions, and the whole collection of them is called the "dictionary." A word can have different meanings in different contexts, just as the word "ball" has one meaning in the context of dance, another in the context of baseball, and another in the context of football. AND has a different meaning in the assembler context (if you've included an assembler in your Forth—and BTW, macro capability is automatic in even a very simple Forth assembler), and another meaning in the Forth context. Not surprisingly, CONTEXT, VOCABULARY, and DEFINITIONS are standard Forth words.

Everything you write extends the dictionary, and your words become every bit as much a part of the language itself as the original kernel's words. ("Language"..."dictionary"..."vocabulary"... see the theme?) You can write new functions, program-flow structures, or anything else, even new operators, and they become part of the dictionary. The compiler looks up the words in the dictionary. When it finds each word, if it finds that the word is not immediate, it lays down the addresses of the code to run it. But if it is immediate, it executes it right then, and that word can optionally take temporary control of the compiler. It makes for a system with virtually no limits.
http://WilsonMinesCo.com/ lots of 6502 resources
Post Reply