Thoughts on Higher Level Language Design for 6502 Systems

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Thoughts on Higher Level Language Design for 6502 Systems

Post by qbradq » Fri Jul 15, 2011 9:08 am

You may have read David A. Wheeler's 6502 Language Implementation Approaches article. I recently read it but was quite underwhelmed. Although David does point out some of the limitations of the 6502 that makes high-level language implementation challenging or inefficient, he takes the approach of working around those limitations to implement features which are inefficient on the platform.

I have given considerable thought recently to the design of a language for the 6502. The approach I normally take when designing a language is to start at the bottom and work upwards, making common design patterns and idioms used in the lower-level language easier to implement in the higher-level language.

To this end I have described the function of a language for the 6502 that cleanly maps to efficient machine instruction sequences that are extremely similar to those we write by hand. Below I describe the function of the language in terms of how it differs from C without regard to syntax. I look forward to your comments.

Limitations
* Function calls may not be part of an expression, only for single-assignment
* Function calls may not be recursive.
* Arrays are limited to 256 elements, however those elements may be of any size
* No support for dynamic memory allocation
* No multiplication, division or modulus operators, rather they are implemented as built-in functions
* Pointer and string data types not supported
* Comparison operators are not considered expressions
* No support for arrays as a member of a struct

Features
* Direct segment management
* 16-, 24- and 32-bit integer support
* Read and write "files", or large arrays, using built-in functions for pointer manipulation
* Support for structs, arrays, and arrays of structs
* Inline assembly support for every language feature

Thoughts? I would also be interested in hearing what language you would like the syntax based on. I have developed a minimalist line-oriented syntax I use for my languages that lacks punctuation of almost any kind, but it can be hard for someone to pick up if they didn't write the thing :D

User avatar
Memblers
Site Admin
Posts: 3898
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Post by Memblers » Fri Jul 15, 2011 11:25 am

Have you seen this thread?
http://nesdev.com/bbs/viewtopic.php?t=7185
Atalan seemed kinda decent.

I've seen some 6502 people say that Forth is good.

I've been doing assembly for years, and C (and Verilog, it's much like C) are the first high-level languages I've started using fairly recently. I'm not sure yet if I'd seriously anything besides asm and C on NES.

3gengames
Formerly 65024U
Posts: 2281
Joined: Sat Mar 27, 2010 12:57 pm

Post by 3gengames » Fri Jul 15, 2011 11:29 am

BASIC. It runs pretty quick on the TRS-80 Color Computer, and there's a disassembly out there. I know there's a lot of stuff it can take shortcuts with on the 6809, but I think basic on 6502 would be sweet if it worked on the NES. There's also C64 and Apple basic, although they aren't anywhere near the quality of it on the Coco I don't believe myself.

tepples
Posts: 22281
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Fri Jul 15, 2011 11:30 am

There is Family BASIC. If only the (72-pin) NES had a keyboard, I'd probably have made a BASIC interpreter myself by now.

User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq » Fri Jul 15, 2011 11:41 am

I have seen Atalan and thought it looked pretty fuggly. It also uses the approach of forcing the 6502 to implement higher-level language constructs that it's just not that good at. I aim to make a language that exploits the strengths of the 6502.

FORTH has a lot of overhead on a 6502. It's a stack-based language and the 6502 kinda sucks when it comes to stacks.

A BASIC interpreter has even more overhead. The BASICs of the day were designed to allow home users to write meaningful and useful programs, not to squeeze every last cycle out of a CPU, which is what we tend to do in NES development. BASIC is great for writing a Tetris clone, but you'd quickly hit a limit with it.

Also the 6800 series processors had 16-bit register pairs and 16-bit math instructions, and a whole lot of other features that made implementing higher-level languages easier and more efficient.

Thanks for all the input folks! No remarks about the idea yet. I don't know if I'll actually make this compiler or not. I'm trying to decide between this and writing a game in QBasic.

3gengames
Formerly 65024U
Posts: 2281
Joined: Sat Mar 27, 2010 12:57 pm

Post by 3gengames » Fri Jul 15, 2011 11:49 am

6809 is more comparabile to the 68000 than the 6800, check it. And there's also hitachi's 6309.

bogax
Posts: 34
Joined: Wed Jul 30, 2008 12:03 am

Re: Thoughts on Higher Level Language Design for 6502 System

Post by bogax » Sat Jul 16, 2011 3:04 pm

qbradq wrote:You may have read David A. Wheeler's 6502 Language Implementation Approaches article. I recently read it but was quite underwhelmed. Although David does point out some of the limitations of the 6502 that makes high-level language implementation challenging or inefficient, he takes the approach of working around those limitations to implement features which are inefficient on the platform.
What features would those be?
qbradq wrote: The approach I normally take when designing a language is to start at the bottom and work upwards, making common design patterns and idioms used in the lower-level language easier to implement in the higher-level language.
What would those be in the case of the 6502?
qbradq wrote: To this end I have described the function of a language for the 6502 that cleanly maps to efficient machine instruction sequences that are extremely similar to those we write by hand.
Some (more explicit) examples?

User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq » Sun Jul 17, 2011 6:30 am

High-Level Language Features that are Inefficient on a 6502:

1. Stack-based expression parsing, which allows function calls within expressions.
2. Stack-based local variables, which allows function recursion.
3. Array indices greater than 8-bits requires pointer math.
4. True objects (classes, structures with methods, anything with a this pointer) must use pointer math.
5. Pointers in general are problematic due to the requirement to use zero-page locations.


Here are the design patterns I am trying to represent in the language:

1. Mathematical expression solving using up to 32-bit numbers.
2. Procedure calling with local variables and statically addressed parameters.
3. Static arrays.
4. Static data structures.
5. Static arrays of data structures.
6. Memory files (large arrays using pointers for indexing).
7. Jump tables.
8. Procedure pointers (for procedures with no parameters).


And here are the design patterns that I am aware of but not trying to represent:

1. Reuse of RAM locations for multiple variables.
2. Self-modifying code.


So after some thought I am going to implement this language and see how it goes. I am going to use C syntax because it is unambiguous and well-known (Java, JavaScript and C# use the same basic syntax).

I'll start a new thread when I've got something to show. Hopefully by the end of the week I'll have a compiler done. It might take longer though, this is the first language I've tried to write that supports more than one data type.

User avatar
koitsu
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu » Mon Jul 18, 2011 1:24 am

Memblers wrote:I've seen some 6502 people say that Forth is good.
It's not. I have to deal with it occasionally on FreeBSD since the 3rd-stage bootstrap is written in Forth. What an awful atrocity; debugging is the most impossible thing I have ever witnessed. Don't click here unless you want a seizure.

Let's face it: the 6502 is a small, bare-bones microprocessor. The keyword there is micro. "User-friendly" languages, particularly scripted languages, are usually over-abstracted and do too much to hide the inner-workings from the programmer. This sounds sexy and delicious on paper, but in practise/implementation it fails every time.

This is why you'll find people using assembly a lot of the time, and if not assembly then C -- but we can't really use that on the NES 6502 given the limitations of the architecture/platform. That's just the way the ball bounces. Sometimes it's best to keep it bare-bones, you know?

Though we all know Tepples masturbates at night to the thought of python running on the NES... ;P

User avatar
thefox
Posts: 3141
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Post by thefox » Mon Jul 18, 2011 1:46 am

Godspeed qbradq, I think this is a decent idea. It seems like it would address most of the problems I have with C on 6502. I have been trying to emulate much of the same stuff (local variables, parameter passing) using macros, but there's only so much you can do with them (even though I've come to find the CA65 macro system is amazingly flexible). Also 16-bit math gets annoyingly verbose (i.e. hard to read) even with macros.
qbradq wrote:2. Procedure calling with local variables and statically addressed parameters.
And here are the design patterns that I am aware of but not trying to represent:

1. Reuse of RAM locations for multiple variables.
Does this mean that each procedure will have its own area of memory for local variables/parameters? That does seem a little bit wasteful. Would it not be possible for the compiler to detect procedures dependencies on each other, and allocate accordingly? For example, the compiler knows proc1 calls proc2, so it knows not to overlap proc2 local variable/parameter area with proc1's area. Of course there are some problems, like proc1 calling proc2, then proc2 calling proc1 again. That could maybe be fixed by making two copies of proc1.. each using a different area of memory.

strat
Posts: 388
Joined: Mon Apr 07, 2008 6:08 pm
Location: Missouri

Post by strat » Mon Jul 18, 2011 2:49 am

I wanted to make a HLL specially for 6502 (Spoiler: It's easier just to write asm macros). Got as far as designing syntax and creating a parser, and maybe some operations work. These were key features:
-No types; everything is an 8 or 16-bit variable
-Any 16-bit variable can be used like a pointer
-Variables declared in loop, function and "scope" keyword have bytes reused outside scope
-ascii string and const array support
-function params don't use stack

The code basically would have looked like Python with a "*" for memory addressing. "*()" would be Y-indirect.

Wave
Posts: 110
Joined: Mon Nov 16, 2009 5:59 am

Post by Wave » Mon Jul 18, 2011 3:23 am

Have you seen NESHLA?

User avatar
qbradq
Posts: 952
Joined: Wed Oct 15, 2008 11:50 am

Post by qbradq » Mon Jul 18, 2011 5:18 am

Thank you all for the input!
koitsu wrote:This is why you'll find people using assembly a lot of the time, and if not assembly then C -- but we can't really use that on the NES 6502 given the limitations of the architecture/platform. That's just the way the ball bounces. Sometimes it's best to keep it bare-bones, you know?
I know, that's why this language is keeping it bare-bones :D Once I'm done with this puppy I think you'll find the generated code to be very close to what you'd write by hand.
koitsu wrote:Though we all know Tepples masturbates at night to the thought of python running on the NES... ;P
I kinda do too :D
thefox wrote:Does this mean that each procedure will have its own area of memory for local variables/parameters? That does seem a little bit wasteful. Would it not be possible for the compiler to detect procedures dependencies on each other, and allocate accordingly?
I was going to leave call graphs for a latter optimization in the segmenting module. There should not be much CPU improvement by doing this, and I want to do a proof-of-concept to make sure the CPU efficiency works out the way I think it will before I put too much work into this thing.
Wave wrote:Have you seen NESHLA?
Yes, and that is what inspired me to start thinking about this months ago. NESHLA has some great qualities, but it's still lower-level that what I'm trying for. Any software engineer can tell you that all things being equal a higher-level language will allow faster development time, fewer defects and quicker maintenance turn-around that a lower-level language. Well, at least this software engineer will tell you that :D

rudla.kudla
Posts: 21
Joined: Mon Nov 22, 2010 9:36 am

Post by rudla.kudla » Mon Jul 18, 2011 5:20 am

When discussing the language design, you must first decide, whether you are going to implement the language on 6502 system or as a cross-compiler.

While it is possible to implement a decent native compiler (see Action!), I believe today it does not have much practical sense, so let's talk about cross-compiler.

1. Stack-based expression parsing, which allows function calls within expressions.

This is not inefficient, it just complicates the parser a little bit.
When you do not allow function calls, you force the programmer to rewrite the expression

Code: Select all

r = func(a) + func(b)
to

Code: Select all

r1 = func(a)
r2 = func(b)
r = r1 + r2
But that is exactly what the compiler would do. Also note, that unless you want a programming language with RPN, you will need some kind of stack for parsing expressions (to implement operator priority and braces).

3. Array indices greater than 8-bits requires pointer math

You need this, or 2-d arrays. Otherwise you will not be able to support access to video ram on any architecture with memory mapped video ram.

Note, that you can have efficient small arrays with 0..255 indexes while still supporting large arrays.

4. True objects (classes, structures with methods, anything with a this pointer) must use pointer math.

That is not an absolute true. For example if you restrict the number of your objects of same class to 256 (quite reasonable restriction under 6502), you can easily represent a 'pointer' to an object as an index to array of objects. You need to design your whole language around this idea though.

5. No multiplication, division or modulus operators, rather they are implemented as built-in functions

This does not bring any extra effectivity. You can just implement the multiplication and friends as a call to built-in procedure even if the syntax is like A * B.

6. Reuse of RAM locations for multiple variables

You absolutelly need this. Otherwise you are going to ran out of variable space very quickly.


I'm very interested in how far you can go with your language (does it actually have a name?). It can give me some nice ideas for Atalan :-) Also if you need some help, let me know.
I encourage you to take a look at Atalan source code and take whatever parts you need (math routines may be useful, techniques of reusing ram locations etc.) I hope the source code is kind of readable (although it's probably hard reading anyways, as the algorithms are getting kind of complicated when all the compiler features are added).


Rudla
(author of Atalan :-)

rudla.kudla
Posts: 21
Joined: Mon Nov 22, 2010 9:36 am

Post by rudla.kudla » Mon Jul 18, 2011 5:21 am

Oh, I forgot, could you, please, provide more information on Memory files (large arrays using pointers for indexing).

?
How exactly does such structure look and what are the operations?

Thanks,

Rudla

Post Reply