It is currently Thu Dec 14, 2017 3:20 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Thu Nov 03, 2016 12:14 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19342
Location: NE Indiana, USA (NTSC)
In this post, Dwedit wrote:
The biggest problem with high level languages on the NES is the lack of proper stack support. So local variables don't have a good place to go.
Nobody has written a Stack Flattener that assigns addresses to local variables, depending on the call graph of the program.

I seem to remember this idea having been floated before. Years ago, when I was planning to expand Thwaite past a 16384 byte PRG ROM while still letting it share an UNROM multicart with other games, I wrote a tool to make a call graph so that I could find opportunities to encapsulate parts of the program into a separate PRG bank. For example, that big sink near the bottom center of this graph (109K) is a music engine, now called Pently, and I made it a sink because I had planned to allow it to reside in a separate bank.

The source code of this call graph generator could be adapted into a stack flattener for ca65 assembly programs that calculates where in zero page each subroutine's local variables begin so that inner subroutines don't overwrite the variables that outer subroutines expect to keep preserved. But this would need some sort of mark-up for each subroutine to state how many bytes of local variables a function needs. It'd also be a good idea to state how many of these are "caller-saved" bytes, which the inner subroutine is free to overwrite, as opposed to "callee-saved", which the inner subroutine must not modify. This way, the inner subroutine's variables can overlap the outer subroutine's caller-saved variables. Then the processor would make some sort of header file.

What might be a good way to express how many bytes of local variables each subroutine expects to use, as well as how many are callee-saved and how many are caller-saved? Something like Javadoc comments?


Top
 Profile  
 
PostPosted: Thu Nov 03, 2016 10:59 pm 
Offline

Joined: Sun Jan 31, 2016 9:55 pm
Posts: 38
This is a slightly different solution than what you proposed, but the lowest baseline could look something like this:

I'm picking @@ as a random signifier that is invalid in CA65 code (I think). Some other syntax would probably be better.
Code:
DrawItem:
    stx @@local1
    sty @@local2
    ; ... do stuff
    adc @@local1
    ; ... do stuff
    ; ... make a call to another function
    jsr Foobar
    adc @@local2
    rts

Foobar:
    stx @@local1
    ; ... do stuff using @@local1
    rts


The idea is this: auto-locals are named with @@. Auto-locals are scoped just like local labels. @@ locals are automatically assigned to a usable location on the zero page.

A pre-processor statically analyzes the calls in the code. The static analysis is simple: it know whether each instruction reads or writes from the local variable, and it considers all instructions in the scope as possible. Some basic control flow analysis could allow some variables to be marked as dead or alive at certain points.

The liveness analysis can deduce this from the code:

1. DrawItem uses 2 local variables, local1 and local2
2. Foobar uses 1 local variable, local1 (different from the one used by Foobar, because of scoping)
3. DrawItem::local1 is not used after the call to Foobar
4. DrawItem::local2 is used after the call to Foobar

This creates an interference graph:
DrawItem::local1 interferes with DrawItem::local2
DrawItem::local2 interferes with Foobar::local1

Rather than allocating sections of the zero page, it assigns a zero page location to each local variable so that no interfering variables are in the same place. Therefore, we should make DrawItem::local1 and Foobar::local1 share the same space, giving a total of 2 bytes necessary. The generated code would look like this:

Code:
DrawItem:
   stx $0
    sty $1
    ; ... do stuff
    adc $0
    ; ... do stuff
    ; ... make a call to another function
    jsr Foobar
    adc $1
    rts

Foobar:
    stx $0
    ; ... do stuff using $0
    rts


The next thing to solve is how to represent multi-byte locals that must be contiguous. That's mostly a syntax issue, but it would be necessary for common cases like pointers for indirect lookups.

Recursive functions are more tricky. Maybe the preprocessor would just error if it discovered direct or indirect recursion. In those cases, the programmer would need to explicitly use pha and pla to handle re-entrant subroutines.

Technically, this problem is global register allocation, which is NP-hard. However, real compilers have good heuristics, and NES programs use few locals, so it might not be an issue in practice.

The limitation of this scheme is that it can only handle local variables. It would be nice to have a scheme that optimizes parameter passing between subroutines too.


Top
 Profile  
 
PostPosted: Fri Nov 04, 2016 5:24 am 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 445
Location: Rive nord de Montréal
russellsprouts wrote:
Recursive functions are more tricky. Maybe the preprocessor would just error if it discovered direct or indirect recursion. In those cases, the programmer would need to explicitly use pha and pla to handle re-entrant subroutines.

I know of at least a (C) compiler that offer that kind of memory optimization, and the compiler/linker would outright bail out if it discovered any direct or indirect recursion, as well as (sadly) any usage of a function pointer other than calling it (e.g. any use of the pointer as a value that is stored anywhere).

I don't remember what was that compiler (or rather, toolchain) but it was for 8-bit Microchip PICs, so it's obviously non-free/not open-source.


Top
 Profile  
 
PostPosted: Fri Nov 04, 2016 9:22 am 
Online
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7314
Location: Chexbres, VD, Switzerland
Personally I already had much thoughts about this. The answer is that in the best case (no function pointers, no multithreading, no interrupts coded in a HLL), the problem is solvable, although not as easily as it first seems.

You could consider it "stack flattening", or just stack allocation at compile time, in the end it is the same thing. It cannot be done at compile time, but only at link time, and obviously, when linking the final part with the main function.

Something that can be difficult is to instruct the linker to include some code if stack flattening/compile time stack allocation was successful, and another code it it wasn't. In practice it'd look like that :
[code]
.if stack_flattended
lda #initial value
sta somewhere
.else
lda #initial value
pha
.endif

This conditional code needs to be resolved at link time, if the variable "somewhere" is dynamically allocated, it is on the stack, and if not, it is in a fixed location. Note that the whole toolchain (assembler, and linker) needs to support this feature, not just the compiler. This can be a problem.

Another approach would be simply wasting bytes for local variables everywhere considering they are not optimized. It wastes ram, sure but would end up still more efficient than dynamic runtime stack allocation.


Top
 Profile  
 
PostPosted: Fri Nov 04, 2016 9:56 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19342
Location: NE Indiana, USA (NTSC)
Bregalad wrote:
Personally I already had much thoughts about this. The answer is that in the best case (no function pointers, no multithreading, no interrupts coded in a HLL), the problem is solvable, although not as easily as it first seems.

And in at least the assembly programs that I've coded, polymorphism is through an index into an array of function pointers rather than actually storing a function pointer in a variable that lasts more than a dozen cycles. I plan to give my preprocessor/allocator a keyword to recognize at least RTS Trick tables.

Quote:
Something that can be difficult is to instruct the linker to include some code if stack flattening/compile time stack allocation was successful, and another code it it wasn't.

The allocation method would be considered part of the ABI, and that's provided to both the compiler and the linker.

Quote:
Another approach would be simply wasting bytes for local variables everywhere considering they are not optimized. It wastes ram, sure but would end up still more efficient than dynamic runtime stack allocation.

Do you mean the equivalent of C static local variables?

Even if the exact problem is NP-hard, I imagine that the approximation of caller-saved and callee-saved variables would make it tractable. The assembly programmer splits variables into sets A and B, and only B are expected to retain their value across a subroutine call. And it'd at least be more memory-efficient than using static all over the place, more runtime-efficient than default automatic allocation, and more programmer time-efficient than manually divvying up global variables.


Top
 Profile  
 
PostPosted: Fri Nov 04, 2016 10:57 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5896
Location: Canada
On the thought of manual solutions that don't require new tools to be created, it's kind of interesting to me that the original quoted statement came from a discussion about C++.

C++ has reference & types that would let you alias shared values to locals, which is a fairly clean way to manually arrange it, I think?

You could do similar with #define in C, but it's a little bit hairier (and you'd probably want an additional #undef at the end of the local scope).

You might come up with some macro system that makes it easier to shift blocks of shared local allocations around, for when you need to play tetris with it later on.
Code:
// C++
void f()
{
   char& i = shared_local[5];
   // ...
}

/* C */
void f()
{
   #define i shared_local[5]
   /* ... */
   #undef i
}

/* macros */
void f()
{
   SHARED_LOCAL(5); /* move allocator to position 5 */
   LOCAL_CHAR(i); /* defines i at current allocator position, advances position automatically */
   /* ... */
   #undef i
}


Top
 Profile  
 
PostPosted: Sat Nov 05, 2016 8:39 am 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 3968
For the purpose of stack flattening, is there really a distinction between function parameters and local variables?

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
PostPosted: Sat Nov 05, 2016 10:13 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19342
Location: NE Indiana, USA (NTSC)
The caller needs to be able to see the function parameters in order to write into them, but not other local variables. In addition, if the caller is creating function parameter values by calculating based on its local variables, it needs these parameters not to overlap its own local variables.


Top
 Profile  
 
PostPosted: Sat Nov 05, 2016 1:11 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5896
Location: Canada
Dwedit wrote:
For the purpose of stack flattening, is there really a distinction between function parameters and local variables?

In CC65 a limited number of function parameters can already bypass the stack with the "fastcall" convention.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: Google Adsense [Bot], Yahoo [Bot] and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group