Compiled stack proposal

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Garth
Posts: 184
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Compiled stack proposal

Post by Garth » Wed May 13, 2020 11:33 am

Bregalad wrote:
Wed May 13, 2020 2:24 am
Garth wrote:
Wed May 13, 2020 2:10 am
It seems like some key characteristics of stacks are perhaps not being understood correctly. There's a treatise on 6502 stacks (plural, not just the page-1 hardware stack) at http://wilsonminesco.com/stacks/ . Later chapters do get into local variables and environments, and recursion. You may not need recursion; but once you have local variables and environments, there's nothing keeping you from doing recursion. It's simply not a problem, as long as you don't run out of stack space.
Of course it is a problem, why would you sacrifice a register for being your stack pointer, and use longer, slower instructions, and possibly waste a lot of ROM space, when you're not going to use recursion anyway ?!
True, absolute addressing will take one more cycle than ZP addressing, and indexing will take one cycle more than non-indexed; but if you already have local variables and environments, recursion does not increase the requirements any. That was my point. I thought the context here was an HLL though, since C has been mentioned; and in that case you're making a big performance sacrifice anyway. Actually, I don't recall ever using recursion in my own programming; but I often use re-entrancy, particularly where an interrupt-service routine for an alarm may call a routine that got interrupted.
http://WilsonMinesCo.com/ lots of 6502 resources

User avatar
aa-dav
Posts: 92
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: Compiled stack proposal

Post by aa-dav » Wed May 13, 2020 11:35 am

Hisoft C is C compiler with 24Kb core within target machine (ZX Spectrum 48).

And it's source code for stdlib library is really cool:

Code: Select all

/*****************************/
/*         Hisoft C          */
/* Standard Function Library */
/*          HEADER           */
/*                           */
/* Copyright (C) 1984 Hisoft */
/* Last changed  15 Aug 1985 */
/*****************************/
Local variables were slow and you don't believe this trick:

Code: Select all

#define  FAST  static
TADAMMM!

There were no type void in this age, so:

Code: Select all

#define  void  int
Simple and efficient...

First generation of C language (Kernighan and Ritchie first books!) was very different to modern realities, so functions with non void and int return types requre some declarations:

Code: Select all

/*  Forward declarations for non-int library functions  */
extern char *strcat(), *strncat(), *strcpy(), *strncpy(), *strchr(), *strrchr(),
            *strpbrk(), *calloc(), *malloc(), *sbrk(),    *fgets(),  *gets();
var_args was implemented in non standard way:

Code: Select all

/*  Two variadic arithmetic functions (see manual for details)  */
int max(param_byte_count)  auto
{
  static int argc, *argv, max;

  argc = param_byte_count/2 - 1;
  argv = &param_byte_count  + argc;
  max  = -32767;

  while (argc--)
    {
      if (*argv > max) max = *argv;
      --argv;
    }

  return max;
}
Type casts were made in non-standard way too:

Code: Select all

typedef  char * __char_ptr;
int peek(address)
{
  return  * cast(__char_ptr) address;
}
void poke(address, value)
{
  * cast(__char_ptr) address = value;
}
There was support of 32-bit:

Code: Select all

void long_add(c, a, b)
  char *a, *b, *c;
{
  static unsigned u, i;

  u = 0;
  for (i = 0; i < 4; ++i)
    {
      u   +=  *a++  +  *b++;
      *c++ =  u & 0xff;
      u  >>=  8;
    }
}
Really! In this way...

And meet this true horror of programming in 1980:

Code: Select all

#define  FALSE   0 /* for Boolean operations */
#define  TRUE    1
Thew...
But isn't #define FAST static cool?

coto
Posts: 47
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: Compiled stack proposal

Post by coto » Wed May 13, 2020 12:27 pm

aa-dav wrote:
Wed May 13, 2020 11:35 am
Hisoft C is C compiler with 24Kb core within target machine (ZX Spectrum 48).

And it's source code for stdlib library is really cool:

Code: Select all

/*****************************/
/*         Hisoft C          */
/* Standard Function Library */
/*          HEADER           */
/*                           */
/* Copyright (C) 1984 Hisoft */
/* Last changed  15 Aug 1985 */
/*****************************/
Local variables were slow and you don't believe this trick:

Code: Select all

#define  FAST  static
TADAMMM!

There were no type void in this age, so:

Code: Select all

#define  void  int
Simple and efficient...

First generation of C language (Kernighan and Ritchie first books!) was very different to modern realities, so functions with non void and int return types requre some declarations:

Code: Select all

/*  Forward declarations for non-int library functions  */
extern char *strcat(), *strncat(), *strcpy(), *strncpy(), *strchr(), *strrchr(),
            *strpbrk(), *calloc(), *malloc(), *sbrk(),    *fgets(),  *gets();
var_args was implemented in non standard way:

Code: Select all

/*  Two variadic arithmetic functions (see manual for details)  */
int max(param_byte_count)  auto
{
  static int argc, *argv, max;

  argc = param_byte_count/2 - 1;
  argv = &param_byte_count  + argc;
  max  = -32767;

  while (argc--)
    {
      if (*argv > max) max = *argv;
      --argv;
    }

  return max;
}
Type casts were made in non-standard way too:

Code: Select all

typedef  char * __char_ptr;
int peek(address)
{
  return  * cast(__char_ptr) address;
}
void poke(address, value)
{
  * cast(__char_ptr) address = value;
}
There was support of 32-bit:

Code: Select all

void long_add(c, a, b)
  char *a, *b, *c;
{
  static unsigned u, i;

  u = 0;
  for (i = 0; i < 4; ++i)
    {
      u   +=  *a++  +  *b++;
      *c++ =  u & 0xff;
      u  >>=  8;
    }
}
Really! In this way...

And meet this true horror of programming in 1980:

Code: Select all

#define  FALSE   0 /* for Boolean operations */
#define  TRUE    1
Thew...
But isn't #define FAST static cool?
Damn son.

-

I agree with tepples but I suspect it may be much more work than intended since the paging NES method must make relative calls within a desired scope (from the processor) and the global program scope (when reentrant calls sharing variables). Now, I hope you guys manage to do it and you've got my support!!

Oh, when done, please, add a lot of test cases when compiling different code. That will literally save a ton of wasted hours.

User avatar
NOOPr
Posts: 68
Joined: Tue Feb 27, 2018 10:41 am
Location: Brazil
Contact:

Re: Compiled stack proposal

Post by NOOPr » Wed May 13, 2020 12:30 pm

I made something like this once, it's called pa65.
I have not yet publicly exposed it because it lacks a good documentation.
If you want to try, here it is https://github.com/Parisoft/pa65

User avatar
Bregalad
Posts: 7892
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Re: Compiled stack proposal

Post by Bregalad » Wed May 13, 2020 2:03 pm

Bananmos wrote:
Wed May 13, 2020 7:04 am
Well, I was thinking it'd be nice if the system could just deduce it from parsing jsr instructions, without a need for the pseudo-instruction for most functions.
Exactly. Ideally you would need special key instructions hidden in the instructions only at relevant places, such as start/stop the tree system (around interupt routines for example; or special routines doing trickery beyond the normall call/return system such as my boss routines).

Maybe the program would output a list of .define statements which could in turn be included in the assembly process ?

For example when using semi-auto allocated variables it would be like:

Code: Select all

Myroutine:
   lda myroutine_temp1      ; this variable is auto-allocated
   clc
   adc player_HP                ; This one is definitely not
   sta myroutine_temp2      ; etc...
   jsr something                 ; the parser understands automatically we call something without any extra info needed
   bcc _next:
   rts                                 ; The parser understands automatically that's a possible end of myroutine
_next:
   lda myroutine_temp3    ; but figures there's still another end ahead
   ...
   jmp something_else       ; the parser should understand this is a tail-call

Oziphantom
Posts: 861
Joined: Tue Feb 07, 2017 2:03 am

Re: Compiled stack proposal

Post by Oziphantom » Thu May 14, 2020 12:00 am

Oh yeah Trampolines, didn't even think of those.. they will basically become the nexus of your project and make everything dependent on everything. A "unlink" this function would be needed.

tepples
Posts: 22020
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Compiled stack proposal

Post by tepples » Thu May 14, 2020 9:01 am

Oziphantom wrote:
Wed May 13, 2020 10:46 am
SelfMod, Dispatch via RTS, Double Interrupt stabilization, Main, IRQ, NMI and potentially BRK threads to consider, stack manipulation, rts chaining and just well I want to return a couple up so lets just pop a bunch of stuff off the stack and go for it.
Self-modifying code can be handled the same way so long as variable addresses aren't modified to point at a caller's variables. I haven't put much thought into it seeing as I need as much of the 2048 bytes of internal work RAM for game state as I can get.

RTS dispatch from a jump table is straightforward using tailcalls, which treats every entry in the jump table as a callee. I haven't considered RTS dispatch in the context of jsr coroutine_yield; I imagine it to be a lot trickier.

I'm not sure what "Double Interrupt stabilization" is, but main, IRQ, and NMI threads could each get their own static stack. Ideally, IRQ and NMI wouldn't be very deep anyway unless you're trying to (say) run the game in NMI and low-priority background tasks that can block (like art decompression) in main. And I hadn't considered syntax to add to one stack or the other because my IRQ and NMI routines have never been that deep.

rts chaining and just well I want to return a couple up so lets just pop a bunch of stuff off the stack Unless I'm misunderstanding something, multi-layer return wouldn't need to do anything, as the intermediate callee's local variable lifetimes are ending too. Could you explain what you mean by "RTS chaining"?
Oziphantom wrote:
Wed May 13, 2020 10:46 am
Having function jump into the middle of other functions
If subroutine X jumps into the middle of subroutine Y, this can be done one of two ways:
  1. Subroutine Y is actually two subroutines: Y1 and Y2, with X jumping to the start of Y2, and Y1 falling through to Y2. Fallthrough is marked as a tail call.
  2. Even simpler: Mark that X calls Y when it jumps to the middle of Y. This makes all of Y1's variables and all of Y2's variables live.
Bregalad's sample routine as marked up:

Code: Select all

.proc Myroutine
var temp1
var temp2
   lda temp1                ; this variable is auto-allocated
   clc
   adc player_HP            ; This one is definitely not
   sta temp2                ; etc...
   call something           ; tell the parser that something participates in this static stack in case something is in another source code file
   bcc _next
      rts                   ; Variable lifetime ends when a routine returns
   _next:
   lda myroutine_temp3      ; but figures there's still another end ahead
   ...
   tailcall something_else  ; even if something_else is in another file
.endproc
The primary reason for a distinction between call and jsr and between tailcall and jmp is that a preprocessor cannot tell whether or not a subroutine defined in another source code file participates in the static stack allocation. If call is used on a subroutine that is not a "callee", the other subroutine will not have defined and exported a symbol representing the end of its local variables. This will cause linking to fail with an unresolved external symbol. The other way is to run the preprocessor on everything and arrange for all subroutines to be marked as callees, even if neither var nor leaf is used in that subroutine, so that the preprocessor can emit allocation metadata for all subroutines. This fails when calling third-party libraries, which do not emit allocation metadata.

Oziphantom
Posts: 861
Joined: Tue Feb 07, 2017 2:03 am

Re: Compiled stack proposal

Post by Oziphantom » Fri May 15, 2020 12:20 am

making this NES 6502 rather than 6502 seems a bit of a waste. However you do also have 8K expansion RAM for the NES to consider.

Double Interrupt stabilization:
When you interupt you have a 0-8 clock window of jitter, so to defeat this you can do
Make Interupt fire on Line 10

Code: Select all

;setup irq to fire on next line
;save the current stack pointer
;cli
; waste clocks until near the end of the line if needed
nop
nop
nop
nop ; somewhere along here the next interrupt fires 
nop

;next interrupt
; restore stack to pop off the extra interrupts data
since you interrupted on a nop, a jitter is now 0-1
; if you need dead 0 jitter. 
; waste almost a line then set it up so the fetch of the cmp in 
; lda rasterline
; cmp rasterline 
; fall either on the last clock ( so it is the same) or first clock of the next line ( so it is different )
; beq *
; * ; now if the where equal the branch is taken 3 clock, it the differ not taken 2 clocks
; you are now on clock 3 of the line 100% 
so basically there are 2 interrupts but only 1 rti
The other trick is "executing I/O registers" so you start a timer that counts down then you set the byte before the timer, to be JMP or JSR as you need. and the set the byte after the timer to be 00.
thus as the clocks go you get
JMP $0800
JMP $0700
JMP $0600
....
in a loop, so you set the IRQ handler to the I/O registers so the IRQ fires, then it executes the JMP with the timer byte high given you a routine you enter knowing the jitter on.

rts chaining
Given

Code: Select all

#DrawDebugStringStackNumber kDebugStrings.playerx,0,0
	#DrawDebugStringStackNumber kDebugStrings.playery,0,1
	#DrawDebugStringStackNumber kDebugStrings.playerTileX,0,2
	#DrawDebugStringStackNumber kDebugStrings.playerTileY,0,3
	#DrawDebugStringStackNumber kDebugStrings.playerTileNum,0,4
	#DrawDebugStringStackNumber kDebugStrings.playerColFlags,0,5
	#DrawDebugStringStackNumber kDebugStrings.Map,35,0
	#DrawDebugStringStackNumber kDebugStrings.MapTCP,40,0
	#DrawDebugStringStackNumber kDebugStrings.MapTCP,51,1
	.rta TrampolineCallStackDrawExit 
where the macros expand as

Code: Select all

DrawDebugStringStackNumber .macro 
	.rta jDrawStringAtPrecalc
	.byte \1 
	.word $3000+(80*\3)+\2	
.endm
btw .rta inserts the return address of an object, so basically .word thing-1.
Then you set the stack to be where this data is(or copy it onto the stack), then you RTS to which the RTS then pops off the first address and thus calls it, pops its params and when done, just RTS which calls the next, and the next until it finally pops the last exit function. This way you can change the call order, don't have to put any logic in to work out what is next, no loops and its not statically linked.
multi layer return
Yes the scopes end, but you need to detect that when you do a

Code: Select all

pla
pla
pla
pla
rts
this ends the scope of it and 2 parents, and this function will not continue into any code after said two jsrs

Having function jump into the middle of other functions
I think a flaw with the current example is calling is not only done with a JSR, calling is also done with a JMP or Branch

so given the following structure

Code: Select all

entry 
set temp2 to something
some code
branch A
some code
branch B
rts

A
some code
set temp1 to something
jmp C

B some code
set temp1 t something
jmp C

D some code
set temp1 to something
set temp2 to something
branch C

C
some code
use temp1
jmp E

E
use temp2
some code rts
So the paths are
Entry -> A -> C -> E
Entry -> B -> C -> E
D -> C -> E
thus you need to treat
Entry + A + B as a " function"
C + E as a "function"
D as a "function"

but they don't all "call" each other because they are seen as one "group" i.e logically Entry + A + B + C + E make one function and the variable scope needs to be shared, i.e A sets up something to be used in E, and like wise D is part of the chain but a different branch, so it needs to set up things that are also used in C + E, so its variables must be allocated the same way as Entry + A + B

Post Reply