Function params & local vars using a stack in ASM6

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

frantik
Posts: 377
Joined: Tue Mar 03, 2009 3:56 pm

Post by frantik »

well i'm kind of hesitant to put a constant value into ZP but it might be useful especially if i were to put it after the frame pointer because then the frame pointer could be used as a true pointer and not an offset

i think i'm going to swap the values in x and y and go through the other op codes to make adcLocal and whatnot
frantik
Posts: 377
Joined: Tue Mar 03, 2009 3:56 pm

Re: Function params & local vars using a stack in ASM6

Post by frantik »

someone recently asked me for the files behind the dead links.. here are the macro/function files

http://swiftlytilting.com/files/asm6_functions.zip

function.s

Code: Select all

;=========================================================
; Virtual Stack macros
;=========================================================

; Set up stack at $0500
ifndef stack
   stack = $0500
endif

; Place stackPtr at $ff  (in zero page for faster code)
ifndef stackPtr
   stackPtr = $FF
endif

macro ldStackPtr
	ldy stackPtr
endm

macro stStackPtr
	sty stackPtr
endm

macro ldStackFramePtr
	ldx localVarPtr
endm

macro stStackFramePtr
	stx localVarPtr
endm

;---------------------------------------------------------
; pushStack @value, @restoreX
;---------------------------------------------------------
; pushes a value onto the virtual stack
;
; Expects X to be pointer to end of virtual stack
;
; @value is optional, otherwise the value in A is used
; @restoreX is optional, when defined sets X to stackPtr

macro pushStack @value

   ifdef @value
      lda @value           ; load value into A if supplied
   endif
   
   iny                     ; increase pointer location
   sta stack,y             ; push A onto virtual stack
   
endm

;---------------------------------------------------------
; pullStack @restoreX
;---------------------------------------------------------
; pulls value from stack and places into A
;
; Expects X to be pointer to end of virtual stack
;
; @restoreX is optional, when defined sets X to stackPtr

macro pullStack

   lda stack,y             ; pull A from the virtual stack
   dey                     ; decrease pointer location   

endm

;---------------------------------------------------------
; resetStack
;---------------------------------------------------------
; moves the stack pointer to the start of the stack

macro resetStack

   lda #$00
   sta stackPtr

endm

;=========================================================
; Function macros
;=========================================================

; Set up pointer to local vars

ifndef localVarPtr
   localVarPtr = $FE
endif

; Set up a few custom directives
defineLocalVars equ enum $01
endLocalVars	 equ ende
char 				 equ .byte 00
int				 equ .word 0000

;---------------------------------------------------------
; call @functionName, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8
;---------------------------------------------------------
; calls a subroutine after pushing data onto stack
;
; @functionName is an assembler address
; @p1 thru @p8 are optional parameters
;
; Notes:
; No registers are intact after the function returns.
;
; To return values:
; * Pass by reference
; * Use a global variable
; * Access old data left on stack using ldaReturn

macro call @functionName, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8

   lda stackPtr            ; get stackPtr at start of function call and
   tax                     ; save before params are pushed onto stack
   tay                     ; also place into y to ensure stack operations work correctly

   ifdef @p1               ; push parameters onto stack
      pushStack @p1
   endif
   ifdef @p2
      pushStack @p2
   endif
   ifdef @p3
      pushStack @p3
   endif
   ifdef @p4
      pushStack @p4
   endif
   ifdef @p5
      pushStack @p5
   endif
   ifdef @p6
      pushStack @p6
   endif
   ifdef @p7
      pushStack @p7
   endif
   ifdef @p8
      pushStack @p8
   endif

   txa
   pushStack               ; push old stackPtr onto stack
   sty stackPtr				; save new stackPtr
   sta localVarPtr         ; the old stack ptr location is also the new local ptr location
   tax                     ; Set x to localVarPtr

                           ; Register status when entering function proper:
                           ; Y = Ptr to end of stack (stackPtr)
                           ; X = Ptr to begining of local vars (localVarPtr)
                           ; A = X

   jsr @functionName       ; call function


   ldy stackPtr            ; pull old stackPtr from stack
   lda stack, y
   sta stackPtr            ; update stackPtr
   tay                     ; update X to contain stackPtr

   lda stack, y            ; restore old localVarPtr
   sta localVarPtr			;
   tax							; restore Y to old localVarPtr 

endm

;---------------------------------------------------------
; ldaLocal @num, @restoreY
;---------------------------------------------------------
; loads a local variable into A
;
; Expects x to be pointer to start of local variables
;
; @num is the nth byte in the local variable scope
; @restoreX is optional, when defined sets X to localVarPtr

macro ldaLocal @num, @restoreX

   ifdef @restoreX
      ldy localVarPtr
   endif

   lda #(stack + @num), x
endm


macro LocalOp @op, @num
	@op #(stack + @num), x
endm

;---------------------------------------------------------
; staLocal @num, @restoreY
;---------------------------------------------------------
; stores A into a local variable
;
; Expects x to be pointer to start of local variables
;
; @num is the nth byte in the local variable scope
; @restoreY is optional, when defined sets X to localVarPtr

macro staLocal @num, @restoreX

   ifdef @restoreX
      ldx localVarPtr      
   endif

   LocalOp sta, @num

endm



macro aslLocal @num, @restoreX

   ifdef @restoreX
      ldx localVarPtr      
   endif

   LocalOp asl, @num

endm


macro rolLocal @num, @restoreX

   ifdef @restoreX
      ldx localVarPtr      
   endif

   LocalOp rol, @num

endm


macro lsrLocal @num, @restoreX

   ifdef @restoreX
      ldx localVarPtr      
   endif

   LocalOp lsr, @num

endm


macro adcLocal @num, @restoreX

   ifdef @restoreX
      ldx localVarPtr      
   endif

   LocalOp adc, @num

endm



;---------------------------------------------------------
; alloca @size, @restoreX, @restoreY
;---------------------------------------------------------
; Allocates a block of memory in the local variable scope
;
; Expects y to be pointer to end of virtual stack
; Expects x to be pointer to start of local variables
;
; @size is the number of bytes to allocate
; @restoreX is optional, when defined sets y to stackPtr
; @restoreY is optional, when defined sets x to localVarPtr
;
; Notes:
; This memory is accessed like parameters using sta/ldaLocal
; Allocated memory is lost on function return

macro alloca @size, @restorey, @restorex

   clc
   ifdef @restorey
      ldy stackPtr
   endif
   ifdef @restorex
      ldx localVarPtr      ; read pointer to start of the local vars
   endif

   tya
   adc @size               ; add size to stackptr
   tay
   sta stackPtr            ; update stack pointer
   txa
   sta stack,y             ; store pointer to local vars at new location

endm

macro ldaReturn @num, @restorey

   ifdef @restorey 
      ldy stackPtr
   endif

   lda #(stack + @num), y

endm


macro fastcall @functionName, @pA, @pX, @pY

	ifdef @pA
		lda @pA
	endif
	ifdef @pX
		ldx @pX
	endif
	ifdef @pY
		ldy @pY
	endif
	
	jsr @functionName

endm
macros.s

Code: Select all

include functions.s

enum $ee
	temp1 					byte 00
	temp2 					byte 00
	temp3 					byte 00
	temp4 					byte 00
	Controller1 			byte 00
	OldController1			byte 00
	Controller2				byte 00
	OldController2			byte 00
	VBLANK_counter 		byte 00
	NameTableBufferPtr 	byte 00
	PPUCTRL_MIRROR 		byte 00
	PPUMASK_MIRROR 		byte 00
	PPUADDR_MIRROR 		word 0000
	PPU_HSCROLL 			byte 00
	PPU_VSCROLL 			byte 00
	localVarPtr 			byte 00
	stackPtr 				byte 00
ende


; Memory Map
;
; 0000 - 00ED - ZP Globals available to user
; 00F0 - 00F3 - 4 Temporary ZP globals available to system (and user but use caution)
; 00F4 - 00FF - ZP globals for system use
;	 F4 -   F5 - Controllers
;          F6 - vblank counter
;			  F7 - Nametable Buffer ptr
;          F8 - PPUCTRL_MIRROR
;          F9 - PPUMASK_MIRROR
;     FA - FB - PPUADDR_MIRROR
;      -   FC - Horizontal Scroll
;          FD - Vertical Scroll
;          FE - Function Stack Frame Prointer
;          FF - Function Stack End Pointer
; 
; 0100 - 01FF - System Stack
; 0200 - 02FF - Sprite Data for DMA
; 0300 - 03FF - Nametable buffer
; 0400 - 04FF - Heap (?)
; 0500 - 05FF - Function Stack
; 0600 - 06FF -  
; 0700 - 07FF -


ifndef PPUCTRL_MIRROR
	PPUCTRL_MIRROR = $f8
endif
ifndef PPUMASK_MIRROR
	PPUMASK_MIRROR = $fA
endif





macro iNES_header @prg, @chr, @mapper1, @mapper2

	ifndef @prg
		@prg = #$01
	endif
	ifndef @chr
		@chr = #$01
	endif
	ifndef @mapper1
		@mapper1 = #$00
	endif
	ifndef @mapper2
		@mapper2 = #$00
	endif

	.byte "NES",$1a
	.byte @prg ; PRG-ROM block
	.byte @chr ; CHR-ROM block
	.byte @mapper1 ; mapper info
	.byte @mapper2 ; mapper info
	.byte 0,0,0,0,0,0,0,0  ; pad header to 16 bytes
endm

macro vectors @vblank, @reset, @irq
	org $fffa
	word @vblank, @reset, @irq
endm

;---------------------------------------------------------
;---------------------------------------------------------
;       16-bit increment (macro: inc16 address)
;---------------------------------------------------------
.macro  inc16   @addr

		inc     @addr
		bne     +
		inc     @addr+1
	+
.endm

macro waitVBlank
   lda VBLANK_counter
-wait:
	cmp VBLANK_counter
	beq -wait:
endm


.macro screen_off
	; Disable all graphics.
   lda #$00
   sta PPUCTRL
   sta PPUMASK
   sta PPUCTRL_MIRROR
   sta PPUMASK_MIRROR
.endm

.macro waitVBlank_pollPPU
-	lda PPUSTATUS
	bpl -
endm

.macro waitSprite0Hit
-	lda PPUSTATUS
	cmp #%01000000
	bne -
endm

.macro clearRAM
   lda #$00						; Clear out RAM.
   ldx #$00
-  sta $000,x
	sta $100,x
	sta $200,x
	sta $300,x
	sta $400,x
	sta $500,x
	sta $600,x
	sta $700,x
	inx
	bne -
endm

.macro initSound
   lda #$01						; initialize sound hardware
   sta $4015
   lda #$00
   sta $4001
	lda #$40
	sta $4017
	sta $4011
endm


.macro initNES
	ldx #$FF						; Reset the stack pointer.
   txs

   sei
   cld
   screen_off					; Disable all graphics.
	waitVBlank_pollPPU
	waitVBlank_pollPPU
	clearRAM

	initSound
.endm

macro setPPUCTRL @value
	lda #@value
	sta PPUCTRL_MIRROR
	sta PPUCTRL
endm

macro restorePPUCTRL
	lda PPUCTRL_MIRROR
	sta PPUCTRL
endm

macro setPPUMASK @value
	lda #@value
	sta PPUMASK_MIRROR
	sta PPUMASK
endm

macro updateNameTable @value
	lda PPUCTRL_MIRROR
	and #%11111100
	and #@value
	sta PPUCTRL_MIRROR
	sta PPUCTRL
endm

macro flipNameTable
	lda PPUCTRL_MIRROR
	eor #%00000010
	sta PPUCTRL_MIRROR
	sta PPUCTRL
endm

macro ldaList @listPtr, @index
	ldx @index
	lda @listPtr, x
endm

macro staList @listPtr, @index
	ldx @index
	sta @listPtr, x
endm



; multiply two bytes in memory using russian peasant algorithm
; uses $00, $01, $02 for temporary variables
; returns 16 bit value in $00 and $01
macro mult @value1ptr, @value2ptr, @ret, @temp

	ifndef @ret
		@ret  = $00
	endif

	ifndef @temp
		@temp  = $02
	endif


	lda #$00
	sta @ret
	sta @ret+1
	sta @temp
	jmp start:

-loop:
	asl @value1ptr				; double first value
	rol @temp						; using 16bit precision
	lsr @value2ptr				; halve second vale
start:
	lda @value2ptr				;
	and #01						; is new 2nd value an odd number?
	beq -loop:					;
	clc							; if so, add new 1st value to running total
	lda @ret						;
	adc @value1ptr				;
	sta @ret						;
	lda @ret+1					;
	adc @temp					;
	sta @ret+1					;
	lda @value2ptr				;
	cmp #01						; is 2nd value 1?  if so, we're done
	bne -loop:					; otherwise, loop
endm
nes.h

Code: Select all

; Constants for NES


PPUCTRL 		= $2000
PPUMASK 		= $2001
PPUSTATUS 	= $2002
OAMADDR 		= $2003
PPUSCROLL 	= $2005
PPUADDR 		= $2006
PPUDATA 		= $2007

; PPUCTRL settings

PPUCTRL_NTABLE_0		  = %00000000 ; Select name table 0
PPUCTRL_NTABLE_1	 	  = %00000001 ; Select name table 1
PPUCTRL_NTABLE_2		  = %00000010 ; Select name table 2
PPUCTRL_NTABLE_3		  = %00000011 ; Select name table 3
PPUCTRL_VERTICAL	     = %00000100 ; Increment PPU address by 32 bytes, going down
PPUCTRL_HORIZONTAL     = %00000000 ; Increment PPU address by 1 byte, going across
PPUCTRL_SPR_PTABLE_0	  = %00000000 ; Sprite pattern table 0
PPUCTRL_SPR_PTABLE_1	  = %00001000 ; Sprite pattern table 1
PPUCTRL_BG_PTABLE_0	  = %00000000 ; Background pattern table 0
PPUCTRL_BG_PTABLE_1	  = %00010000 ; Background pattern table 1
PPUCTRL_SPR_8X8		  = %00000000 ; Sprites are 8x8
PPUCTRL_SPR_8X16		  = %00100000 ; Sprites are 8x16
PPUCTRL_VBLANK_NMI_ON  = %10000000 ; write this value to PPUCTRL to enable NMI 
PPUCTRL_VBLANK_NMI_OFF = %00000000 ; write this value to PPUCTRL to disable NMI 


; PPUMASK settings

PPUMASK_GRAYSCALE 		= %00000001
PPUMASK_COLOR		 		= %00000000
PPUMASK_BG_NOCLIP 		= %00000010
PPUMASK_BG_CLIP 			= %00000000
PPUMASK_SPR_NOCLIP 		= %00000100
PPUMASK_SPR_CLIP 			= %00000000
PPUMASK_BG_ON 				= %00001000
PPUMASK_BG_OFF 			= %00000000
PPUMASK_SPR_ON 			= %00010000
PPUMASK_SPR_OFF 			= %00000000
PPUMASK_INTENSE_RED		= %00100000
PPUMASK_INTENSE_GREEN	= %01000000
PPUMASK_INTENSE_BLUE 	= %10000000


OAM_DMA 		= $4014
SNDCHN 		= $4015
CONTROLLER1	= $4016
CONTROLLER2 = $4017

KEY_A      = %10000000
KEY_B      = %01000000
KEY_SELECT = %00100000
KEY_START  = %00010000
KEY_UP     = %00001000
KEY_DOWN   = %00000100
KEY_LEFT   = %00000010
KEY_RIGHT  = %00000001 



; setup RAM constants

RAM_PAT_TABLE_0		= $0000
RAM_PAT_TABLE_1		= $1000                         

RAM_NAME_TABLE_0 		= $2000
RAM_ATTRIB_TABLE_0 	= $23C0
RAM_NAME_TABLE_1 		= $2400
RAM_ATTRIB_TABLE_1 	= $27C0
RAM_NAME_TABLE_2 		= $2800
RAM_ATTRIB_TABLE_2 	= $2BC0
RAM_NAME_TABLE_3 		= $2C00
RAM_ATTRIB_TABLE_3 	= $2FC0

RAM_BG_PALETTE			= $3F00
RAM_SPR_PALETTE		= $3F10

RAM_SAVE					= $6000
RAM_PRG					= $8000

; INES header constants

INES_HORIZONTAL 	= %00000000
INES_VERTICAL 	 	= %00000001
INES_SRAM  		 	= %00000010
INES_TRAINER		= %00000100
INES_4SCREEN   	= %00001000

INES_NROM			= $00
INES_MMC1   		= $10
INES_UNROM			= $20
INES_CNROM			= $30
INES_MMC3			= $40
INES_MMC5			= $50
INES_FFEF4			= $60
INES_AXROM			= $70
INES_FFEF4			= $80
INES_MMC2			= $90
INES_MMC4			= $A0
INES_COLORDREAMS	= $B0
INES_FFEF4			= $C0
INES_CPROM			= $D0
;unused   			= $E0
INES_100IN1			= $F0
Garth
Posts: 246
Joined: Wed Nov 30, 2016 4:45 pm
Location: Southern California
Contact:

Re: Function params & local vars using a stack in ASM6

Post by Garth »

I'm not familiar with ASM6, but I'll be watching this topic with interest. My 6502 stacks treatise addresses local variables in section 14, at http://wilsonminesco.com/stacks/loc_vars.html, and in the middle of the page I express a wish for being able to do something in macros to create and destroy variables in the page-1 stack space, something where I've run into roadblocks with the assemblers I'm familiar with. The stack addressing builds on what's in the top third of the page in section 6, at http://wilsonminesco.com/stacks/parampassing.html, as well as the previous two sections. Since you mention multiplication, by coincidence, I do have a multiplication routine there in section 6, using only data on the page-1 stack, about 1/3 of the way down the page. Recursion takes the local-variable idea a step further, and is discussed in section 15, at http://wilsonminesco.com/stacks/recurse.html .
Also, it seems much better to leave Y free and have X be reserved for something, because the (zp),y addressing mode is the most important to have available to user code all the time, rather than (zp,x).
FWIW, I use (ZP,X) a lot, and rarely (ZP),Y. I use the (ZP,X) in a ZP data stack. This is in Forth, but you don't have to use Forth to take advantage of the method, described in sections 4, 5, and 6. The multiplication routine comes up again 2/3 of the way down the page in section 6, this time using a ZP data stack.
http://WilsonMinesCo.com/ lots of 6502 resources
keldon
Posts: 8
Joined: Wed Jun 07, 2017 7:55 am

Re: Function params & local vars using a stack in ASM6

Post by keldon »

Code: Select all

   ifdef @restoreX
      ldy localVarPtr
   endif
I know this is old (Garth already necro'd ;) ), but shouldn't that be ldx?
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: Function params & local vars using a stack in ASM6

Post by dougeff »

If this comment is accurate...
@restoreX is optional, when defined sets X to localVarPtr
then, I would agree with you. should be LDX
nesdoug.com -- blog/tutorial on programming for the NES
keldon
Posts: 8
Joined: Wed Jun 07, 2017 7:55 am

Re: Function params & local vars using a stack in ASM6

Post by keldon »

I've been playing around with some high level macros as well and I've found a way for a macro to accept either a local parameter or an actual register with the following code:

Code: Select all

MovLocal @index, regA 
I'll also explain how you can achieve the following flexibility with the same macro:

Code: Select all

MovLocal @index, IMMEDIATE $10
This code assumes locals on the stack.

How it works is that local variables range from $0-ff, so we can use $100-105 to represent the registers.

Code: Select all

.enum $100
    regA .dsb 1
    regX .dsb 1
    regY .dsb 1
    regS .dsb 1
    regP .dsb 1
.ende

MIN_REG = regA
MAX_REG = regP

IMMEDIATE equ $800 +
AS_ZERO equ $1000 +
ZERO_TO_ABSOLUTE equ $1800 +

macro MovLocal @to, @from
    if @from = regX
        txa
    endif

    tsx

    if @from < MIN_REG
        tsx
        lda $103 + @from, x
    elseif @from <= MAX_REG
        if @from = regA
        endif
        if @from = regY
            tya
        endif
        if @from = regS
            txa
        endif
        if @from = regP
            php
            pla
        endif

    endif
    sta $103 + @to, x
endm
If you wanted you could go even further and exploit mirroring to pass addresses from zero page (pass zero page address + $1000 then subtract in the macro), allow absolute addressing (even of zero page) and immediate values (pass value + $800, then subtract in the macro).
User avatar
dustmop
Posts: 136
Joined: Wed Oct 16, 2013 7:55 am

Re: Function params & local vars using a stack in ASM6

Post by dustmop »

From my experience, using the stack to store function parameters and local / temp variables leads to a poor experience. The 6502 just wasn't designed for it. It leads to a lot of boilerplate, wastes valuable ROM space, runs slow, and isn't even that convenient due to clobbering the X register. Too many downsides to be worth it.

The solution that Bregalad posted about, of using a chunk of RAM space to pass around globals, certainly solves a lot of these problems, but unfortunately is pretty error-prone. I used it for a few previous projects and it's very easy to introduce subtle bugs without constant vigilance.

In cc65 there's a compiler option --static-locals to allocate each local variable to a static address, such that every memory location used is unique. This leads to code that is pretty fast, but unfortunately wastes a ton of RAM. It also breaks recursion, but I've only ever need recursion once in an NES project and it was a complete edge case.

I've been exploring an idea that improves upon these approaches. Have a tool analyze each function and track all the other functions that it calls. After compilation is done, build a call-graph of the entire program. Starting from the leaves, count how much RAM space each function needs for locals and parameters, and propagate that value upwards to callers. Using this, allocate memory addresses to the function based upon the needs of its locals plus the maximum of its callees. For example:

Code: Select all

void A() {
  int s;
  // ...
  B();
  C();
  D();
}

void B() {
  int t;
  int u;
  // ...
}

void C() {
  int v;
  // ...
}

void D() {
  int w;
  // ...
  B();
}
A calls B,C,D (top-level, entry point)
B and C are leaf calls (don't call anything else).
D calls B.

Starting from the leaves, start allocating memory at $10 (an arbitrary location for this example).

B needs 2 bytes for t and u, so it gets memory address $10 through $11.
C needs 1 byte for v, so it also gets memory address $10. Sharing memory for locals between C and B is okay because neither calls the other.
D needs 1 byte for w, plus 2 bytes for B, so w is assigned $12.
A needs 1 byte for s, then the maximum needed for all of B, C, D, which is 3 bytes. So s gets assigned $13.

The amount of RAM used is going to be equal to the most RAM used by the longest call stack from entry point to leaf.

I have implemented this technique in co2, a scheme-based compiler that I took over for use in an upcoming project. See the file casla.scm for the code. I can imagine an assembler doing something similar, allocating memory locations at link time, though it would need to be aware of jump tables and similar stack manipulating control structures. Possibly even cc65 could get support, assuming someone wanted to add it.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Function params & local vars using a stack in ASM6

Post by tepples »

That sort of static yet overlapping allocation of a nonrecursive thread's local variables sounds like what kevtris described to me before.
User avatar
dustmop
Posts: 136
Joined: Wed Oct 16, 2013 7:55 am

Re: Function params & local vars using a stack in ASM6

Post by dustmop »

Yup, sounds like the same concept. It can even be extended to recursive calls by detecting cycles, tagging functions in or below those cycles, then pushing their locals to the stack upon entry. Haven't implemented that yet, but it's on the TODO list.

Even without recursion, having a tool do this automatically is fantastic. Having an assembler with support would be absolutely killer (also on the TODO list but a bit further down).
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Function params & local vars using a stack in ASM6

Post by rainwarrior »

I normally just document functions that have global side effects, and otherwise sort it out manually:

Code: Select all

extern void foo(); // clobbers c,d,e
I'm truthfully a little surprised that managing these things would be enough of a problem that you'd want to write a call-stack analysis tool to automatically solve it.

Remembering the whole point of this is only an optimization to avoid the overhead of stack variables, too... I don't think the stack really needs to be globally avoided?

For that reason I always thought --static-locals was too heavy-handed (unless RAM is plentiful for your project), though as a pragma that you turn on for specific functions that need more speed it seems nice enough.

Doing it manually also lets you mix use of stack and static variables as needed. CC65 is going to use the stack for a lot of temporary stuff even if the explicit variables are (e.g. in a lot of cases the static variable will end up being pushed to the stack to be the operand of an internal function call like multiply)...

I guess ideally the analysis could be done at link time, even, but it seems like a big ordeal for an optimization that to me is only situationally important... but maybe the goal here is to write code that transparently looks like stack-locals like you'd write for modern C platforms? A portability concern?
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Function params & local vars using a stack in ASM6

Post by tepples »

rainwarrior wrote:I normally just document functions that have global side effects, and otherwise sort it out manually:

Code: Select all

extern void foo(); // clobbers c,d,e
And then run into strange bugs in the caller when the callee is changed to clobber more things. Eventually you get to comments //clobbers a,b,c,f,g,h,i,j,l,m,p which becomes hard to maintain.

In addition, anything where you "sort it out manually" increases the likelihood of inadvertent inconsistent incorrect behavior, that is, lurking bugs. During the development of Thwaite, adding sound effects when an explosion spawned clobbered some of the variables that the collision detection routine was using. This caused other explosions to be triggered prematurely as if in a chain reaction, but it took me quite a while to narrow down exactly what was going wrong.
rainwarrior wrote:I'm truthfully a little surprised that managing these things would be enough of a problem that you'd want to write a call-stack analysis tool to automatically solve it.
"I'm truthfully surprised that bounds checking or type checking would be enough of a problem that you'd want a compiler to save your behind by pointing out obvious bugs to you."
rainwarrior wrote:CC65 is going to use the stack for a lot of temporary stuff
A compiler built around non-recursive, non-reentrant functions being the default situation wouldn't.
rainwarrior wrote:I guess ideally the analysis could be done at link time
That's exactly when the bulk of allocation would happen. Fortunately, ld65 can evaluate some fairly complex* expressions at link time. Should I try making a set of ca65 macros to do this as a proof of concept?
rainwarrior wrote:but maybe the goal here is to write code that transparently looks like stack-locals like you'd write for modern C platforms?
Yes, in the interest of maintainability.
rainwarrior wrote:A portability concern?
Portability concerns become noticeable when I have to spend time explaining to four different people why I haven't already made a full-scale game or a music engine for Super NES.


* Not in the "square root of -1" sense
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Function params & local vars using a stack in ASM6

Post by rainwarrior »

tepples wrote:And then run into strange bugs in the caller when the callee is changed to clobber more things.
No, I haven't had such a bug yet. I agree that it "could", but it literally hasn't in my past few years of NES development.
tepples wrote:Eventually you get to comments //clobbers a,b,c,f,g,h,i,j,l,m,p which becomes hard to maintain.
I don't "eventually" reach that point. If I did find something hard to maintain I'd simplify it.

Sometimes that simplification is just using the stack, which has some pretty useful benefits at times. Like I was saying, this is an attempt at optimization, and I think it's far from ideal to use a solution that forces everything one way or the other.

My statement was my honest opinion/experience: I've been using global temporaries like this habitually for a long time and find a clobberlist really easy to maintain/read/use, so much so that I am surprised by the desire to augment a compiler to automate it.
tepples wrote:In addition, anything where you "sort it out manually" increases the likelihood of inadvertent inconsistent incorrect behavior...
This is a truism, but 500% of 0 is still 0. There are tons of things one has to do manually in programming that could possibly be automated, the question I have, and what I'm expressing surprise at is that this particular problem carries enough weight that it warrants the (in my estimation fairly large) upfront effort involved in automation.

...and I'd be interested in hearing stories about how that weight has come up for you or others in development. (I am, on the other hand, very uninterested in being told that my methods will "eventually" or hypothetically lead to problems for me that I'm not actually having.)
tepples wrote:"I'm truthfully surprised that bounds checking or type checking would be enough of a problem that you'd want a compiler to save your behind by pointing out obvious bugs to you."
Don't be rude. I made an honest statement about my own experience and I don't think this is a fair analogue at all.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Function params & local vars using a stack in ASM6

Post by Oziphantom »

Just though I would offer the path I have taken...
I mostly have things like

Code: Select all

ZPTemp1,2,3,4,5 
Pointer1,2,3,4
etc in my code, and what happens is 3 months later I go to add feature Y and end up trashing this or that. Or it magically works on PAL, but then doing an NTSC port where the code overlaps other things and boom..
So I made a python script ( warning I'm not a Python scriptor so its probably verbose for python and does things the 'C' way ) that parses a verbose code listing.
It adds 3 concepts , the first 2 are

Code: Select all

updateTickdowns
;&&trashes a,x
;&&modfies TickDowns
So trashes means it just writes and they have no meaning
Modifies means its allowed to change the contents and the caller expects it to.
There is a ;&&trashes _DONT_CARE_ form which says this is a high level function that calls everything and I don't

Code: Select all

;&&preserve x,ZPTemp3
This is basically a static assert, in that it will check the function that is called and make sure that neither x,ZPTemp3 is in its modify, or trashes list. So you use it as follows

Code: Select all

    ldx numEnts
-   lda entActive,x
    beq +
    jsr doEnt ;&&preserve x
+   dex
    bpl -
The python code has a parser in it and it will detect if things not listed are changed by the 6502 and tell you, you need to add it to one. I've been trying to come up with ways to make it less "work", but as my more modern and indepth coding style has broken it, I need to upgrade its systems as well as add the "This is missing from X, Press M for Modify, and T for Trash" code.

code is here https://github.com/oziphantom/Misc
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Function params & local vars using a stack in ASM6

Post by tepples »

rainwarrior wrote:
tepples wrote:Eventually you get to comments //clobbers a,b,c,f,g,h,i,j,l,m,p which becomes hard to maintain.
I don't "eventually" reach that point. If I did find something hard to maintain I'd simplify it.
Parts of my recent projects that use close to a dozen bytes' worth of temporaries include these:
  • The main metasprite drawing code
  • The movement subroutine for walking actors, including all terrain collision detection and response
  • The movement subroutine for the player, including player-to-enemy, enemy-to-player, and bullet-to-player collision
  • A subroutine that draws a rectangle of background tile numbers to the transfer buffer
I concede that player movement and metasprite drawing can usefully "clobber all temporaries" except the one reserved for iterating through the actor list. But player movement still has to avoid what the walker subroutine returns, and several enemies call the rectangle drawing code.
rainwarrior wrote:Sometimes that simplification is just using the stack, which has some pretty useful benefits at times.
I will push a temporary value to the stack in some cases. But I have found that stack use becomes less practical as code becomes more branchy, as I have to ensure that a pushed value is pulled in all branches. Or did you mean "using the stack" in a generic sense, such as unconditionally pushing several global temporaries at the start of a subroutine and pulling them at its end? I started doing that toward the end of Thwaite after the clobber problem, treating $0008-$000F as callee-saved temporary variables. (See the prolog and epilog of updateAllExplosions in explosion.s.)
rainwarrior wrote:I've been using global temporaries like this habitually for a long time and find a clobberlist really easy to maintain/read/use
I'm looking for text editors on Windows and GNU/Linux that can be set up to display the clobberlist of the subroutine whose name is under the insertion point. Any recommendations and setup instructions? In addition, keeping the call stack's clobber list in my head becomes doubly hard when the roommate blares the TV or radio at a volume that distracts me from concentrating on what everything clobbers.
rainwarrior wrote:so much so that I am surprised by the desire to augment a compiler to automate it.
A system that can unify allocation of temporary variables with allocation of non-temporary variables could actually reduce the load that a programmer has to keep in his head. Variables used by the only by the in-game engine and variables used only by the menu engine would thus be considered "temporary" and reusable by the other.

Likewise with variables used only in different phases of gameplay, such as in the cannon placement, battle, and rebuilding phases of Rampart. Only the rebuilding phase needs temporary flood-filling space to calculate which parts of the player's territory have completely been surrounded by walls, and only the battle phase needs the coordinates of cannonballs in flight. But all in-game phases share some information, such as where the walls are, where the cannons are, the players' scores, and possibly owned but undeployed cannons and how much money each player has earned to buy more cannons. One might have to allocate scratchpad areas used by rebuild phase flood fill and then manually allocate cannonball coordinate variables for battle phase and artillery shop for placement phase on top of that scratchpad.

This single abstraction would cover temporary variables per subroutine and RAM overlay segments per phase.
rainwarrior wrote:what I'm expressing surprise at is that this particular problem carries enough weight that it warrants the (in my estimation fairly large) upfront effort involved in automation.
I'm willing to do the up-front effort and license it suitably, as I did for Pently, Popslide, and my VWF library. Once it's done, it's done for every ca65 user.
rainwarrior wrote:
tepples wrote:"I'm truthfully surprised that bounds checking or type checking would be enough of a problem that you'd want a compiler to save your behind by pointing out obvious bugs to you."
Don't be rude. I made an honest statement about my own experience and I don't think this is a fair analogue at all.
Object lifetime is one of the big problems that C++ fans claim that C++ solves better than some other languages. Just as templates and polymorphism fix C's "everything is a void *", destructors and STL containers can be used to prevent leaks, double frees, and use after free statically through the language's semantics. And I consider clobber list management analogous to object lifetime.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Function params & local vars using a stack in ASM6

Post by rainwarrior »

tepples wrote:Parts of my recent projects that use close to a dozen bytes' worth of temporaries include these:
  • The main metasprite drawing code
  • The movement subroutine for walking actors, including all terrain collision detection and response
  • The movement subroutine for the player, including player-to-enemy, enemy-to-player, and bullet-to-player collision
  • A subroutine that draws a rectangle of background tile numbers to the transfer buffer
That's interesting. Well, hoping it's not too much of a digression to indulge, i'll compare my current project:

My metasprite code has only 4 global temporaries. 1 each for the base X/Y position, 1 for the tile count, and a pointer to the sprite data.

For collision, there are 4 global temporaries that I use for a bounding box, and anything in the code to do with a bounding box will use those same 4 temporaries, generally. Most of the collision checks use 3 additional temporaries on top of the bounding box (for X/Y and one for arithmetic). The player bounding box isn't in global temporaries, and moving blocking rectangles in the world are also dedicated (and handled transparently by the collision routines)... so I have a player-box collide, but there isn't actually a generic box-box one; the need surprisingly hasn't come up yet.

The one thing that uses the most temporaries is a generic "move character" function that moves a rectangle colliding it with the world. It inherits the 7 used by collision, and adds 7 more.
tepples wrote:I concede that player movement and metasprite drawing can usefully "clobber all temporaries"...
Well, in my code every character has a separate "tick" and "draw" routine, and each of them is allowed to clobber any of the shared global temporaries (named i-w, plus 2 pointers). It's kind of a little sandbox, and the concerns are separated: the tick code won't draw sprites, the draw code will never move the character. Maybe that's a big part of why I don't find organizing the temporaries much of a problem.

That generic move is kind of a temporary hog, but I find it pretty easy to accommodate.

I actually wouldn't want my metasprite routine to clobber too much-- several of my characters are drawn from multiple pieces, and do use temporaries around that.
tepples wrote:...except the one reserved for iterating through the actor list.
!? I'm very surprised that you would want to use a global temporary for that and not a dedicated variable.
tepples wrote:I have found that stack use becomes less practical as code becomes more branchy
Sure, that's a valid concern, though I haven't found it that troubling myself. In my current project not needing IRQs and using BRK as a crash-handler has actually made it quick/easy to diagnose most unbalanced-stack errors, which has been pretty cool but I realize not everyone has that luxury.
tepples wrote:...unconditionally pushing several global temporaries at the start of a subroutine and pulling them at its end?
Sure, sometimes you need temporary space but it's not appropriate to allocate more, and the stack is usually a good place for this, whether you want to use it directly or to temporarily make room on your ZP, or whatever.

Using the stack is the best way to make things truly "local", and that's a very good property to have. Storing locals statically is an performance optimization, and most code is not performance critical.

So yeah, you could write a static local manager to figure that optimization out for you, I get that... but it's not something I've ever found myself desiring in my NES work.
tepples wrote:I'm looking for text editors on Windows and GNU/Linux that can be set up to display the clobberlist of the subroutine whose name is under the insertion point.
I know there are editors that can parse various documentation comment formats and show them to you, but I'm not familiar with them. I personally just used Notepad++ or Visual Studio most of the time. Often I use a split view to see "header"/documentation for quick reference in one column and the code i'm editing in another.

Most of the time I don't actually even need to look up the clobber list, though. If the character tick routine I'm writing isn't currently holding values in temporaries, it doesn't actually matter what a called function uses. I don't need to rely on my (fallible) memory if I know the clobbers aren't relevant anyway.

Could also be a factor that I prototype most things in C++, so I don't end up having to do a lot of iteration in actual NES code. This probably lowers the frequency that I need to actually think about temporary management too.
tepples wrote:A system that can unify allocation of temporary variables with allocation of non-temporary variables...
Likewise with variables used only in different phases of gameplay, such as in the cannon placement, battle, and rebuilding phases of Rampart...
This single abstraction would cover temporary variables per subroutine and RAM overlay segments per phase.
I understand the problem of reusing memory for different game phases, but I view that as a separate issue from managing static local temporaries. Any stuff that would persist longer than a frame I don't imagine encapsulating as a function local, and that kind of RAM management should be pretty explicit and directly tied to the particulars of the game... so I'm unable to envision what kind of "system" you're thinking about that handles all of this at once. (I'm not asking/challenging you to outlay a design, though; as I've been saying all along I've found existing tools adequate for it. Though, I suppose designing that kind of thing is the point of this thread...)
tepples wrote:
rainwarrior wrote:Don't be rude.
I consider clobber list management analogous to object lifetime.
Sorry, the analogy seemed so strained to me as to be mocking. I shouldn't have interpreted it that way, as I'm sure it was not intended.

dustmop wrote:I have implemented this technique in co2, a scheme-based compiler that I took over for use in an upcoming project. See the file casla.scm for the code. I can imagine an assembler doing something similar, allocating memory locations at link time, though it would need to be aware of jump tables and similar stack manipulating control structures. Possibly even cc65 could get support, assuming someone wanted to add it.
Have had some time to look at this example, and it's interesting to see.

With the very functional style scheme code you're writing, I can see how the call stack quickly builds and local temporaries are a rapidly mounting problem. It really does demonstrate why you had a need to write a system to manage them as statics.
Post Reply