Function params & local vars using a stack in ASM6
Moderator: Moderators
well i'm kind of hesitant to put a constant value into ZP but it might be useful especially if i were to put it after the frame pointer because then the frame pointer could be used as a true pointer and not an offset
i think i'm going to swap the values in x and y and go through the other op codes to make adcLocal and whatnot
i think i'm going to swap the values in x and y and go through the other op codes to make adcLocal and whatnot
Re: Function params & local vars using a stack in ASM6
someone recently asked me for the files behind the dead links.. here are the macro/function files
http://swiftlytilting.com/files/asm6_functions.zip
function.s
macros.s
nes.h
http://swiftlytilting.com/files/asm6_functions.zip
function.s
Code: Select all
;=========================================================
; Virtual Stack macros
;=========================================================
; Set up stack at $0500
ifndef stack
stack = $0500
endif
; Place stackPtr at $ff (in zero page for faster code)
ifndef stackPtr
stackPtr = $FF
endif
macro ldStackPtr
ldy stackPtr
endm
macro stStackPtr
sty stackPtr
endm
macro ldStackFramePtr
ldx localVarPtr
endm
macro stStackFramePtr
stx localVarPtr
endm
;---------------------------------------------------------
; pushStack @value, @restoreX
;---------------------------------------------------------
; pushes a value onto the virtual stack
;
; Expects X to be pointer to end of virtual stack
;
; @value is optional, otherwise the value in A is used
; @restoreX is optional, when defined sets X to stackPtr
macro pushStack @value
ifdef @value
lda @value ; load value into A if supplied
endif
iny ; increase pointer location
sta stack,y ; push A onto virtual stack
endm
;---------------------------------------------------------
; pullStack @restoreX
;---------------------------------------------------------
; pulls value from stack and places into A
;
; Expects X to be pointer to end of virtual stack
;
; @restoreX is optional, when defined sets X to stackPtr
macro pullStack
lda stack,y ; pull A from the virtual stack
dey ; decrease pointer location
endm
;---------------------------------------------------------
; resetStack
;---------------------------------------------------------
; moves the stack pointer to the start of the stack
macro resetStack
lda #$00
sta stackPtr
endm
;=========================================================
; Function macros
;=========================================================
; Set up pointer to local vars
ifndef localVarPtr
localVarPtr = $FE
endif
; Set up a few custom directives
defineLocalVars equ enum $01
endLocalVars equ ende
char equ .byte 00
int equ .word 0000
;---------------------------------------------------------
; call @functionName, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8
;---------------------------------------------------------
; calls a subroutine after pushing data onto stack
;
; @functionName is an assembler address
; @p1 thru @p8 are optional parameters
;
; Notes:
; No registers are intact after the function returns.
;
; To return values:
; * Pass by reference
; * Use a global variable
; * Access old data left on stack using ldaReturn
macro call @functionName, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8
lda stackPtr ; get stackPtr at start of function call and
tax ; save before params are pushed onto stack
tay ; also place into y to ensure stack operations work correctly
ifdef @p1 ; push parameters onto stack
pushStack @p1
endif
ifdef @p2
pushStack @p2
endif
ifdef @p3
pushStack @p3
endif
ifdef @p4
pushStack @p4
endif
ifdef @p5
pushStack @p5
endif
ifdef @p6
pushStack @p6
endif
ifdef @p7
pushStack @p7
endif
ifdef @p8
pushStack @p8
endif
txa
pushStack ; push old stackPtr onto stack
sty stackPtr ; save new stackPtr
sta localVarPtr ; the old stack ptr location is also the new local ptr location
tax ; Set x to localVarPtr
; Register status when entering function proper:
; Y = Ptr to end of stack (stackPtr)
; X = Ptr to begining of local vars (localVarPtr)
; A = X
jsr @functionName ; call function
ldy stackPtr ; pull old stackPtr from stack
lda stack, y
sta stackPtr ; update stackPtr
tay ; update X to contain stackPtr
lda stack, y ; restore old localVarPtr
sta localVarPtr ;
tax ; restore Y to old localVarPtr
endm
;---------------------------------------------------------
; ldaLocal @num, @restoreY
;---------------------------------------------------------
; loads a local variable into A
;
; Expects x to be pointer to start of local variables
;
; @num is the nth byte in the local variable scope
; @restoreX is optional, when defined sets X to localVarPtr
macro ldaLocal @num, @restoreX
ifdef @restoreX
ldy localVarPtr
endif
lda #(stack + @num), x
endm
macro LocalOp @op, @num
@op #(stack + @num), x
endm
;---------------------------------------------------------
; staLocal @num, @restoreY
;---------------------------------------------------------
; stores A into a local variable
;
; Expects x to be pointer to start of local variables
;
; @num is the nth byte in the local variable scope
; @restoreY is optional, when defined sets X to localVarPtr
macro staLocal @num, @restoreX
ifdef @restoreX
ldx localVarPtr
endif
LocalOp sta, @num
endm
macro aslLocal @num, @restoreX
ifdef @restoreX
ldx localVarPtr
endif
LocalOp asl, @num
endm
macro rolLocal @num, @restoreX
ifdef @restoreX
ldx localVarPtr
endif
LocalOp rol, @num
endm
macro lsrLocal @num, @restoreX
ifdef @restoreX
ldx localVarPtr
endif
LocalOp lsr, @num
endm
macro adcLocal @num, @restoreX
ifdef @restoreX
ldx localVarPtr
endif
LocalOp adc, @num
endm
;---------------------------------------------------------
; alloca @size, @restoreX, @restoreY
;---------------------------------------------------------
; Allocates a block of memory in the local variable scope
;
; Expects y to be pointer to end of virtual stack
; Expects x to be pointer to start of local variables
;
; @size is the number of bytes to allocate
; @restoreX is optional, when defined sets y to stackPtr
; @restoreY is optional, when defined sets x to localVarPtr
;
; Notes:
; This memory is accessed like parameters using sta/ldaLocal
; Allocated memory is lost on function return
macro alloca @size, @restorey, @restorex
clc
ifdef @restorey
ldy stackPtr
endif
ifdef @restorex
ldx localVarPtr ; read pointer to start of the local vars
endif
tya
adc @size ; add size to stackptr
tay
sta stackPtr ; update stack pointer
txa
sta stack,y ; store pointer to local vars at new location
endm
macro ldaReturn @num, @restorey
ifdef @restorey
ldy stackPtr
endif
lda #(stack + @num), y
endm
macro fastcall @functionName, @pA, @pX, @pY
ifdef @pA
lda @pA
endif
ifdef @pX
ldx @pX
endif
ifdef @pY
ldy @pY
endif
jsr @functionName
endm
Code: Select all
include functions.s
enum $ee
temp1 byte 00
temp2 byte 00
temp3 byte 00
temp4 byte 00
Controller1 byte 00
OldController1 byte 00
Controller2 byte 00
OldController2 byte 00
VBLANK_counter byte 00
NameTableBufferPtr byte 00
PPUCTRL_MIRROR byte 00
PPUMASK_MIRROR byte 00
PPUADDR_MIRROR word 0000
PPU_HSCROLL byte 00
PPU_VSCROLL byte 00
localVarPtr byte 00
stackPtr byte 00
ende
; Memory Map
;
; 0000 - 00ED - ZP Globals available to user
; 00F0 - 00F3 - 4 Temporary ZP globals available to system (and user but use caution)
; 00F4 - 00FF - ZP globals for system use
; F4 - F5 - Controllers
; F6 - vblank counter
; F7 - Nametable Buffer ptr
; F8 - PPUCTRL_MIRROR
; F9 - PPUMASK_MIRROR
; FA - FB - PPUADDR_MIRROR
; - FC - Horizontal Scroll
; FD - Vertical Scroll
; FE - Function Stack Frame Prointer
; FF - Function Stack End Pointer
;
; 0100 - 01FF - System Stack
; 0200 - 02FF - Sprite Data for DMA
; 0300 - 03FF - Nametable buffer
; 0400 - 04FF - Heap (?)
; 0500 - 05FF - Function Stack
; 0600 - 06FF -
; 0700 - 07FF -
ifndef PPUCTRL_MIRROR
PPUCTRL_MIRROR = $f8
endif
ifndef PPUMASK_MIRROR
PPUMASK_MIRROR = $fA
endif
macro iNES_header @prg, @chr, @mapper1, @mapper2
ifndef @prg
@prg = #$01
endif
ifndef @chr
@chr = #$01
endif
ifndef @mapper1
@mapper1 = #$00
endif
ifndef @mapper2
@mapper2 = #$00
endif
.byte "NES",$1a
.byte @prg ; PRG-ROM block
.byte @chr ; CHR-ROM block
.byte @mapper1 ; mapper info
.byte @mapper2 ; mapper info
.byte 0,0,0,0,0,0,0,0 ; pad header to 16 bytes
endm
macro vectors @vblank, @reset, @irq
org $fffa
word @vblank, @reset, @irq
endm
;---------------------------------------------------------
;---------------------------------------------------------
; 16-bit increment (macro: inc16 address)
;---------------------------------------------------------
.macro inc16 @addr
inc @addr
bne +
inc @addr+1
+
.endm
macro waitVBlank
lda VBLANK_counter
-wait:
cmp VBLANK_counter
beq -wait:
endm
.macro screen_off
; Disable all graphics.
lda #$00
sta PPUCTRL
sta PPUMASK
sta PPUCTRL_MIRROR
sta PPUMASK_MIRROR
.endm
.macro waitVBlank_pollPPU
- lda PPUSTATUS
bpl -
endm
.macro waitSprite0Hit
- lda PPUSTATUS
cmp #%01000000
bne -
endm
.macro clearRAM
lda #$00 ; Clear out RAM.
ldx #$00
- sta $000,x
sta $100,x
sta $200,x
sta $300,x
sta $400,x
sta $500,x
sta $600,x
sta $700,x
inx
bne -
endm
.macro initSound
lda #$01 ; initialize sound hardware
sta $4015
lda #$00
sta $4001
lda #$40
sta $4017
sta $4011
endm
.macro initNES
ldx #$FF ; Reset the stack pointer.
txs
sei
cld
screen_off ; Disable all graphics.
waitVBlank_pollPPU
waitVBlank_pollPPU
clearRAM
initSound
.endm
macro setPPUCTRL @value
lda #@value
sta PPUCTRL_MIRROR
sta PPUCTRL
endm
macro restorePPUCTRL
lda PPUCTRL_MIRROR
sta PPUCTRL
endm
macro setPPUMASK @value
lda #@value
sta PPUMASK_MIRROR
sta PPUMASK
endm
macro updateNameTable @value
lda PPUCTRL_MIRROR
and #%11111100
and #@value
sta PPUCTRL_MIRROR
sta PPUCTRL
endm
macro flipNameTable
lda PPUCTRL_MIRROR
eor #%00000010
sta PPUCTRL_MIRROR
sta PPUCTRL
endm
macro ldaList @listPtr, @index
ldx @index
lda @listPtr, x
endm
macro staList @listPtr, @index
ldx @index
sta @listPtr, x
endm
; multiply two bytes in memory using russian peasant algorithm
; uses $00, $01, $02 for temporary variables
; returns 16 bit value in $00 and $01
macro mult @value1ptr, @value2ptr, @ret, @temp
ifndef @ret
@ret = $00
endif
ifndef @temp
@temp = $02
endif
lda #$00
sta @ret
sta @ret+1
sta @temp
jmp start:
-loop:
asl @value1ptr ; double first value
rol @temp ; using 16bit precision
lsr @value2ptr ; halve second vale
start:
lda @value2ptr ;
and #01 ; is new 2nd value an odd number?
beq -loop: ;
clc ; if so, add new 1st value to running total
lda @ret ;
adc @value1ptr ;
sta @ret ;
lda @ret+1 ;
adc @temp ;
sta @ret+1 ;
lda @value2ptr ;
cmp #01 ; is 2nd value 1? if so, we're done
bne -loop: ; otherwise, loop
endm
Code: Select all
; Constants for NES
PPUCTRL = $2000
PPUMASK = $2001
PPUSTATUS = $2002
OAMADDR = $2003
PPUSCROLL = $2005
PPUADDR = $2006
PPUDATA = $2007
; PPUCTRL settings
PPUCTRL_NTABLE_0 = %00000000 ; Select name table 0
PPUCTRL_NTABLE_1 = %00000001 ; Select name table 1
PPUCTRL_NTABLE_2 = %00000010 ; Select name table 2
PPUCTRL_NTABLE_3 = %00000011 ; Select name table 3
PPUCTRL_VERTICAL = %00000100 ; Increment PPU address by 32 bytes, going down
PPUCTRL_HORIZONTAL = %00000000 ; Increment PPU address by 1 byte, going across
PPUCTRL_SPR_PTABLE_0 = %00000000 ; Sprite pattern table 0
PPUCTRL_SPR_PTABLE_1 = %00001000 ; Sprite pattern table 1
PPUCTRL_BG_PTABLE_0 = %00000000 ; Background pattern table 0
PPUCTRL_BG_PTABLE_1 = %00010000 ; Background pattern table 1
PPUCTRL_SPR_8X8 = %00000000 ; Sprites are 8x8
PPUCTRL_SPR_8X16 = %00100000 ; Sprites are 8x16
PPUCTRL_VBLANK_NMI_ON = %10000000 ; write this value to PPUCTRL to enable NMI
PPUCTRL_VBLANK_NMI_OFF = %00000000 ; write this value to PPUCTRL to disable NMI
; PPUMASK settings
PPUMASK_GRAYSCALE = %00000001
PPUMASK_COLOR = %00000000
PPUMASK_BG_NOCLIP = %00000010
PPUMASK_BG_CLIP = %00000000
PPUMASK_SPR_NOCLIP = %00000100
PPUMASK_SPR_CLIP = %00000000
PPUMASK_BG_ON = %00001000
PPUMASK_BG_OFF = %00000000
PPUMASK_SPR_ON = %00010000
PPUMASK_SPR_OFF = %00000000
PPUMASK_INTENSE_RED = %00100000
PPUMASK_INTENSE_GREEN = %01000000
PPUMASK_INTENSE_BLUE = %10000000
OAM_DMA = $4014
SNDCHN = $4015
CONTROLLER1 = $4016
CONTROLLER2 = $4017
KEY_A = %10000000
KEY_B = %01000000
KEY_SELECT = %00100000
KEY_START = %00010000
KEY_UP = %00001000
KEY_DOWN = %00000100
KEY_LEFT = %00000010
KEY_RIGHT = %00000001
; setup RAM constants
RAM_PAT_TABLE_0 = $0000
RAM_PAT_TABLE_1 = $1000
RAM_NAME_TABLE_0 = $2000
RAM_ATTRIB_TABLE_0 = $23C0
RAM_NAME_TABLE_1 = $2400
RAM_ATTRIB_TABLE_1 = $27C0
RAM_NAME_TABLE_2 = $2800
RAM_ATTRIB_TABLE_2 = $2BC0
RAM_NAME_TABLE_3 = $2C00
RAM_ATTRIB_TABLE_3 = $2FC0
RAM_BG_PALETTE = $3F00
RAM_SPR_PALETTE = $3F10
RAM_SAVE = $6000
RAM_PRG = $8000
; INES header constants
INES_HORIZONTAL = %00000000
INES_VERTICAL = %00000001
INES_SRAM = %00000010
INES_TRAINER = %00000100
INES_4SCREEN = %00001000
INES_NROM = $00
INES_MMC1 = $10
INES_UNROM = $20
INES_CNROM = $30
INES_MMC3 = $40
INES_MMC5 = $50
INES_FFEF4 = $60
INES_AXROM = $70
INES_FFEF4 = $80
INES_MMC2 = $90
INES_MMC4 = $A0
INES_COLORDREAMS = $B0
INES_FFEF4 = $C0
INES_CPROM = $D0
;unused = $E0
INES_100IN1 = $F0
Re: Function params & local vars using a stack in ASM6
I'm not familiar with ASM6, but I'll be watching this topic with interest. My 6502 stacks treatise addresses local variables in section 14, at http://wilsonminesco.com/stacks/loc_vars.html, and in the middle of the page I express a wish for being able to do something in macros to create and destroy variables in the page-1 stack space, something where I've run into roadblocks with the assemblers I'm familiar with. The stack addressing builds on what's in the top third of the page in section 6, at http://wilsonminesco.com/stacks/parampassing.html, as well as the previous two sections. Since you mention multiplication, by coincidence, I do have a multiplication routine there in section 6, using only data on the page-1 stack, about 1/3 of the way down the page. Recursion takes the local-variable idea a step further, and is discussed in section 15, at http://wilsonminesco.com/stacks/recurse.html .
FWIW, I use (ZP,X) a lot, and rarely (ZP),Y. I use the (ZP,X) in a ZP data stack. This is in Forth, but you don't have to use Forth to take advantage of the method, described in sections 4, 5, and 6. The multiplication routine comes up again 2/3 of the way down the page in section 6, this time using a ZP data stack.Also, it seems much better to leave Y free and have X be reserved for something, because the (zp),y addressing mode is the most important to have available to user code all the time, rather than (zp,x).
http://WilsonMinesCo.com/ lots of 6502 resources
Re: Function params & local vars using a stack in ASM6
Code: Select all
ifdef @restoreX
ldy localVarPtr
endif
Re: Function params & local vars using a stack in ASM6
If this comment is accurate...
then, I would agree with you. should be LDX@restoreX is optional, when defined sets X to localVarPtr
nesdoug.com -- blog/tutorial on programming for the NES
Re: Function params & local vars using a stack in ASM6
I've been playing around with some high level macros as well and I've found a way for a macro to accept either a local parameter or an actual register with the following code:
I'll also explain how you can achieve the following flexibility with the same macro:
This code assumes locals on the stack.
How it works is that local variables range from $0-ff, so we can use $100-105 to represent the registers.
If you wanted you could go even further and exploit mirroring to pass addresses from zero page (pass zero page address + $1000 then subtract in the macro), allow absolute addressing (even of zero page) and immediate values (pass value + $800, then subtract in the macro).
Code: Select all
MovLocal @index, regA
Code: Select all
MovLocal @index, IMMEDIATE $10
How it works is that local variables range from $0-ff, so we can use $100-105 to represent the registers.
Code: Select all
.enum $100
regA .dsb 1
regX .dsb 1
regY .dsb 1
regS .dsb 1
regP .dsb 1
.ende
MIN_REG = regA
MAX_REG = regP
IMMEDIATE equ $800 +
AS_ZERO equ $1000 +
ZERO_TO_ABSOLUTE equ $1800 +
macro MovLocal @to, @from
if @from = regX
txa
endif
tsx
if @from < MIN_REG
tsx
lda $103 + @from, x
elseif @from <= MAX_REG
if @from = regA
endif
if @from = regY
tya
endif
if @from = regS
txa
endif
if @from = regP
php
pla
endif
endif
sta $103 + @to, x
endm
Re: Function params & local vars using a stack in ASM6
From my experience, using the stack to store function parameters and local / temp variables leads to a poor experience. The 6502 just wasn't designed for it. It leads to a lot of boilerplate, wastes valuable ROM space, runs slow, and isn't even that convenient due to clobbering the X register. Too many downsides to be worth it.
The solution that Bregalad posted about, of using a chunk of RAM space to pass around globals, certainly solves a lot of these problems, but unfortunately is pretty error-prone. I used it for a few previous projects and it's very easy to introduce subtle bugs without constant vigilance.
In cc65 there's a compiler option --static-locals to allocate each local variable to a static address, such that every memory location used is unique. This leads to code that is pretty fast, but unfortunately wastes a ton of RAM. It also breaks recursion, but I've only ever need recursion once in an NES project and it was a complete edge case.
I've been exploring an idea that improves upon these approaches. Have a tool analyze each function and track all the other functions that it calls. After compilation is done, build a call-graph of the entire program. Starting from the leaves, count how much RAM space each function needs for locals and parameters, and propagate that value upwards to callers. Using this, allocate memory addresses to the function based upon the needs of its locals plus the maximum of its callees. For example:
A calls B,C,D (top-level, entry point)
B and C are leaf calls (don't call anything else).
D calls B.
Starting from the leaves, start allocating memory at $10 (an arbitrary location for this example).
B needs 2 bytes for t and u, so it gets memory address $10 through $11.
C needs 1 byte for v, so it also gets memory address $10. Sharing memory for locals between C and B is okay because neither calls the other.
D needs 1 byte for w, plus 2 bytes for B, so w is assigned $12.
A needs 1 byte for s, then the maximum needed for all of B, C, D, which is 3 bytes. So s gets assigned $13.
The amount of RAM used is going to be equal to the most RAM used by the longest call stack from entry point to leaf.
I have implemented this technique in co2, a scheme-based compiler that I took over for use in an upcoming project. See the file casla.scm for the code. I can imagine an assembler doing something similar, allocating memory locations at link time, though it would need to be aware of jump tables and similar stack manipulating control structures. Possibly even cc65 could get support, assuming someone wanted to add it.
The solution that Bregalad posted about, of using a chunk of RAM space to pass around globals, certainly solves a lot of these problems, but unfortunately is pretty error-prone. I used it for a few previous projects and it's very easy to introduce subtle bugs without constant vigilance.
In cc65 there's a compiler option --static-locals to allocate each local variable to a static address, such that every memory location used is unique. This leads to code that is pretty fast, but unfortunately wastes a ton of RAM. It also breaks recursion, but I've only ever need recursion once in an NES project and it was a complete edge case.
I've been exploring an idea that improves upon these approaches. Have a tool analyze each function and track all the other functions that it calls. After compilation is done, build a call-graph of the entire program. Starting from the leaves, count how much RAM space each function needs for locals and parameters, and propagate that value upwards to callers. Using this, allocate memory addresses to the function based upon the needs of its locals plus the maximum of its callees. For example:
Code: Select all
void A() {
int s;
// ...
B();
C();
D();
}
void B() {
int t;
int u;
// ...
}
void C() {
int v;
// ...
}
void D() {
int w;
// ...
B();
}
B and C are leaf calls (don't call anything else).
D calls B.
Starting from the leaves, start allocating memory at $10 (an arbitrary location for this example).
B needs 2 bytes for t and u, so it gets memory address $10 through $11.
C needs 1 byte for v, so it also gets memory address $10. Sharing memory for locals between C and B is okay because neither calls the other.
D needs 1 byte for w, plus 2 bytes for B, so w is assigned $12.
A needs 1 byte for s, then the maximum needed for all of B, C, D, which is 3 bytes. So s gets assigned $13.
The amount of RAM used is going to be equal to the most RAM used by the longest call stack from entry point to leaf.
I have implemented this technique in co2, a scheme-based compiler that I took over for use in an upcoming project. See the file casla.scm for the code. I can imagine an assembler doing something similar, allocating memory locations at link time, though it would need to be aware of jump tables and similar stack manipulating control structures. Possibly even cc65 could get support, assuming someone wanted to add it.
Re: Function params & local vars using a stack in ASM6
That sort of static yet overlapping allocation of a nonrecursive thread's local variables sounds like what kevtris described to me before.
Re: Function params & local vars using a stack in ASM6
Yup, sounds like the same concept. It can even be extended to recursive calls by detecting cycles, tagging functions in or below those cycles, then pushing their locals to the stack upon entry. Haven't implemented that yet, but it's on the TODO list.
Even without recursion, having a tool do this automatically is fantastic. Having an assembler with support would be absolutely killer (also on the TODO list but a bit further down).
Even without recursion, having a tool do this automatically is fantastic. Having an assembler with support would be absolutely killer (also on the TODO list but a bit further down).
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Function params & local vars using a stack in ASM6
I normally just document functions that have global side effects, and otherwise sort it out manually:
I'm truthfully a little surprised that managing these things would be enough of a problem that you'd want to write a call-stack analysis tool to automatically solve it.
Remembering the whole point of this is only an optimization to avoid the overhead of stack variables, too... I don't think the stack really needs to be globally avoided?
For that reason I always thought --static-locals was too heavy-handed (unless RAM is plentiful for your project), though as a pragma that you turn on for specific functions that need more speed it seems nice enough.
Doing it manually also lets you mix use of stack and static variables as needed. CC65 is going to use the stack for a lot of temporary stuff even if the explicit variables are (e.g. in a lot of cases the static variable will end up being pushed to the stack to be the operand of an internal function call like multiply)...
I guess ideally the analysis could be done at link time, even, but it seems like a big ordeal for an optimization that to me is only situationally important... but maybe the goal here is to write code that transparently looks like stack-locals like you'd write for modern C platforms? A portability concern?
Code: Select all
extern void foo(); // clobbers c,d,e
Remembering the whole point of this is only an optimization to avoid the overhead of stack variables, too... I don't think the stack really needs to be globally avoided?
For that reason I always thought --static-locals was too heavy-handed (unless RAM is plentiful for your project), though as a pragma that you turn on for specific functions that need more speed it seems nice enough.
Doing it manually also lets you mix use of stack and static variables as needed. CC65 is going to use the stack for a lot of temporary stuff even if the explicit variables are (e.g. in a lot of cases the static variable will end up being pushed to the stack to be the operand of an internal function call like multiply)...
I guess ideally the analysis could be done at link time, even, but it seems like a big ordeal for an optimization that to me is only situationally important... but maybe the goal here is to write code that transparently looks like stack-locals like you'd write for modern C platforms? A portability concern?
Re: Function params & local vars using a stack in ASM6
And then run into strange bugs in the caller when the callee is changed to clobber more things. Eventually you get to comments //clobbers a,b,c,f,g,h,i,j,l,m,p which becomes hard to maintain.rainwarrior wrote:I normally just document functions that have global side effects, and otherwise sort it out manually:Code: Select all
extern void foo(); // clobbers c,d,e
In addition, anything where you "sort it out manually" increases the likelihood of inadvertent inconsistent incorrect behavior, that is, lurking bugs. During the development of Thwaite, adding sound effects when an explosion spawned clobbered some of the variables that the collision detection routine was using. This caused other explosions to be triggered prematurely as if in a chain reaction, but it took me quite a while to narrow down exactly what was going wrong.
"I'm truthfully surprised that bounds checking or type checking would be enough of a problem that you'd want a compiler to save your behind by pointing out obvious bugs to you."rainwarrior wrote:I'm truthfully a little surprised that managing these things would be enough of a problem that you'd want to write a call-stack analysis tool to automatically solve it.
A compiler built around non-recursive, non-reentrant functions being the default situation wouldn't.rainwarrior wrote:CC65 is going to use the stack for a lot of temporary stuff
That's exactly when the bulk of allocation would happen. Fortunately, ld65 can evaluate some fairly complex* expressions at link time. Should I try making a set of ca65 macros to do this as a proof of concept?rainwarrior wrote:I guess ideally the analysis could be done at link time
Yes, in the interest of maintainability.rainwarrior wrote:but maybe the goal here is to write code that transparently looks like stack-locals like you'd write for modern C platforms?
Portability concerns become noticeable when I have to spend time explaining to four different people why I haven't already made a full-scale game or a music engine for Super NES.rainwarrior wrote:A portability concern?
* Not in the "square root of -1" sense
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Function params & local vars using a stack in ASM6
No, I haven't had such a bug yet. I agree that it "could", but it literally hasn't in my past few years of NES development.tepples wrote:And then run into strange bugs in the caller when the callee is changed to clobber more things.
I don't "eventually" reach that point. If I did find something hard to maintain I'd simplify it.tepples wrote:Eventually you get to comments //clobbers a,b,c,f,g,h,i,j,l,m,p which becomes hard to maintain.
Sometimes that simplification is just using the stack, which has some pretty useful benefits at times. Like I was saying, this is an attempt at optimization, and I think it's far from ideal to use a solution that forces everything one way or the other.
My statement was my honest opinion/experience: I've been using global temporaries like this habitually for a long time and find a clobberlist really easy to maintain/read/use, so much so that I am surprised by the desire to augment a compiler to automate it.
This is a truism, but 500% of 0 is still 0. There are tons of things one has to do manually in programming that could possibly be automated, the question I have, and what I'm expressing surprise at is that this particular problem carries enough weight that it warrants the (in my estimation fairly large) upfront effort involved in automation.tepples wrote:In addition, anything where you "sort it out manually" increases the likelihood of inadvertent inconsistent incorrect behavior...
...and I'd be interested in hearing stories about how that weight has come up for you or others in development. (I am, on the other hand, very uninterested in being told that my methods will "eventually" or hypothetically lead to problems for me that I'm not actually having.)
Don't be rude. I made an honest statement about my own experience and I don't think this is a fair analogue at all.tepples wrote:"I'm truthfully surprised that bounds checking or type checking would be enough of a problem that you'd want a compiler to save your behind by pointing out obvious bugs to you."
-
- Posts: 1565
- Joined: Tue Feb 07, 2017 2:03 am
Re: Function params & local vars using a stack in ASM6
Just though I would offer the path I have taken...
I mostly have things like
etc in my code, and what happens is 3 months later I go to add feature Y and end up trashing this or that. Or it magically works on PAL, but then doing an NTSC port where the code overlaps other things and boom..
So I made a python script ( warning I'm not a Python scriptor so its probably verbose for python and does things the 'C' way ) that parses a verbose code listing.
It adds 3 concepts , the first 2 are
So trashes means it just writes and they have no meaning
Modifies means its allowed to change the contents and the caller expects it to.
There is a ;&&trashes _DONT_CARE_ form which says this is a high level function that calls everything and I don't
This is basically a static assert, in that it will check the function that is called and make sure that neither x,ZPTemp3 is in its modify, or trashes list. So you use it as follows
The python code has a parser in it and it will detect if things not listed are changed by the 6502 and tell you, you need to add it to one. I've been trying to come up with ways to make it less "work", but as my more modern and indepth coding style has broken it, I need to upgrade its systems as well as add the "This is missing from X, Press M for Modify, and T for Trash" code.
code is here https://github.com/oziphantom/Misc
I mostly have things like
Code: Select all
ZPTemp1,2,3,4,5
Pointer1,2,3,4
So I made a python script ( warning I'm not a Python scriptor so its probably verbose for python and does things the 'C' way ) that parses a verbose code listing.
It adds 3 concepts , the first 2 are
Code: Select all
updateTickdowns
;&&trashes a,x
;&&modfies TickDowns
Modifies means its allowed to change the contents and the caller expects it to.
There is a ;&&trashes _DONT_CARE_ form which says this is a high level function that calls everything and I don't
Code: Select all
;&&preserve x,ZPTemp3
Code: Select all
ldx numEnts
- lda entActive,x
beq +
jsr doEnt ;&&preserve x
+ dex
bpl -
code is here https://github.com/oziphantom/Misc
Re: Function params & local vars using a stack in ASM6
Parts of my recent projects that use close to a dozen bytes' worth of temporaries include these:rainwarrior wrote:I don't "eventually" reach that point. If I did find something hard to maintain I'd simplify it.tepples wrote:Eventually you get to comments //clobbers a,b,c,f,g,h,i,j,l,m,p which becomes hard to maintain.
- The main metasprite drawing code
- The movement subroutine for walking actors, including all terrain collision detection and response
- The movement subroutine for the player, including player-to-enemy, enemy-to-player, and bullet-to-player collision
- A subroutine that draws a rectangle of background tile numbers to the transfer buffer
I will push a temporary value to the stack in some cases. But I have found that stack use becomes less practical as code becomes more branchy, as I have to ensure that a pushed value is pulled in all branches. Or did you mean "using the stack" in a generic sense, such as unconditionally pushing several global temporaries at the start of a subroutine and pulling them at its end? I started doing that toward the end of Thwaite after the clobber problem, treating $0008-$000F as callee-saved temporary variables. (See the prolog and epilog of updateAllExplosions in explosion.s.)rainwarrior wrote:Sometimes that simplification is just using the stack, which has some pretty useful benefits at times.
I'm looking for text editors on Windows and GNU/Linux that can be set up to display the clobberlist of the subroutine whose name is under the insertion point. Any recommendations and setup instructions? In addition, keeping the call stack's clobber list in my head becomes doubly hard when the roommate blares the TV or radio at a volume that distracts me from concentrating on what everything clobbers.rainwarrior wrote:I've been using global temporaries like this habitually for a long time and find a clobberlist really easy to maintain/read/use
A system that can unify allocation of temporary variables with allocation of non-temporary variables could actually reduce the load that a programmer has to keep in his head. Variables used by the only by the in-game engine and variables used only by the menu engine would thus be considered "temporary" and reusable by the other.rainwarrior wrote:so much so that I am surprised by the desire to augment a compiler to automate it.
Likewise with variables used only in different phases of gameplay, such as in the cannon placement, battle, and rebuilding phases of Rampart. Only the rebuilding phase needs temporary flood-filling space to calculate which parts of the player's territory have completely been surrounded by walls, and only the battle phase needs the coordinates of cannonballs in flight. But all in-game phases share some information, such as where the walls are, where the cannons are, the players' scores, and possibly owned but undeployed cannons and how much money each player has earned to buy more cannons. One might have to allocate scratchpad areas used by rebuild phase flood fill and then manually allocate cannonball coordinate variables for battle phase and artillery shop for placement phase on top of that scratchpad.
This single abstraction would cover temporary variables per subroutine and RAM overlay segments per phase.
I'm willing to do the up-front effort and license it suitably, as I did for Pently, Popslide, and my VWF library. Once it's done, it's done for every ca65 user.rainwarrior wrote:what I'm expressing surprise at is that this particular problem carries enough weight that it warrants the (in my estimation fairly large) upfront effort involved in automation.
Object lifetime is one of the big problems that C++ fans claim that C++ solves better than some other languages. Just as templates and polymorphism fix C's "everything is a void *", destructors and STL containers can be used to prevent leaks, double frees, and use after free statically through the language's semantics. And I consider clobber list management analogous to object lifetime.rainwarrior wrote:Don't be rude. I made an honest statement about my own experience and I don't think this is a fair analogue at all.tepples wrote:"I'm truthfully surprised that bounds checking or type checking would be enough of a problem that you'd want a compiler to save your behind by pointing out obvious bugs to you."
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: Function params & local vars using a stack in ASM6
That's interesting. Well, hoping it's not too much of a digression to indulge, i'll compare my current project:tepples wrote:Parts of my recent projects that use close to a dozen bytes' worth of temporaries include these:
- The main metasprite drawing code
- The movement subroutine for walking actors, including all terrain collision detection and response
- The movement subroutine for the player, including player-to-enemy, enemy-to-player, and bullet-to-player collision
- A subroutine that draws a rectangle of background tile numbers to the transfer buffer
My metasprite code has only 4 global temporaries. 1 each for the base X/Y position, 1 for the tile count, and a pointer to the sprite data.
For collision, there are 4 global temporaries that I use for a bounding box, and anything in the code to do with a bounding box will use those same 4 temporaries, generally. Most of the collision checks use 3 additional temporaries on top of the bounding box (for X/Y and one for arithmetic). The player bounding box isn't in global temporaries, and moving blocking rectangles in the world are also dedicated (and handled transparently by the collision routines)... so I have a player-box collide, but there isn't actually a generic box-box one; the need surprisingly hasn't come up yet.
The one thing that uses the most temporaries is a generic "move character" function that moves a rectangle colliding it with the world. It inherits the 7 used by collision, and adds 7 more.
Well, in my code every character has a separate "tick" and "draw" routine, and each of them is allowed to clobber any of the shared global temporaries (named i-w, plus 2 pointers). It's kind of a little sandbox, and the concerns are separated: the tick code won't draw sprites, the draw code will never move the character. Maybe that's a big part of why I don't find organizing the temporaries much of a problem.tepples wrote:I concede that player movement and metasprite drawing can usefully "clobber all temporaries"...
That generic move is kind of a temporary hog, but I find it pretty easy to accommodate.
I actually wouldn't want my metasprite routine to clobber too much-- several of my characters are drawn from multiple pieces, and do use temporaries around that.
!? I'm very surprised that you would want to use a global temporary for that and not a dedicated variable.tepples wrote:...except the one reserved for iterating through the actor list.
Sure, that's a valid concern, though I haven't found it that troubling myself. In my current project not needing IRQs and using BRK as a crash-handler has actually made it quick/easy to diagnose most unbalanced-stack errors, which has been pretty cool but I realize not everyone has that luxury.tepples wrote:I have found that stack use becomes less practical as code becomes more branchy
Sure, sometimes you need temporary space but it's not appropriate to allocate more, and the stack is usually a good place for this, whether you want to use it directly or to temporarily make room on your ZP, or whatever.tepples wrote:...unconditionally pushing several global temporaries at the start of a subroutine and pulling them at its end?
Using the stack is the best way to make things truly "local", and that's a very good property to have. Storing locals statically is an performance optimization, and most code is not performance critical.
So yeah, you could write a static local manager to figure that optimization out for you, I get that... but it's not something I've ever found myself desiring in my NES work.
I know there are editors that can parse various documentation comment formats and show them to you, but I'm not familiar with them. I personally just used Notepad++ or Visual Studio most of the time. Often I use a split view to see "header"/documentation for quick reference in one column and the code i'm editing in another.tepples wrote:I'm looking for text editors on Windows and GNU/Linux that can be set up to display the clobberlist of the subroutine whose name is under the insertion point.
Most of the time I don't actually even need to look up the clobber list, though. If the character tick routine I'm writing isn't currently holding values in temporaries, it doesn't actually matter what a called function uses. I don't need to rely on my (fallible) memory if I know the clobbers aren't relevant anyway.
Could also be a factor that I prototype most things in C++, so I don't end up having to do a lot of iteration in actual NES code. This probably lowers the frequency that I need to actually think about temporary management too.
I understand the problem of reusing memory for different game phases, but I view that as a separate issue from managing static local temporaries. Any stuff that would persist longer than a frame I don't imagine encapsulating as a function local, and that kind of RAM management should be pretty explicit and directly tied to the particulars of the game... so I'm unable to envision what kind of "system" you're thinking about that handles all of this at once. (I'm not asking/challenging you to outlay a design, though; as I've been saying all along I've found existing tools adequate for it. Though, I suppose designing that kind of thing is the point of this thread...)tepples wrote:A system that can unify allocation of temporary variables with allocation of non-temporary variables...
Likewise with variables used only in different phases of gameplay, such as in the cannon placement, battle, and rebuilding phases of Rampart...
This single abstraction would cover temporary variables per subroutine and RAM overlay segments per phase.
Sorry, the analogy seemed so strained to me as to be mocking. I shouldn't have interpreted it that way, as I'm sure it was not intended.tepples wrote:I consider clobber list management analogous to object lifetime.rainwarrior wrote:Don't be rude.
Have had some time to look at this example, and it's interesting to see.dustmop wrote:I have implemented this technique in co2, a scheme-based compiler that I took over for use in an upcoming project. See the file casla.scm for the code. I can imagine an assembler doing something similar, allocating memory locations at link time, though it would need to be aware of jump tables and similar stack manipulating control structures. Possibly even cc65 could get support, assuming someone wanted to add it.
With the very functional style scheme code you're writing, I can see how the call stack quickly builds and local temporaries are a rapidly mounting problem. It really does demonstrate why you had a need to write a system to manage them as statics.