It is currently Mon Oct 22, 2018 12:20 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 49 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Thu May 31, 2018 6:38 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1708
Since actual local variables on the stack require more ROM space and more CPU time, I'm trying to come up with an efficient and comfortable way of reusing some global zeropage variables and use them like local variables.

Since my game requires quite a bit of variables, it wouldn't be enough to declare distinct global variables for each function that I only use inside this function. (Or in case of C: Declaring local static variables.)

So, if I neither want to use local variables on the stack, nor global non-zeropage variables, I have to find a way to reuse some zeropage variables.


My currently planned attempt is this:

I declare a bunch of general-purpose variables, but in several "layers" or "levels": A, B, C etc.:

AByte1
AByte2
AByte3
AByte4
AInt1
...

BByte1
BByte2
...

CByte1
CByte2
...

If a function needs local variables and it either doesn't call any other function or it only calls other functions that don't need local variables, this function would use the A variables. Also, the name of the function would get a postfix:
ProcessHero_vA
--> uses A variables

If a function calls another function that already has the A postfix, then this new function uses the B variables and gets a B postfix:
ProcessAllCharacters_vB
--> uses B variables
--> calls ProcessHero_vA

If a function doesn't use local variables itself, but calls other functions that do, then this function of course always gets the highest postfix as well, because those temporary variables are used inside this function via the other functions:
ProcessGameLoop_vB
--> uses no variables
--> calls ProcessAllCharacters_vB
--> Therefore, ProcessGameLoop needs to get the _vB postfix because the B variables are in use while ProcessGameLoop is running (via ProcessAllCharacters), so the function that calls ProcessGameLoop couldn't use the B variables itself, but would have to use the C variables.


In the actual code, I would of course rename the variables. I wouldn't work with AByte1 etc. directly:

#define currentDirection_ AByte1

(The name gets an underscore as a postfix, so that there cannot be any name clash or compiler issue if you happen to have an actual local static variable named currentDirection in your code. Since #define would replace any instance of its name without regard for context.
In Assembly itself, the underscore wouldn't be necessary because this:
currentDirection: .res 1
currentDirection = AByte1
would be a compiler error anyway.)


The layer system is done, so that functions who call each other cannot accidentally overwrite each others' values:

If you have a function DoSomething_vA and you suddenly find out that this function needs to call another function that also has the _vA postfix, then you have to rename DoSomething_vA to DoSomething_vB and use the B values instead of the A values:

Rename
#define currentDirection_ AByte1
to
#define currentDirection_ BByte1

Then you can let the compiler find all instances of the old name DoSomething_vA, so that you can rename these as well as and any other function that calls DoSomething to reflect the new layer of variables.


So, that's my attempt to use zeropage variables as reusable local variables.

Do you know of any alternate way that might be better?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Thu May 31, 2018 9:38 am 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10906
Location: Rio de Janeiro - Brazil
I don't know about C, but in assembly I have a chunk of ZP dedicated to local variables and function parameters, and I use a macro to allocate the space needed for each function. The macro can start allocating space from the very first byte of the ZP chunk, or you can optionally specify an offset into the chunk. These offsets can be specified using what I call "memory counters", which are symbols that are incremented automatically when you allocate memory using them, so if I have a bunch of nested functions, I just have to use the same memory counter for all of them and their locals will never overlap.

This works well for most cases, but you still have to be careful if nested functions are called from multiple places. To completely avoid overlaps in this case, you'd probably have to make ALL functions that can possibly be called before the deepest one use the same memory counter, and if there are a lot of them, you might end up wasting a lot of ZP space for this.

I still haven't found the "ideal" solution for this, but what I have now allows for decent automatic reuse of RAM for locals and parameters, and also for program modules that don't run concurrently.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 10:32 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 813
I'd probably make a program that calculates every possible execution tree, then passes that to a register allocation algorithm.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 12:17 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2760
I've been programming in ASM for a long time, and I don't even know how I do it myself. I think if I write the outer routine first, the inner routine uses whatever temporary variables are left, and vise-versa. Also, if you notice "register b" always gets passed as "parameter x", then just use "register b" as your "parameter x" register.

I think I'm sloppy on subroutine use. The only times I use subroutines is either, I am trying to loop over a huge chunk of code, or I just happen to find duplicated code in my game. It's hard to know what type of stuff I need to reuse.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 12:59 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10906
Location: Rio de Janeiro - Brazil
calima wrote:
I'd probably make a program that calculates every possible execution tree, then passes that to a register allocation algorithm.

Sounds like a lot of work, but that should produce the optimal results.

psycopathicteen wrote:
I've been programming in ASM for a long time, and I don't even know how I do it myself.

To be fair, you're an SNES programmer, so you don't have to worry as much about RAM as us NES guys.

Quote:
I think if I write the outer routine first, the inner routine uses whatever temporary variables are left, and vise-versa.

The problem is that doing this manually is very tedious and error-prone. You may need to add a variable or a parameter to a function you coded long ago and not realize you have to change all functions that that function calls, and the ones that those call, and so on. It's maintenance hell.

My solution was basically to share a memory counter (managed by macros) between all functions that can run concurrently, so that overlaps are completely avoided among them, but things can get weird when the same functions are used in multiple nesting scenarios, so functions that don't run concurrently have to share a memory counter because they all use one or more common functions. Thankfully that hasn't happened a lot to me yet. If things got really ugly I'd even consider duplicating the common functions to reduce the amount of functions using the same memory counter.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 1:43 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6898
Location: Canada
Personally, I just document the used/clobbered temporaries for functions that use them and manually manage it. Occasionally I get little messes when things change or get reused in new ways, but it's not too much mess to deal with. By the end of the project there is some temporary management that maybe looks a little bit ugly but it works fine, not really something I'd want to spend engineering effort on an automated solution for.

Like you can propose some pre-emptive layered structure like DRW has here, but I think the more generic (and maybe more applicable idea) is just that your temporary usage can have groups. Your code will naturally have some structured relationship between functions, and you can group your temporary usage correspondingly as well as you can fit it to the existing relationships. i.e. this group of functions will mostly use temporaries i,j,k,l and this other one will mostly use m,n,o,p...

There's back and forth here, but when considering creating new structures to group your code, also try to find ones that already exist inherently.

...and it could of course be automated, and you could create code analysis tools and call graphs etc. but the scale of the problem never seem to fit the scale of that solution, to me. YMMV.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 2:43 pm 
Offline

Joined: Sun Mar 27, 2011 10:49 am
Posts: 266
Location: Seattle
I've asked about this sort of thing in the past.

I don't do much 6502 at the moment, so I haven't gotten a chance to try it out, but an idea I like (inspired by many of the comments on that thread) is to just use the first n bytes of zero page something like the standard MIPS calling convention suggests for its registers, with some tweaks for 6502 suitability. Feel free to use some other standard ISA's calling convention as well. The big thing is to reserve some number of bytes as temporaries that aren't preserved across subroutine calls, some number of bytes as saved temporaries that are, some number of bytes as kernel temporaries (for ISR's, so they don't need to save anything besides registers), designated bytes for function parameters (that don't fit in A, X, and Y), function return values (say, pointers that don't just fit in A), and a pointer for a virtual stack to supplement the hardware one.

The approach you're outlining in your post sort of resembles register windows, so if you want to go down that road it might be worth looking into the SPARC calling convention as well.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 3:08 pm 
Offline
User avatar

Joined: Wed Sep 07, 2005 9:55 am
Posts: 328
Location: Phoenix, AZ
I set aside zp variables like MIPS registers:

Code:
a0 - a5 : argument registers
t0 - t9 : temporary registers
s0 - s5 : save registers
v0 - v1 : return value registers (or more arguments)
i0 - i1 : temporary registers (for interrupts)


A, T, and V are assumed to be overwritten across function calls.
S must be pushed to the stack before being used.
I are only used in interrupt handlers.

Then I assign some local name one of those aliases:

Code:
; ======================================
__loop_var   = s0
__src_addr   = a0
__special_val   = v0
SomeFunction:
      preserve __loop_var

      ldx #(SIZE - 1)
-         stx __loop_var
          lda table_hi, X
          sta __src_addr + 1
          lda table_lo, X
          sta __src_addr
          jsr SomeOtherFunc
          jsr BlahBlah
          jsr FuncThatReturnsAValue
          lda __special_val
          jsr DoSomethingWithIt
          ldx __loop_var
          dex
          bpl -
         
      restore __loop_var
      rts

; ======================================
__dest_addr   = t0
SomeOtherFunc:
      setpointer __dest_addr #SomeRamAddress
      ldy #0
-         lda (__src_addr), Y
          sta (__dest_addr), Y
          iny
          bne -
      rts


It is then easy to change where each local is actualy mapped if I need to for whatever reason.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 3:23 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10906
Location: Rio de Janeiro - Brazil
I like the "virtual registers" approach, but I sometimes need a crapload of local variables, so that approach doesn't look like a good fit for most of my projects...


Top
 Profile  
 
PostPosted: Thu May 31, 2018 4:52 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20681
Location: NE Indiana, USA (NTSC)
Sometime during the development of Thwaite, when I discovered that my audio driver (now called Pently) was breaking missile-explosion collision detection, I adopted the convention (for that project) that $0000-$0007 are caller saved, and $0008-$000F are callee saved.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 7:18 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2760
The more I think about it, variables are kind of an abstract concept.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 7:33 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6898
Location: Canada
Yes, they are somewhat abstract, especially for a compiler, where usually a "variable" does not refer to one specific memory location, but rather whatever place currently holds its value, whether that's a register, or in memory, or somewhere else.

With that in mind, this desire to manage temporaries on the ZP is merely one possible optimization. If you find you have conflicts to resolve, and for some reason it seems really painful to manage your ZP temporaries, there's generally always an option to do something "easier" that's only a little less efficient. If you can't find an allocation in your existing temporaries, you might put the variable on the stack, or maybe somewhere in RAM, or even just add one more ZP temporary, presuming you haven't budgeted 100% of it yet. Once you relax the requirement that it go into one of your already-allocated ZP temporaries there are really a lot of possibilities; for example consider that a 256 OAM buffer in RAM might be rebuilt every frame, which is quite a bit of temporary space available for anything that happens before you build your sprite list.

There's always many ways to do this, and the constraints aren't nearly as pressing when you don't insist on always doing it the fastest way. Spending a few more cycles using RAM instead of ZP may well be worth avoiding some organizational headache.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 8:21 pm 
Offline
User avatar

Joined: Wed Sep 07, 2005 9:55 am
Posts: 328
Location: Phoenix, AZ
rainwarrior wrote:
a 256 OAM buffer in RAM might be rebuilt every frame, which is quite a bit of temporary space available for anything that happens before you build your sprite list


I never considered this possibility. I'll have to keep this in mind if I find myself short on temporary storage.


Top
 Profile  
 
PostPosted: Thu May 31, 2018 8:34 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10906
Location: Rio de Janeiro - Brazil
I've considered using the OAM buffer for other purposes, but the object routines start to fill it up fairly early in the game logic loop in my engines...


Top
 Profile  
 
PostPosted: Thu May 31, 2018 8:54 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2760
never-obsolete wrote:
rainwarrior wrote:
a 256 OAM buffer in RAM might be rebuilt every frame, which is quite a bit of temporary space available for anything that happens before you build your sprite list


I never considered this possibility. I'll have to keep this in mind if I find myself short on temporary storage.

Wow, I didn't think about that either.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 49 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: kikutano and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group