(from here on for convenience I'll just refer to the 6502/zero page, but what I say pretty much applies to the 65816 as well)
That's well and good for demonstration routines, demos, etc., where it's easy enough to keep the usage of zero page "in one's head" and 256 bytes should be enough for anyone. But what about larger programs? What kind of calling convention do people usually enforce so that they can safely take advantage of zero page across subroutines? Without an easy or fast way of saving/restoring the zero page "registers" to the stack, is this even done?
To put it bluntly, how on earth do people write real programs on the 6502?
Okay, for real, I consider the bytes used by a function to 'belong' to that function, so I just poke those bytes before calling it.
In my documentation of functions, I generally have a list of which of these are used (i.e. input, output, clobber) and for functions that need to call each other I try not to overlap them. In some moments of despair I do indeed temporarily store some of them on the stack because of overlap.
Otherwise I think my general rule of thumb is that if something is used every frame or especially many times per frame I try to put it on the ZP; if not it can go in regular RAM. It's usually pretty easy to move variables back and forth between these locations, so I don't worry about it too much; I just move them around whenever I think it will help.
Personally I dunno how much I like the comparison to registers. I think it's the operations that work directly on memory that make them register-like, and for most of the relevant instructions that applies to all memory, not just the zero page. The zero page is just slightly faster at it.
The rest of ZP is where most of my variables go. Anything that isn't an array good there. If there's not enough space for all variables, the least used ones go in other memory pages.
I too don't see much resemblance between ZP and actual CPU registers... Just the fact that you have to load from and store to ZP in order to do most things with these variables already make them very different from (and much slower than) CPU registers.
This is kind of what other people said, but to give a simplified answer...But what about larger programs?
First, when you refer to a zero page address in code, you typically don't use the memory location. You CAN, but it's generally somewhat bad form, for multiple reasons that I won't get into here.
So, you don't have to remember, for example, that $0083 is the variable for your object #2 X position low byte. In fact, you don't even have to usually know that it's object #2. You create a set of variables called something like "objectXposLowByte", and when you refer to it, you'll usually have the object number stored in your register. So, objectXposLowByte, x will be object #2 if X = 2. Does that make sense?
Really, you just end up having to memorize your own naming conventions; the same as any other language.
Also, with the 256 bytes, most of the data that you use won't have to be stored from frame to frame. The object position example I used would be, so that variable won't get overwritten. Let's say for example though that your program already completed the portion of it's code used for scrolling. In the process, your program used a bunch of variables in the zero page. Only a handful of those will need to be saved. The rest can be used again for other variables which don't need to be saved, in a later part of your routine. So when you get to calculating collision detection, you can reuse all of the zero page space which was used for temporary variables during scrolling.
You can assign multiple variable names to the same location, so once again, in practice you'll only need to remember your variable naming conventions. After a while, you'll know what they should be named without having to look. You just have to be careful not to overwrite any non-temporary variables.
Does that help? It's totally possible to write programs in 6502 assembly.
For small programs this works fairly well. Leaf functions can go haywire on the scratch area while the rest uses permanent zero-page locations for storage. Occasionally the level above the leaf functions are allowed out of the scratch area from the other direction.
Trouble starts once the program grows beyond a certain point and the call stacks start to go deep while you've simultaneously running out of permanent space and need starting juggle overlays by hand to manage.
Madness ensues and you end up with a festering nest of bugs where exercising any rare code paths is a minefield and the code is hopelessly inflexible since any slight changes need to be carefully vetted against all possible contexts. Then throw non-trivial asynchronous interrupt-based logic into the mix and great fun is had by all..
For any future large-ish project I want to explore means of automating the process.
The most straightforward solution would be to treat the zero-page scratch space as a stack. Leaf functions would just grab however much they needed for temporaries and arguments/results from the top and export a new stack top symbol indicating the next available byte. Callers would then allocate the from the maximum of all these callee stack tops and continue the process.
Conveniently this may be foisted off on CA65 relatively easily since it can resolve maximum values at link-time and since much of the details of declaring call graph and allocating storage may be hidden behind macros.
Of course this still leaves any number of open issues on the table. For instance persisting results between intermediate calls would require the variables to be allocated inside of the a single caller.
Significantly it also leaves the work of _correctly_ declaring this meta-data up to the programmer where any slight errors may not be immediately obviously and likely to pop up later on in a completely different context. Something which previous experience in trying to write exception unwinding specifications leaves me worried about.
Yes! Your and everyone else's input is much appreciated. And of course, developers for years managed to do write programs for the 6502, I don't at all mean to suggest it's impossible...except maybe for medarryl.revok wrote:Does that help? It's totally possible to write programs in 6502 assembly.
I'm pretty surprised at how much bristling there is here at the suggestion that zero page could be thought of an extended register set...it always seemed to me to be a pretty common refrain among 6502 enthusiasts that, at the very least, the zero page compensated for the paucity of registers. For example, the Wikipedia article on the 6502 makes the comparison, observing that "code for the 6502 uses the zero page much as code for other processors would use registers" and notes that "using the indexed modes, the zero page effectively acts as a set of up to 128 additional (though very slow) address registers". Likewise the article for zero page states that "the MOS Technology 6502 has only one general purpose register (the accumulator). As a result, it used the zero page extensively". To be clear, I basically agree with you guys - it really seems like if zero page bytes are supposed to substitute for registers, they're a pretty mediocre substitute.
Anyway, it looks like it'd be accurate to say that most people:
- Use some small amount of zero page dynamically, for temporaries and indirect address modes (i.e. the sort of "poor man's registers" suggested above)
- Statically reserve the majority of zero page for the most commonly accessed data in the program
This is pretty much my experience, although I may be overly paranoid about occurrence rate of these type of bugs. Having to do all the memory juggling manually really sucks given that it's technically a solvable problem.doynax wrote:Madness ensues and you end up with a festering nest of bugs where exercising any rare code paths is a minefield and the code is hopelessly inflexible since any slight changes need to be carefully vetted against all possible contexts. Then throw non-trivial asynchronous interrupt-based logic into the mix and great fun is had by all..
My current plan is to do it manually, but have some additional macros and Lua code to assert (at runtime) that no two functions attempt to own the same area of scratch memory simultaneously. Something like this:
Code: Select all
.proc foo alloc_scratch 6, 4 ; reserve 4 bytes from offset 6 ; ... use the memory ... jsr bar dealloc_scratch 6 ; deallocate so that others can use it rts .endproc .proc bar ; this would cause an assert to trip at runtime: alloc_scratch 6, 1 ; reserve 1 byte from offset 6 ; ... use the memory ... dealloc_scratch 6 ; deallocate so that others can use it rts .endproc
- In case of an error, the system could print where the memory was allocated, so overlaps should be fairly easy to resolve. It could also print out a suggestion about which part of memory has been free throughout all passes through the allocation call.
- Memory management/juggling is still manual.
- Errors are only noticed at runtime, and only within the execution paths that are actually executed.
That's my opinion too, but I suppose a lot of people don't like to consider memory outside of the CPU as being a "true" register. But with so many of the indirect opcodes being intended for zeropage, it seems to me that's pretty much what is, just a way to add a bunch of registers while keeping the CPU itself cheap. I find it kinda funny that when I do NES stuff it seems like I tend to re-use the same zeropage variables, but when doing stuff on the PIC16 I find it really annoying that there are only 2 registers for indirect access. OK for small assembly programs, but when mixing C and asm you can pretty much forget about having a dedicated use for them. It was first made around same time as the 6502, not that I've ever used the 'original' PIC, maybe it's even worse.adam_smasher wrote: I'm pretty surprised at how much bristling there is here at the suggestion that zero page could be thought of an extended register set...it always seemed to me to be a pretty common refrain among 6502 enthusiasts that, at the very least, the zero page compensated for the paucity of registers.
Would have been nice if Nintendo had put the write-only registers in zeropage, but I guess that would have added cost, and the extreme cost-cutting is what made the NES/FC so unique and extendable with cartridge hardware, with it's external CHR memory and the fact that it had a 6502 at all, while it seems like everything else from Japan was using the more expensive Z80.
The original PIC (we'll talk about the PIC1650) is very close to the PIC16F59: 12-bit instruction words, 32 I/O pins. Like all the 12-bit instruction PICs, no interrupt support.Memblers wrote:when doing stuff on the PIC16 I find it really annoying that there are only 2 registers for indirect access. OK for small assembly programs, but when mixing C and asm you can pretty much forget about having a dedicated use for them. It was first made around same time as the 6502, not that I've ever used the 'original' PIC, maybe it's even worse.
Amusingly, it's even binary-compatible: the SFRs have not moved, and the instruction encoding is unchanged. Pins have moved around, though.