It is currently Tue Sep 25, 2018 12:26 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 49 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Jun 03, 2018 8:41 pm 
Offline

Joined: Thu Aug 20, 2015 3:09 am
Posts: 396
DRW wrote:
In fact, we don't need another high level language either. For me, I can say that I wouldn't switch to another programming language just for one singular feature.

Who said anything about a new language? We're talking about a compiler flaw. If CC65 didn't generate awful code we wouldn't even be having this conversation, because
DRW wrote:
local variables on the stack require more ROM space and more CPU time
would not be true in the first place.

(Of course, C is a lousy language to be using on a 6502 anyway, since the language standard requires all arithmetic be at least 16-bit and pointer arithmetic makes array striping a gigantic pain in the neck. But that's a whole new can of worms.)

DRW wrote:
In C, it should be clear that if the programmer declares a local variable as static, that this one does not get turned into a temp variable.
But how do you distinguish in Assembly whether a .res is supposed to represent a local variable or whether it's a global variable or a local static variable where the value needs to be kept between two function calls?

...the same way you choose between ZP, main RAM and WRAM? Most assemblers already have automatic allocation; they're just not smart enough to overlap variables that aren't live at the same time.


Top
 Profile  
 
PostPosted: Sun Jun 03, 2018 11:38 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1684
Rahsennor wrote:
Who said anything about a new language? We're talking about a compiler flaw.

Here:
Rahsennor wrote:
A HLL with the same features would be more popular

HLL = High level language.

Rahsennor wrote:
If CC65 didn't generate awful code we wouldn't even be having this conversation, because
DRW wrote:
local variables on the stack require more ROM space and more CPU time
would not be true in the first place.

Stack addresses have variable addresses and therefore would always be slower than access of global variables since you need at least LDA Stack, X which is slower than LDA Variable. That's the whole purpose of the stack: That it can grow and shrink randomly.

Rahsennor wrote:
DRW wrote:
In C, it should be clear that if the programmer declares a local variable as static, that this one does not get turned into a temp variable.
But how do you distinguish in Assembly whether a .res is supposed to represent a local variable or whether it's a global variable or a local static variable where the value needs to be kept between two function calls?

...the same way you choose between ZP, main RAM and WRAM?

You mean a new marker?
Code:
.segment "ZEROPAGE"
.scope LOCAL
    Variable: res 1


Rahsennor wrote:
Most assemblers already have automatic allocation; they're just not smart enough to overlap variables that aren't live at the same time.

Which would be a pretty hard thing to find out because you never know what's the intention of the developer. Even if two variables aren't "live" at the same time, how shall the Assembler determine whether the value has to remain the same between two calls of the same function or whether it can be overwritten between two calls?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 1:14 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 579
I've been thinking and I think coding style is also a factor in this problem. However upon seeing the other posts here I'm seeing this is a cake and eat it too problem.

The problem is you are using C and a C based Assembler which prays at the alter of C. But you want the convenience of C but with the speed of Asm, and those two things don't mix. At least not on a 65XX based system ;) Ultimately a C compiler should do what you are saying, it should be capable of sorting and reusing registers and variables as it sees fit. But once you mix asm with C, its 'all bets are off'. Either the language places and decided upon all memory usage, or you do it. You can't really do it half and half. ( unless you're Apple and completely break your language and add a bunch of "here be dragon tags" and make life pain )

The way you solve this on 6502 is you rearrange and solve the problem such that you can keep as much in x,y and A as possible and that you have data you need to refresh stored in convenient locations. doing
STA ZP
LDA ZP
vs
LDA XXXX or LDA XXXX,x
the second is smaller and faster. In that you don't just optimise your code, you optimise your data to go with it. An assembler won't be able to optimise the data.

Tass64 is an optimising assembler, but it won't help you with variable allotment, as its impossible for the assembler to fully know. In the world where you can self mod code it becomes impossible for the assembler to know where you are jumping to. In your case though its 99.98% sure to not self mod. However its ability to union and "sections" would make it somewhat easy to do the "layers" idea I would think.

A way to add extra data to variables to define their type is through comments
MyVar .byte ? ;&&temp
SomeThing .byte ? ;&&unique
then the tool could parse all the ;&&temp lines and then change the line accordingly.

You're time as a programmer is worth far more than the cost of adding a 128K banked SRAM chip to the cart for the volumes anybody here is going to make, even adding the $4 to the cart cost could easily be passed on to the consumer in a neither here nor there purchase choice. To which having all the extra RAM to make the problem go away is probably the most efficient means of solving it.


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 2:16 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1684
Oziphantom wrote:
The problem is you are using C and a C based Assembler which prays at the alter of C. But you want the convenience of C but with the speed of Asm, and those two things don't mix.

It's more like:

Because I'm using mostly C and not Assembly, I need all the extra performance that I can get.
One way is to use global zeropage variables instead of local variables, so each occurence of (sp), Y (and all the stack adding and removing and the start and end of functions) are turned into Temp1 etc. (without stack moving).

In general, this would be a mundane issue by simply declaring every local variable in C as static (and doing some macro tricks when it comes to function parameters). But unfortunately, I only have 256 zeropage variables. And these are not enough for my game, that's why I have to reuse them.

Oziphantom wrote:
Ultimately a C compiler should do what you are saying, it should be capable of sorting and reusing registers and variables as it sees fit.

I assume it's pretty difficult, maybe even impossible, for a compiler to optimize all occurences of local variables into zeropage variables in a way that it reuses the zeropage variables, but doesn't overlap them.
This would require a complete understanding of the whole source code (not just the by-module way C compilers work) and a deep analysis of the program flow.

So, yeah, I'm aware that a compiler couldn't do this. You would have to use a separate tool that parses your whole code and analyzes which function calls which other function to organize the use of temporary variables.

Oziphantom wrote:
doing
STA ZP
LDA ZP
vs
LDA XXXX or LDA XXXX,x
the second is smaller and faster.

Erm, what? The non-zeropage access is smaller and faster?

Oziphantom wrote:
You're time as a programmer is worth far more than the cost of adding a 128K banked SRAM chip

This wouldn't solve my problem. It's not that I'm running out of RAM. It's that I don't have enough room specifically for giving each function its distinct local zeropgage variables. Regular variables wouldn't be that much of an issue, especially since my game has battery save anyway, so I have tons of room for variables. But I want to keep the code itself as small as possible, so I try to use zeropage variables as much as I can.

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 3:11 am 
Offline

Joined: Thu Aug 20, 2015 3:09 am
Posts: 396
DRW wrote:
Here:
Rahsennor wrote:
A HLL with the same features would be more popular

HLL = High level language.

And what do you know, C is a high level language! :P

DRW wrote:
Stack addresses have variable addresses and therefore would always be slower than access of global variables since you need at least LDA Stack, X which is slower than LDA Variable. That's the whole purpose of the stack: That it can grow and shrink randomly.

And local variables do not need a stack. Any non-recursive program can be compiled without a stack, with the same or lower RAM usage, completely transparent to the programmer. This is not new; there are compilers that do so already.

DRW wrote:
Which would be a pretty hard thing to find out because you never know what's the intention of the developer. Even if two variables aren't "live" at the same time, how shall the Assembler determine whether the value has to remain the same between two calls of the same function or whether it can be overwritten between two calls?

Because a variable that needs to stay the same between two calls, as per C's static, is actually a global variable, and should be marked as a global variable. I described the requirements for automatic overlap above - the only extra work for the programmer is to annotate the code with procedures (as per normal use of proc/endproc in some dialects) and ensure that the assembler knows about all outgoing calls (which, for the most part, is already covered by JSR procname). That, plus the absence of recursion (which really does require a stack to implement) is all the assembler needs to know to safely overlap procedure-local variables.

DRW wrote:
I assume it's pretty difficult, maybe even impossible, for a compiler to optimize all occurences of local variables into zeropage variables in a way that it reuses the zeropage variables, but doesn't overlap them.

Doing so optimally is hard, but doing so in at most as much RAM as a stack would use is trivial. In fact, I've already done it. Wrecking Balls was written this way.

DRW wrote:
This would require a complete understanding of the whole source code (not just the by-module way C compilers work) and a deep analysis of the program flow.

Seperate compilation is obsolete anyway; unity builds and LTO already achieve what you describe.


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 4:08 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 579
DRW wrote:
Oziphantom wrote:
The problem is you are using C and a C based Assembler which prays at the alter of C. But you want the convenience of C but with the speed of Asm, and those two things don't mix.

It's more like:

Because I'm using mostly C and not Assembly, I need all the extra performance that I can get.
One way is to use global zeropage variables instead of local variables, so each occurrence of (sp), Y (and all the stack adding and removing and the start and end of functions) are turned into Temp1 etc. (without stack moving).

As I said you want the convenience of C but the speed of ASM. I.e you get the speed to code of C but it comes at the cost of CPU and RAM usage. You are trying to avoid that cost and you want the speed of ASM. You are trying to eat your cake and have it too. Your "spot optimising" puts you in a "all bets are off" category, as it needs to analyse your ASM that is hand made, and make sure its auto generated C doesn't conflict or get trashed by. Modern C/C++ assemblers actually do do this, which is why you have to specify what registers you want, type of register you want etc and they some compiler will say " I see what you are thinking, but no, use these registers here and this for that one and all good"

DRW wrote:
In general, this would be a mundane issue by simply declaring every local variable in C as static (and doing some macro tricks when it comes to function parameters). But unfortunately, I only have 256 zeropage variables. And these are not enough for my game, that's why I have to reuse them.

Oziphantom wrote:
Ultimately a C compiler should do what you are saying, it should be capable of sorting and reusing registers and variables as it sees fit.

I assume it's pretty difficult, maybe even impossible, for a compiler to optimize all occurences of local variables into zeropage variables in a way that it reuses the zeropage variables, but doesn't overlap them.
This would require a complete understanding of the whole source code (not just the by-module way C compilers work) and a deep analysis of the program flow.

This is literally the point of a compiler. When I was at uni and they are trying to convince us Hey C is not that bad, its 96% as good, here is how you write asm, and then have an example where you use a header that says AX = number of items, BX = pointer to items, CX = max depth, DX = result that a human will typically keep the registers locked to their intended purpose, the compiler will work out when it can reuse registers and avoid not having to save a register when it jumps to the routine and optimise keeping data in registers across all calls generally a lot more often than the humans will. This is something I would expect an average C compiler to do, even say Borland C 5 I would expect it. Maybe going back to say DevPac 2 probably not that well and it will keep it "simple". Turbo C 128 not a chance. When you get to modern Arm this is the absolute basis of speed and the compilers ( although most are rubbish ) will try and do this above anything else.
DRW wrote:
So, yeah, I'm aware that a compiler couldn't do this. You would have to use a separate tool that parses your whole code and analyzes which function calls which other function to organize the use of temporary variables.

Oziphantom wrote:
doing
STA ZP
LDA ZP
vs
LDA XXXX or LDA XXXX,x
the second is smaller and faster.

Erm, what? The non-zeropage access is smaller and faster?

To do the cache its 6 bytes and 6 clocks, to load the data I prepared earlier is 3 bytes and 4 clocks.
So if I need say the width of a collision box, rather than get the data and then stash it for when I need it later, its best to store the information in a indexed array based upon the entity and just do lda entity.collision.width,x where x is "current ent".
DRW wrote:
Oziphantom wrote:
You're time as a programmer is worth far more than the cost of adding a 128K banked SRAM chip

This wouldn't solve my problem. It's not that I'm running out of RAM. It's that I don't have enough room specifically for giving each function its distinct local zeropgage variables. Regular variables wouldn't be that much of an issue, especially since my game has battery save anyway, so I have tons of room for variables. But I want to keep the code itself as small as possible, so I try to use zeropage variables as much as I can.
RAM always gives you more options. For example you could exmoise your code, then deflat it to RAM and that reduces ROM Pressure. If you code is in RAM you do 1 byte/1 clock penelty for self mod code. I.e
Code:
sta ValueINeedDownThere+1
...
other code
.....
ValueINeedDownThere
lda #$FF ; this will get patched
the call of the extra clock/byte versus ZP complexity is a call you can then make.
If you are caculating some values, you can throw away a couple of pages of RAM and then just index it, without hurting ROM and/or you can change the table as and when needed.


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 5:28 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2751
How would a compiler keep track of every indirect jump?


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 5:33 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 579
because the compiler wrote every single indirect jump and hence it knows all the places it will jump to.


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 5:41 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2751
Wouldn't it still have to backtrack through a bunch of code just to figure out every possible routine it can jump to?


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 5:46 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 579
well no it has already walked through it to build the code that makes the jump table and it builds the jump table as a result of the code tree that it built as part of the compilation. For example it may optimise a switch statement to be a jump table if it deems it more optimal. to which all the targets are the case statements which it knows about.


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 6:50 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20573
Location: NE Indiana, USA (NTSC)
But then wouldn't the type of a function pointer have to encompass how many caller-saved zero page addresses it uses?


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 7:08 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 579
if you have a case where the pointer is read in from a data file, and is thus something it can't determine, then the compiler would have to implement and fall back on strict ABI calling. if the compiler then was happy to trust you to declare all functions with a __stdcall or if it then says __stdcall for everything unless you say __fastcall would be up to the compiler. However how you would know the exact address of a function to put into your data without compiling it all first and having a dummy call to the function so it doesn't get stripped means this basically never actually happens and no compiler would handle it.


Top
 Profile  
 
PostPosted: Mon Jun 04, 2018 7:13 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20573
Location: NE Indiana, USA (NTSC)
I was more referring things that store the low, high, and bank bytes of a function pointer, such as the pointer to an actor's move routine, in a striped array. The compiler would have to know all functions that can be referred to through that pointer.


Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 6:53 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20573
Location: NE Indiana, USA (NTSC)
Today I learned some production C compilers actually do this. From BL51 Users Guide: Data Overlaying:

The 8051 hardware stack is limited to a maximum of 256 bytes. As such, using stack frames on the 8051 is very wasteful of the limited memory available.

The Keil C51 C Compiler works with the LX51 Linker to store function arguments and local variables in fixed memory locations using well-defined names (so that function arguments are easily passed and accessed). The linker analyzes the structure of the program and creates a call tree which it uses to overlay the data segments containing local variables and function arguments.


Top
 Profile  
 
PostPosted: Wed Jun 27, 2018 11:31 am 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7553
Location: Seattle
The compilers made by Keil, IAR, and CCS all do that. IAR even used to sell a 6502-targetting one... now you'd probably have to special-order it from them.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 49 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Google [Bot] and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group