aquasnake wrote: ↑Thu Dec 10, 2020 10:45 pm
...please test the following, if it causes problems, please feedback me:
Code: Select all
#define reg(_addr) (*((volatile unsigned char*)(_addr)))
#define reg16(_addr) (*((volatile unsigned int*)(_addr)))
Retesting my assumptions, I had slightly misremembered the effect of volatile, but I still have to hard recommend against using it for this purpose in cc65.
Putting volatile on a variable does
not forbid optimizations, apparently.
"asm volatile" does that. I had misremembered this point. So... the code generated won't be as bad performance as I mentioned, but this is also a big problem, because it is entirely unsafe to use for the same reason. From the
cc65 documentation: "The volatile keyword has almost no effect."
cc65's optimizer assumes nothing is volatile. I think the keyword literally does not affect it at all? Any redundant loads/stores are permitted to be optimized away regardless of the volatile attribute.
Here's the first example of incorrect code I found. I don't want to spend all day searching for more examples, but this one didn't take long to get to and I've seen others in the past.
Code: Select all
// C, compile with -O
char i;
#define reg(_addr) (*((volatile unsigned char*)(_addr)))
void test1()
{
reg(0x1000) = i;
i = reg(0x1000);
}
; generated assembly
.proc _test1: near
; reg(0x1000) = i;
lda _i
sta $1000
; i = reg(0x1000);
sta _i
rts
.endproc
You can use --debug-opt-output to see exactly what the optimizer does here. The concept of "volatile" seems simply to no longer be known once the optimizer begins its work. I can't find a mechanism in cc65's code for the optimizer to know any load or store was volatile, and consequently none of its optimizations can account for it. Many of its optimizations will violate the meaning of volatile, as you can see from that example.
I'm not going to take the time to test
reg16, as it doesn't seem relevant to the NES, but it's even scarier to me, because there are a
lot of 16-bit optimizations in cc65 that can affect something like that. cc65 does not generate 16-bit code, so every write of that type will be done with two separate 8-bit writes, so even if the optimizer doesn't screw it up, splitting it into two writes seems problematic just by itself.
A separate example of the other thing I was saying it is inadequate for. I wouldn't call this invalid code generation, but it shows something important that volatile does not address in any way:
Code: Select all
// C
reg(0x1000) = reg(0x1000) + 1;
; assembly
lda $1000
clc
adc #$01
sta $1000
// C
reg(0x1000) += 1;
; assembly
inc $1000
The C compiler does not know anything about dummy reads/writes within instructions. Volatility, which should definitely be affected by this, is not accounted for in any way. You will get different behaviour from the above code with a register like $2007 where dummy reads or writes matter. This doesn't affect every kind of MMIO register, but there are lots of cases where choice of instruction matters a lot, in a way that there's just no concept of in C.
On platforms other than the 6502 + cc65, this particular issue probably isn't a problem. Most C compilers would have an optimizer that knows what volatile means (cc65 is unusual in this way), and I'm not sure if dummy read/write cycles are really a thing in the same way on other platforms.
TLDR: I misspoke a little earlier, but the recommendation is the same: volatile is not reliable on cc65. Do your direct hardware access in assembly.
I used macros like you suggested in the past. They work a lot of the time, but over the years they've failed me too many times in too many confusing ways for me to think they're a good solution.
I've also used volatile in the past on other platforms with other compilers, where it generally works like we should expect it to, at least on MMIO. (I'd put a big caution against using volatile for multithreading, though, because it's generally very bad for that task, esp. on modern CPUs. I've seen people recommend this and end up with extremely incorrect code because of it.)