Vs System Shared Memory

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Vs System Shared Memory

Post by aquasnake » Thu Dec 10, 2020 10:45 pm

I use C language to access a fixed address register through pointer, and declare that it is volatile variable. It runs well, and there is no "worry" that some people call it.

I don't want to quarrel because of some people's temper. In fact, any rude behavior can't affect me. Some people say that there is one thousandth of the possibility that using volatile in cc65 environment will cause some "problems". I can also say that what is the problem, please give me, if I can, I also have 80% confidence to solve it. Because different solutions are actually different ways of doing things, but some people firmly defend what they think is right, and if one in a thousand of your ways may have problems, they say that method is not good. Even say you're "bullshit.".

I just want to reply positively, please test the following, if it causes problems, please feedback me:

Code: Select all

#define reg(_addr)                   (*((volatile unsigned char*)(_addr)))
#define reg16(_addr)                 (*((volatile unsigned int*)(_addr)))

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior » Thu Dec 10, 2020 10:54 pm

nocash wrote:
Thu Dec 10, 2020 10:39 pm
Yeah, at least in the past some years. They are now asserting everything... not only interrupts... also chip selects and whatever. To me it appears to be a relative new habit, perhaps derived from elite universities, or maybe from google translate.
Yes, you would also "assert" a chip select, or any other pin input on a chip you want to make active. I don't think this term is new at all? I'm sure I've seen it for decades.

I'm not treating "assert" as an english word with any of those connotations. It is just the usual technical jargon I am familiar with for exactly this application.
nocash wrote:
Thu Dec 10, 2020 10:39 pm
EDIT: How about "request" interrupt instead of "trigger" or "assert"?
What I wrote on the wiki was "asserts an IRQ request". I considered "asserts an interrupt request" to avoid the redundancy with IRQ, but I thought it was less clear that way. I'd rather use the word request than expect the reader to have to think out IRQ as "interrupt request" mentally, because I don't think that comes as easily... and I think inelegantly including "IRQ" also helps, because IRQ doesn't reallly only mean the two words that its name was derived from.

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior » Fri Dec 11, 2020 12:21 am

aquasnake wrote:
Thu Dec 10, 2020 10:45 pm
...please test the following, if it causes problems, please feedback me:

Code: Select all

#define reg(_addr)                   (*((volatile unsigned char*)(_addr)))
#define reg16(_addr)                 (*((volatile unsigned int*)(_addr)))
Retesting my assumptions, I had slightly misremembered the effect of volatile, but I still have to hard recommend against using it for this purpose in cc65.

Putting volatile on a variable does not forbid optimizations, apparently. "asm volatile" does that. I had misremembered this point. So... the code generated won't be as bad performance as I mentioned, but this is also a big problem, because it is entirely unsafe to use for the same reason. From the cc65 documentation: "The volatile keyword has almost no effect."

cc65's optimizer assumes nothing is volatile. I think the keyword literally does not affect it at all? Any redundant loads/stores are permitted to be optimized away regardless of the volatile attribute.

Here's the first example of incorrect code I found. I don't want to spend all day searching for more examples, but this one didn't take long to get to and I've seen others in the past.

Code: Select all

// C, compile with -O
char i;
#define reg(_addr)                   (*((volatile unsigned char*)(_addr)))
void test1()
{
	reg(0x1000) = i;
	i = reg(0x1000);
}

; generated assembly
.proc	_test1: near
; reg(0x1000) = i;
	lda     _i
	sta     $1000
; i = reg(0x1000);
	sta     _i
	rts
.endproc
You can use --debug-opt-output to see exactly what the optimizer does here. The concept of "volatile" seems simply to no longer be known once the optimizer begins its work. I can't find a mechanism in cc65's code for the optimizer to know any load or store was volatile, and consequently none of its optimizations can account for it. Many of its optimizations will violate the meaning of volatile, as you can see from that example.

I'm not going to take the time to test reg16, as it doesn't seem relevant to the NES, but it's even scarier to me, because there are a lot of 16-bit optimizations in cc65 that can affect something like that. cc65 does not generate 16-bit code, so every write of that type will be done with two separate 8-bit writes, so even if the optimizer doesn't screw it up, splitting it into two writes seems problematic just by itself.


A separate example of the other thing I was saying it is inadequate for. I wouldn't call this invalid code generation, but it shows something important that volatile does not address in any way:

Code: Select all

// C
reg(0x1000) = reg(0x1000) + 1;
; assembly
	lda     $1000
	clc
	adc     #$01
	sta     $1000

// C
reg(0x1000) += 1;
; assembly
	inc     $1000
The C compiler does not know anything about dummy reads/writes within instructions. Volatility, which should definitely be affected by this, is not accounted for in any way. You will get different behaviour from the above code with a register like $2007 where dummy reads or writes matter. This doesn't affect every kind of MMIO register, but there are lots of cases where choice of instruction matters a lot, in a way that there's just no concept of in C.

On platforms other than the 6502 + cc65, this particular issue probably isn't a problem. Most C compilers would have an optimizer that knows what volatile means (cc65 is unusual in this way), and I'm not sure if dummy read/write cycles are really a thing in the same way on other platforms.


TLDR: I misspoke a little earlier, but the recommendation is the same: volatile is not reliable on cc65. Do your direct hardware access in assembly.

I used macros like you suggested in the past. They work a lot of the time, but over the years they've failed me too many times in too many confusing ways for me to think they're a good solution.

I've also used volatile in the past on other platforms with other compilers, where it generally works like we should expect it to, at least on MMIO. (I'd put a big caution against using volatile for multithreading, though, because it's generally very bad for that task, esp. on modern CPUs. I've seen people recommend this and end up with extremely incorrect code because of it.)

Oziphantom
Posts: 1080
Joined: Tue Feb 07, 2017 2:03 am

Re: Vs System Shared Memory

Post by Oziphantom » Fri Dec 11, 2020 12:34 am

Assert is a very common term. However Official Literature uses the phrasing "Held Low", so you can then also "Held High" or "Allowed to rise".
Triggers an IRQ is valid English, however in the context of Computing, a Trigger is a Edge Transition based event and also misleading as the IRQ pin, unlike the NMI pin, doesn't guarantee a trigger, its "Requests". So IRQs are not Triggered they are "Requested while Held Low." In that an IRQ can be ignored, while an NMI can't.

Since Nintendo use NMIs for VBlank, using SEI/CLI shouldn't be too much of a problem. In that you have a "gentleman's agreement" that you won't trigger more than 1 per frame and I would imagine both machines are on the "same clock". You can

CPU A : puts data into SRAM, Hold IRQ Low
CPU B : Gets IRQ, Reads Data, Writes Data Back, Hold IRQ Low, SEI
CPU A : Gets IRQ, Reads Data, Holds IRQ Hi, SEI

NMI
CPU A : CLI
CPU B : CLI

This is if you are running two copies of the Game on two machines and just need to sync score and input data between them.

If you are running one game, and using the other CPU as slave then it is different.

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior » Fri Dec 11, 2020 12:56 am

Oziphantom wrote:
Fri Dec 11, 2020 12:34 am
Assert is a very common term. However Official Literature uses the phrasing "Held Low", so you can then also "Held High"...
Assert means held low only if it's a pin that is active low. It means held high if it's a pin that's active high.

My point in using this term is so that the reader doesn't have to know that /IRQ on the 6502 is active low, or whether the signal from $4016:1 is being inverted, or other implementation details like that. "Assert" just cuts right through that and is saying that the IRQ request is active.

The phrase I chose of "asserts an IRQ request" is trying to make it redundant in a few ways so that if you don't know the jargon meaning of "assert", I think it might still be deduced that it means its making a request. Similarly, if I left out the word "request", someone who doesn't already know both "assert" and the etymology of "IRQ" might have a hard time putting the meaning together.

On the other side, I definitely don't want to use some plain English synonym like "making an IRQ request" because for the purpose of reference, the common technical term serves better. Making up your own words for things like this confuses experienced readers who are just trying to look up the information, and prevents learners from getting accustomed to the proper terms.

Anyhow, that's my long winded justification for that particular change. I kinda feel silly for writing that much about 4 words... I'm not married to that edit, but if you wanted an explanation for why I wrote that in particular, that's it.

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Vs System Shared Memory

Post by aquasnake » Fri Dec 11, 2020 1:06 am

Here's my nonsense kite flying :shock:
aaa.jpg


There is no need for slave to give the master handshake signal.

M2 is synchronous. After the master sends out IRQ (level), the logic circuit converts the level to trigger a short pulse, so as to ensure that the IRQ will not be triggered multiple times in the next frames

The shared RAM write to $6000 is processed while the slave is interrupted.

The master controls the IRQ by actively executing a delay routine(or external hardware timer) . When the delay time is up, it cancels the IRQ. At this time, the $6000 has been switched to the master, and the slave's write data can be synchronized

There is a certain agreement between the two sides that the waiting time for the master to issue IRQ must be longer than that for slave interrupt processing. After the slave processes the IRQ exit, there may be a lot of time to wait. Since the IRQ level has been processed to a short pulse, the slave will not be interrupted again if the original IRQ sent by the master maintains a fixed level after exiting the interrupt.

Until the time is up, msater writes 1 then 0 to & 4016.1 again.


The above is just my kite flying brainstorming, if someone reverses to extract the real running routine, thank you very much
Last edited by aquasnake on Fri Dec 11, 2020 7:10 pm, edited 3 times in total.

Fiskbit
Posts: 259
Joined: Sat Nov 18, 2017 9:15 pm

Re: Vs System Shared Memory

Post by Fiskbit » Fri Dec 11, 2020 1:06 am

I regularly see assert to mean to make a thing active, whether that's active low or high. I'm in favor of using it in this case. Using the definition instead of the word will certainly be clearer to someone who hasn't encountered this word in this context before, but less clear to people who know this standard computing terminology.

If it's an issue, then assert should be added to the nesdev glossary. (And some stuff probably should be removed from that page... "BFT"??)

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Vs System Shared Memory

Post by aquasnake » Fri Dec 11, 2020 1:22 am

Code: Select all

char i;
#define reg(_addr)                   (*((volatile unsigned char*)(_addr)))
void test1()
{
	reg(0x1000) = i;
	i = reg(0x1000);
}

#define reg(_addr) (*((volatile unsigned char*)(_addr)))
void test1()
{
volatile char i;
reg(0x1000) = i;
i = reg(0x1000);
}

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior » Fri Dec 11, 2020 1:30 am

aquasnake wrote:
Fri Dec 11, 2020 1:22 am

Code: Select all

#define reg(_addr)                   (*((volatile unsigned char*)(_addr)))
void test1()
{
        volatile char i;
	reg(0x1000) = i;
	i = reg(0x1000);
}
Did you test it? You get this, which is broken in exactly the same way:

Code: Select all

_test1:
	jsr     decsp1
; reg(0x1000) = i;
	ldy     #$00
	lda     (sp),y
	sta     $1000
; i = reg(0x1000);
	sta     (sp),y
	jmp     incsp1
What was the intent of adding volatile to i and putting it on the stack?

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Vs System Shared Memory

Post by aquasnake » Fri Dec 11, 2020 1:53 am

rainwarrior wrote:
Fri Dec 11, 2020 1:30 am

What was the intent of adding volatile to i and putting it on the stack?
Putting into stack is to save more memory.

In this case, compiler optimization affects the reentrant function. If the function itself is not called by interrupt, it will not be reentrant, and the optimization result is exactly what you expect.

If you want to allow functions to reenter, such self loop assignments, we will be cautious. Cc65 can not achieve such applications, such as porting a small RTOS. I don't know if cc65 can support function reentry. If cc65 can't, then this is not a problem of reading an address port, but reading other variables will also be optimized.

In the case of non reentrant, the solution can also use an intermediate variable temporary storage in other platform C compilers.

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior » Fri Dec 11, 2020 2:09 am

The problem with the code was that it never reads from reg despite the volatile attribute, because volatile doesn't work on cc65.

cc65 has no problem at all with re-entrant functions.

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Vs System Shared Memory

Post by aquasnake » Fri Dec 11, 2020 3:18 am

to rainwarrior

Minor modification can solve the problem

Code: Select all

char proof=0x55;

void test1()
{
    char tmp;
	reg(0x1000) = proof;
	proof = proof; // prevent compiler optimization
	proof = reg(0x1000);
}
bbb.png
bbb.png (3.55 KiB) Viewed 2109 times

lidnariq
Posts: 10273
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Vs System Shared Memory

Post by lidnariq » Fri Dec 11, 2020 11:51 am

... but at that point you have to read the asm in order to know that the optimizer isn't breaking things, and what's the advantage of writing something in C that looks utterly stupid instead of just linking to an external asm routine that does the right thing succinctly?

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior » Fri Dec 11, 2020 1:44 pm

Just as an addendum, the reason you can't just use inline assembly for this:

Code: Select all

// C
char m;
void asm_test()
{
	asm ("lda %v",m);
	asm ("sta $1000");
	asm ("lda $1000"); // this line will get optimized out??
	asm ("sta %v",m);
}

; assembly
_asm_test:
	lda     _m
	sta     $1000
	sta     _m
	rts
cc65's optimizer doesn't know the difference between inline assembly and its own generated code, therefore it's allowed to eliminate redundant stores like this, causing the exact same problem. (All inline assembly in cc65 is hugely flawed because of this, IMO.)

"asm volatile" avoids the problem, except but it works by disabling all optimization for the entire function it's used in. So you could do something like this, but everywhere you use it there's a performance bomb. :( That's the other case I was thinking of before when I misspoke. (At least this solution generates correct code, though, just slower.)

Code: Select all

#define poke(_addr,_v) { asm volatile ("lda #%b",_v); asm volatile ("sta %w",_addr); }
A function containing only asm volatile code would be fine, but that that point you're literally just writing the function in assembly anyway, just in an uglier syntax. Convenient if you don't already have an assembly source file elsewhere to stick the code, maybe.

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Vs System Shared Memory

Post by aquasnake » Fri Dec 11, 2020 5:11 pm

It is not only in the cc65 scenario to avoid compiler optimizing access to volatile variables.

Insert a meaningless self assignment statement to avoid compiler optimization from looking stupid, if you have any questions about it, it means that you have too little knowledge

In fact, I see this in arm code, such as in the mobile phone system.



This is a common phenomenon of C compiler(rather, it's the behavior of the optimizer), not a feature of cc65.
Last edited by aquasnake on Fri Dec 11, 2020 6:16 pm, edited 1 time in total.

Post Reply