Vs System Shared Memory

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
Goose2k
Posts: 320
Joined: Wed May 13, 2020 8:31 am
Contact:

Vs System Shared Memory

Post by Goose2k »

I have my ROM now accessing the Shared Memory of the Vs System at $6000, and it gets saved and loaded as expected (in emulator anyway). Which is awesome!

However, I think I am going to run into trouble once this is run on a Dual System (switching my ROM to Dual System just crashes Mesen).

Code: Select all

Controller and CHR ROM bank ($4016 write)
7  bit  0
---- ----
xxxx xCRS
      |||
      ||+- 1 then 0: Request a report from the joysticks or Zapper
      |+-- In the DualSystem, does two things:
      |    #1: When low, asserts /IRQ on the other CPU
      |    #2: On the primary CPU, when high, the primary CPU can access 2 KiB of shared RAM
      |         mapped in the $6000-$7FFF region. When instead low, the the secondary CPU instead
      |         can access the same physical memory.
      +--- Select 8 KiB CHR ROM bank for PPU $0000-$1FFF (mapper 99 games only)
           Note: In case of games with 40KiB PRG-ROM (as found in VS Gumshoe),
                 the above bit additionally changes 8KiB PRG-ROM at $8000-$9FFF.
I am really clueless to what this part means:
#1: When low, asserts /IRQ on the other CPU
#2: On the primary CPU, when high, the primary CPU can access 2 KiB of shared RAM
mapped in the $6000-$7FFF region. When instead low, the the secondary CPU instead
can access the same physical memory.
Are there any examples of ROMs that properly, and safely access this shared RAM across 2 games running in parallel?
Last edited by Goose2k on Tue Dec 08, 2020 12:57 am, edited 1 time in total.
User avatar
Goose2k
Posts: 320
Joined: Wed May 13, 2020 8:31 am
Contact:

Re: Vs System Shared Memory

Post by Goose2k »

I saw the charactize ROM by lidnariq, and took a look at it, but I don't think it is doing anything with $4016 to gain access to shared RAM, but I might be wrong!

To start, at a high level, what is the sequence of events need to READ the shared ram, and WRITE to the shared RAM?

Sorry, its so vague, but at this point I am really still not sure where to start. :)
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Vs System Shared Memory

Post by Oziphantom »

So basically how it works

Set $4016 to X1X
Now you have the 2K of RAM on the main Cpu.

modify the RAM, set you control byte/msg data whatever you come up with to instruct the other CPU what you want it to do.

Set $4106 to X0X
Now the other CPU gets the RAM and gets an IRQ. so its IRQ handler should check the shared RAM to find out what the other CPU wanted it to do and act accordingly.

There doesn't seem to be a way to send back safely or ack that is safe, so you just have to code the 2nd CPU such that is does it as fast as possible before it will get another.
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Vs System Shared Memory

Post by lidnariq »

I don't know of any examples of explicitly doing the handshake. However, it should look something like this:

1- Primary CPU drives OUT1 high.
2- Primary CPU does all operations it wants to the shared memory.
3- Primary CPU drives OUT1 low.
4- Secondary CPU receives step 3 as an interrupt.
5- Secondary CPU does all operations it wants to the shared memory.
6- Secondary CPU drives its OUT1 ... low? high? whatever?
7- Primary CPU drives OUT1 high, wait until next exchange.

Because the 6502's interrupts are level triggered, both CPUs must leave OUT1 idle high. I'm pretty certain that you can structure this as either the primary or secondary CPU starting the transaction. (i.e. secondary CPU drives OUT1 low, primary CPU says "ok, let me put all my data in place" or primary CPU says "I put all my data in place, secondary CPU please use this data")



My "characterize" ROM only verifies that the CPU has access to the RAM, no negotiating between CPUs.
User avatar
Jarhmander
Formerly ~J-@D!~
Posts: 568
Joined: Sun Mar 12, 2006 12:36 am
Location: Rive nord de Montréal

Re: Vs System Shared Memory

Post by Jarhmander »

How is the port pin $4016.2 wired between the CPUs? Is the one from the primary CPU directly wired to the IRQ line of the second CPU, and the $4016.2 port pin of the second CPU somehow connected to the primary CPU's IRQ line?
((λ (x) (x x)) (λ (x) (x x)))
User avatar
Goose2k
Posts: 320
Joined: Wed May 13, 2020 8:31 am
Contact:

Re: Vs System Shared Memory

Post by Goose2k »

Some follow up questions:

NOTE: CPU1 = Primary, CPU2 = Secondary

a) How does the program know if it is the primary or secondary CPU? I checked the Vs Tennis manual (which is 4 player) and I don't see any mention of dipswitches telling the software which CPU it is on, or that a particular EPROM set needs to go in a certain socket. So I assume this can be done from software somehow. (I just noticed this information is stored in $4016.7)

b) Is it safe to READ the shared memory even without first gaining "control" of the shared memory? Or is that needed for both WRITE and READ operations?

c) What does "Primary CPU drives OUT1 high". I assume this means flip bit $2016.2 to 1? (I've just not heard the terms "drives OUT1 high" before)

d) Would it not be valid to just constantly ping pong back and forth between the 2 cpus, rather than 1 CPU driving everything? eg. CPU 1 does what it needs to do, trigger IRQ on CPU2, CPU2 immediately does what it needs to do with shared RAM, triggers IRQ on CPU1, repeat forever. Perhaps this is wasteful of CPU time to be constantly triggering interrupts like this?

e) How does this work when you have 2 different games in a dual system (eg. Mario on one side, Ice Climbers on the other)? I see this setup a lot, and it seems like the 2 games would have conflicting Shared RAM usage (eg. Mario stores coins at $6000, and Ice Climbers stores credits at $6000). Perhaps games that use shared RAM are required to be duplicated across both machines?

f) "There doesn't seem to be a way to send back safely or ack that is safe, so you just have to code the 2nd CPU such that is does it as fast as possible before it will get another." Couldn't CPU1 be waiting for IRQ as well before attempting to use the RAM?

g) This still confuses me:

On the primary CPU, when high, the primary CPU can access 2 KiB of shared RAM mapped in the $6000-$7FFF region. When instead low, the the secondary CPU instead can access the same physical memory.

What is the behaviour on the Secondary CPU when the bit is LOW/HIGH? This sentence seems to only be describing the meaning of writing the bit from the primary CPU perspective. Is this because only the primary CPU can write that bit? Can CPU2 not trigger the IRQ on CPU1?
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Vs System Shared Memory

Post by lidnariq »

Jarhmander wrote: Tue Dec 08, 2020 5:42 am How is the port pin $4016.2 wired between the CPUs? Is the one from the primary CPU directly wired to the IRQ line of the second CPU, and the $4016.2 port pin of the second CPU somehow connected to the primary CPU's IRQ line?
Assuming you mean "$4016.1", yes, exactly that, OUT1 on both CPUs is connected to /IRQ on the other CPU. The /IRQ pins are otherwise inaccessible.
Goose2k wrote: Tue Dec 08, 2020 10:21 am b) Is it safe to READ the shared memory even without first gaining "control" of the shared memory? Or is that needed for both WRITE and READ operations?
No, the RAM is only available to one CPU at a time, period.
c) What does "Primary CPU drives OUT1 high". I assume this means flip bit $2016.2 to 1? (I've just not heard the terms "drives OUT1 high" before)
$4016.1, in other words LDA #2 / STA $4016. Is there another term that you would have already understood to mean that?
d) Would it not be valid to just constantly ping pong back and forth between the 2 cpus, rather than 1 CPU driving everything? eg. CPU 1 does what it needs to do, trigger IRQ on CPU2, CPU2 immediately does what it needs to do with shared RAM, triggers IRQ on CPU1, repeat forever. Perhaps this is wasteful of CPU time to be constantly triggering interrupts like this?
When will the CPUs do anything else? Keep in mind they still have to upload data to the PPU during vblanking, and the two PPUs will forever remain perfectly synchronized so both vblanks happen during the exact same moments: you'll really need to keep your OAM shadow and PPU update queue in non-shared memory.
e) How does this work when you have 2 different games in a dual system (eg. Mario on one side, Ice Climbers on the other)? I see this setup a lot, and it seems like the 2 games would have conflicting Shared RAM usage (eg. Mario stores coins at $6000, and Ice Climbers stores credits at $6000). Perhaps games that use shared RAM are required to be duplicated across both machines?
2-screen (2- or 4- player) games require the same game on both sides. 1-screen (1- or 2- player) games ignore /IRQ and – with the exception of Vs. SMB – don't use the shared memory.

You could set up your game to automatically detect if there's one or two copies of your game installed, and automatically switch between 1-screen and 2-screen versions. But the existing games don't.
f) "There doesn't seem to be a way to send back safely or ack that is safe, so you just have to code the 2nd CPU such that is does it as fast as possible before it will get another." Couldn't CPU1 be waiting for IRQ as well before attempting to use the RAM?
Yes. One side sends an IRQ, other side marks completion with another IRQ. The secondary CPU can detect when it's lost access to shared memory as well: when it doesn't have access the corresponding region will be open bus.

(For example, you could read from $6000 and $7800 and if the first byte is $60 and the second byte is $78 even though they would point to the same address of physical memory, you know that the secondary CPU does not currently have access to the shared memory)
What is the behaviour on the Secondary CPU when the bit is LOW/HIGH? This sentence seems to only be describing the meaning of writing the bit from the primary CPU perspective.
Correct, only the primary CPU can control which side gets the RAM.
Is this because only the primary CPU can write that bit? Can CPU2 not trigger the IRQ on CPU1?
No? That works fine?

Please help me rephrase this to be clearer, I really don't know how I can make this clearer:
In the DualSystem, does two things:
#1: When low, asserts /IRQ on the other CPU
#2: On the primary CPU, when high, the primary CPU can access 2 KiB of shared RAM mapped in the $6000-$7FFF region. When instead low, the the secondary CPU instead can access the same physical memory.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior »

Trying to work out how these things are supposed to acknowledge each other... it seems a little bit weird, just because it seems like the IRQ is the only signal you can use, so knowing when that signal has been received seems tricky. I think I'd want to do it something like this:


During most of the frame, $4016:1 is high for both CPUs. Primary is free to update shared RAM as much as it wants. When it's done it can enter a wait loop, waiting for an IRQ from secondary to know it's also ready.

1. Secondary finishes its update. Sets $4016.1 low to signal ready, enters loop awaiting IRQ from Primary.

2. Primary will finish its update and end in a wait loop, but it might receive the IRQ from Secondary before that time. When the IRQ is received, it should set a flag to indicate secondary is ready, then disable interrupts on the stack flags and RTI to return to the rest of its update.

3. Once primary finishes its update and enters its wait loop, it will now check the ready flag. When it knows Secondary is ready, it will set $4016:1 low to signal secondary that it's time for it to use shared RAM. Finally, it will CLI to put itself in an "IRQ loop" that secondary will break by eventually returning $4016:1 high on its side.

4. Secondary receives the IRQ and processes shared RAM. When finished, it sets $4016:1 high, releasing Primary from its IRQ loop. Enters its own "IRQ loop" waiting for Primary to acknowledge by releasing it.

5. On Primary, immediately following the CLI, it will set $4016:1 high to release secondary.

6. Both sides now know the exchange is complete. Can resume normal operation.


Expressed in code:

Code: Select all

;
; Primary CPU
;

.zeropage
irq_loop: .res 1
secondary_ready: .res 1

.code
game_loop:
	; ... wait for NMI, do NMI stuff in handler
	; ... do game update
	; ... update shared RAM
	; now we're ready to hand it to Secondary, wait for an IRQ asserted from it to begin
:
	lda secondary_ready
	beq :-
	; tell Secondary to use shared RAM now
	lda #0
	sta $4016
	; enter IRQ loop to wait for Secondary to finish
	lda #1
	sta irq_loop
	cli ; waits here until Secondary turns off IRQ signal
	; Secondary is finished, release its IRQ and go to next frame
	lda #0
	sta irq_loop
	sta secondary_ready
	lda #2
	sta $4016
	jmp game_loop

irq:
	bit irq_loop
	beq :+
	rti
:
	pha
	txa
	pha
	lda #1
	sta secondary_ready
	tsx
	lda $103, X
	ora #4 ; disable IRQ
	sta $103, X
	pla
	tax
	pla
	rti

;
; Secondary CPU
;

.zeropage
irq_loop: .res 1
primary_ready: .res 1

.code
game_loop:
	; ... wait for NMI, do NMI stuff in handler
	; ... do game update
	; tell Primary we're ready to use shared RAM by asserting its IRQ
	lda #0
	sta $4016
:
	lda primary_ready
	beq :-
	; ... read/write shared RAM now
	; tell Primary we're finished with shared RAM by releasing IRQ
	lda #2
	sta $4016
	; enter IRQ loop to wait for Primary to acknowledge
	lda #1
	sta irq_loop
	cli ; waits here until Primary turns off IRQ signal
	; Primary has acknowledged that we're finished, go to next frame
	lda #0
	sta irq_loop
	sta primary_ready
	jmp game_loop

irq:
	bit irq_loop
	beq :+
	rti
:
	pha
	txa
	pha
	lda #1
	sta primary_ready
	tsx
	lda $103, X
	ora #4 ; disable IRQ
	sta $103, X
	pla
	tax
	pla
	rti
Summary:

Code: Select all

P: .........****************....
S: ....****************.........
       1    2    3    4    5
* = asserting IRQ for other CPU.
  • 1. Secondary ready for exchange, signals with IRQ.
  • 2. Primary ready for exchange, signals with IRQ and hands off.
  • 3. Secondary takes its turn with the shared RAM.
  • 4. Secondary signals finish by releasing IRQ.
  • 5. Primary acknowledges finish by releasing IRQ.
I guess this would work just as well the other way (i.e. Primary signalling first) but this was the order as I thought of it.
Last edited by rainwarrior on Tue Dec 08, 2020 4:12 pm, edited 2 times in total.
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Vs System Shared Memory

Post by lidnariq »

Yeah, basically the ingredients available for the handshake are:

1- If interrupts are enabled, secondary CPU knows that primary CPU asserts /IRQ because: secondary CPU jumps to IRQ vector
2- If interrupts are enabled, primary CPU knows that secondary CPU asserts /IRQ because: primary CPU jumps to IRQ vector
3- If interrupts are disabled, secondary CPU knows that primary CPU asserts /IRQ because: RAM is found in the $6000-$7FFF region, instead of open bus.
4- If interrupts are disabled, primary CPU has no way to detect status of its /IRQ pin

Because of #4, I'd probably personally choose a design that had the primary CPU signal first.
User avatar
Goose2k
Posts: 320
Joined: Wed May 13, 2020 8:31 am
Contact:

Re: Vs System Shared Memory

Post by Goose2k »

lidnariq wrote: Tue Dec 08, 2020 12:31 pm Please help me rephrase this to be clearer, I really don't know how I can make this clearer:

I'm still very new to all this, so there are a lot of gaps in my knowledge that I don't think could be filled easily in this one article. :D

I'm not sure if it's implied elsewhere, but this one line is very informative for me: "Only the primary CPU can control which side gets the RAM. Both CPU1 and CPU2 can trigger IRQ on the other CPU."
Fiskbit
Posts: 890
Joined: Sat Nov 18, 2017 9:15 pm

Re: Vs System Shared Memory

Post by Fiskbit »

Maybe something like this? "On the primary CPU, controls which CPU can access 2 KiB of shared RAM mapped in the $6000-7FFF region. When high, only the primary CPU can access the shared RAM. When low, only the secondary CPU can. The CPU that cannot access the shared RAM sees open bus." It was not entirely clear to me until reading your earlier post that access when the RAM wasn't configured for the current CPU would result in open bus.
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Vs System Shared Memory

Post by nocash »

The wiki page would be clearer when saying "0" and "1" instead of "low" and "high". Especially as NES/SNES controller port signals are usually inverted, ie. 1=Low, 0=High - so everyone would automatically assume that "low" does probably (not) mean the opposite of what you (didn't) think.

The schematic on the Vs. System wiki page doesn't show inverters between output and /IRQ input pins. So I guess it is "0=Low=/IRQ"?
Unless the output itself is being inverted inside of the CPU.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Vs System Shared Memory

Post by rainwarrior »

That makes me wonder whether I interpreted it correctly... "low" meant writing a 0 to $4016:1, right?
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Vs System Shared Memory

Post by lidnariq »

Yes. The 2A03 is a normal one, so connecting OUT1 to the other CPU's /IRQ means that low=0=assert IRQ
User avatar
Goose2k
Posts: 320
Joined: Wed May 13, 2020 8:31 am
Contact:

Re: Vs System Shared Memory

Post by Goose2k »

Edit: I didn't see lidnariq had already replied. Sorry for the redundant info.

Writing a value of "0000 0010" to $4016 gives access to the shared RAM. I have run this code on a physical cab now, and it works. Without writing that value, the calls to set values in SRAM fails.

Code: Select all

// Take control of SRAM.
POKE(0x4016, 2);
		
if (xram_test[1] != 6)
{
	// These are stored in SRAM.
	xram_test[1] = 6;
	memfill(high_scores_vs_initials, '-', sizeof(high_scores_vs_initials));
	memfill(high_scores_vs_value, 0xff, sizeof(high_scores_vs_value) * 4);
}

// Reliquish control of SRAM.
POKE(0x4016, 0);		
Unrelated Side Note:

I ran into an interesting bug with this. Initially I found that I could write to SRAM and retrieve the values right after, even without setting the bit in $4016. Something like:

Code: Select all

test_val = 1; // fails
print(test_val); // still prints 1!
Turns out, the compiler was optimizing things for me, not actually getting test_val on the second line, but instead using the value of 1 that was still in the accumulator. :shock:

Code: Select all

	lda     #$01
	sta    test_val ; fails.
	jsr     pusha
...
Maybe obvious to most people but it took me a while to figure out what was going on.
Post Reply