nesdev.com
http://forums.nesdev.com/

Putting second PPU inside cartridge
http://forums.nesdev.com/viewtopic.php?f=9&t=19033
Page 1 of 1

Author:  krzysiobal [ Wed Jun 26, 2019 11:33 am ]
Post subject:  Putting second PPU inside cartridge

Mak
I have a crazy idea of making a game-genie adapter with second (external) PPU. Why? That would allow to generate RGB video without any modification inside the console - one would plug cartridge inside the adapter and adapter inside the console and voila. Adapter would have video output jack and maybe even joypad ports to connect exernal joypads.
Code:
 +-------------------+
 | Regular cartridge |
 |                   |
 |||||||||||||||||||||
 
          |
         \ /
 _____________________
 \___________________/
 |  PPU + RAM + FPGA |
 |                   |
 |||||||||||||||||||||


There are some issues that I know about:
* Internal PPU cannot be disconnected from of the bus $2000-$2007,
* Internal PPU cannot be disconnected from the /NMI line and external PPU cannot be connected to it (because it's not present in the cartridge connector).

But here are my solution:
* Let's first concentrate on the Famicom, so the external PPU will be RP2C02 (or UA6528), clocked by external 21 MHz crystal.
* There will be also 74373, external CIRAM and some FPGA chip in the adapter.
* All PPU signal lines from the console connector will be ignored
* FPGA inside adapter is constantly snooping the CPU bus, remembering internally all three last data values that appear on the bus.
* All CPU writes to $2000-$3fff can be without harm executed by both internal and external PPUs
* If CPU is reading from $2000-$3fff then first it gets byte from the internal PPU. The values that appeared on the bus before had to be like:
Code:
   R/W   Address      Data      Info
   ---   -----------  ----      -----
    1    x            d1
    1    x + 1        d2        =$00-$07
    1    x + 2        d3        =$20
    1    $2000-$2007  *         Data read from the internal PPU
But then FPGA injects those cycles:
    1    x + 3        d1       driven by FPGA
    1    x + 4        d2       drigen by FPGA   
    1    x + 5        $80     driven FPGA
    1    $8000-$8007            This will force a read cycle from $8000-$8007,
                                and FPGA will put cartridge /ROMSEL=1 and enable external PPU for that cycle,
                                making the value read from internal PPU to be replaced with the one from external
                                (no matter if that was LDA, LDX, LDY or BIT opcode)
    1    x + 6        $F0/$D0  (now FPGA injects BEQor BNE opcode, depending if the previous
                               value read from data bus was zero or non-zero)
    1    x + 7        -4       to branch to x + 3 so that execution of regular code continues
 


This will add 4 cycle delay and will not work if the original PPU read was executed under from $0000..$7fff (unfortunately). Some games execute code from RAM, but I did not spot any that would execute PPU read opcodes from RAM.

* If the internal PPU pulls /NMI down (which can be guessed by the FPGA when $FFFA read occur), FPGA starts injecting constantly $4C (jmp $4c4c) opcode.
But FPGA also monitors /NMI of external PPU and when it sees that it goes down, it stops injecting $4c and injects $jmp (fffa) opcode

* If the external PPU pulls /NMI down (but the internal one doesn't), I think the only possible approach is just to ignore it and wait the /NMI to be pulled by internal one too.

Because both internal & external PPUs works (almost) exactly the same, the timing difference by them probably is only around a few cycles every frame (caused by the 21 MHz clock jitter) but it will equalize in every NMI.

What do you think?

I used similar approach to replace CPU reads from $4016/4017 to connect external PS/2 keyboard which can be used instead of joypads and it worked:
https://www.youtube.com/watch?v=Z0_mR2494M0

Author:  supercat [ Wed Jun 26, 2019 12:52 pm ]
Post subject:  Re: Putting second PPU inside cartridge

krzysiobal wrote:
Mak
I have a crazy idea of making a game-genie adapter with second (external) PPU. Why? That would allow to generate RGB video without any modification inside the console - one would plug cartridge inside the adapter and adapter inside the console and voila. Adapter would have video output jack and maybe even joypad ports to connect exernal joypads.


If you do an FPGA emulation of the PPU, I think you could make things even simpler, at least for games that never display sprites from corrupt OAM storage and access palette entries straightforwardly. Have the cart start by running code to force both PPUs to the same known state, and then simply have the new PPU track the state of the PPU registers, palette, and OAM. If it's initialized the same way, and all operations upon either are visible to both, it should need the contents of memory at the same time as the PPU accesses it. Note that the FPGA-emulated PPU would only need to keep track of the main configuration and horizontal fine-scroll registers, the OAM, an "is this a palette access or not" flag, and the bottom five bits of OAMADDR. It wouldn't need to worry about computing its own addresses, because the main PPU would do that.

The only difficulty I can see is that if the OAM becomes corrupt, there would be no way for a slave PPU to know what the internal PPU thinks sprites' Y positions are. If the external PPU thinks that the only sprites on line 57 is sprites 8-15, it will expect the data bus to contain the data for those sprites during sprite fetches. If, however, the internal PPU thinks sprite 2 is there, the data bus would contain data for sprites 2 and 8-14 during those cycles, and the slave PPU would have no way of knowing that such a thing had occurred. For most games, however, that really shouldn't be a problem.

Author:  Ben Boldt [ Wed Jun 26, 2019 3:17 pm ]
Post subject:  Re: Putting second PPU inside cartridge

I have an idea that I have really been tossing around lately. Since the built-in PPU does unique fetches for the pixels and palette selection for each 1x8 pixels, theoretically you could build a cartridge that contains a fully rasterized buffer of the screen, and feed the PPU all the right 1x8 chunks of data so it builds that on the actual screen, no longer relying on available 8x8 tiles in the pattern table or 16x16 attribute table palette selections. Tom Murphy actually already did this with a Raspberry Pi, he even ran an SNES emulator in the raspberry pi and displayed it on the NES's screen this way. If I'm not mistaken, also getting the controller data from the NES. Unfortunately the raspberry pi's OS interferes with the responsiveness of the IO pins so the screen is sort of glitchy. But it does work and completely proves the concept. There are videos on Youtube, he explains how he did it and shows it running. He plays Super Mario World, the real one, on his NES, and fit it all into a cartridge. Check it out if you haven't seen it.

I think it is possible to do this glitchless with a dual-core microcontroller and some supporting logic. One core being dedicated to feeding the PPU, the other core draws whatever is on the screen. The video buffer being shared memory between the cores, basically dual-port RAM. The supporting logic would trigger input change notification interrupts of the microcontroller and handle memory mapping.

Author:  lidnariq [ Wed Jun 26, 2019 4:52 pm ]
Post subject:  Re: Putting second PPU inside cartridge

Ben Boldt wrote:
Check it out if you haven't seen it.
Rasteri has the "right" way to do that, using a explicit USB FIFO instead the RasPi's terrible GPIO. viewtopic.php?t=16807

Author:  Ben Boldt [ Wed Jun 26, 2019 7:47 pm ]
Post subject:  Re: Putting second PPU inside cartridge

Nice! So a raspberry pi is a legit way to do this, using the USB / parallel adapter, cool.

I have more ideas about this, but I do not mean to steal this thread.

Author:  supercat [ Wed Jun 26, 2019 11:12 pm ]
Post subject:  Re: Putting second PPU inside cartridge

Ben Boldt wrote:
I think it is possible to do this glitchless with a dual-core microcontroller and some supporting logic. One core being dedicated to feeding the PPU, the other core draws whatever is on the screen. The video buffer being shared memory between the cores, basically dual-port RAM. The supporting logic would trigger input change notification interrupts of the microcontroller and handle memory mapping


I think a relatively modest single-core micro (e.g. an ARM Cortex-M0) could accomplish this without even needing a whole lof of I/O. Drive the CPU data bus through some 1K resistors, and use a 74HC373 or equivalent to drive the PPU data bus in response to either (phi2 nand R/W) or /PPURD, respectively. Additional inputs would be needed for CPU A0, /ppurd, and PPU A12. Data pins could be shared between the CPU and PPU by having the 6502 run a loop within WRAM whenever rendering is in progress.
On startup, the micro would start by outputting something like $A9 except when it sees two odd addresses in a row, in which case it would output $4C. Once the CPU starts fetching the reset vector, the sequence of accesses would be:

FFFC - A9
FFFD - A9
A9A9 - 4C -- Second odd address in a row
A9AA - A9
A9AB - A9
A9A9 - 4C -- Second odd address in a row

The sequence even-odd-odd-even-odd-odd etc. would repeated indefinitely once the micro has established control of the 6502, but would be unlikely to repeat many times before that. Once control is established, the micro could feed a sequence of LDA/STA instructions to build a routine in RAM which would wait until rendering is done, poll the data bus for a "ready' signal, and then start interacting with the micro.

Returning to the subject of this thread. designing an on-cartridge PPU system for use with a particular game would be simple and straightforward, but even after further consideration, I still don't see any way of keeping an internal and external PPU in sync in case of OAM corruption. What I don't know is how many games if any would be affected by such issues.

Author:  krzysiobal [ Thu Jun 27, 2019 2:31 am ]
Post subject:  Re: Putting second PPU inside cartridge

Why it matters if external and internal PPU's are perfectly synced even in what's they rendering?
Internal PPU will not have access to the cartridge's pattern tables so it will output garbage, and for example - sprite 0 hit bit might be invalid.

What I just need is to make the internal and external PPUs to be synced in the time when they stop rendering frame, so that the NMI of both occurs at (almost) the same time. Does corrupted OAM affects frame timing?

A few games (for example those from CodeMasters) calculate in loop number of CPU cycles between two frames (by polling $2002.7) and high byte of that 16 bit number decides whether game should treat console like PAL or NTSC.
In that approach, adding 4 cycles to every $2000-2007 read might significantly change the result. So I think that better is to connect external PPU pins 2..9 to FPGA and every read of $2000-$3fff should trigger both PPUs. And if FPGA sees that the data returned by both differs, it will "inject" additional read cycle to replace the data byte with correct one.

Author:  supercat [ Thu Jun 27, 2019 9:00 am ]
Post subject:  Re: Putting second PPU inside cartridge

krzysiobal wrote:
Why it matters if external and internal PPU's are perfectly synced even in what's they rendering?
Internal PPU will not have access to the cartridge's pattern tables so it will output garbage, and for example - sprite 0 hit bit might be invalid.


The external PPU wouldn't need to care about the sprite 0 hit, since the CPU would get that information from the main PPU.

Quote:
What I just need is to make the internal and external PPUs to be synced in the time when they stop rendering frame, so that the NMI of both occurs at (almost) the same time. Does corrupted OAM affects frame timing?


The external PPU wouldn't need to do anything with the NMI, since the main CPU would receive an NMI from the main PPU.

Quote:
A few games (for example those from CodeMasters) calculate in loop number of CPU cycles between two frames (by polling $2002.7) and high byte of that 16 bit number decides whether game should treat console like PAL or NTSC.
In that approach, adding 4 cycles to every $2000-2007 read might significantly change the result. So I think that better is to connect external PPU pins 2..9 to FPGA and every read of $2000-$3fff should trigger both PPUs. And if FPGA sees that the data returned by both differs, it will "inject" additional read cycle to replace the data byte with correct one.


I'm not sure where the four-cycle notion comes from. I don't see any reason that anything the program does should be affected by the existence of the second PPU.

The reason the OAM matters is that at any moment in time, the only byte of PPU memory available to the second PPU will be the exact byte (if any) that is being accessed by the main PPU at that moment, and the main PPU will fetch data for those sprites that its OAM says should be displayed on that line.

Author:  Ben Boldt [ Fri Jun 28, 2019 6:08 am ]
Post subject:  Re: Putting second PPU inside cartridge

I would think there should be a somewhat simple way to recreate the PPU clock from the CPU clock so that you can keep a synchronized clock. Basically we normally have something like this:

Code:
   _____    _____    _____
__|     |__|     |__|     |__| CPU
   __    __    __    __    __
__|  |__|  |__|  |__|  |__|  | PPU

I think you could have this logic to recreate the PPU clock from CPU clock:
- Any time CPU clock has any edge, toggle the new PPU clock
- Any time CPU clock has wide pulse, add an extra toggle to the new PPU clock.

You could end up with something like this, which remains fully synchronized:
Code:
   _____    _____    _____
__|     |__|     |__|     |__| CPU
   ___   __     _    ___   __
__|   |_|  |___| |__|   |_|  | New PPU

Or maybe a better approach:
- Any time CPU clock has any edge, toggle the new PPU clock (same)
- Any time CPU clock has a rising edge (my assumption here, didn't check), trigger a delay which causes an extra toggle to the new PPU clock.

Then your new PPU clock isn't jittery anymore:
Code:
   _____    _____    _____
__|     |__|     |__|     |__| CPU
   __    __    __    __    __
__|  |__|  |__|  |__|  |__|  | New PPU

Also, you could increase or remove the sprite limit if you recreate the PPU, and even offer it as an internal replacement PPU, for use inside Famicom, NES, Playchoice-10, etc. We do have Hi-Def NES which is great but still requires original PPU, too expensive, too blow-up-y, and not fixable or supported thereafter.

Author:  krzysiobal [ Fri Jun 28, 2019 7:12 am ]
Post subject:  Re: Putting second PPU inside cartridge

I made a special PCB for testing purposes.
Image Image

First (lazy) approach, as Supercat suggested (or did I misunderstood him)) was just to force all CPU access of $2000-$3fff to both PPUs (and discard the value read from external PPU).
This is the video got from the external PPU.
As you can see, the vertical "artifact" area (which corresponds to the VBLANK moment) is slowly marching up, which is due to crystal clock difference.

https://www.youtube.com/watch?v=Ms2qYMMWiIs

Author:  tepples [ Fri Jun 28, 2019 7:28 am ]
Post subject:  Re: Putting second PPU inside cartridge

Would it work to multiply M2 by 12 using a PLL or 11 or 13 while waiting for genlock?

Author:  krzysiobal [ Fri Jun 28, 2019 10:27 am ]
Post subject:  Re: Putting second PPU inside cartridge

PLL should help in keeping both PPUs run at the same speed, but it must be 100% exactly the same clock to make it work indefinitely.

As shown in the video - after restarting the console (soft reset) or powering it down and up for shord periods (<10sec), there is still artifact in the center, which disappars only aftter long power down (>10sec).

It makes me think that since the very first clock, there is some counter inside PPU that clocks the frame timing and moment when it generates VBLANK is deterministic, not dependent on disabling and enabling rendering
Image

I must implement the method described in first topic to sync two PPUS ("when the internal one generates VBLANK but external does not, program code must wait).

Author:  lidnariq [ Fri Jun 28, 2019 10:39 am ]
Post subject:  Re: Putting second PPU inside cartridge

krzysiobal wrote:
It makes me think that since the very first clock, there is some counter inside PPU that clocks the frame timing and moment when it generates VBLANK is deterministic, not dependent on disabling and enabling rendering
On the 2C03,4,5,7 that's correct; frame timing is a function of when the PPU is released from reset only. There's no missing dot to slip against.

On the 2C02, however, we know that the missing dot depends on rendering being enabled at the start of the first visible scanline, which is after a scanline's worth of fetches.

Author:  supercat [ Fri Jun 28, 2019 3:46 pm ]
Post subject:  Re: Putting second PPU inside cartridge

lidnariq wrote:
krzysiobal wrote:
It makes me think that since the very first clock, there is some counter inside PPU that clocks the frame timing and moment when it generates VBLANK is deterministic, not dependent on disabling and enabling rendering
On the 2C03,4,5,7 that's correct; frame timing is a function of when the PPU is released from reset only. There's no missing dot to slip against.

On the 2C02, however, we know that the missing dot depends on rendering being enabled at the start of the first visible scanline, which is after a scanline's worth of fetches.


The clock fed to the external PPU will need to run exactly one cycle for every cycle of the internal one except that on startup, if the two PPUs don't start in sync with each other (a normal scenario) it may be necessary to have the slave PPU's clock input omit pulses until both PPUs are in sync. Have the cart start by running its own code until the two PPUs are in sync, at which point it can enable the external ROM and start execution from there.

Page 1 of 1 All times are UTC - 7 hours
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/