It is currently Fri Dec 15, 2017 8:33 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Mon May 04, 2015 12:50 am 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
This is just an idea I had.

Each PPU fetch takes 2 cycles (the actual fetch occuring on the second cycle), and each CPU cycle takes 3 PPU cycles.

This would mean that, for each CPU cycle, there is always at least 1 PPU cycle where the PPU is not fetching, in which time, the fetch can go to the CPU instead.

During the phase where the PPU is not fetching, the RAM's address and data lines can be routed to the CPU instead. If the CPU wants to write, it can all pass through. If the CPU wants to read however, the byte fetched from the ram would need to go to a latch, which would feed the CPU's data lines. This is because the byte would need to be held on the CPU's data lines until the CPU is finished making the fetch, and the PPU making a fetch would disrupt this.

This would hypothetically allow one ram chip to feed both the PPU and the CPU. The obvious advantage is that the memory is accessible by both busses simultaneously. That means you can prepare some graphics memory in the CPU's address space and then swap it into the PPU, freeing you up from vblank bandwidth and allowing you to prepare offscreen buffers.

The requirement would be RAM that has a quick enough response time to be usable at the PPU's clock rate, since the address would be changing every PPU cycle, alternating between the CPU's and the PPU's busses.

Are there any reasons why this wouldn't work? If not, I was going to draft it into my mapper design. 32kb of memory usable as pattern tables, nametables, extended attributes, and general purpose wram, together with vblank-agnostic video updates would be an incredibly powerful feature.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 1:57 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3488
Location: Indianapolis
I remember playing around with that idea a while back, the trouble for me was that I didn't know how to tell which cycle the PPU is on. There's the /ALE signal, but that doesn't come out to the cartridge. I guess a delay thing could be done, based on the PPU /RD or /WR signal, but I'm not sure of the best way to get that, or is there another way you have in mind?


Top
 Profile  
 
PostPosted: Mon May 04, 2015 5:53 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10165
Location: Rio de Janeiro - Brazil
This is a great idea! I, as a programmer, would love to have a big chunk of VRAM that could also be used as WRAM in any way I saw fit. I can't contribute to the hardware design at all though, since that's not my area of expertise, so sorry! =)


Top
 Profile  
 
PostPosted: Mon May 04, 2015 8:14 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5898
Location: Canada
Isn't this what dual port RAM is for? Why do you need to get funky with the PPU timing?


Top
 Profile  
 
PostPosted: Mon May 04, 2015 8:37 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19348
Location: NE Indiana, USA (NTSC)
I think this is an attempt to provide a dual-port-like front-end to "normal" 6264/62256 SRAM, giving the CPU access while PPU /RD is high (inactive). It sort of reminds me of the multiplexing of memory between video and CPU in the Commodore 64 and Apple II. The only dual-port RAM that I've seen used in NES Game Paks is the ExRAM in the MMC5. What other dual-port RAM is affordable?


Top
 Profile  
 
PostPosted: Mon May 04, 2015 9:55 am 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
Yeah, it's an attempt to multiplex "normal" ram which, as Tepples said, is more easily available than dual-port ram. Also, on other systems where actual dual-port ram is used for both the CPU and the graphics, accessing the graphics memory during rendering causes "snow" artifacts to appear on screen, so my method shouldn't have those glitches since there's only one access at a time.

This information on the wiki, specifically "During this cycle, the value is read from or written to the lower eight address pins", I'm lead to believe that there's a possibility that during the first cycle, neither the /rd nor /wr pins are asserted, since the PPU is outputting a garbage address. Performing a write at that time would be destructive, and performing a read would result in bus conflicts as the PPU is trying to latch the lower 8 bits of the address, from pins that are multiplexed as both address and data lines. I have no idea if this is true though.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 10:19 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19348
Location: NE Indiana, USA (NTSC)
In any case, you'll have to isolate the PPU address bus from the memory while the CPU is accessing the memory and vice versa. This would probably require a buffer IC as big as the MMC5.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 11:09 am 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
tepples wrote:
In any case, you'll have to isolate the PPU address bus from the memory while the CPU is accessing the memory and vice versa. This would probably require a buffer IC as big as the MMC5.

A data selector wouldn't work? Both busses go into the selector, and one pin on the selector determines which bus gets connected to the RAM. That way, both busses connect to the RAM. When either PPU /RD or PPU /WR are asserted, the PPU's bus is selected, with the CPU selected otherwise.

Granted, the mapper's banking pins would have to be routed to the selector as well, so it'd be a lot of pins, but it's not a hugely complicated circuit.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 11:58 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19348
Location: NE Indiana, USA (NTSC)
What you mean by "data selector" is what I meant by "buffer IC". The address lines alone would need 39 pins (13 from CPU, 13 from PPU, 13 out to RAM), plus a bunch more for select lines. It'd have to latch the data bus too in order to satisfy the CPU's setup and hold timing if it services a read from the CPU close to a read from the PPU. So with the majority of the cart bus as well as all RAM signals, you're looking at a minimum close to 80 pins.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 3:27 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3488
Location: Indianapolis
Multiplexing the I/O is the easy part. To expand on what I posted earlier, it seems like the real trick to this is in how the control signals for the memory will be generated. PPU /RD and /WR is easy, it's faster and will have to take priority, but the /RD, /WR, and /CE pulses for the CPU accesses will have to come from somewhere. I can see how to do it with an FPGA or a CPLD with a counter in hardware, you'd have a fast clock input and could arbitrate the memory access and generate the control signals based on periods of that clock. I've done a little planning on a mapper that would work like this. Before I started using programmable logic, I used to try to figure out how to make this work with 74HC parts but was never able to, without bringing in some kind of expensive hardware to help.

The NES CPU memory can be really slow, a Game Genie I modified is running my code on a 450ns EPROM. Seems to work OK at that speed.

On a related note, it's worth considering that the CPU writing to memory is slower than writing to the $2007 port. Now you need addressing modes, you lose the auto-increment, etc. But OTOH, writes to VRAM are generally being pre-buffered in RAM by the CPU anyways.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 3:49 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19348
Location: NE Indiana, USA (NTSC)
Pre-buffering works if you don't need random access. But for anything resembling LZ77 compression, you need random access if you're going to be using back-references longer than 256 bytes. You also need random access if you're planning to store parts of the game state in unused parts of VRAM.


Top
 Profile  
 
PostPosted: Mon May 04, 2015 7:24 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10165
Location: Rio de Janeiro - Brazil
tepples wrote:
But for anything resembling LZ77 compression, you need random access if you're going to be using back-references longer than 256 bytes.

Why are you mentioning this arbitrary amount of memory like it was the absolute maximum anyone would dedicate to this purpose?

I don't see why someone with less game state to keep track of couldn't decide to use 512 or more bytes for their LZ buffer. On the other end, another programmer might have so little free RAM that he can't even spare 64 bytes.

And there's also the actual decoding process to consider... if speed isn't a concern, one can very well read and write through $2006/$2007, setting the address for every byte if necessary (I did this once, I think), but using a small buffer to copy strings isn't out of the question.


Top
 Profile  
 
PostPosted: Tue May 05, 2015 12:42 am 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
Here's the logic in English, I'm not in the right mind right now to try to figure out the physical connections for these though:

Select the PPU's address bus only when the PPU is reading or writing (PPU_/RD && PPU_/WR), or when the CPU is accessing outside of $6000-$7FFF. Select the CPU's address bus otherwise. The currently selected address bus determines whose turn it is.

Enable the CPU's outbound buffer only when the CPU is writing during its turn. Tri-state it otherwise.
Enable the CPU's latch only when the CPU is reading from WRAM, regardless of turn. Tri-state it otherwise.
Clock the CPU's latch only when the CPU is reading during its turn.

Enable the PPU's outbound buffer only when the PPU is writing. Tri-state it otherwise.
There's no situation in which you'd need to block inbound data coming to the PPU, but in case you wanted to be safe:
Enable the PPU's inbound buffer only when the PPU is reading. Tri-state it otherwise.

Edit: Sorry, I forgot to mention: In order to control chip communications on bidirectional busses, you'd need two buffers per chip; one for incoming data, and one for outgoing data. Having just one buffer would mean that data can go one way but not the other way. So for example, the CPU would have a buffer where RAM_D0-D7 is connected to the inputs and CPU_D0-D7 is connected to the outputs, and also a buffer with CPU_D0-D7 at its inputs and RAM_D0-D7 at its outputs. Therefore, one buffer regulates the "outbound" data and the other buffer regulates the "inbound" data. (and in my case, the CPU's inbound buffer is actually a latch)


Last edited by Drag on Thu May 07, 2015 1:19 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Wed May 06, 2015 3:24 pm 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 950
Question is, can the 2A03 correctly receive data at these two alternate duty cycles? The timing tolerances of the cartRAM can certainly be changed by selecting a different part, but the CPU, not so much.


Top
 Profile  
 
PostPosted: Wed May 06, 2015 5:16 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19348
Location: NE Indiana, USA (NTSC)
The buffer chip could latch the 8-bit data read back from the memory, much like registered memory, and ensure that it's stable on the CPU data bus by the time M2 is about to fall. Show me a logic analyzer trace of M2 and PPU /RD, and I'll try to clarify how it might work.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group