minimum CPU memory memory load

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

Minimum randomly-accessable memory

32kB
5
33%
64kB
1
7%
128kB
5
33%
256kB
2
13%
512kB
1
7%
>512kB
1
7%
 
Total votes: 15

User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: minimum CPU memory memory load

Post by Memblers »

Yeah, it'd be a shame to let the CHR features of the mapper be held back by a small memory. Today I was considering that since the mapper is 3.3V and is controlling so much of the CHR addresses already, it might not be much of a stretch to just level translate the remaining bits to 3.3V, and that might widen my options a bit. This would eat up a bit more of the CPLD too, so I'll see what can be done to strike a good balance.

tepples: In this case I'll be attempting to actually boot directly from the mapper, I've created sort of a highly optimized IPL ROM. When I was working on my NES to Playchoice adapter, I found that CPLDs can be pretty decent at being a ROM. It doesn't use up macrocells, but rather the "product terms" which are relatively plentiful. It becomes a lot of AND gates and some inverters.
User avatar
Hamtaro126
Posts: 818
Joined: Thu Jan 19, 2006 5:08 pm

Re: minimum CPU memory memory load

Post by Hamtaro126 »

128k is enough, RAM is lacking on most current homebrews and hacks as is...

This and an increase in CHR-RAM (compared to the default 8k of FDS) will suffice!
AKA SmilyMZX/AtariHacker.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: minimum CPU memory memory load

Post by tepples »

I think most of what I could come up with could be squeezed into 64K PRG RAM, 32K CHR RAM, and a large flash memory with sequential byte access. You said it'd take 5 or 6 writes to start reading bytes from the SPI flash; that's roughly comparable to MMC1's switching time. But is this a byte address or a 512-byte address?

And how many writes would it take to start writing bytes, such as if I'm devoting the last few kilobytes to saved games?
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: minimum CPU memory memory load

Post by Memblers »

It's a NOR flash, I'm looking at Winbond W25Q series in particular. It does address individual bytes, for reading and writing.

Not tested, maybe one more step than I thought before, but selecting the read address will look like this:
1: clear chip enable
2: set chip enable
3: send read command
4: send address
5: send address
6: send address
7: read data, discard
8: read data
8+n: read data+n

That's something I didn't mention, the buffer is shared between sending and receiving, so much like reading the PPU $2007 register, you have to read it once to fill the pipeline.

Writing is pretty much the same process, except you clear the chip enable to begin the write process after you've sent the data. It has a 256 byte programming buffer, so you can write 1~256 bytes in one go.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: minimum CPU memory memory load

Post by tepples »

Memblers wrote:It does address individual bytes, for reading and writing. [...] the buffer is shared between sending and receiving, so much like reading the PPU $2007 register, you have to read it once to fill the pipeline.
Will it have the same problem as $2007 where I have to disable DMC to read from it or risk byte deletion? (Case in point: LAN Master restore bug)
Writing is pretty much the same process, except you clear the chip enable to begin the write process after you've sent the data. It has a 256 byte programming buffer, so you can write 1~256 bytes in one go.
As far as I can tell based on Google Winbond W25Q, erases on that series are in units of 4096 bytes. Is that right? 4K erases and 1-256 byte writes sound reasonable for a log-structured file system.

So we have PCB, CPLD, CIC, PRG RAM, CHR RAM, flash, and assembly. How much are we looking at again? And will it be programmed through the USB to controller 2 cable, or some other way?
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: minimum CPU memory memory load

Post by Memblers »

tepples wrote:
Memblers wrote:It does address individual bytes, for reading and writing. [...] the buffer is shared between sending and receiving, so much like reading the PPU $2007 register, you have to read it once to fill the pipeline.
Will it have the same problem as $2007 where I have to disable DMC to read from it or risk byte deletion? (Case in point: LAN Master restore bug)
Ooh, that's a good question. It's probably safe to say it will have that same problem. With this design I'm going to try using (M2 && delayed M2) for read/write, hoping to avoid the bus conflict problems seen on the first rev PowerPak and my old EPROM emulator. That will affect the timing, but I don't know if it will affect that bug or not.
As far as I can tell based on Google Winbond W25Q, erases on that series are in units of 4096 bytes. Is that right? 4K erases and 1-256 byte writes sound reasonable for a log-structured file system.

So we have PCB, CPLD, CIC, PRG RAM, CHR RAM, flash, and assembly. How much are we looking at again? And will it be programmed through the USB to controller 2 cable, or some other way?
I think some of the W25Q's have multiple sector sizes, but 4kB sounds about right for the smallest sector.

For cost estimate, I think this can be made for about $4 more than GTROM. $14~$15 range is what I'd aim for in production. The first builds of anything will always cost a little more, though. With this being an experimental mapper, I'm going to assemble 100 by hand (toaster oven reflow), and hopefully soon find out if a revision and/or factory order is needed.

It will be programmed over the controller port adapter. The required bootloader would be written to it the same way GTROM carts are done, with a modified Game Genie. I'm also experimenting with making an insanely fast stand-alone programmer for the carts, mostly for fun but it could become useful.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: minimum CPU memory memory load

Post by tepples »

Memblers wrote:
tepples wrote:
Memblers wrote:It does address individual bytes, for reading and writing. [...] the buffer is shared between sending and receiving, so much like reading the PPU $2007 register, you have to read it once to fill the pipeline.
Will it have the same problem as $2007 where I have to disable DMC to read from it or risk byte deletion? (Case in point: LAN Master restore bug)
Ooh, that's a good question. It's probably safe to say it will have that same problem.
Which I guess shoots down the dream of using it to stream vocals, or to stream anything in a game that uses sampled drums or sampled sound effects. Would it take a lot of CPLD resources to ignore the double accesses that cause this artifact, in the same way that the MMC3 ignores nearby clocks on PA12? You're already counting to eight cycles to access the SPI a byte at a time; perhaps you could just ignore multiple reads or writes of the data port while that circuit is busy. Because once this is available, I plan to update the rhythm game section of "Limitations" on the wiki to reflect it. "There's a tradeoff: you can have megabytes of flash memory, but available mappers won't let you access it if you're playing samples."
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: minimum CPU memory memory load

Post by Memblers »

You're way ahead of me on the problem and the solution. It should be simple to check for that, so the SPI shifter won't be retriggered while it's busy, and that would prevent the read address from going out of alignment. Assuming the 'real' memory read comes before the fake one. If it only reads memory we're OK, but if the extra reads affect the accumulator, then I'd have to buffer the last received byte, and that would cost 8 macrocells.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: minimum CPU memory memory load

Post by infiniteneslives »

Memblers wrote:Assuming the 'real' memory read comes before the fake one.
Anyone have their head wrapped around the DMC read conflict enough to answer this?

Having a hard time extracting the info from the subject on the wiki
Conflict with controller and PPU read

On the NTSC NES and Famicom, if a new sample byte is fetched from memory at the same time the program is reading the controller through $4016/4017, a conflict occurs corrupting the data read from the controller. Programs which use DPCM sample playback will normally use a redundant controller read routine to work around this defect.

A similar problem occurs when reading data from the PPU through $2007, or polling $2002 for vblank.
Likely internal implementation of the read

The following is speculation, and thus not necessarily 100% accurate. It does accurately predict observed behavior.

The 6502 cannot be pulled off of the bus normally. The 2A03 DMC gets around this by pulling RDY low internally. This causes the CPU to pause during the next read cycle, until RDY goes high again. The DMC unit holds RDY low for 4 cycles. The first three cycles it idles, as the CPU could have just started an interrupt cycle, and thus be writing for 3 consecutive cycles (and thus ignoring RDY). On the fourth cycle, the DMC unit drives the next sample address onto the address lines, and reads that byte from memory. It then drives RDY high again, and the CPU picks up where it left off.

This matters because on NTSC NES and Famicom, it can interfere with the expected operation of any register where reads have a side effect: the controller registers ($4016 and $4017), reads of the PPU status register ($2002), and reads of VRAM/VROM data ($2007) if they happen to occur in the same cycle that the DMC unit pulls RDY low.

For the controller registers, this can cause an extra rising clock edge to occur, and thus shift an extra bit out. For the others, the PPU will see multiple reads, which will cause extra increments of the address latches, or clear the vblank flag.

This problem has been fixed on the 2A07 and PAL NES is exempt of this bug.
So I'm pretty sure I understand all this, but the thing that still seems ambiguous is which of the two reads is actually received by the CPU. My understanding is that the DMC is waiting for the CPU to make a read so it can hijack the address bus from the CPU. My guess is the CPU actually starts the read operation from the CPU's intended address which gets registeds by the controllers, PPU, SPI, etc as the first read. However the CPU doesn't actually get the chance to complete the read as the DMC takes over on that cycle. The DMC high jacks the address bus, completes it's single byte read from somewhere in $C000-CFFF. Then takes RDY high allowing the CPU to "pick up where it left off". Meaning that the CPU finally gets the chance to complete the read operation it was trying to make when it was so rudely interrupted by the DMC. In effect the CPU initiates another read operation which is sensed by the ctlrs, PPU, SPI as a second read causing us problems.. If all of this understanding is accurate, I'm left to assume that the first sensed read is not actually caught by the CPU. And it's the second sensed read that actually gets retained by the CPU. But I don't like assumptions, so curious what you all's understanding is...

My initial idea of implementing a byte wide SPI interface was setup where you actually instructed the mapper to fetch the next byte from SPI flash. That implementation wouldn't actually be affected by the DMC duplicate read since reads aren't actually instructing the mapper to do anything. But it sure would be nice and fast to have a read from the SPI buffer trigger the next byte to be fetched. Especially if you wanted to do something like map the SPI register to $C000 and allow direct streaming from SPI flash to the DMC, thus reducing the necessity of PRG-RAM if graphics and audio were your primary motives for serial rom. This idea could be a potentially cheap fix with a mode bit in the SPI control register used to designate whether the mapper auto fetches after each read (for speed but subject to byte loss if DMC is in use), or to require a mapper write instruction for each fetch (slowing things down but removing DMC conflict byte loss).

Let's assume I'm correct and the first sensed read is false, and it's the second hardware sensed read that sticks to the CPU. Based on a SPI register setup that a read instructs the mapper to fetch the next byte, we can't simply filter out the second read which occurs 1-4 cycles after the first one. Because this implementation would also have the goal of speed where bytes can be fetched in 8-10 CPU/M2 cycles. (LDA/STA loops, or perhaps LDA/STA/NOP). With this the mapper must start fetching immediately and can't wait around for 4 cycles before it replaces the current byte with the next one.

In this situation you'd have to solve it as membler's proposed by effectively retaining the last read byte. If I'm correct, the actual read will occur 1-4 CPU cycles after the first sensed DMC hijack read cycle, then you could get by with a 12bit shift register. If it's been longer than 4 cycles since the last read the mapper would provide the most recently fetched 8 bits (ie bits 7-0 of the shift register. That initial read would instruct the mapper to fetch the next byte from SPI. And if another read were performed within 4 CPU cycles the mapper would provide the same data that it did 1-4 cycles ago which has now shifted higher up the shift register (ie bits 11-4 if it's been 4 cycles). The exact location would depend on how many cycles ago the inital read occured.

Something like that might be the most reasonable implementation especially if there is logic to spare in the mapper. Although the simple/cheap way out by having a mode bit in the mapper which allows the user to decide between auto fetch (subject to DMC byte loss), or safe and slow instructed fetching (immune to DMC byte loss) has it's merits especially in implementations where mapper logic resources may be limited.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: minimum CPU memory memory load

Post by tepples »

There's one good way to test this: a test ROM that reads back from $2007.
User avatar
NovaSquirrel
Posts: 483
Joined: Fri Feb 27, 2009 2:35 pm
Location: Fort Wayne, Indiana
Contact:

Re: minimum CPU memory memory load

Post by NovaSquirrel »

tpw_rules in #nesdev wrote:i put the clock on an interrupt pin and had the ISR be long enough so that clearing the interrupt pending flag at the end would clear any spurious interrupt from dpcm. fortunately AVR hardware supports clocking the SPI engine manually so i just did that
I think this implies that the first read is the one the CPU gets and you can get away with just having a delay before accepting another read.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: minimum CPU memory memory load

Post by infiniteneslives »

I can't make sense of any of what tpw did, not near enough info.. Doing something with an avr???
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: minimum CPU memory memory load

Post by tepples »

At one time, tpw_rules was working on a microcontroller to translate between PS/2 keyboard and mouse protocol and NES controller protocol.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: minimum CPU memory memory load

Post by infiniteneslives »

Okay makes sense now. Yeah so perhaps there is hope that my line of thinking is wrong after all and it's as simple as filtering out the second read. Prob best to test with Tepples' idea of a test rom with $2007 when I get around to it, can't really be sure if tpw had fully tested things or if he happened to be assuming the first read was the real one and never fully tested it...
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Post Reply