Was there an NES expansion chip like SA-1

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Oziphantom
Posts: 1080
Joined: Tue Feb 07, 2017 2:03 am

Re: Was there an NES expansion chip like SA-1

Post by Oziphantom » Mon Feb 08, 2021 10:50 pm

domgetter wrote:
Mon Feb 08, 2021 1:55 pm
turboxray wrote:
Mon Feb 08, 2021 11:38 am

But why do you call it a microbankswitcher? That's basically just a word size address register - not really bank switching anything. Unless I'm missing something.
Perhaps it wasn't clear, but it would do the same as bankswitching an 8k bank but for a bank window that's only 2 bytes in size. Then a lookup into that bank would faithfully route to ROM or RAM (wherever the data is stored) just like when you LDA $8000 in MMC3 or whatever.

The mapper doesnt store the data, it looks it up every time, just like all other bankswitching.

So for example you would do this

Code: Select all

LDA #$12       ; bank where 128k sine lookup table is stored
STA $5050      ; you won't need to do this every time
LDA ball_pos_x ; top byte of fixed point ball pos
STA $5051
LDA ball_pos_x + 1   ; low byte of fixed point ball pos
STA $5052
LDA $5053            ; top byte of fixed point result of sin(x)
STA ball_pos_x_sin
LDA $5054            ; low byte of fixed point result of sin(x)
STA ball_pos_x_sin + 1
Of course, this doesn't have to be a sine lookup table, nor be for 8.8 fixed point. It's programmer defined what data goes in your ROM, obviously, this is just one use case. Reading $5053 and $5054 again would just re-read through the bankswitching logic.

I hope that clears it up :beer:
If you write docs on this, just call it a PORT it will save so much headache ;)

Oziphantom
Posts: 1080
Joined: Tue Feb 07, 2017 2:03 am

Re: Was there an NES expansion chip like SA-1

Post by Oziphantom » Mon Feb 08, 2021 10:52 pm

stan423321 wrote:
Mon Feb 08, 2021 3:24 pm
It's been said multiple times that behaviour of bus conflicts in existing hardware is unreliable and harmful to it. However, a key element here is that those cartridges were designed with a budget conscience first and foremost, which raises the following terrifying question. Could a cartridge specifically designed to cause bus conflicts cause them reliably and safely?
No Bus contention will always cause damage and wear. Unless all devices connected to the pins are Open Collect, which I doubt the NES pins are.

stan423321
Posts: 35
Joined: Wed Sep 09, 2020 3:08 am

Re: Was there an NES expansion chip like SA-1

Post by stan423321 » Tue Feb 09, 2021 2:38 am

lidnariq wrote:
Mon Feb 08, 2021 4:29 pm
There was a aborted commercial product to augment an Atari 2600 that relied on this; the 6507 thought it was executing a long stream of STA $nn, where $nn was fed by the expansion hardware and the data cycle was overridden with bus conflicts.
Could a cartridge specifically designed to cause bus conflicts cause them reliably and safely?
... maaaaybe.
This is ultimately a thermal problem. Can we override the 2A03 in a way that won't cause too much heat? I really don't know.

Part of the problem is that the PPU's exact timing is cranky. Some parts only care about the value that propagated inside by the time the 74LS139 stops selecting the PPU. Other parts are asynchronous. And other parts are synchronous to the PPU's pixel clock. The only reliable thing may well be driving the data bus for the entire cycle.

(/forced/) Ones are basically right out.
Oziphantom wrote:
Mon Feb 08, 2021 10:52 pm
No Bus contention will always cause damage and wear. Unless all devices connected to the pins are Open Collect, which I doubt the NES pins are.
I can't say I didn't see that coming, but I had to ask about it. Still, six cycle copies would probably do just fine.
lidnariq wrote:
Mon Feb 08, 2021 4:29 pm
Multiplexing buses is I/O intensive, but should be doable as soon as you have enough I/O in your programmable logic. For every 2 CPU fetches, there are 3 PPU fetches. However, the NES's CPU-PPU can differ in timing by any one of 4 different skews (or 5 in the case of the PAL famiclone; or for the PAL NES there are 5 CPU fetches for every 8 PPU fetches), and making the multiplexer robust to all of them would be rather obnoxious.

(...)

Parallel asynchronous ROMs basically maxed out at 45ns, or roughly 21MHz. This sounds like a lot, except that those models have been discontinued, and the short list of models that are still manufactured are at best 55ns and often 70-110ns grades. (Also, the larger the memory, usually the slower it is.) Furthermore, some significant overhead is going to be consumed by the CPU-PPU multiplexer, and I don't know how much.

The largest asynchronous ROMs available right now max out at 256MB, and those are only available in BGA, and they're really expensive. Any release that wants more than 64MB of capacity per bus is going to find it much cheaper to put one or two DRAM(s) on the board and load data into them from commodity NAND flash (i.e. SD cards).

(Right now, ignoring Digi-Key's close-out prices - by which I mean, when they charge the same amount for 1 of a thing or 10k of a thing, which indicates they just want to get rid of it -
256MB NOR flash is around $14/@10k to $20/@1
128MB NOR flash is around $8/@10k to $10/@1.
64MB NOR flash is around $6/@10k to $8/@1, with some significant cost savings and availability for recently discontinued parts ($3.5/@100) )

Large modern DRAMs can sustain incredibly high bandwidths, but latency has basically not improved over the past 20 years: it still takes somewhere around 100ns from when the command is issued to fetch data is issues to when one can start getting it out. Non-parallel nonvolatile memories, such as SPI NOR flash, have similar problems but there it's just due to the time it takes to get the address into the memory. (...)
This, on the other hand? I knew RAM wasn't getting lower latencies for a while now, but I guess I don't understand the numbers properly. How fast memory does PPU need?

Oziphantom
Posts: 1080
Joined: Tue Feb 07, 2017 2:03 am

Re: Was there an NES expansion chip like SA-1

Post by Oziphantom » Tue Feb 09, 2021 3:01 am

stan423321 wrote:
Tue Feb 09, 2021 2:38 am
lidnariq wrote:
Mon Feb 08, 2021 4:29 pm
There was a aborted commercial product to augment an Atari 2600 that relied on this; the 6507 thought it was executing a long stream of STA $nn, where $nn was fed by the expansion hardware and the data cycle was overridden with bus conflicts.
Could a cartridge specifically designed to cause bus conflicts cause them reliably and safely?
... maaaaybe.
This is ultimately a thermal problem. Can we override the 2A03 in a way that won't cause too much heat? I really don't know.

Part of the problem is that the PPU's exact timing is cranky. Some parts only care about the value that propagated inside by the time the 74LS139 stops selecting the PPU. Other parts are asynchronous. And other parts are synchronous to the PPU's pixel clock. The only reliable thing may well be driving the data bus for the entire cycle.

(/forced/) Ones are basically right out.
Oziphantom wrote:
Mon Feb 08, 2021 10:52 pm
No Bus contention will always cause damage and wear. Unless all devices connected to the pins are Open Collect, which I doubt the NES pins are.
I can't say I didn't see that coming, but I had to ask about it. Still, six cycle copies would probably do just fine.
1 off forever, sure fine. It happens per frame, at 60 frames per second and you play for 2hrs then you just did it 432,000 times. And then each time you play it does it more and more and wears out the drivers more and more until they break and machine is broken and you need to replace one or more of the chips.

Each pin will have a driver transistor, and each driver has a maximum rated drive current. So it if wants to push the voltage to +5V and you are fighting it to go to 0V, so you are draining the line down. Then the Chip is going to try and drive more and more current to try and get the level up to where it wants it to be. This then starts to overload the transistors as they go over current and try to push more through the small thing traces inside the chip. And the chips don't heal, every time you do it, it adds to the damage. Sure its only a teeny tiny amount of damage but it will happen rapidly and often which means it compounds. On vintage hardware that is already suffering from heat expansion issues that can't just be replaced or repaired with new parts.

stan423321
Posts: 35
Joined: Wed Sep 09, 2020 3:08 am

Re: Was there an NES expansion chip like SA-1

Post by stan423321 » Tue Feb 09, 2021 3:48 am

Okay, okay, I get it, conflicts aren't fine. Six cycle byte copy can be done without those, and their speed seems fine for the purpose of DMA avoidance, "just" with a mix of constant bytes and port reads (LDA (inject value); STA ad-dress (/store/)).

tepples
Posts: 22288
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Was there an NES expansion chip like SA-1

Post by tepples » Tue Feb 09, 2021 6:57 am

The money to generate intentional bus conflicts on the PPU bus could be better spent just making pseudo-dual-port memory. The PPU reads video memory once every 372 ns. If you interpose all address and data lines to a 120 ns or faster CHR RAM, you can stick writes between the reads.

stan423321
Posts: 35
Joined: Wed Sep 09, 2020 3:08 am

Re: Was there an NES expansion chip like SA-1

Post by stan423321 » Tue Feb 09, 2021 8:19 am

Okay, look, I never said bus conflicts were a sane idea. And they're meant to go on CPU bus to write OAM and palettes while allowing breaks, but... I got it. No bus conflicts.

If PPU reads every 372 ns, and 120- ns RAM is not unacquirable, that would give us a slot for CPU access and another slot for other purposes per each without bus widening lunacy, and I'll get to bus widening lunacy, but one thing at a time. The thing is, lidnariq says the multiplexer would be unpractical. I guess what you're saying, tepples, is that having just writing would be easier to realize?

tepples
Posts: 22288
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Was there an NES expansion chip like SA-1

Post by tepples » Tue Feb 09, 2021 9:29 am

The Apple II computer ran its main DRAM at 2.045 MHz. The CPU got half of these cycles and the video circuit the other half.

With 120 ns RAM, timing considerations make writes easier than reads, as a write can be stored in the registers of the bus arbiter chip to be executed at its next convenience. One workaround is to require the CPU to read each byte twice: once to request that the front end chip read from the RAM at next opportunity and once to have it return the value to the CPU.

Another workaround is to take advantage of the fact that modern SRAMs are 70 ns or faster. This means the bus arbiter could service both a PPU read and a CPU read within one M2 cycle.

turboxray
Posts: 147
Joined: Thu Oct 31, 2019 12:56 am

Re: Was there an NES expansion chip like SA-1

Post by turboxray » Tue Feb 09, 2021 9:35 am

domgetter wrote:
Mon Feb 08, 2021 1:55 pm
turboxray wrote:
Mon Feb 08, 2021 11:38 am

But why do you call it a microbankswitcher? That's basically just a word size address register - not really bank switching anything. Unless I'm missing something.
Perhaps it wasn't clear, but it would do the same as bankswitching an 8k bank but for a bank window that's only 2 bytes in size. Then a lookup into that bank would faithfully route to ROM or RAM (wherever the data is stored) just like when you LDA $8000 in MMC3 or whatever.

The mapper doesnt store the data, it looks it up every time, just like all other bankswitching.

So for example you would do this

Code: Select all

LDA #$12       ; bank where 128k sine lookup table is stored
STA $5050      ; you won't need to do this every time
LDA ball_pos_x ; top byte of fixed point ball pos
STA $5051
LDA ball_pos_x + 1   ; low byte of fixed point ball pos
STA $5052
LDA $5053            ; top byte of fixed point result of sin(x)
STA ball_pos_x_sin
LDA $5054            ; low byte of fixed point result of sin(x)
STA ball_pos_x_sin + 1
Of course, this doesn't have to be a sine lookup table, nor be for 8.8 fixed point. It's programmer defined what data goes in your ROM, obviously, this is just one use case. Reading $5053 and $5054 again would just re-read through the bankswitching logic.

I hope that clears it up :beer:
No, I understand the concept. I'm saying that's literally just a non-cpu memory mapped address vector ( a memory mapped port used for indirect memory access). $5050 is your page offset, $5051/52 is your 64kWord offset, $5053/54 is the word size read port. That's not bank switching - there's no bank switching going on. That's indirect addressing. This is basically what Arcade Card does on the PC-Engine (though more advance features to pair with the port and more ports).

lidnariq
Posts: 10273
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Was there an NES expansion chip like SA-1

Post by lidnariq » Tue Feb 09, 2021 11:42 am

stan423321 wrote:
Tue Feb 09, 2021 2:38 am
This, on the other hand? I knew RAM wasn't getting lower latencies for a while now, but I guess I don't understand the numbers properly. How fast memory does PPU need?
As far as I can tell, the PPU "should" only need 1/2.7MHz (350ns) ROM or RAM, but in practice people seem to need 200ns or faster. I assume difference is analog effects.

For a sophisticated bus multiplexer, you should have a fully valid address within 0.5 pixels, so your deadline ought to be 1.5/5.4MHz = 270ns.

CPU deadline is similar; the address bus is only known after /ROMSEL would have fallen (roughly 30ns, 320ns slack) and for writes the data bus arrives even later (roughly 80ns, 270ns slack)

stan423321
Posts: 35
Joined: Wed Sep 09, 2020 3:08 am

Re: Was there an NES expansion chip like SA-1

Post by stan423321 » Tue Feb 09, 2021 11:54 am

tepples wrote:
Tue Feb 09, 2021 9:29 am
The Apple II computer ran its main DRAM at 2.045 MHz. The CPU got half of these cycles and the video circuit the other half.

With 120 ns RAM, timing considerations make writes easier than reads, as a write can be stored in the registers of the bus arbiter chip to be executed at its next convenience. One workaround is to require the CPU to read each byte twice: once to request that the front end chip read from the RAM at next opportunity and once to have it return the value to the CPU.

Another workaround is to take advantage of the fact that modern SRAMs are 70 ns or faster. This means the bus arbiter could service both a PPU read and a CPU read within one M2 cycle.
Oh, now I get it. If we get hit at once by both sides, we can't stall for too long. Gotcha.
turboxray wrote:
Tue Feb 09, 2021 9:35 am
No, I understand the concept. I'm saying that's literally just a non-cpu memory mapped address vector ( a memory mapped port used for indirect memory access). $5050 is your page offset, $5051/52 is your 64kWord offset, $5053/54 is the word size read port. That's not bank switching - there's no bank switching going on. That's indirect addressing. This is basically what Arcade Card does on the PC-Engine (though more advance features to pair with the port and more ports).
I'm surprised you consider it that big of a deal that someone reinvented a useful idea and didn't stumble upon the exact same name that is the industry standard. Besides, why isn't it bank switching? Did someone define that banks must be at least N bytes long?

turboxray
Posts: 147
Joined: Thu Oct 31, 2019 12:56 am

Re: Was there an NES expansion chip like SA-1

Post by turboxray » Tue Feb 09, 2021 12:07 pm

stan423321 wrote:
Tue Feb 09, 2021 11:54 am
tepples wrote:
Tue Feb 09, 2021 9:29 am
The Apple II computer ran its main DRAM at 2.045 MHz. The CPU got half of these cycles and the video circuit the other half.

With 120 ns RAM, timing considerations make writes easier than reads, as a write can be stored in the registers of the bus arbiter chip to be executed at its next convenience. One workaround is to require the CPU to read each byte twice: once to request that the front end chip read from the RAM at next opportunity and once to have it return the value to the CPU.

Another workaround is to take advantage of the fact that modern SRAMs are 70 ns or faster. This means the bus arbiter could service both a PPU read and a CPU read within one M2 cycle.
Oh, now I get it. If we get hit at once by both sides, we can't stall for too long. Gotcha.
turboxray wrote:
Tue Feb 09, 2021 9:35 am
No, I understand the concept. I'm saying that's literally just a non-cpu memory mapped address vector ( a memory mapped port used for indirect memory access). $5050 is your page offset, $5051/52 is your 64kWord offset, $5053/54 is the word size read port. That's not bank switching - there's no bank switching going on. That's indirect addressing. This is basically what Arcade Card does on the PC-Engine (though more advance features to pair with the port and more ports).
I'm surprised you consider it that big of a deal that someone reinvented a useful idea and didn't stumble upon the exact same name that is the industry standard. Besides, why isn't it bank switching? Did someone define that banks must be at least N bytes long?
Wow.. okay haha. Let me put it this way; How can you be experienced in assembly, and assisted hardware like this.. not understand the concept of port mapped address vector? When you access the ppu read port, after setting the vram address, do you consider that bank switching too? I would hope not. If you honestly don't understand how that's not bank switching, then there's not much more to say on it. I dunno. I guess I'm just a fan of using accurate names for things to avoid confusion. But that's just me.. "surprise!".

stan423321
Posts: 35
Joined: Wed Sep 09, 2020 3:08 am

Re: Was there an NES expansion chip like SA-1

Post by stan423321 » Tue Feb 09, 2021 12:34 pm

turboxray wrote:
Tue Feb 09, 2021 12:07 pm
Wow.. okay haha. Let me put it this way; How can you be experienced in assembly, and assisted hardware like this.. not understand the concept of port mapped address vector? When you access the ppu read port, after setting the vram address, do you consider that bank switching too? I would hope not. If you honestly don't understand how that's not bank switching, then there's not much more to say on it. I dunno. I guess I'm just a fan of using accurate names for things to avoid confusion. But that's just me.. "surprise!".
PPUDATA changes the target address on writes and reads, so it's not like a regular bank at all. I'm assuming this gizmo doesn't, which puts it into a pathological category where it's not what I'd name it, but I can't really find a logical argument against the name, and with "micro-" prefix I didn't find the description misleading.

Out of curiosity, where do you think the port ends and the bank starts? 16 bytes? 64 bytes?

turboxray
Posts: 147
Joined: Thu Oct 31, 2019 12:56 am

Re: Was there an NES expansion chip like SA-1

Post by turboxray » Tue Feb 09, 2021 1:45 pm

stan423321 wrote:
Tue Feb 09, 2021 12:34 pm

PPUDATA changes the target address on writes and reads, so it's not like a regular bank at all. I'm assuming this gizmo doesn't..
If an address vector has 'auto incrementing' (or decrementing), that's an attribute of the address vector and irrelevant as to if it's defined as an address vector or not. Same to whether it's a read or write port (or both). Or if it adheres to a page boundary (i.e. paired with another port mapped register to form a final address). That doesn't change the definition.
Out of curiosity, where do you think the port ends and the bank starts? 16 bytes? 64 bytes?
It's defined by the single 'element' or 'type' you need to access. The intended behavior of an address vector is to produce a window to a single type in memory. As soon as that 'window' is more than the largest defined type, then it becomes something else. Traditionally, for an old system an address vector is probably not more than 4 bytes (a dword). I would define it by the widest type for the system/software.

Likewise, if you have a 32bit cpu address register, but you're accessing only bytes from the fetched type - you wouldn't call the cpu address register a micro 'bank'. The same would apply to any address vector. In the given log example, the type is a 16bit word. That's the intended type. If someone where to choose to interpret that as two 8bit chars, it doesn't change the original functionality - just their interpretation of the fetched data type. We wouldn't call that a banking mechanism, anymore than we'd specifically call every 32bit integer a mini 4byte array.

So yeah, if you wanted to put a number on it.. anything wider 4 bytes would probably be a grey area.

stan423321
Posts: 35
Joined: Wed Sep 09, 2020 3:08 am

Re: Was there an NES expansion chip like SA-1

Post by stan423321 » Tue Feb 09, 2021 4:57 pm

Oh well. I can't say I follow your logic on "element" sizing, a tile pattern is 16 bytes long and feels like it qualifies as an element just as much. Meanwhile the address change thing was something I mentioned to say PPUDATA is no bank, while your argument feels more like support for mapper gizmo being a port as well, which I don't deny. But so be it.

At any rate, back to overpowered mapper hypotheticals.

The memory speed facts are making me doubt whether it is a thing that would actually happen, but... suppose we have 16-bit ROMs and RAMs instead of 8-bit ones. What can we do with those other than trivial odd/even page selection?

Well, I think reducing PPU bandwidth usage is plausible. Consider: For used pattern fetches, the PPU will predictably fetch both bitplanes. So if we rearranged them GameBoy/SNES style, we can feed two out of four PPU reads with one 16-bit memory read. That doesn't help us much with multiplexing CPU accesses, but it frees some time for Advanced Tricks.

Similar thing can be done with nametable fetch as well if we keep the scope big. Yes, big. Just store ordinary nametable data in even bytes and EXRAM equivalent in odd ones, assuming you don't defer to on-board PPU RAM. Ironically, the one thing that would be a problem would be extra RAM nametables with classic attributes.

That doesn't really do much new unless you have ideas what to do with the other reads, which I can see some people not having for their games. This wouldn't be enough to allow using really slow memories, that would require further nonsense. But! The cartridge could scramble the color indices of the pattern freely now. E.g. map color 0 to 3 and keep others as is, or keep 0 as is and change all others to 3. Would this be useful? I have no idea.

The CPU bus is comparatively unpredictable. There's DPCM fetches, interrupt vector fetches, and conditional branches. So the one read for two approach would be significantly harder to set up. It's just as well that I have a completely different deranged idea here. With 16 bits instead of 8, we can encode all sorts of crazy stuff to happen when a byte is read. Examples:
  • normal byte (duh)
  • special mapper config opcode: give processor a NOP while writing to an internal register
  • regular opcode fetch, which can be overridden for a mini-interrupt, either a subroutine or in-place with backward branch (probably way more convoluted than useful, but the possibility exists)
  • fetch an (on-cart) RAM value here; this would allow many addressing optimisation tricks but would also require double lookups, which don't seem viable
  • fetch an (on-cart) RAM value on access to the next byte; this allows slower memory than variant above but is also more convoluted
Note that console memory writes, including venerable zero page, could be shadowed on the cart.

This is all fun to think about for me, but definitely not realistic at all. That said, maybe a dialed down yet customizable version would be possible? 16 bits would probably be a waste.

That's probably the closest my ramblings will come to discussing CPU acceleration though. The remaining stuff is a bit closer to the realm of possibility:
  • IRQ vector lookup override. If the cartridge knows it caused an IRQ, it can override it. If the console did, it won't. This saves the IRQ source discovery time and could extend to multiple cartridge controlled vectors. The important part is to not switch the address mid-read.
  • Data read ports. I guess people think alike? I'd like an autoincrement option though.
  • "Math" functions: Binding an identity table to $41xx and some longer bit rotations to $42xx etc. is an advanced trick, isn't it.
  • Tile blowup. If you use 16x16 metatiles, maybe let mapper handle the meta part? Admittedly I didn't think much about details here.
  • Fixing background palettes to pattern numbers. Probably ranges of pattern numbers, specifically.
  • Column offsets. This would look really cool, but would also involve recreating most of PPU's addressing logic on the cartridge. Note that simple column offsets would fit fine in attribute space, if you're not using it, because of EXRAM or something. Also note that with addressing logic recreated, a split assist function would be easy to implement, fixing the two prefetched tiles and reducing h-blank write needs to horizontal scroll.
  • Sprite insanity. A mapper could allow designing sprite table in its own format and transform it during DMA, feeding PPU incrementing pattern numbers and saving real ones for itself, then using these numbers to put sprites partially to the left or top of the visible screen. Where's my applause?
Okay, so maybe this one wasn't more realistic. That leaves me with secondary background, which I imagine isn't particularly interesting. It's just more bandwidth and PPU duplication, injected into background patterns when they're empty or something.

And that would be it for my fantasies, I guess. Time to return to reality.

Post Reply