It is not just graphics related. It allows for vast ROM sizes, graphics enhancements, and audio enhancements. In particular, at a glance, 768MiB of directly addressable ROM, 8x1 attributes and 4 bankswitchable nametables, and extension of the stock 4081 byte DPCM sample limitation to 16MiB.
Here's a bit longer overview:
MXM-0 - Memory eXpansion Module
A new mapper for the NES
Max PRG - 512MByte ROM + 512MByte RAM
Max CHR - 256MByte ROM (with 4k window) + 256MByte RAM (with 4k window)
PRG Bankswitching - 16-bit bank numbers w/ 8k window RAM/ROM
4 banks - 6000, 8000, A000, C000
1 fixed bank - E000 (fixed to last bank in ROM)
CHR Bankswitching - 16-bit bank numbers w/ 4k/2k/1k/0.5k windows RAM/ROM
Bank number depends on mode
e.g. 4k => 2 banks, 1k => 8 banks
Nametable Bankswitching - 16-bit bank numbers w/ 1k bank size
4 banks - 1 for each nametable
Extended Attribute Table Bankswitching - 16-bit bank numbers
Window depends on mode (8x8, or 8x1)
14-bit auto-bankswitching per tile
Similar to MMC5 ExAttr table, but using 14 bits instead of 6 bits
This allows each rendered tile to load independently from anywhere in 64MByte of CHR
(Thus *per frame*, all 960 tiles are unique, from up to 16,384 different tilesets)
4 banks - 1 for each nametable
MicroBankswitcher - 2 byte bank in PRG.
Serial Memory Loader - 1 byte serial bank @ C000
Reading from C000 will make it so the next byte in ROM is yielded on the next read
Useful for DPCM loading.
Post-Interrupt cycle counter to help with CPU/PPU jitter
12k CHR mode
bankswitches to sprite graphics when in hblank
The only current use in our project is for using programmer-defined lookup tables. Say, for example, you've generated a table of log for all fixed point values from 00000000.00000000 to 11111111.11111111. The 2-byte microbankswitcher would allow you to supply the input 2 byte fixed point value, then immediately read the 2 byte result. It will save having to figure out which 8k bank to switch to and what offset to use to grab the bytes.
I'm sure people will figure out other uses for it, though.
As an illustration, say you have a loop to copy a range of memory from a certain source address to a destination address (actual usage may not be such a simple task). Usual practice is to use indirect addressing, putting these two addresses in zero page and then proceed with the loop. However, if your code is in RAM, you can just use direct addressing, by changing these corresponding bytes beforehand(when using Basic in old 8-bit computers, the usual practice is to POKE the address values into RAM and then execute the loop) and it would save a lot of cycles when looped multiple times.
So, the ability to bankswitch 2 bytes may make this possible in ROM (though it may involve aligning your codes so that the address bytes are within one such micro bank and I don't know whether this is a feasible setting) as long as you have a table of possible address that you want to use.
The thing is, SNES's bus architecture makes its expansion stuff significantly more... boring, I suppose. You accelerate the CPU one way or another, or stream the data onto the PPU memory that pretty much obeys its regular limitations.
If you want to have some thought experiments involving cartridge circuitry, there isn't really anything quite like the NES. Neo Geo has multiple buses, but they're not much of an upgrade specifically in tweakability, since most of tilemap definitions got merged with sprite definitions. Saturn has video overlay bus, which takes its native video and runs a lawnmower over it. Neither are anywhere near as popular.
I'm not sure if I should continue my ramblings there or not. Are the regulars here fine with this becoming a general overpowered mapper thread? Would you prefer me to make a new one?
But why do you call it a microbankswitcher? That's basically just a word size address register - not really bank switching anything. Unless I'm missing something.domgetter wrote: ↑Sun Feb 07, 2021 6:04 pmThe only current use in our project is for using programmer-defined lookup tables. Say, for example, you've generated a table of log for all fixed point values from 00000000.00000000 to 11111111.11111111. The 2-byte microbankswitcher would allow you to supply the input 2 byte fixed point value, then immediately read the 2 byte result. It will save having to figure out which 8k bank to switch to and what offset to use to grab the bytes.
I'm sure people will figure out other uses for it, though.
I'd say it's more because all the interesting attributes aren't override-able. You can feed all the 4bpp tiles you want, but you're still only getting one palette selected per sprite/tile/column.
In many ways, it'd be like if the NES had never released a form where you could disable the nametable (and especially attribute table) RAM, and were forever stuck to just the two attribute tables in the console, like so many of the famiclones and the ... Famicom Titler? Maybe not that one. I believe I remember reading of one first-party console that also ignores the nametable /CE pin on the card edge.
Also 32x, also MSX. Doesn't negate your point at all.Saturn has video overlay bus, which takes its native video and runs a lawnmower over it. Neither are anywhere near as popular.
We've already transitioned to PPU augmentation in this thread, and we're collectively bad at keeping threads staying on one topic :pI'm not sure if I should continue my ramblings there or not. Are the regulars here fine with this becoming a general overpowered mapper thread? Would you prefer me to make a new one?
Perhaps it wasn't clear, but it would do the same as bankswitching an 8k bank but for a bank window that's only 2 bytes in size. Then a lookup into that bank would faithfully route to ROM or RAM (wherever the data is stored) just like when you LDA $8000 in MMC3 or whatever.
The mapper doesnt store the data, it looks it up every time, just like all other bankswitching.
So for example you would do this
Code: Select all
LDA #$12 ; bank where 128k sine lookup table is stored STA $5050 ; you won't need to do this every time LDA ball_pos_x ; top byte of fixed point ball pos STA $5051 LDA ball_pos_x + 1 ; low byte of fixed point ball pos STA $5052 LDA $5053 ; top byte of fixed point result of sin(x) STA ball_pos_x_sin LDA $5054 ; low byte of fixed point result of sin(x) STA ball_pos_x_sin + 1
I hope that clears it up
Definitely! It's my favorite console, in fact.
But I've also heard of the modern gaming PC, which is 10,000 times more powerful than the NES and SNES combined. A little over two years ago, I decided that my company's first project was going to target the NES primarily, and then port it to other systems like the PC and the Nintendo Switch in order to get more sales. The new mapper is something I conceived of as a way to make the game be as good as possible within the NES's hardware constraints. We're also planning on releasing the specs and the implementation(s) to the world at some point in the future, free of charge. I want to lift Famicom/NES game development out of the low-ROM-size pit that it's been in since its inception.
The sequel, however, is planned to be on the SNES. I also want to shower some love on that system. I'm not an expert on the SNES, yet, but from what I understand it simply isn't as extensible as the NES is, so it's not clear how intense that love can be.
It switches a 2-byte slot all over the 2-byte banks of the cartridge. Not the industry standard name, but of all the quirks of that design, that's just a label.
Alright. I have a few ideas left with various degrees of perceived insanity. Let's start with the most insane and the least insane. And the least insane would be - surely the memory chips got slightly better since eighties?
If we're already throwing in some advanced circuitry into our hypothetical cart, I would start with multiplexing buses, if that's an option at all. The benefits range from direct to foundational:
- You need less memory chips (exact count depends on the use case). Probably won't completely offset the cost of Advanced Circuitry, but it's something.
- You get simpler ROM size management of CHR RAM without any of the downsides of CHR RAM. You also get simpler RAM size management.
- Handing off CHR and nametable data to PPU is trivial.
- If the memory chips (of usable architecture) are actually significantly faster now, you can reserve some bandwidth for Advanced Tricks. If they're not, well, you can use a wider data bus for a smaller set of those.
What's the most insane part? Well, bus conflicts.
It's been said multiple times that behaviour of bus conflicts in existing hardware is unreliable and harmful to it. However, a key element here is that those cartridges were designed with a budget conscience first and foremost, which raises the following terrifying question. Could a cartridge specifically designed to cause bus conflicts cause them reliably and safely?
You are probably asking, why the heck would I want to do that. Well, it's about avoiding DMA during my PCM abuse scenario. Without messing with the bus, but with a somewhat sophisticated cartridge, one could copy one byte of data in six cycles. But if we could just force zeroes on the data bus, four would be manageable. If we could force ones on both buses, we could get to three. Even if that would only cover PPU registers, PCM could update more often.
Multiplexing buses is I/O intensive, but should be doable as soon as you have enough I/O in your programmable logic. For every 2 CPU fetches, there are 3 PPU fetches. However, the NES's CPU-PPU can differ in timing by any one of 4 different skews (or 5 in the case of the PAL famiclone; or for the PAL NES there are 5 CPU fetches for every 8 PPU fetches), and making the multiplexer robust to all of them would be rather obnoxious.
On the one hand yes, on the other hand now you have to care about tearing. (On the other hand, hopefully you have enough PPU RAM that you can double buffer easily)Handing off CHR and nametable data to PPU is trivial.
Parallel asynchronous ROMs basically maxed out at 45ns, or roughly 21MHz. This sounds like a lot, except that those models have been discontinued, and the short list of models that are still manufactured are at best 55ns and often 70-110ns grades. (Also, the larger the memory, usually the slower it is.) Furthermore, some significant overhead is going to be consumed by the CPU-PPU multiplexer, and I don't know how much.If the memory chips (of usable architecture) are actually significantly faster now, you can reserve some bandwidth for Advanced Tricks.
The largest asynchronous ROMs available right now max out at 256MB, and those are only available in BGA, and they're really expensive. Any release that wants more than 64MB of capacity per bus is going to find it much cheaper to put one or two DRAM(s) on the board and load data into them from commodity NAND flash (i.e. SD cards).
(Right now, ignoring Digi-Key's close-out prices - by which I mean, when they charge the same amount for 1 of a thing or 10k of a thing, which indicates they just want to get rid of it -
256MB NOR flash is around $14/@10k to $20/@1
128MB NOR flash is around $8/@10k to $10/@1.
64MB NOR flash is around $6/@10k to $8/@1, with some significant cost savings and availability for recently discontinued parts ($3.5/@100) )
Large modern DRAMs can sustain incredibly high bandwidths, but latency has basically not improved over the past 20 years: it still takes somewhere around 100ns from when the command is issued to fetch data is issued to when one can start getting it out. Non-parallel nonvolatile memories, such as SPI NOR flash, have similar problems but there it's just due to the time it takes to get the address into the memory. (We've seen a bunch of OneBus famiclones sold this past year, and a user on the forum here wrote a program to build NOR flash images for them: viewtopic.php?t=19581 . These have to be running those SPI NOR flash at 80+MHz and in 4 data bits mode)
Unmanaged NAND flash isn't directly suitable: it's sold with a specific fault rate expected, so you can't put the same data in the same order on every IC for distribution. You can buy "known good die" grades, but that significantly increases the cost.
There was a aborted commercial product to augment an Atari 2600 that relied on this; the 6507 thought it was executing a long stream of STA $nn, where $nn was fed by the expansion hardware and the data cycle was overridden with bus conflicts.What's the most insane part? Well, bus conflicts.
... maaaaybe.Could a cartridge specifically designed to cause bus conflicts cause them reliably and safely?
This is ultimately a thermal problem. Can we override the 2A03 in a way that won't cause too much heat? I really don't know.
Part of the problem is that the PPU's exact timing is cranky. Some parts only care about the value that propagated inside by the time the 74LS139 stops selecting the PPU. Other parts are asynchronous. And other parts are synchronous to the PPU's pixel clock. The only reliable thing may well be driving the data bus for the entire cycle.
Ones are basically right out.If we could force ones on both buses, we could get to three. Even if that would only cover PPU registers, PCM could update more often.