Why not mapper 11 or 66 ?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
Bregalad
Posts: 8055
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Why not mapper 11 or 66 ?

Post by Bregalad »

If I remember well mapper #11 is basically a nybble swapped mapper #66 with lockout defeat, so it's not much more interesting to use than #66.

As for #66 I believe there's two fundamental reasons it's less often used:

1) Simply put, it's much deeper in iNES mapper numbering than other common mappers. For that reason, it is more likely to be ignored and/or not considered as an option. Even if it's a Nintendo official mapper.
2) 8 KB CHR-ROM banking is often impractical, as it means that whole sprite sheets have to be swapped with BG sheets. For smaller and simple games which fits in 32 KB PRG (CNROM) it's not so much an issue as it's possible to have simple gameplay and either 3 or 4 level graphics layouts. But for larger games which uses more PRG, it becomes a handicap to have only 4 pages of CHR-ROM which have to be wholly switched, as opposed to finer grained CHR-ROM switching.
na_th_an
Posts: 558
Joined: Mon May 27, 2013 9:40 am

Re: Why not mapper 11 or 66 ?

Post by na_th_an »

dougeff wrote:
but seems like most don't agree or fully understand what to do in order to take advantage of 32KB banking
1.copy some bank switch code into the RAM
2.jump there to switch banks, then jsr to a entrance code part of the bank, which reads the game state and uses a jump table to get to the appropriate subroutine
3.have a reset code in every PRG bank which can return you to the main bank, if the user presses reset.

The advantage of 32k PRG banks would be to have a contiguous code block or data block that doesn't need to be split awkwardly.
Music code and data might benefit from having its own 32k block.

The disadvantage would be, DMC samples might have to be copied in more than 1 bank, wasting space.
And, CHR banks are less efficient, if you have the same graphics partially copied in multiple 8k CHR banks.

I suppose, you lose mirroring changes, without the MMC. And no scanline counter.

EDIT - every bank needs an NMI handler too, which does not bode well for "all in the NMI" style programs
You don't need to be that fiddly - I mean having to copy the routine to RAM. I usually reserve some bytes just beside the vectors in each ROM, and copy the same exact bankswitching code. When you change banks, the new banks will have the exact same contents, so no problems.

I usually work with a set of NROMs I paste together using a custom tool which writes the correct iNES header as well.

INL's extended mapper 11 gives you up to 16 32K PRG-ROM pages and 16 8K CHR-ROM pages with bus conflicts, but you don't usually need every combination of PRG and CHR accessible from every bank, so I use an indexed table of the actual values. Such table can lie anywhere on ROM. So my setup is something like this:

cc65 cfg:

Code: Select all

MEMORY {
    [...]
    RJM: start = $ffc0, size = $3a, file = %O, fill = yes;
    [...]
}

SEGMENTS {
    [...]
    ROMCHGR:  load = RJM,			 type = rw;
    [...]
}
crt0.s

Code: Select all

.segment "RODATA"
[...]
	; This can be big so place here
	.include "bus_conflict_tbl.s"

.segment "ROMCHGR"
	_change_rom:
	lda #0
	sta PPU_MASK
	sta PPU_CTRL

	ldx $0300
	lda bus_conflict_tbl, x
	sta bus_conflict_tbl, x 

	jmp start

_change_reg:
	ldx $0300
	lda bus_conflict_tbl, x
	sta bus_conflict_tbl, x 
	rts
"bus_conflict_tbl.s" has the table with the PRG/CHR combinations I need in every PRG bank. _change_rom does the bank switch, then jumps to "start", which contains the initialization code. _change_reg just bankswitch, which is mostly used when you are just changing the paged CHR-ROM.

This is an example of "bus_conflict_tbl.s"

Code: Select all

	; This ROM pages in PRG0:CHR0, PRGD:CHRD or PRGB:CHRD

	bus_conflict_tbl:
		.byte $00, $DD, $DB
		
My initialization clears all RAM but a small section I use so the different ROMs can communicate. I perform a simple CRC-like check on these values to invalidate them. If an invalid combination is found, all banks page in PRG0:CHR0, which works great as mapper doesn't guarantee an initial state.
supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: Why not mapper 11 or 66 ?

Post by supercat »

rainwarrior wrote:There's a whole bunch of pretty sensible discrete logic mappers. If you're asking why more people aren't using them, I'd actually say that the reason might just be that there's more discrete mappers than homebrew that needs them. :P
Were there any discrete-logic based mappers that provided an IRQ or other means to assist vertical screen splits? It would seem like one could get a mapper that combined CHROM banking with IRQ generation using only two common 74-series chips. Use a a 74HC688 (8-bit identity comparator) to detect a PPU address of the form 01 111x xxx1 x00x and use that to gate an enable-controlled flip flop like the 74HC377 (a smaller one might also suffice) that captures bits 0 and 5-8 of the address. Feed the LSB of that flip flop to /IRQ, and the remaining bits to CHROM address lines.

If tiles use the lower address range, or if code avoids even-numbered background tiles $E0-$FE, background tile fetches would never enable the flip flop. The flip flop would be enabled, however, when accessing the first two lines of an 8x16 sprite using tile $F0-$FF. The first such fetch would turn on /IRQ and the second would automatically turn it off. The bottom four bits of the tile number would be loaded into four bits of the CHROM address, thus making it possible to cleanly split the screen between zones using different tile sets merely by placing a sprite at each zone boundary. If one of the sixteen tile banks was blank, that could easily be used to clean up the top and bottom edges of vertically-scrolling games that use vertical mirroring, even without the main CPU having to do handle the IRQ. If a game used eight-way scrolling along with a "score" zone, it could start the frame with a blank CHROM bank selected, use a sprite to trigger a split just above where the score is supposed to appear, use a second sprite to trigger a split at the bottom of the score which would load scrolling registers, a third if needed at the point where the name table would need to wrap to avoid the score zone, and a fourth to switch back to the empty CHROM bank.

Alternatively, if a game doesn't use scrolling and avoids even-numbered background tiles $E0-$FE, one could use those tiles numbers as the cue to switch banks, again with the CPU either using or ignoring the IRQ as appropriate for the application.

While this approach would require that one forego using part of the tile set, the restriction would seem less severe than that imposed by some other mappers that require that all sprites be placed in the opposite side of the address space from background tiles.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Why not mapper 11 or 66 ?

Post by lidnariq »

supercat wrote:Were there any discrete-logic based mappers that provided an IRQ or other means to assist vertical screen splits?
Plenty of pirate mapper hacks used a fixed some-power-of-two CPU cycle IRQs with discrete logic – usually a CD4020. Extremely few were clocked off the PPU (example), and especially no historical mapper ever used a "catch exactly one address" to trigger an IRQ. I think it was just too expensive in parts (for a discrete logic implementation) or in package pins (for an ASIC) to be justifiable in comparison to other implementations.
Use a a 74HC688 (8-bit identity comparator)
Those aren't particularly affordable... much like the 74'670 that showed up in a bunch of historical unlicensed discrete logic mappers, programmable logic is just so much cheaper now that many discrete logic parts don't make sense.
a PPU address of the form 01 111x xxx1 x00x
Note that any sprite slot not used on any given scanline out of the limit of 8 instead fetches from tile $FF. I don't know which row of that tile.
less severe than that imposed by some other mappers that require that all sprites be placed in the opposite side of the address space from background tiles.
That's really not as big of a benefit as you might think; the PPU makes it less useful to use the same side for both because you don't get to pick and choose from tile to tile. Plus the most conspicuous example of this constraint - MMC3 - already has more sophisticated banking available underneath, largely obviating this utility.
NewRisingSun
Posts: 1510
Joined: Thu May 19, 2005 11:30 am

Re: Why not mapper 11 or 66 ?

Post by NewRisingSun »

Mapper 163 can be made to auto-switch between two 4 KiB CHR banks by looking at PA9 during nametable reads. Mapper 518 can be made to auto-switch between two 4 KiB CHR banks by looking at PA10 or PA11 during nametable reads. But both are globtop ASICs, not discrete mappers.
supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: Why not mapper 11 or 66 ?

Post by supercat »

lidnariq wrote:
supercat wrote:Were there any discrete-logic based mappers that provided an IRQ or other means to assist vertical screen splits?
Plenty of pirate mapper hacks used a fixed some-power-of-two CPU cycle IRQs with discrete logic – usually a CD4020. Extremely few were clocked off the PPU (example), and especially no historical mapper ever used a "catch exactly one address" to trigger an IRQ.
How did such mappers control when the counters ran so as to allow them to generate interrupts at the proper places?
Use a a 74HC688 (8-bit identity comparator)
Those aren't particularly affordable... much like the 74'670 that showed up in a bunch of historical unlicensed discrete logic mappers, programmable logic is just so much cheaper now that many discrete logic parts don't make sense.

Digi-key shows $0.75 or so in onesies, or $0.30 in quantity 2000. One could replace the 74HC688 with a 13-input NAND and a few inverters, if desired, and I agree programmable logic is also often a good way of doing things. I was more interested in whether anyone had used any sort of address-triggered approach, than in the exact chips they might have used.
a PPU address of the form 01 111x xxx1 x00x
Note that any sprite slot not used on any given scanline out of the limit of 8 instead fetches from tile $FF. I don't know which row of that tile.

I hadn't thought of that, but that might be a problem with using that particular tile range. Using a different tile range would alleviate the issue in any case. What do you think of the concept from an ease-of-programming standpoint? If nobody's implemented such a thing yet, would you see any particular problem? Incidentally, I'm new to the NES, but I've done some rather interesting 6502 mappers for the Atari 2600. One of my favorites simplifies the act of plotting a pixel at coordinate (x,y) of a 96x192 high-res screen to--quite literally:

lda $7F00,x ; Fetch bitmask and set bank of $7E00 as appropriate for this vertical stripe
ora $7E00,y
sta $7E00,y

That mapping scheme required a Xilinx XC9536XL, but ended up making many things more convenient on the Atari's 13-bit address space than they would have been on a straight linear-mapped 16-bit address space. One of my ambitions is to port (and finish) Ruby Runner, a game shown below for the 2600 which used the above mapper (not for high-res graphics, though, but to expedite the loop that had to fetch every tile in the level and decide what to do with it). I'd like to do smooth vertical and horizontal scrolling, but that would pose some interesting challenges given the need to redraw all of the tiles every "gametick" (four animation frames). While I might use a sprite for the player, everything else would be background tiles.

Image
less severe than that imposed by some other mappers that require that all sprites be placed in the opposite side of the address space from background tiles.
That's really not as big of a benefit as you might think; the PPU makes it improbably you'd want to use the same side for both because you don't get to pick and choose from tile to tile. Plus the most conspicuous example of this constraint - MMC3 - already has more sophisticated banking available underneath, largely obviating this utility.[/quote]

When using 8x16 sprite mode, 128 tiles will be in one bank and 128 in the other. So games needing ready access to more than 128 sprites might need to take them from both halves of the address space.

In any case, my question was whether a simple banking approach could minimize the amount of circuitry required for raster effects.
User avatar
nesrocks
Posts: 563
Joined: Thu Aug 13, 2015 4:40 pm
Location: Rio de Janeiro - Brazil
Contact:

Re: Why not mapper 11 or 66 ?

Post by nesrocks »

My ghostbusters romhack became mapper 66 when the rom was expanded by NewRisingSun for the new dpcm.

I'm going to say that I don't know how to choose a mapper. I wish there was some program where we could check or choose mapper features and then it would tell you which mappers matched those features. It's pretty tough to read up on 500 mappers to find out which ones are similar to what you're using or to even know what's possible with current mappers.
https://twitter.com/bitinkstudios <- Follow me on twitter! Thanks!
https://www.patreon.com/bitinkstudios <- Support me on Patreon!
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Why not mapper 11 or 66 ?

Post by lidnariq »

supercat wrote:How did such mappers control when the counters ran so as to allow them to generate interrupts at the proper places?
They're not free-running. They usually are something like "latch output connected to 4020 asynchronous clear; some high bit of counter connected via inverter to /IRQ".

Since 1024 cycles is almost exactly 9 scanlines (plus 3 pixels) it's not too bad of a constraint.

Licensed PAL (2A07) is a lot less convenient, but I don't know how many unlicensed games showed up in those regions for play on the 2A07 instead of a Famiclone.
I was more interested in whether anyone had used any sort of address-triggered approach, than in the exact chips they might have used.
Not beyond the MMC2/MMC4 tile-based bankswtiching.

The logic you suggest fits nicely inside a GreenPAK ... could just check for the top two scanlines of one tile. Say $FE, to be like MMC2.

That said, I don't usually look as high as qty 2k when I'm eyeballing BOM prices for NES things. I don't think very many games sell more than a couple hundred.
What do you think of the concept from an ease-of-programming standpoint? If nobody's implemented such a thing yet, would you see any particular problem?
It's "easier" than most IRQs in that you just have to make sure to put a sprite in the right place. But...

Each IRQ per retrace uses up one of the not-very-many 64 entries
You have to make sure there's no chance of it getting bumped out of the 8-per-scanline limit, and it consumes some of the very limited overdraw if you do.

I personally think those compromises make it kinda tough to swallow. I think I'd prefer MMC3, even with the flaw you've identified.
Incidentally, I'm new to the NES, but I've done some rather interesting 6502 mappers for the Atari 2600.
I see your name in the credits for the Harmony Cart. I'm confident you have interesting thoughts to share :)
I'd like to do smooth vertical and horizontal scrolling, but that would pose some interesting challenges given the need to redraw all of the tiles every "gametick" (four animation frames).
The NES answer is "just use CHR bankswitching for that".
When using 8x16 sprite mode, 128 tiles will be in one bank and 128 in the other. So games needing ready access to more than 128 sprites might need to take them from both halves of the address space.
But you can only display 128 total tiles of sprites in 8x16 at a time anyway. And your overdraw is very limited - a lot of games used 8x8 sprites even though that consumed more OAM entries, because it both reduced the scanlines of overdraw constraint and because it means that there aren't tons of empty spots in the CHR table.

There aren't a ton of circumstances where MMC3 CHR bankswitching is available and you need many different possible sprites and you don't want to set CHR banks to overlap where they refer to in CHR. The only one that comes to mind is deliberately only using three 1 KiB banks for backgrounds, and using the last 1 KiB plus the other two 2KiB ... but that's kinda contrived. It'd be better just to use a mapper that actually gave you eight 1 KiB banks instead, like VRC4 or RAMBO-1.

Any mapper that supports CHR bankswitching comes extremely close to making two bits in $2000 irrelevant.
In any case, my question was whether a simple banking approach could minimize the amount of circuitry required for raster effects.
Maybe? I mean, it lets you avoid having the counter on the cart, because it's hidden in the OAM evaluation. But it's not usually the counter that's the hard part.
nesrocks wrote:I'm going to say that I don't know how to choose a mapper. I wish there was some program where we could check or choose mapper features and then it would tell you which mappers matched those features.
I have a table, but it might be too full of jargon.

Tepples had written a selector, but it was written before various modern homebrew designs hit mass production, so I don't know if he'd revise that now.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Why not mapper 11 or 66 ?

Post by tepples »

I wrote Mapper wizard back when discretes (0, 2, 3, 7, 34, 180) and MMC1 were the only mappers you could get as all new parts, and before retroUSB discontinued the ReproPak.
User avatar
za909
Posts: 248
Joined: Fri Jan 24, 2014 9:05 am
Location: Mijn hart woont al in Nederland

Re: Why not mapper 11 or 66 ?

Post by za909 »

I can agree that UxROM is very attactive to jump to after messing around with NROM. If you look at it, some of the early "great games" of the system used it like Castlevania or Mega Man. It's much easier to get into at first, since you don't have to worry about any kind of volatile trampoline routine to avoid "pulling the rug" from under your program. You can always stash another "I don't wanna do math"- lookup table in that 16k, or samples, or whatever else you might need at any time.

But then again, I'm saying in that while my current project uses BxROM (1-bit register for 2x32k PRG, 8k CHR-ROM that is not switched).
supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: Why not mapper 11 or 66 ?

Post by supercat »

lidnariq wrote:
I'd like to do smooth vertical and horizontal scrolling, but that would pose some interesting challenges given the need to redraw all of the tiles every "gametick" (four animation frames).
The NES answer is "just use CHR bankswitching for that".
CHR bank switching will suffice for three out of four animation frames. If the fourth animation frame of one gametick had three tiles stacked in the pattern (top down) "rock leaving", "rock entering", and "empty", then the top tile will need to switch from the last frame of "rock leaving" to "empty", the middle tile will need to switch from the last frame of "rock entering" to the first frame of "rock leaving", and the bottom tile will need to switch from "empty" to the first frame of "rock entering". Even if only half of the name-table entries would need to change, I would think that updating 256 entries per frame using something like:

lda $C0
sta $2007
eor #$01
sta $2007
lda $C1
sta $2007
eor #$01
sta $2007
...
lda $C0
eor #$02
sta $2007
eor #$01
sta $2007
lda $C1
eor #$02
sta $2007
eor #$01
sta $2007
....

would take less time spread over the course of three frames to update a 32x24 bunch of tiles (in an off-screen nametable) than would be needed to selectively update half the tiles in such a table, especially since the latter approach would require that every tile actually get updated twice (if table 0 is showing and 1 is offscreen, it would need to be updated in table 1 which could then be brought to the front, but would then need to be written in table 0).

A mapper which could include some dual-port storage (interleaving PPU and CPU cycles) could make things much more convenient, but might be seen as cheating even though some FPGAs which include 7K of RAM cost less than $2. Unfortunately, those parts all have evil packages, and have inputs that are 3.3V tolerant but not 5V tolerant.

Is there any way to use burst DMA to update anything other than OAM entries? If I were using a mapper with RAM at $6000, burst DMA would seem like it could ease display updates, though I think putting 64 meta-tiles in zero page each frame and then using code to copy them to 256 nametable entries would probably be adequate.
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: Why not mapper 11 or 66 ?

Post by Memblers »

supercat wrote: Digi-key shows $0.75 or so in onesies, or $0.30 in quantity 2000. One could replace the 74HC688 with a 13-input NAND and a few inverters, if desired, and I agree programmable logic is also often a good way of doing things. I was more interested in whether anyone had used any sort of address-triggered approach, than in the exact chips they might have used.
There was the MMC2 and MMC4, in 2011 I also prototyped a mapper called 8T-ROM which used the 13-input NAND plus an XC9536XL. I was trying to set it up so certain tile numbers can trigger bankswitching and IRQ at the same time. I don't clearly remember there being any technical reason that stopped development, mostly lack of time for the hobby around the time (had a baby). The mapper is in a kind of development hell where I still think it's neat, but I think I have better ways of doing stuff, between GreenPAKs and FPGAs.

I think it watched tiles $FC through $FF and I was wanting it to do multiple CHR switches per line. With the special tiles being the borders, you could make a window.

For IRQ use, one could make sprite #0 use tile $FE, it beats polling for sprite #0 hit, but it's not an ideal IRQ source. I think it will trigger during hblank, so you may have to wait until the next hblank. Easy to use, but I don't think it's much to get excited about, since most of the interest in IRQs is for multiple splits per screen. You could put multiple sprites with that tile, but it will be triggered 8 times per sprite.. if that could be pre-scaled by 8, so it could be triggered only once per sprite, that would be an improvement.
supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: Why not mapper 11 or 66 ?

Post by supercat »

lidnariq wrote:That said, I don't usually look as high as qty 2k when I'm eyeballing BOM prices for NES things. I don't think very many games sell more than a couple hundred.
The main screen showed 1pc and 2kpc prices. Intermediate quantities sell for intermediate prices. It looks like 100pc price is $0.42; not the cheapest part in the universe, but not outrageous. My thought of using a 13-input NAND and some inverters seems bizarrely worse, since the cheapest 13-input NAND is over $2 at quantity 1000(!?). In any case, I'll look at MMC2 and MMC4 to see what they do.
What do you think of the concept from an ease-of-programming standpoint? If nobody's implemented such a thing yet, would you see any particular problem?
It's "easier" than most IRQs in that you just have to make sure to put a sprite in the right place. But...

Each IRQ per retrace uses up one of the not-very-many 64 entries
You have to make sure there's no chance of it getting bumped out of the 8-per-scanline limit, and it consumes some of the very limited overdraw if you do.
If one puts the necessary sprites at the start of the list I think that would protect them against being bumped. In most situations where I would think one might want to switch many times per frames, background tiles could be used for that (e.g. games that are laid out in 16x16 metatiles could switch back and forth between two tile sets at the start of each name table row, thus allowing code to store the same data into even and odd rows of the nametable, while almost doubling the number of usable tiles.
I personally think those compromises make it kinda tough to swallow. I think I'd prefer MMC3, even with the flaw you've identified.
No mapper can be expected to be optimal for every game. I was pondering what kind of discrete-logic mapper would best suit the needs of Ruby Runner. If I extended things out to use a CPLD and an on-cart nametable RAM, I could make things work even better by having one of the latchable bits gate address bit 5 of the nametable RAM, thus eliminating the need to write half the rows.

BTW, one thing I've almost never seen 6502 mappers other than mine do on any platform, even though it makes things very convenient and efficient from a programming standpoint, is have 256-byte regions that can be mapped on 256-byte boundaries. Having a piece of code treat a data structure as occupying 128 banks of 256 bytes each is simpler than trying to have it treat data as two banks of 64 pages of 256 bytes each. My 2600 banking scheme only had one fine-control 256-byte region, but IIRC worked around that by making any CPU cycle in the address range $00FC to $00FF configure the fine-control region to access the page of RAM indicated by the byte that was read, and $00F8-$00FB do likewise selecting flash. Thus, one way to copy 256 bytes from one arbitrary page of RAM to another would be:

Code: Select all

    ldy #127
loop:
    bit $FC   ; Assumes source page is stored in $FC
    lda $7E00,y
    ldx $7E80,y
    bit $FD   ; Assume destination page is stored in $FD
    sta $7E00,y
    txa
    sta $7E80,y
    dey
    bpl loop
A total cost of 3+4+4+3+5+2+5+2+3 = 11+10+10 = 31 cycles per 2 bytes. The use of absolute indexed addressing saves a cycle compared with indirect indexed, partially recouping the cost of bank switching. If I had enough logic to hold two page selections in my mapper, things could have been a bit better:

Code: Select all

    ldy #63
loop:
    lda $7E00,y
    sta $7F00,y
    lda $7E40,y
    sta $7F40,y
    lda $7E80,y
    sta $7F80,y
    lda $7EC0,y
    sta $7FC0,y
    dey
    bpl loop
Achieving the same performance in a linear-address system would require using self-modifying code, which would add the overhead of setting up the function in RAM. When using fine-grained bank selection, such issues go away and the above code could execute out of ROM with no difficulty. Have you seen any Nintendo mappers based on page-level banking?
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Why not mapper 11 or 66 ?

Post by lidnariq »

supercat wrote:[a fully unrolled LDA #immed / STA $2007 copy] would take less time spread over the course of three frames to update a 32x24 bunch of tiles (in an off-screen nametable) than would be needed to selectively update half the tiles in such a table, especially since the latter approach would require that every tile actually get updated twice (if table 0 is showing and 1 is offscreen, it would need to be updated in table 1 which could then be brought to the front, but would then need to be written in table 0).
I mean, sure, if you insist on designing your game to rely on being able to brute-force update the entire visible game state on every 4th frame, you can design a cart that has enough RAM to support it. But there was a licensed port of Boulder Dash published on the NES at it doesn't use anything resembling such heroics.

Most of the time, a design that uses the two nametables as a means of double-buffering is trying to make the NES act like some other console rather than work within the limited bandwidth.

And to be fair, when I optimized Driar from its original SGROM release down to NROM, I did something similar, using 1K of the CPU's RAM to hold fully-unrolled copying code to do updates to nametables, to work around no longer having meaningful CHR bankswitching.
A mapper which could include some dual-port storage (interleaving PPU and CPU cycles) could make things much more convenient, but might be seen as cheating even though some FPGAs which include 7K of RAM cost less than $2. Unfortunately, those parts all have evil packages, and have inputs that are 3.3V tolerant but not 5V tolerant.
Unfortunately those cheaper ones might not have enough I/O pins. If you're willing to limit it to just licensed NESes (no famiclones) and are willing to guess where the ALE cycles are to demultiplex the PPU's address bus and are willing to make the CPU side interface a PITA, you need at least 10(PPU A9, A8, PPU AD7..0)+2(PPU /RD, PPU /WR)+8(CPU D7..0)+2(CPU A0,1)+2(M2,R/W)=24 IO pins. While there are iCE40UL parts in that range, one'd probably prefer to have all the CPU/PPU address/data pins to make the programmer's life less miserable. Which gets us back to the iCE40xx1K parts. At least some come in a TQFP...

Also, personally, I kinda think using an FPGA as only a dual ported RAM is a waste of an FPGA.
Is there any way to use burst DMA to update anything other than OAM entries?
Nope.

You can map your own cart device to additionally listen to writes to $2004, but that's it.
supercat wrote:It looks like 100pc price is $0.42; not the cheapest part in the universe, but not outrageous. My thought of using a 13-input NAND and some inverters seems bizarrely worse, since the cheapest 13-input NAND is over $2 at quantity 1000(!?).
Waaacky. Looks like everyone's trying to clear out their inventory; the only outfit with a low price is Rochester Electronics, for the LS version of the part, and only as a "you buy our remaining inventory".
Having a piece of code treat a data structure as occupying 128 banks of 256 bytes each is simpler than trying to have it treat data as two banks of 64 pages of 256 bytes each.
I must be missing something... how does being able to bankswitch on A8 and up help with this particular transformation?
Have you seen any Nintendo mappers based on page-level banking?
No licensed mappers used anything finer than 8 KiB. And to the best of my knowledge, the finest banking seen in any pirate mapper hack is 1KiB.
supercat
Posts: 161
Joined: Thu Apr 18, 2019 9:13 am

Re: Why not mapper 11 or 66 ?

Post by supercat »

lidnariq wrote:I mean, sure, if you insist on designing your game to rely on being able to brute-force update the entire visible game state on every 4th frame, you can design a cart that has enough RAM to support it. But there was a licensed port of Boulder Dash published on the NES at it doesn't use anything resembling such heroics.
I've not played the NES Boulderdash nor seen any frame-accurate recordings of it. Does ensure that all displayed tiles get updated on the same frame? Because the tile cycling in Boulderdash doesn't involve motion between tiles, it probably wouldn't matter visually if some tiles were updated on one frame and some were updated on the next, so I'd guess that's probably what happens. Achieving the smoother motion shown in the Ruby Runner .gif I posted would require, however, that all tile updates occur synchronously with the switch from the last tile set to the first. I think that would be probably achievable even using a basic CNROM cart, but if other mappers could make it easier that would be nice to know.
Most of the time, a design that uses the two nametables as a means of double-buffering is trying to make the NES act like some other console rather than work within the limited bandwidth.
I'd say that would depend on whether that whether the desired play mechanic could be achieved better some other way on the NES. I think the NES hardware would seem like an excellent fit for Ruby Runner save for the difficulties updating name-table RAM, and even with those difficulties I would think it would be workable.
And to be fair, when I optimized Driar from its original SGROM release down to NROM, I did something similar, using 1K of the CPU's RAM to hold fully-unrolled copying code to do updates to nametables, to work around no longer having meaningful CHR bankswitching.
Not familiar with that game.
A mapper which could include some dual-port storage (interleaving PPU and CPU cycles) could make things much more convenient, but might be seen as cheating even though some FPGAs which include 7K of RAM cost less than $2. Unfortunately, those parts all have evil packages, and have inputs that are 3.3V tolerant but not 5V tolerant.
Unfortunately those cheaper ones might not have enough I/O pins. If you're willing to limit it to just licensed NESes (no famiclones) and are willing to guess where the ALE cycles are to demultiplex the PPU's address bus and are willing to make the CPU side interface a PITA, you need at least 10(PPU A9, A8, PPU AD7..0)+2(PPU /RD, PPU /WR)+8(CPU D7..0)+2(CPU A0,1)+2(M2,R/W)=24 IO pins. While there are iCE40UL parts in that range, one'd probably prefer to have all the CPU/PPU address/data pins to make the programmer's life less miserable. Which gets us back to the iCE40xx1K parts. At least some come in a TQFP...
The I/O requirements for main CPU interfacing could be reduced by 5 if one adds a 74HC299 universal shift register (reads and writes will be separated by at least 3 main-CPU clocks that don't read or write the register, giving the FPGA enough time to get data to/from the shift register).

As I think about it, though, I wonder if the best way to make a cheap but versatile Nintendo cart might be to adapt the same approach used by the Atari 2600 melody cart, using one 70MHz ARM7TDMI or similar device on each bus, and maybe running an SPI port between them.
Is there any way to use burst DMA to update anything other than OAM entries?
Nope.

You can map your own cart device to additionally listen to writes to $2004, but that's it.
That seems like a missed opportunity in the NES design. If the same 6502 address had been used for OAM and PPU data, with the set-address write selecting which kind of data would be written, that would have freed up a 6502 address while also enhancing the usefulness of DMA. Oh well.
Having a piece of code treat a data structure as occupying 128 banks of 256 bytes each is simpler than trying to have it treat data as two banks of 64 pages of 256 bytes each.
I must be missing something... how does being able to bankswitch on A8 and up help with this particular transformation?
Because the upper byte of the 6502 address will be constant.

If one has a 64KiB data structure on a cart starting at address $010000 which is using an 8K banked region from $8000-$8FFF, and wants to fetch a byte given at offset X:Y, the required code would be something like:

Code: Select all

    sty temp
    lda 
    txa
    lsr
    lsr
    lsr
    lsr
    lsr
    ora #8
    sta $8000
    txa
    and #$1F
    sta temp+1
    lda #0
    sta temp
    lda (temp),y
as compared with something like:

Code: Select all

    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    lda $7C00,y ; uses LSB of address, plus last value accessed at $FC, plus $010000.
One could replace the shifts in the first example with a table lookup, but I think the second would still seem a lot easier. The "normal" banking approach would require that the offset be split into an 8-bit part, a 5-bit part, and a 3-bit part, rather than simply being kept as two eight-bit parts. The page-level granuarity could be especially useful if one had multiple adjacent banking regions. If one wanted to load x, y, and a with three consecutive bytes at an offset specified by x:y, the code could be something like:

Code: Select all

    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    inx
    stx $FD ; Set bits 8-15 of address for $7D00-$7DFF region
    ldx $7C00,y
    lda $7C01,y
    sta temp
    lda $7C02,y
    ldy temp
Note that this code will work even if the object crosses a page boundary. Compare that to what would be needed to fetch three consecutive bytes using normal banking if one had to allow for the possibility of crossing a block boundary.
Have you seen any Nintendo mappers based on page-level banking?
No licensed mappers used anything finer than 8 KiB. And to the best of my knowledge, the finest banking seen in any pirate mapper hack is 1KiB.
Bummer. Page-mapped regions are really nice to work with.
Post Reply