Introducing myself. Starting off with mapper selection.

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

8bitMicroGuy
Posts: 314
Joined: Sun Mar 08, 2015 12:23 pm
Location: Croatia

Introducing myself. Starting off with mapper selection.

Post by 8bitMicroGuy »

Hello. I'm an 8-bit microcontroller programmer. I've been programming Atmel AVR microcontrollers. I've been working in ASM and C. I have 5 years of programming experience and also make C++ programs in Windows.

When I was a teen, I played SMB on an emulator and wished to make an NES game.
Soon, I made my own hacks. They were level edits, music edits, text edits and graphics edits, but none of them were ASM edits.
Few years later, I've been working in asm to make some demos and such.
Now I decided to make my own NES game and I've been brainstorming about how things might work.
What I have on my mind are NES games that push the NES resources up to the maximum.

What I need to know is which mapper and compiler suit my needs for these projects.
Please help me find a mapper and a compiler. I need these features:
  • Dynamically mapping of tiles into the PPU (like in Battletoads how Rash's current metasprite tiles are on the same address all the time! Just changing the tiles)
    Need to somehow get at least (but I prefer more) 16KB of RAM and 16KB of save for a Minecraft-like project (I need to store the blocks somewhere)
    Mapper with a multiplier while the C compiler must use the mapper's registers for multiplying and not the manual ASM multiplying
    Extra channels for sound (like shown in FamiTracker)
    Having the 0xE000-0xFFFF PRG-ROM bank static while other(s) is/are remapped by writing to a certain register
    Synchronized scroll changing for each scanline during rendering (Does mapper do this or can I do this myself?)
    Remapping background tiles during rendering between scanlines (Same question)
    Must be compatible with FamiTracker's bankswitching for executing FamiTracker code and reading FamiTracker music data
    4 PPU screens
I can implement the game loop and the game object iteration myself, but I want to know which mapper to use before I start.
Projects on my mind are (sorted from less makeable to more makeable):
  • Team Fortress 2 for NES for 2 players
    Minecraft for NES with 2 players
    A Mario kind of a game, but you change weapons and suits during the action scene and fight like in Battletoads
Thank you for your time to look at this.
User avatar
Bregalad
Posts: 8056
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Introducing myself. Starting off with mapper selection.

Post by Bregalad »

Sounds like you want to code for the SNES :mrgreen:
Or write your own mapper and your own compiler, but beware, this is several years of works alone before you are able to write the first like of code for your hypothetical game.

The only currently ready C compiler for the 6502 generates extremely inefficient code and certainly do not take advantage of the MMC5's builtin multiplier.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Introducing myself. Starting off with mapper selection.

Post by lidnariq »

8bitMicroGuy wrote:Dynamically mapping of tiles into the PPU (like in Battletoads how Rash's current metasprite tiles are on the same address all the time! Just changing the tiles)
Having the 0xE000-0xFFFF PRG-ROM bank static while other(s) is/are remapped by writing to a certain register
4 PPU screens
Almost any non-trivial mapper supports these. The only question is how many tiles change at the same time. (The best you can get is "upload things to RAM yourself" or "64 tiles at a time")
Need to somehow get at least (but I prefer more) 16KB of RAM and 16KB of save for a Minecraft-like project (I need to store the blocks somewhere)
That basically limits you to MMC5. (If you're willing to give on "16 KiB of RAM available always", Sunsoft 5B, mapper 30, and mapper 168 will provide larger save blocks)
Mapper with a multiplier while the C compiler must use the mapper's registers for multiplying and not the manual ASM multiplying
MMC5 and JY-Company, but as bregalad pointed out, you'll have to use your own (inlineable) function, not *
Extra channels for sound (like shown in FamiTracker)
MMC5, VRC6, VRC7, Namco 163, Sunsoft 5B. I'll leave out the FDS's single channel because you said plural.
Synchronized scroll changing for each scanline during rendering (Does mapper do this or can I do this myself?)
Remapping background tiles during rendering between scanlines (Same question)
The former has to be done by the CPU. The latter can be done by the MMC2 and MMC4.
Must be compatible with FamiTracker's bankswitching for executing FamiTracker code and reading FamiTracker music data
You should instead look into the FamiTone replayer.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Introducing myself. Starting off with mapper selection.

Post by tokumaru »

The only option you have is the MMC5 (a moderately rare mapper that hasn't been fully replicated yet), but even it won't do 100% of the things you're asking for. Keep in mind that extra sound channels don't work on the NES without hardware mods, only on the famicom.

The SNES will indeed fulfill your list of demands much better.
8bitMicroGuy
Posts: 314
Joined: Sun Mar 08, 2015 12:23 pm
Location: Croatia

Re: Introducing myself. Starting off with mapper selection.

Post by 8bitMicroGuy »

Please excuse me. I made a miscalculation. I thought that one page/screen of blocks has 32x32 blocks while it has 16x16 blocks. This resulted in saying that I need 16K RAM. How embarrassing. When I looked into SMB3 RAM map on Data Crystal, I saw that the level data is held on 6000-794F. I divided it with 256 (16x16 blocks) and got 25. This means that I can have 25 blocks. So my Minecraft world would be of 6x4 pages which is 96x64 blocks which is actually very good. The rest of the save RAM of the MMC5 mapper (7950-7FFF) would be used for entity data (Zombies, Chests, Player inventory) and some other game save data.
lidnariq wrote:
8bitMicroGuy wrote:Dynamically mapping of tiles into the PPU (like in Battletoads how Rash's current metasprite tiles are on the same address all the time! Just changing the tiles)
Having the 0xE000-0xFFFF PRG-ROM bank static while other(s) is/are remapped by writing to a certain register
4 PPU screens
Almost any non-trivial mapper supports these. The only question is how many tiles change at the same time. (The best you can get is "upload things to RAM yourself" or "64 tiles at a time")
Let's say we have a Tanooki Mario metasprite. If all sprites are 8x16 (which they must be because of some requirements of the MMC5 mapper for what I want), then the metasprite has 6 sprites: 2 for the head, 2 for the body and 2 for the tail. If it has 6 sprites, then, according to my rule, it can only have 12 sprite tiles allocated in the VRAM where each is 8x8 (of course). No more, no less! Having an addressed bank of 64 sprites for only one game object instance is a big waste of memory! That's why I want to do it Battletoads-style: One instance, one metasprite, only used sprite tiles are kept in VRAM. This will be the best solution for having SMB3 enemies from different worlds without having conflicts where the enemy A uses bank #Foo, the enemy B uses bank #Bar, and the enemy C wants bank #Baz, but can't because banks can only be allocated 64 by 64.

My configuration is that every game object instance each allocates maximum 12 or 16 sprite tiles in VRAM. Those tile addresses belong to those instances until they change the metasprite to a metasprite with a different number of contained sprites or until the instances are destroyed. It works like malloc/free, new/delete, garbage collector. (Yay malloc for the NES lol XD)
The first bank address space will be used for the sprites that are always there (Players, HUD, Cursor, etc.) (like .data segment), the second and third will be the heap (Enemies, Items, etc.) (like the .heap segment) and the fourth will be the one from which I copypaste the sprites into the other bank address spaces.

However, I was thinking if there was a mapper that doesn't let the CPU copypaste, but that the mapper instantly puts a tile from a bank directly into VRAM at the needed position.

The number of the tiles in the need to change at the same time would be... Let's say, Raccoon Mario, 2 fireballs, 2 Hammer Brothers, 4 Hammers. 12+2*2+2*4+4*2. That's 32 tiles per frame.
tokumaru wrote:The only option you have is the MMC5 (a moderately rare mapper that hasn't been fully replicated yet), but even it won't do 100% of the things you're asking for. Keep in mind that extra sound channels don't work on the NES without hardware mods, only on the famicom.

The SNES will indeed fulfill your list of demands much better.
Well, as long as emulators can do it, I'm happy. Can FCEUX emulate MMC5 including, but not limited to the extra sound channels?
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Introducing myself. Starting off with mapper selection.

Post by tepples »

8bitMicroGuy wrote:Well, as long as emulators can do it, I'm happy.
If you're not interested in running your game on a cartridge, you could always make a game for the Java or .NET platform. JVM is an emulator, and CLR is an emulator.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Introducing myself. Starting off with mapper selection.

Post by tokumaru »

8bitMicroGuy wrote:This resulted in saying that I need 16K RAM.
This is the least of your worries, seeing as the MMC5 can address up to 64KB of RAM.
That's why I want to do it Battletoads-style: One instance, one metasprite, only used sprite tiles are kept in VRAM.
Then you definitely have to use CHR-RAM, because no existing mapper can switch chunks smaller than 1KB (64 tiles), even though it's technically possible.

Don't forget that VRAM bandwidth is severely limited (you can only write to VRAM during VBlank or forced blanking), and changing 12 tiles means writing 192 tiles to the PPU. With the fastest code possible copying that data from ROM (i.e. string of LDA $XXXX; STA $2007) that would take 1536 CPU cycles, or 68% of the total VBlank time (about 2273 cycles) which leaves little time left for other updates (OAM, name + attribute tables, palettes).

Common techniques to increase the amount of data you can send to VRAM are:

1- Sacrificing part of the picture with forced blanking to get more time;
2- Copying the data to ZP before hand so you can transfer each byte in 7 cycles instead of 8;
3- Using insane amounts of RAM to generate strings of LDA #$XX; STA $2007 so each byte is transferred in 6 cycles;
My configuration is that every game object instance each allocates maximum 12 or 16 sprite tiles in VRAM. Those tile addresses belong to those instances until they change the metasprite to a metasprite with a different number of contained sprites or until the instances are destroyed.
No matter the technique you use, it's safe to say you won't be updating several characters every frame. If you're lucky, you'll be able to update one without the need to double-buffer it. Enemies aren't typically animated like this for this reason. Actually, few games on the NES use this method of animation for anything.
However, I was thinking if there was a mapper that doesn't let the CPU copypaste, but that the mapper instantly puts a tile from a bank directly into VRAM at the needed position.
AFAIK, there are no mappers that will help you with this.
The number of the tiles in the need to change at the same time would be... Let's say, Raccoon Mario, 2 fireballs, 2 Hammer Brothers, 4 Hammers. 12+2*2+2*4+4*2. That's 32 tiles per frame.
32 tiles * 16 bytes * 8 cycles = 4096 cycles, with the fastest possible code copying directly from ROM to VRAM. You'd have to blank a little over 16 scanlines in order to be able to transfer this much data, but that's not counting the other updates (OAM, NT + AT, palettes).

If you use have 2560 bytes of RAM to spare, you can update the 32 tiles in only 3072 cycles, in which case you'd only need 8 or so blanked scanlines (again, not considering the other VRAM updates).
Can FCEUX emulate MMC5 including, but not limited to the extra sound channels?
I think so, but FCEUX isn't particularly accurate. Don't rely solely on it for development.
User avatar
rainwarrior
Posts: 8735
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Introducing myself. Starting off with mapper selection.

Post by rainwarrior »

I must admit, I don't really understand your aim, wanting to target the NES, but not wanting to be bound by its usual constraints... but good luck on your game! (Are you making a game?)
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Introducing myself. Starting off with mapper selection.

Post by tepples »

If you're willing to budge on the use of MMC5 expansion audio in an English-language game, here's how to make CHR RAM work for sprite animation.

The vast majority of CHR RAM boards have 8 KiB. The sprite half of the pattern table ($1000-$1FFF) holds 256 different tiles, which is is enough for 128 distinct 8x16 pixel sprites, twice as many as the NES can show. Thus you can double buffer the sprite cels. For example, while displaying a frame of animation with 6 sprites stored at $1000-$10BF, you can load the next frame of animation into $1800-$18BF and then switch to those tiles once the upload is complete. If you don't extend blanking, a moderately unrolled loop can copy 10 tiles to VRAM per vblank, enough for five 8x16 pixel sprites, and still have time to copy the display list to OAM.
Raccoon Mario, 2 fireballs, 2 Hammer Brothers, 4 Hammers. 12+2*2+2*4+4*2. That's 32 tiles per frame.
Provided that all of them are animating at 60 fps. If they're animating at 15 fps, you can spread the 32 tiles of updates across several vblanks. Or you can statically allocate space for often-reused frames such as hammers and fireballs.

But given how intrusive the MMC5 is on other aspects of rendering, I'm not entirely sure MMC5 even allows CHR RAM.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Introducing myself. Starting off with mapper selection.

Post by lidnariq »

tepples wrote:But given how intrusive the MMC5 is on other aspects of rendering, I'm not entirely sure MMC5 even allows CHR RAM.
People have made reproductions of the Megaman−∞ ROM hack, which uses CHRRAM, and had it work, so... at least that isn't a problem.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Introducing myself. Starting off with mapper selection.

Post by tokumaru »

rainwarrior wrote:I must admit, I don't really understand your aim, wanting to target the NES, but not wanting to be bound by its usual constraints... but good luck on your game!
With such high technical requirements, this better be the ultimate NES game... if it ends up looking like a normal game it will be a huge letdown. I mean, I hope you have something really cool planned for the MMC5 multiplications, because most games work just fine without any multiplications at all, and when they're absolutely necessary, look-up tables or plain multiplication algorithms are perfectly acceptable.

IMO, most of the fun in coding for old consoles is about working around the limitations and designing creative solutions that will allow you to do things that aren't straightforward in those platforms, so beefing up the hardware in an attempt to get around the limitations kills all the fun. You might as well code the game for a much less restricted environment, specially considering you're not particularly interested in running it on real hardware.
8bitMicroGuy
Posts: 314
Joined: Sun Mar 08, 2015 12:23 pm
Location: Croatia

Re: Introducing myself. Starting off with mapper selection.

Post by 8bitMicroGuy »

tepples wrote:
8bitMicroGuy wrote:Well, as long as emulators can do it, I'm happy.
If you're not interested in running your game on a cartridge, you could always make a game for the Java or .NET platform. JVM is an emulator, and CLR is an emulator.
Java has a garbage collector, but I like it when I can free/delete objects using free/delete function/statement when I want and not when those tangly references are released. If Java has no pointers, but only references, then it's not for me.

Also, woah... I didn't actually realize that this copypasting would take away so much cycles! I guess I'll have to stick with bankswitching. I've been expecting so much out of this poor little primitive and tired machinie 6502 :(
While studying the Battletoads ROM, I found out that the only tiles that are being copypasted are the noise generator (used for the bird pilot guy's screen), waterfall tiles and player tiles. All enemies use their own tiles which are static.
When I saw Tom and Jerry ROM, Jerry moves very smoothly, but there's no copypasting, just bankswitching for every 5th frame. Jerry's metasprite is 3x4 tiles if I'm not mistaken. That's 12 tiles. One bank has 64 tiles. 64 divided by 12 is 5+1/3.
This means that Jerry's walking animation uses two banks.

For the minecraft project, I took 4 hours to write up this big document

Code: Select all

Minecraft for NES - Sprite analysis

All sprites have 2 tiles because they're 8x16.
There are 4 bank address spaces.
All bank address spaces are switching constantly to make a walking effect for walking animations and breathing effect for steady creatures.
There are some tiles that are "copypasted" using this mechanism:
1. Bank address space 0 gets switched to the bank from which to copy
2. Needed data gets loaded from PPU address space into a buffer in RAM
3. Data from buffer gets written into PPU address space from 228th tile to 256th
4. Bank address space 0 gets switched back to its bankswitch animation default

Tiles from 36th to 64th in the banks being mapped to the bank address space 3 are filled with color 2 with an X in color 0 for detection of failure in copypasting in time.

These are the tile allocations:
Players = 4(head)*2(for each player)*2(normal, hurt)+4(body)*(12(steady/breathing, walking, running, jumping, falling, mining, hurt, crouching, crawling, crouching and mining, holding on ladder/wines, 

climbing on ladder/wines) = 64
Sheep = 4(head)*2(normal, hurt)+6(body)*3(steady/breathing, walk, hurt)) = 26
Pig = 4(head)*2(normal, hurt)+6(body)*3(steady/breathing, walk, hurt)) = 26
Cow = 4(head)*2(normal, hurt)+6(body)*3(steady/breathing, walk, hurt)) = 26
Zombie = 4(head)*2(normal, hurt)+4(body)*2(steady/shaking, walking) = 16
Skeleton = 4(head)+4(body)*2(steady, walking) = 12
Creeper = 4(head)+4(body)*2(steady, walking) = 12
Enderman = 4(head)+8(body)*2(steady, walking) = 20
Villager = 4(head)+4(body)*3(stop, walkA, walkB) = 16
Explosion = 4 (will be mirrored, is animated by bankswitch) = 4
Flames = 4 (will be mirrored, is animated by bankswitch) = 4
Shattered block piece = 2(that's the minimum of one sprite and metasprite, will be animated as a polar rotating piece of block by bankswitch) = 2
Cracking block animation = 4*2(for each player) = 8 (will be copypasted because it cannot be synchronous with the bankswitch animation)
Dropped items = 2(that's the minimum of one sprite and metasprite, they are copypasted on need)*8 = 16
Held items = 2(same)*2(for each player) = 4

Copypasted tiles = Cracking block animation + Dropped items + Held items = 8 + 16 + 4 = 20

Sprite Palette 0 = Orange, Red, Brown (Explosion, Fire, Wood, Wooden planks, Wheat, Bucket with Lava)
Sprite Palette 1 = White, Pink, Dark brown (Skeleton, Pig, Enderman, Pink dyed sheep, Player torso, Raw beef, Flowers)
Sprite Palette 2 = White, Green, Dark blue (Creeper, Cow, Green dyed sheep, Zombie, Player torso, Seeds, Bucket, Bucket with Water)
Sprite Palette 3 = White, Skin color, Dark gray (Sheep, Player head, Villager, Cracking block animation, Shattered block)
tokumaru wrote:
The number of the tiles in the need to change at the same time would be... Let's say, Raccoon Mario, 2 fireballs, 2 Hammer Brothers, 4 Hammers. 12+2*2+2*4+4*2. That's 32 tiles per frame.
32 tiles * 16 bytes * 8 cycles = 4096 cycles, with the fastest possible code copying directly from ROM to VRAM. You'd have to blank a little over 16 scanlines in order to be able to transfer this much data, but that's not counting the other updates (OAM, NT + AT, palettes).

If you use have 2560 bytes of RAM to spare, you can update the 32 tiles in only 3072 cycles, in which case you'd only need 8 or so blanked scanlines (again, not considering the other VRAM updates).
So how much would I need to sacrifice from the scanlines if I have 20 tiles? How much cycles? The game will run in PAL mode which means 50fps and it will not blank full frames. Maybe just a bit from the scanlines. Game code must run in each frame, not every second or third, etc.
I'll have source address selection code as explained in "copypasting", then I'll copy the tiles from VROM into a buffer by using

Code: Select all

LDA $2006 (is it 2006 for loading from VROM?)
STA $8000 (PRG-RAM from MMC5)
LDA $2006
STA $8001
etc...
Just finding the tile and then performing this without loops to save cycles.
Then I'll just do

Code: Select all

LDA $8000
STA $2007
LDA $8001
STA $2007
etc...
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Introducing myself. Starting off with mapper selection.

Post by lidnariq »

8bitMicroGuy wrote:So how much would I need to sacrifice from the scanlines if I have 20 tiles? How much cycles? The game will run in PAL mode which means 50fps and it will not blank full frames. Maybe just a bit from the scanlines. Game code must run in each frame, not every second or third, etc.
2C07 (PAL) mode is kinda cheating, even if it's somewhat of a tradition.
To do the math:
70 scanlines of vblank × 341 = 23870 pixels ÷ 3.2 = 7459 cycles
-514 for OAM DMA = 6945 cycles
Fastest possible bulk upload, has to be fully unrolled in RAM: 6 cycles per byte, fully unrolled: 1157 bytes; consumes almost 6kB of RAM.
Unrolled, but can be in ROM: 8 cycles per byte: 828 bytes; consumes 5kB of ROM.
Not unrolled at all, but uploads have to be page-aligned (i.e. $XX00): 13 cycles per byte: 534 bytes; consumes 9 bytes of ROM.

If you're going to instead target the 2C02 and Dendy, it's instead
20 scanlines × 341 = 6820 pixels ÷ 3 = 2273 cycles - 514 for OAM DMA = 1759 cycles
The same numbers then become 293, 219, and 135.

Note that each time you change the address you're writing to that consumes the time that two bytes would have taken.
8bitMicroGuy wrote:This means that Jerry's walking animation uses two banks.
Remember that sprites can be re-selected from anywhere in their 4 KiB bank on every vblank for ~free (you're probably already uploading new OAM data every frame). So it's not just the four 1 KiB banks but whatever clever management you can do within that bank.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Introducing myself. Starting off with mapper selection.

Post by tokumaru »

8bitMicroGuy wrote:The game will run in PAL mode
PAL has the advantage of a much longer VBlank (70 scanlines vs. 20 of NTSC), allowing way more VRAM updates. That means complete incompatibility with NTSC consoles though, which is not very nice.

Code: Select all

LDA $2006 (is it 2006 for loading from VROM?)
STA $8000 (PRG-RAM from MMC5)
LDA $2006
STA $8001
etc...
If you're transferring bytes you don't have VROM, only VRAM. Tiles are stored in regular PRG-ROM and you copy them to CHR-RAM.

Also, you can't read $2006, this register is only used to set the PPU address, which you can read from or write to using register $2007.

Code: Select all

LDA $8000
STA $2007
LDA $8001
STA $2007
etc...
I'd recommend using some sort of indexing (along with interleaving the tile data), so you can use the same code to read different tiles each time, instead of having code that can only read from one location.
User avatar
Bregalad
Posts: 8056
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Introducing myself. Starting off with mapper selection.

Post by Bregalad »

8bitMicroGuy wrote: Java has a garbage collector, but I like it when I can free/delete objects using free/delete function/statement when I want and not when those tangly references are released. If Java has no pointers, but only references, then it's not for me.
I am getting completely off-topic, but I agree so much with you here ! Java as a language incitates people to write messy and disorganized programs. There is a couple of other, better, languages that targets the JVM. However I think tepples's comment about "if you want to write code for an emulator, write for the JVM, because the JVM is an emulator" is ironical and should be taken to the 2nd degree. (your rection makes it sounds you took it directly).

Tepples' point is that if you don't care about hardware and just want to write a NES-like game that can run on a PC, then just do so (Mega Man 9 comes in mind, for instance). Making it possible on hardware is a very complicated extra step.

Besides, the JVM is not really an emulator, it compiles the bytecode when you start to run the program to machine (binary) code and then executes it, which is why starting up a Java program can be extremely slow. Even back when it was interpreting the bytecode it doesn't make it an emulator, because there is no "hardware" equivalent of it, and it just emulates a CPU, not an entire machine.

Also you keep mentionning minecraft, but besides the graphics being blocky, they are not possible to render on the NES (just so that you know) because they're 3D.
Post Reply