A note to all NES-emulator authors: ROM header wishlist

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

Occasionally I think about the state of the .NES 2.0 header. And while it seems like it would work, it also feels like a kluge.

I know that there is probably no chance of this idea being adopted, and that people (including myself) wouldn't be thrilled with "yet another NES header."

Regardless, I feel like it's better to at least get the idea out there, feel free to shoot it down (but please, provide reasons so it can be a learning discussion).

Given the current state of NES format, and the strong desire for backwards compatibility, here's what I would consider ideal.

A structure which looks like this:

Code: Select all

struct {
    uint32_t magic;
    uint32_t checksum;
    uint32_t version;
    // ... followed by all the data deemed useful by the community.
};
and can be located anywhere within the file, but probably the end of the file for maximum compatibility. Here's the concept:

When loading a ROM, you search the file for a magic value. If I were implementing this standard, I would enforce that it must be on a 4-byte boundary so make the search efficient. If found, you read the version number (so you know the size of the structure) and the checksum. Then you CRC what you found. If that matches the checksum, this is your header, you can just ignore the iNES 1.0 header entirely from that point forward.

There are several benefits to this approach:

* 100% backwards compatible, should not break old emulators
* easy to apply to existing ROMs, you just append it to the ROM file.
* versioned, so we can add new fields as the community strives for more and more accurate emulation.
* because it can be located anywhere, it's size is not limited. So if a new version wants to add some new bytes to the structure, it's trivial to do so.
* provides an opportunity to recreate the structure in a more sane way (no more munging together bit fields to create values). For example, I would just a uint16_t for mapper numbers, I would use a full uint32_t for PRG and CHR sizes (but with the standard spelling out how to handle non-power of 2 sizes).

The only counter argument I can foresee at the moment is "the loading is relatively complicated in comparison to just "read the first 16 bytes and parse." My counter argument to that is that iNES 1.0 already is fairly complex to parse correctly as it has been tweaked over time and requires mashing together bit fields for things like mapper numbers and such. This becomes even more complex as iNES 2.0 comes into play. When comparing the two schemes, a simple linear search followed by a checksum is pretty simple when compared to the bit fiddling currently needed to handle things correctly.

Thoughts?
proxy
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Re: A note to all NES-emulator authors..

Post by Near »

Only because you asked for feedback ...

> and can be located anywhere within the file

What if two tags appear in the same file? Some emulators will invariably go with the first, others with the last, regardless of what you say in the specification.

> I would enforce that it must be on a 4-byte boundary so make the search efficient.

Other emulators would happily ignore this, and then homebrew makers would ignore it, and then you'd look like the broken implementation for enforcing this rule.

> 100% backwards compatible, should not break old emulators

You have to insert the data somewhere into existing ROMs. If you put it at the end, you will most certainly break some emulators that rely on file sizes instead of header info. Eg one might say "read iNES PRG ROM size, and assume CHR ROM size == remaining file size."

Remember that there are more NES emulators than there are cars. If it can be done, it's been done :P

> easy to apply to existing ROMs, you just append it to the ROM file.

Breaks checksums of files, breaks existing emulator CRC32 lookups.

> versioned, so we can add new fields as the community strives for more and more accurate emulation.

Versioning creates as many problems as it solves. But for something like this, I don't see much choice. Especially when everyone keeps insisting on non-extensible data formats.

> because it can be located anywhere, it's size is not limited. So if a new version wants to add some new bytes to the structure, it's trivial to do so.

The header should just specify the packet size.

...

Anyway, please don't be disheartened in the critiques. Always fun to hear new ideas!

My favorite part of your idea is that it could be embedded in new homebrew, while still keeping the PRG/CHR size exactly right. That would truly be 100% seamless with existing emulators.
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

byuu wrote:Only because you asked for feedback ...
I did, and it is always appreciated. You thought of some things which I hadn't, which is a good thing :-). I do have some responses though.
byuu wrote: > and can be located anywhere within the file
What if two tags appear in the same file? Some emulators will invariably go with the first, others with the last, regardless of what you say in the specification.
That would have to be addressed on a spec level. First matched tag seems reasonable to me as it would be the easiest for an author to implement. Also, it's what other similar specifications have done (multiboot, ACPI tables, etc) with much success.

With any spec, there is a reasonable expectation and requirement for people to follow it for it to be useful. While it is true that some people will of course not follow it correctly. I could say the same thing about any spec, even iNES 1.0 and 2.0. People could easily choose to interpret some bits the wrong way and not play the game correctly.

There is a silver lining though, most (if not all) NES emulator authors are members of this forum, so it is at least possible to get the majority of emulators on board with doing it right, which is enough for it to work.
byuu wrote: > I would enforce that it must be on a 4-byte boundary so make the search efficient.
Other emulators would happily ignore this, and then homebrew makers would ignore it, and then you'd look like the broken implementation for enforcing this rule.
Good point, I would have hoped that anyone trying this spec out would do it right, but definitely some people would not. Fortunately, that is nothing more than an optimization which could be dropped. Any alignment would work, just doing it by uint32_t's make it 4 times faster to locate it.
byuu wrote: > 100% backwards compatible, should not break old emulators
You have to insert the data somewhere into existing ROMs. If you put it at the end, you will most certainly break some emulators that rely on file sizes instead of header info. Eg one might say "read iNES PRG ROM size, and assume CHR ROM size == remaining file size."
Interesting point. I know that GoodNES hashing everything after the first 16 bytes, so it would be effected (that's how it determines overdumps). It's a problem, but, I don't think it's a major problem, here's why:

#1 it is no different as far as these emulators are concerned from an overdump, there is just some extra data at the end.
#2 emulators which ignore the iNES headers and use hashes instead are fairly new and likely still in development to some degree, so we could work with the authors to get it right.
#3 truly legacy emulators (i mean the ones from the 90's) definitely rely on the iNES 1.0 header, so they won't be effected.
#4 every emulator which makes decisions based on hashes would be insane not to fall back on iNES headers if there isn't a match found. That's what would happen here. It's not perfect, but most games are playable under those conditions.
#5 taking into consideration point #4, those same emulators would have to continually update there database of checksums to keep up with new releases or they risk not being able to play new dumps/home brews. If they are keeping up, they can either add these new "overdumps", or update there emulator to handle it more gracefully.
#6 regarding "read iNES PRG ROM size, and assume CHR ROM size == remaining file size.", that's interesting, and certainly someone has done it. But the resultant CHR ROM size wouldn't be a valid size (not a multiple of 8192). So they would pretty much have to truncate it since they would not have a whole page at the end.
#7 I have a (admitadly stalled) "Open GoodNES" project which aims to be an open source / open DB alternative GoodNES. This may be a good reason to resume work on it!
byuu wrote: > easy to apply to existing ROMs, you just append it to the ROM file.
Breaks checksums of files, breaks existing emulator CRC32 lookups.
This is pretty much the same as the previous point.
byuu wrote: > versioned, so we can add new fields as the community strives for more and more accurate emulation.
Versioning creates as many problems as it solves. But for something like this, I don't see much choice. Especially when everyone keeps insisting on non-extensible data formats.
Agreed, I think it's necessary, but can be done carefully to make it easy to get right.
byuu wrote: > because it can be located anywhere, it's size is not limited. So if a new version wants to add some new bytes to the structure, it's trivial to do so.
The header should just specify the packet size.
Agreed! Now that I think of it, it should be done similar to Microsoft APIs. Size is the version. The structure should have a size field instead of a version field. And you simply never remove fields, only append to the end (potentially deprecating previous ones). This has worked for almost 30 years and if there is one thing they succeed at, it's maintaining compatibility with older versions of there API.

All in all, you make some good critiques, but I don't think any of them would be deal breakers. The need is not to get every emulator to do it right, but only the most popular ones. The rest will follow suite if that happens.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: A note to all NES-emulator authors..

Post by Sik »

I'm surprised byuu didn't ask what would happen if the game itself happens to have something in it that matches the tag starting the metadata. That's pretty much guaranteed to break everything. Heck, it could even be used to ensure emulators can't run the ROMs properly.
zzo38
Posts: 1096
Joined: Mon Feb 07, 2011 12:46 pm

Re: A note to all NES-emulator authors..

Post by zzo38 »

Sik wrote:I'm surprised byuu didn't ask what would happen if the game itself happens to have something in it that matches the tag starting the metadata. That's pretty much guaranteed to break everything. Heck, it could even be used to ensure emulators can't run the ROMs properly.
That is why I hate this way of doing it. If more header is needed to be added to NES 2.0 (such as to make NES 2.3 or to make NES 3.0), then add a header extension table address field in the header. However I also think many things don't need to be included in the ROM image file, and probably everything that does need to be included is already a part of NES 2.0, so this won't be necessary at all.

I recommend that you never use the database of checksums to determine the file if the header is NES 2.0 format (as indicated by the version bits). Probably best is don't use it at all unless the user enables and loads the database anyways. NES 2.0 headers should always be assumed correct.
(Free Hero Mesh - FOSS puzzle game engine)
etabeta
Posts: 109
Joined: Wed Nov 29, 2006 10:11 am
Location: Trieste, Italy

Re: A note to all NES-emulator authors..

Post by etabeta »

Sik wrote:I'm surprised byuu didn't ask what would happen if the game itself happens to have something in it that matches the tag starting the metadata. That's pretty much guaranteed to break everything. Heck, it could even be used to ensure emulators can't run the ROMs properly.
I agree that this is the biggest issue with proxy's suggestion (much more than a dev stupidly assuming that he can ignore the CHR size in the header, when the format explicitly requires it for a reason ;) and for the record any Vimm lair overdump would fail in such an emu as well, since that website was appending a 0x10 watermark on its roms, based on the fact that the format only specifies the PRG/CHR chunks and not the whole rom size, so you can load larger files without most emu even noticing it)
etabeta
Posts: 109
Joined: Wed Nov 29, 2006 10:11 am
Location: Trieste, Italy

Re: A note to all NES-emulator authors..

Post by etabeta »

zzo38 wrote:I recommend that you never use the database of checksums to determine the file if the header is NES 2.0 format (as indicated by the version bits). Probably best is don't use it at all unless the user enables and loads the database anyways. NES 2.0 headers should always be assumed correct.
MESS took a more radical approach, actually. We prefer users to adopt for non-homebrew entries separate PRG and CHR files, so that we can throw in the wc and flush away headers (both correct and incorrect), and rely only on the database. Given that years of attempts to spread correct headers have proven useless because most users don't even want to know what headers are and why they need a correct one for the games to work, why not to remove entirely the problem and just have to handle good and bad dumps? ;)

iNES is of course fine for homebrew, because one hopes the creator has setup the parameters correct, but for the rest...

I ensure you that we still get reports of people asking why half of their nointro set does not load in MESS, just because they downloaded it in 6 years ago when the set was first built and distributed by website *without* any headers
and people ask why these can't just load like SNES, GB and MD roms (but at least to this I can't reply to check their Atari Lynx and Atari 7800 romsets which have the same problem with headers ;) )
zzo38
Posts: 1096
Joined: Mon Feb 07, 2011 12:46 pm

Re: A note to all NES-emulator authors..

Post by zzo38 »

etabeta wrote:MESS took a more radical approach, actually. We prefer users to adopt for non-homebrew entries separate PRG and CHR files, so that we can throw in the wc and flush away headers (both correct and incorrect), and rely only on the database. Given that years of attempts to spread correct headers have proven useless because most users don't even want to know what headers are and why they need a correct one for the games to work, why not to remove entirely the problem and just have to handle good and bad dumps?
Well, I suppose you could have an emulator to do this if you want to, but if you take this approach, you should probably do the following:
  • Make the database external to the executable but in the same directory and provided with the emulator.
  • Provide information about the database format in order to convert between iNES format and separate PRG/CHR format; you could also provide an extra program to do this.
  • If the database includes the relevant information, also allow .NES.INI file to be created from an entry in the database; this could be done by the same program as the one to convert to/from iNES.
Although doing it your way is OK if you want to, I don't like that way; I think that that the user ought to understand these things when downloading a bad dump or if they are dumping a cartridge they own by themself, to know what the correct settings are (there is bootgod database to help with this).

Note that GameBoy ROMs already include headers; if you copy it from a cartridge the header will already be there, because Nintendo put them there!
(Free Hero Mesh - FOSS puzzle game engine)
etabeta
Posts: 109
Joined: Wed Nov 29, 2006 10:11 am
Location: Trieste, Italy

Re: A note to all NES-emulator authors..

Post by etabeta »

zzo38 wrote:[*]Make the database external to the executable but in the same directory and provided with the emulator.
[*]Provide information about the database format in order to convert between iNES format and separate PRG/CHR format; you could also provide an extra program to do this.
[*]If the database includes the relevant information, also allow .NES.INI file to be created from an entry in the database; this could be done by the same program as the one to convert to/from iNES.[/list]
about the first 2 points, did you ever attempt to make a search about MESS before posting? ;)
the database is already external, and in xml format (so one can rely on existing parsers to read info from it)
it also was announced to the community long ago: viewtopic.php?f=3&t=6558 and everyone is free to use any info from it whenever they like (even if I'd appreciate people to acknowledge the source of info and share back any fix)
the updated address for the db is: http://git.redump.net/mame/tree/hash/nes.xml

same goes for the info in all the other databases we include with MESS: http://git.redump.net/mame/tree/hash/ (non-NES lists still lack proper documentation of controllers that can be used with a given game, but it will be added in due time)


about the 3rd point: I've been toying for a couple of years with the idea of adding support for a per-game xml file but we are back to the header point... despite trying for a few years, we never succeeded having people to fix headers in they roms, why should they now add an xml (or bml or ini or whatever additional file) to their roms? how would you force them to do that?


zzo38 wrote:Note that GameBoy ROMs already include headers; if you copy it from a cartridge the header will already be there, because Nintendo put them there!
you might want to re-read my previous post. all the cases I mentioned (NES, Lynx and A7800) use a format which is not a straight dump from the carts, but rely on bits added by the dumper (or the emu author) in order to setup mappings of the ROM inside the emu memory. As such, even if you get a perfect dump from a cart you own, you cannot use it in emus which don't use hash databases because the binary dump does not contain the mapping data needed for emulation (and typically you blame emulators, not without a point)
OTOH, of course GB roms have an header too, but it's part of the original binary you extract from the cart so it has not to be added (artificially) by whoever spreads the dumps and does not confuse users...
zzo38
Posts: 1096
Joined: Mon Feb 07, 2011 12:46 pm

Re: A note to all NES-emulator authors..

Post by zzo38 »

etabeta wrote:about the first 2 points, did you ever attempt to make a search about MESS before posting? ;)
the database is already external, and in xml format (so one can rely on existing parsers to read info from it)
it also was announced to the community long ago: viewtopic.php?f=3&t=6558 and everyone is free to use any info from it whenever they like (even if I'd appreciate people to acknowledge the source of info and share back any fix)
That is good idea; thanks.
about the 3rd point: I've been toying for a couple of years with the idea of adding support for a per-game xml file but we are back to the header point... despite trying for a few years, we never succeeded having people to fix headers in they roms, why should they now add an xml (or bml or ini or whatever additional file) to their roms? how would you force them to do that?
In my case the .NES.INI contains only optional information; the game should work without it. The information in the INI might be used to display a title, make a simpler cartridge (or at least give you more options) than what is specified in the iNES header, automatically select the correct input device, dynamically optimize the emulator, add enhancements for the game, use for querying in a database, etc. It is not a requirement. (My own released .NES files include a .NES.INI too, and no programs currently read them; that is perfectly fine. My format is meant to be completely optional on both ends (I am not trying to force anyone to do anything).)

I also suppose that you could use XSLT and XQuery to convert the database entries into an individual XML file for each game if you want to, and if you find such things useful; I also suppose that it would be possible to convert some of the data in such a file to/from .NES format and .NES.INI format, since much of it is the information in the NES 2.0 header while some (such as the <feature name="peripheral"> command) corresponds to things in the .NES.INI format.

Note that although I like to have the PRG and CHR ROM and header and stuff in one iNES file, I would rather have the disk images in a separate file for each side (I have decided to give these files the .QDI extension so you don't confuse them with .FDS files that have all disk sides in one file), so that you can write-protect them individually, make copies of the disks, use FDS software that can write files on other disks (maybe no commercial FDS games do this, but homebrew software might), etc.
zzo38 wrote:Note that GameBoy ROMs already include headers; if you copy it from a cartridge the header will already be there, because Nintendo put them there!
you might want to re-read my previous post. all the cases I mentioned (NES, Lynx and A7800) use a format which is not a straight dump from the carts, but rely on bits added by the dumper (or the emu author) in order to setup mappings of the ROM inside the emu memory. As such, even if you get a perfect dump from a cart you own, you cannot use it in emus which don't use hash databases because the binary dump does not contain the mapping data needed for emulation (and typically you blame emulators, not without a point)
OTOH, of course GB roms have an header too, but it's part of the original binary you extract from the cart so it has not to be added (artificially) by whoever spreads the dumps and does not confuse users...
I don't know about Lynx and A7800, but OK I understand you.
(Free Hero Mesh - FOSS puzzle game engine)
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

Sik wrote:I'm surprised byuu didn't ask what would happen if the game itself happens to have something in it that matches the tag starting the metadata. That's pretty much guaranteed to break everything. Heck, it could even be used to ensure emulators can't run the ROMs properly.
If it were just the magic which had to be correct you'd have a point. But that's why there are 3 things which need to match. The uint32_t magic value, the uint32_t size value AND the uint32_t checksum. That's a total number of values = 2^96, which makes it have 1/79,228,162,514,300,000,000,000,000,000 odds of happening by random chance.

Many standards have used this approach before with great success. And these aren't minor standards they are things like:

* Multiboot - What GRUB uses to load Linux and other OSes
* ACPI - What the OS uses to identify the power management features of your OS are[/list]

There are actually tons of systems which use this "taged packet with a checksum" approach. While it is not literally impossible for there to be a collision, it is quite close to it. Your odds of getting a CRC32 collision on the ROM's PRG/CHR themselves in a database lookup are FAR greater.
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: A note to all NES-emulator authors..

Post by blargg »

byuu wrote:
and can be located anywhere within the file
What if two tags appear in the same file? Some emulators will invariably go with the first, others with the last, regardless of what you say in the specification.
Two tags each with valid checksums? That'd be pretty tricky to make work, might even be like trying to create a file that gives a certain hash.
I would enforce that it must be on a 4-byte boundary so make the search efficient.
Other emulators would happily ignore this, and then homebrew makers would ignore it, and then you'd look like the broken implementation for enforcing this rule.
Plus we're talking hundred-K files. Premature optimization here.
My favorite part of your idea is that it could be embedded in new homebrew, while still keeping the PRG/CHR size exactly right. That would truly be 100% seamless with existing emulators.
Excellent point. Existing ROM sets have the problem more or less solved, through databases and headers. New ones can't rely on databases, and iNES handling isn't fully consistent or thorough. New ones can have this in their ROM data.
proxy wrote:If it were just the magic which had to be correct you'd have a point. But that's why there are 3 things which need to match. The uint32_t magic value, the uint32_t size value AND the uint32_t checksum. That's a total number of values = 2^96, which makes it have 1/79,228,162,514,300,000,000,000,000,000 odds of happening by random chance.
The size value will mostly be a small value with its upper and lower bits clear. The values being matched are also not distributed evenly, with things like 0x00 and 0xFF being more common than others. A while back I did an analysis of NES ROMs and found certain three- and four-byte sequences that were drastically less likely than others. I used this in choosing a three-byte magic value for the bootloader format.
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

blargg wrote:
byuu wrote:
and can be located anywhere within the file
What if two tags appear in the same file? Some emulators will invariably go with the first, others with the last, regardless of what you say in the specification.
Two tags each with valid checksums? That'd be pretty tricky to make work, might even be like trying to create a file that gives a certain hash.
I would enforce that it must be on a 4-byte boundary so make the search efficient.
Other emulators would happily ignore this, and then homebrew makers would ignore it, and then you'd look like the broken implementation for enforcing this rule.
Plus we're talking hundred-K files. Premature optimization here.
Yea, the two tags thing, the odds of it happening by random are astronomical. The only way I could see it happening is if tools fail to check for an existing one before applying a new one. Which like i said, the spec could just say "Use the first". The second one would do no harm.

I Agreed about the alignment requirement. While nice in concept when thinking in terms of efficiency, it is simply not needed and only really serves to be something which be done wrong. I would be happy to ditch that part :-).
blargg wrote:
My favorite part of your idea is that it could be embedded in new homebrew, while still keeping the PRG/CHR size exactly right. That would truly be 100% seamless with existing emulators.
Excellent point. Existing ROM sets have the problem more or less solved, through databases and headers. New ones can't rely on databases, and iNES handling isn't fully consistent or thorough. New ones can have this in their ROM data.
I hadn't thought about that, and I like that perspective. Having it as part of all new ROMs would be completely harmless since they won't be in any DBs anyway. Only serves as extra information for any emulators which choose to implement this kind of standard. After that addition of old ROMs could happen at any pace depending on adoption.

Though I do feel that the database lookup issue is not as bad as it sounds. Ever emulator I've ever seen will at least fall back on built in headers if the ROM isn't in the DB. And that will work for almost all games.
blargg wrote:
proxy wrote:If it were just the magic which had to be correct you'd have a point. But that's why there are 3 things which need to match. The uint32_t magic value, the uint32_t size value AND the uint32_t checksum. That's a total number of values = 2^96, which makes it have 1/79,228,162,514,300,000,000,000,000,000 odds of happening by random chance.
The size value will mostly be a small value with its upper and lower bits clear. The values being matched are also not distributed evenly, with things like 0x00 and 0xFF being more common than others. A while back I did an analysis of NES ROMs and found certain three- and four-byte sequences that were drastically less likely than others. I used this in choosing a three-byte magic value for the bootloader format.
Fair enough, the odds aren't quire 1/(2^96) due to non-linear distributions. Picking a good magic value is certainly important. Additionally to improve unqueness, there are several equally viable options.

* make the magic value bigger.
* have a magic footer value too at the end of the structure.

Either one of these would be very effective in ensuring no collisions. And like I mentioned earlier, the odds of a CRC32 collision on PRG/CHR hashes is already FAR more likely and that doesn't seem to be an issue.
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: A note to all NES-emulator authors..

Post by blargg »

proxy wrote:Yea, the two tags thing, the odds of it happening by random are astronomical. The only way I could see it happening is if tools fail to check for an existing one before applying a new one. Which like i said, the spec could just say "Use the first". The second one would do no harm.
I think the only concern would be utilities, but a utility adding a second one without checking for the first would be fine since the checksum for the first would almost surely be wrong afterwards. Thus there would only be one valid tag in the file. Even a malicious utility would have to somehow have both checksums correct, which like I mentioned is very tricky, since modifying one changes the correct value for the other.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: A note to all NES-emulator authors..

Post by Sik »

proxy wrote:If it were just the magic which had to be correct you'd have a point. But that's why there are 3 things which need to match. The uint32_t magic value, the uint32_t size value AND the uint32_t checksum. That's a total number of values = 2^96, which makes it have 1/79,228,162,514,300,000,000,000,000,000 odds of happening by random chance.
That still doesn't prevent the "fake header on purpose" issue.
Post Reply