A note to all NES-emulator authors: ROM header wishlist

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A note to all NES-emulator authors..

Post by rainwarrior »

Why have a checksum in the file itself?

I understand having checksums in a database so you know whether you've found the "good" ROM, but you can't do this with a hash is stored in the file itself. It's just as easy to build the hash for a bad ROM dump as a good one.

The only result I can think of is that homebrewers and romhackers would be annoyed by having to regenerate the hash every time they change their file. It's just redundant information. What's the point?
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

rainwarrior wrote:Why have a checksum in the file itself?
blargg wrote:I think the only concern would be utilities, but a utility adding a second one without checking for the first would be fine since the checksum for the first would almost surely be wrong afterwards. Thus there would only be one valid tag in the file. Even a malicious utility would have to somehow have both checksums correct, which like I mentioned is very tricky, since modifying one changes the correct value for the other.
To address both of these, I think there has been a misunderstanding in what the checksum in my proposal would be fore. The checksum is not for the file in its entirety, or to verify ROM integrity. It is a checksum of the tag structure itself to avoid random data being mistaken for a valid tag. The actual content of the tag would be pretty much equivalent to all of the things which we would want in iNES 2.0, but better organized.

Whether or not that includes PRG/CHR checksums is a valid debate to be had, but not what I was proposing.
Sik wrote:
proxy wrote:If it were just the magic which had to be correct you'd have a point. But that's why there are 3 things which need to match. The uint32_t magic value, the uint32_t size value AND the uint32_t checksum. That's a total number of values = 2^96, which makes it have 1/79,228,162,514,300,000,000,000,000,000 odds of happening by random chance.
That still doesn't prevent the "fake header on purpose" issue.
I don't think fake headers are a valid concern. What prevents someone from putting invalid info in the iNES header format to make the game unplayable except in emulators which use a database lookup? Should we disregard that iNES format because it could be abused like that as well?
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A note to all NES-emulator authors..

Post by rainwarrior »

Oh, I understand now. You want to place the "header" at some random location in the middle of the file and linearly search for it? Thus the checksum ensures that you didn't find the wrong thing by mistake. This is really weird to me; it just seems like to a solution to a problem that you created for yourself.

Why not just use a chunky structure like RIFF where unknown chunk types can be safely skipped because the size is part of the chunk header? (Or a table of contents, or XML, etc.) There are a lot of clean solutions to the skippable-data problem. Trying to append new data onto the end of existing stuff inherently creates its own set of problems, why not just start fresh with something properly extensible?

Or perhaps propose extensions to NES 2.0 to allow arbitrary metadata to be added?
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: A note to all NES-emulator authors..

Post by tepples »

rainwarrior wrote:Why not just use a chunky structure like RIFF where unknown chunk types can be safely skipped because the size is part of the chunk header?
Because UNIF had plenty of problems.
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

rainwarrior wrote:Oh, I understand now. You want to place the "header" at some random location in the middle of the file and linearly search for it? Thus the checksum ensures that you didn't find the wrong thing by mistake. This is really weird to me; it just seems like to a solution to a problem that you created for yourself.
The reason why is because it is a particularly clean way to add new meta-data to iNES files in a way which is both extensible and backwards compatible. While it may seem weird to you, it is actually a fairly common technique. As mentioned in my previous posts, things like the multiboot standard (for OS kernels), ACPI for hardware power management, I think ID3v2 uses it (but it typically at the file start), and many others use it.

Since the goal would be to have minimal impact on emulators which use iNES headers. I could have just said "it must be at the very end of the file", which is also a viable option and would probably be the case for most implementations. But imposing such a limitation now means that I have to do one of the 2 options:

1. mandate the size of the structure so people know the read the last N bytes of the file. This makes it less extensible. By not doing this, version == size, and we can just add new fields to the structure as the need arises.

2. have implementations try to read a block of size N1, then N2, then N3 from the end of the file if there are 3 revisions.

By having it allowed to be "anywhere" in the file and the standard specifies a trivial search and verify algorithm. The structure can be versioned safely without breaking previous version support.
rainwarrior wrote: Why not just use a chunky structure like RIFF where unknown chunk types can be safely skipped because the size is part of the chunk header? (Or a table of contents, or XML, etc.) There are a lot of clean solutions to the skippable-data problem. Trying to append new data onto the end of existing stuff inherently creates its own set of problems, why not just start fresh with something properly extensible?
This has been tried, it's called UNIF and I'm actually the maintainer of that standard. Unfortunately, while I think it would have been pretty fantastic if it became widely adopted, it has enough problems that it is not considered worth it to convert people's collections to a new wildly different format. My current proposal is trivial to implement, backwards compatible, and can be applied to existing ROMs without having to worry very much about emulator support. You can always fall back on the legacy iNES tags if you don't support it.
rainwarrior wrote:Or perhaps propose extensions to NES 2.0 to allow arbitrary metadata to be added?
iNES 2.0 (and even the current state of iNES 1.0) is very much a messy standard. It is kludgy, requires bit twiddling to get correct values and is limited in size (16 bytes total, that's all you get, and that's not enough IMO). You may have noticed that almost 100% of byu's well thought out critique involves the fact that extra non-header data placed in the file and the concerns it raises about compatibility with emulators which use databases of hashes and ignore iNES headers anyway.

So if we are going to add arbitrary meta-data to the file itself. We may was well do it right have a proper versioned structure with has enough space to represent the data in an obvious and easy to use format.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A note to all NES-emulator authors..

Post by rainwarrior »

proxy wrote:While it may seem weird to you, it is actually a fairly common technique. As mentioned in my previous posts, things like the multiboot standard (for OS kernels), ACPI for hardware power management, I think ID3v2 uses it (but it typically at the file start), and many others use it.

Since the goal would be to have minimal impact on emulators which use iNES headers. I could have just said "it must be at the very end of the file", which is also a viable option and would probably be the case for most implementations. But imposing such a limitation now means that I have to do one of the 2 options:

1. mandate the size of the structure so people know the read the last N bytes of the file. This makes it less extensible. By not doing this, version == size, and we can just add new fields to the structure as the need arises.

2. have implementations try to read a block of size N1, then N2, then N3 from the end of the file if there are 3 revisions.

By having it allowed to be "anywhere" in the file and the standard specifies a trivial search and verify algorithm. The structure can be versioned safely without breaking previous version support.
After looking it up, the Multiboot Standard appears to use it because it does not have an alternative. I'm not sure what to look up for ACPI, I'm not familiar with it. ID3 tags (v1 and 2) as far as I know only appear at specific locations in the file and do not require a linear search or a checksum?

I'm very familiar with techniques of placing extra data at the end of a file, but these don't require a linear search, or a checksum. In theory, data appended to the end of an iNES should not cause a problem, though I'm sure there are emulators/cases that will do stupid things (as has been mentioned). A 4-byte number telling you how much appended data there is and X bytes of "magic" at the end of a file should be sufficient to identify the extra data's presence and location in the file. In this case, since the location is already known, would there there any reason for a checksum? In my view it would only make it harder to build files, but provide no additional security.

I also think the linear search may not be germane to an efficient implementation on certain systems (e.g. flash cart). It's very easy where you have enough temporary space to hold the entire file, but if you don't this might require another pass through the file, drastically increasing load times. (Depending on whether random-access is available too, even a tag indicator on the end of the file could be a problem.)
rainwarrior wrote:My current proposal is trivial to implement, backwards compatible, and can be applied to existing ROMs without having to worry very much about emulator support. You can always fall back on the legacy iNES tags if you don't support it.

iNES 2.0 (and even the current state of iNES 1.0) is very much a messy standard. It is kludgy, requires bit twiddling to get correct values and is limited in size (16 bytes total, that's all you get, and that's not enough IMO). You may have noticed that almost 100% of byu's well thought out critique involves the fact that extra non-header data placed in the file and the concerns it raises about compatibility with emulators which use databases of hashes and ignore iNES headers anyway.
It seems weird to reject iNES 2 for the problems it inherits from iNES 1 and at the same time wanting to reuse iNES 1 for your format though. Given how strong a candidate iNES 2 seems right now... do you not think there are any worthwhile ways it could be improved without throwing it out?

A lot of the unfortunate iNES 1 legacy is easily washed away by simply not using the legacy features and getting the header correct. We've been making a lot of good progress on mapper definitions on the wiki as well, so we're in better shape than ever toward having a coherent central database of mappers. With iNES 2.0 there is still unallocated space in the 16-byte header; there could easily be a bit for "extra data after end of ROM data", or whatever we need here.

What are its practical deficiencies, though? I think there are things like the Jaleco baseball games, where there is a sample ROM that should probably be included in the .NES file (though so far only WAV rips exist, so we're not even to the point where we have the data to include, yet). What other kinds of extra data can you think of? (I know we can't think of everything, obviously...)

Also, what kinds of extra data would you propose to include that can be ignored (i.e. fall back to iNES 1.0)? There are things like Game Genie cheats (I think was mentioned earlier in this thread) that could easily be included as chunky metadata, but aren't essential to emulation. I dunno how opposed people are to that sort of thing; possibly they would be seen as just another way to make a mess of the files...
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A note to all NES-emulator authors..

Post by rainwarrior »

Though, another thing that might actually be a fun use of a linear-search method for tagging in a homebrew is to actually embed it in the PRG or CHR data directly. That would keep it out of any place it could do harm to an old emulator, but still host all the fancy new metadata. ;)
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: A note to all NES-emulator authors..

Post by tepples »

proxy: True, NES 2.0 requires a few bit operations to parse. But they're no more complicated than the bit manipulations needed to emulate an NES in the first place, and they're only done once at load time. And encouraging emulator authors to allow metadata embedded in the ROM to override the header would just make it easier for certain developers to include a false header as a speed bump to emulating dumped carts. Given what boards exist as of 2013, I'd be inclined to prefer sticking with NES 2.0.

When GBA homebrew was starting out, the GNU assembler had no ".incbin" directive, plus I wanted a way to tweak art and audio assets without having to relink the whole program. So I used linear search for my GBFS library. A game using GBFS would append one or more archives to a binary and linearly search cart address space for the archive signature at 256-byte intervals.
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

rainwarrior wrote: I'm very familiar with techniques of placing extra data at the end of a file, but these don't require a linear search, or a checksum. In theory, data appended to the end of an iNES should not cause a problem, though I'm sure there are emulators/cases that will do stupid things (as has been mentioned). A 4-byte number telling you how much appended data there is and X bytes of "magic" at the end of a file should be sufficient to identify the extra data's presence and location in the file. In this case, since the location is already known, would there there any reason for a checksum? In my view it would only make it harder to build files, but provide no additional security.
Are you suggesting having the magic value as as the very last value in the file? If present, the preceding value is the size? I suppose that would work but doesn't strike me as particularly simpler.
rainwarrior wrote: I also think the linear search may not be germane to an efficient implementation on certain systems (e.g. flash cart). It's very easy where you have enough temporary space to hold the entire file, but if you don't this might require another pass through the file, drastically increasing load times. (Depending on whether random-access is available too, even a tag indicator on the end of the file could be a problem.)
This is a valid point. But I think the idea of having a trailing magic value also would tend to be implemented in multiple passes of the file since I would imagine some systems don't have a trivial API for getting a file's size. At the very least it would involve a seek to the end - sizeof(uint32_t), then a seek to the end - (sizeof(uint32_t) * 2), then a seek to where the meta data starts. Doesn't strike me as much better :-(.
rainwarrior wrote: It seems weird to reject iNES 2 for the problems it inherits from iNES 1 and at the same time wanting to reuse iNES 1 for your format though. Given how strong a candidate iNES 2 seems right now... do you not think there are any worthwhile ways it could be improved without throwing it out?
I think you misunderstand. My proposed system is designed to replace iNES, but remain backwards compatible. The iNES header (version 1.x or 2.x) would be present and as complete as possible. But if the emulator supports the tag system I propose, it would ignore the iNES header and get all of its information from the tag instead. There would be some redundancy of information of course. But it would also maintain a high degree of compatibility.
rainwarrior wrote: A lot of the unfortunate iNES 1 legacy is easily washed away by simply not using the legacy features and getting the header correct. We've been making a lot of good progress on mapper definitions on the wiki as well, so we're in better shape than ever toward having a coherent central database of mappers. With iNES 2.0 there is still unallocated space in the 16-byte header; there could easily be a bit for "extra data after end of ROM data", or whatever we need here.
I am precisely suggesting not using the legacy features of iNES. The tag system I suggest would replace it entirely with a more extensible and more forward thinking design.
rainwarrior wrote: What are its practical deficiencies, though? I think there are things like the Jaleco baseball games, where there is a sample ROM that should probably be included in the .NES file (though so far only WAV rips exist, so we're not even to the point where we have the data to include, yet). What other kinds of extra data can you think of? (I know we can't think of everything, obviously...)

Also, what kinds of extra data would you propose to include that can be ignored (i.e. fall back to iNES 1.0)? There are things like Game Genie cheats (I think was mentioned earlier in this thread) that could easily be included as chunky metadata, but aren't essential to emulation. I dunno how opposed people are to that sort of thing; possibly they would be seen as just another way to make a mess of the files...
The iNES header is mostly very functional, but it can be a pain in the ass:

* Having to do complex operations to get trivial values like "PRG size" or "Mapper Number".

* Some ROMs have PRG/CHR sizes which don't fit into the multiple of 16384 or multiple 8192 categories. They are rare, but we could easily dispense with such restrictions.

* Just look at Byte 10 (RAM size) of iNES 2.0. It's sufficient, for now, but complex, involves a lookup table because lots of information was stuffed into 4 bits. What if some oddball cart is discovered that doesn't fit the mold?

* Even with spare bits, very limited space. While not necessary (and I'm sure some would disagree), it would be very nice if there were a UTF-8 field containing the original name of the ROM. Or perhaps Region/Country codes. Or Perhaps manufacturer.

More than anything, the idea is to provide a technique to say, "if the NES ROM file format were being developed today, based on what we know now. What would it look like?"

Like I said, I have no expectation that my idea would be adopted by everyone, but it would be an opportunity to "remake" the ROM meta data is a more clear and simple way. So it's worth putting it out there, and fueling a discussion.
proxy
Posts: 68
Joined: Tue Mar 08, 2011 9:45 am
Contact:

Re: A note to all NES-emulator authors..

Post by proxy »

rainwarrior wrote:Though, another thing that might actually be a fun use of a linear-search method for tagging in a homebrew is to actually embed it in the PRG or CHR data directly. That would keep it out of any place it could do harm to an old emulator, but still host all the fancy new metadata. ;)
That is interesting and would certainly be considered fair game :-). I Like it!
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A note to all NES-emulator authors..

Post by rainwarrior »

proxy wrote:
rainwarrior wrote: I also think the linear search may not be germane to an efficient implementation on certain systems (e.g. flash cart). It's very easy where you have enough temporary space to hold the entire file, but if you don't this might require another pass through the file, drastically increasing load times. (Depending on whether random-access is available too, even a tag indicator on the end of the file could be a problem.)
This is a valid point. But I think the idea of having a trailing magic value also would tend to be implemented in multiple passes of the file since I would imagine some systems don't have a trivial API for getting a file's size. At the very least it would involve a seek to the end - sizeof(uint32_t), then a seek to the end - (sizeof(uint32_t) * 2), then a seek to where the meta data starts. Doesn't strike me as much better :-(.
If depends on the particular system, but I've never seen the unknown-file-size limitation except when dealing with strict C code. Without exception, I can't think of any file system I've worked with (embedded included) that didn't have an API for getting a file's size. The method of fseek+ftell is usually only resorted to when trying to write cross-platform C code within the limitations of the stdio.h API. At least, that's been my experience with that method.

Going to the end of the file to read the magic/etc. would indeed involve seeking though, and this is where the random access requirement is pretty important. If you don't have enough RAM to hold the whole file (probably not terribly convenient on the NES), hopefully at least you can jump around in the file without undue delay. This is probably true of CF cards, but I don't know for sure. I suspect in cases like the PowerPak a couple of seeks would still be a lot faster than having to read the whole file to find a tag.


As for stuffing metadata into PRG/CHR, I actually often stick text messages near the end of my ROMs for anybody who wants to poke around in there with a hex editor, so this kinda thing might actually give a regular user a chance to find it with their emulator.
User avatar
rainwarrior
Posts: 8734
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A note to all NES-emulator authors..

Post by rainwarrior »

I do agree that iNES 1.0 parsing is somewhat onerous, especially in the cases where iNES 1.0 underspecifies and has to be resolved due to known things about the existing ROMs. I think iNES 2.0 is trying to alleviate those problems of underspecification, the new fields for larger PRG/CHR, for example. It still has all the bit-packing (and more), though, which you seem to dislike.

I don't find the bit packing that much of a pain, personally. Decoding each value more or less becomes one line of code with a shift and an and. Also bit-packing is something you can easily do by hand or by an assembler if you're writing homebrew. This is not true of CRCs, which generally can't be done directly by an assembler or by hand, which is the immediate objection in the back of my mind when I see it proposed as part of the file format. Putting a CRC into a file requires a specialized tool for generating it.

I agree that the organization of all the bits and stuff in an iNES 1 or 2 is more or less abitrary/random, and if iNES 2.0 wasn't trying to maintain some backward compatibility it would certainly be packed a lot differently, but I feel this legacy packing organization is reasonable enough to accept in the name of backward compatibility.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: A note to all NES-emulator authors..

Post by Sik »

proxy wrote:
Sik wrote:
proxy wrote:If it were just the magic which had to be correct you'd have a point. But that's why there are 3 things which need to match. The uint32_t magic value, the uint32_t size value AND the uint32_t checksum. That's a total number of values = 2^96, which makes it have 1/79,228,162,514,300,000,000,000,000,000 odds of happening by random chance.
That still doesn't prevent the "fake header on purpose" issue.
I don't think fake headers are a valid concern. What prevents someone from putting invalid info in the iNES header format to make the game unplayable except in emulators which use a database lookup? Should we disregard that iNES format because it could be abused like that as well?
There's a difference between messing with a header that was added just for emulators and messing with data that was part of the original ROM itself (rather than metadata). Yes, you can modify the ROM, but that doesn't sound good for preservation and such (it wouldn't be a good dump anymore).
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: A note to all NES-emulator authors..

Post by tepples »

proxy wrote:I would imagine some systems don't have a trivial API for getting a file's size.
Which systems might those be? POSIX systems have stat(), and Windows has _wstat(). Besides, NES ROMs are small enough that you could just read the entire ROM file into RAM. Are you designing a format for emulators, a format for the successor to the PowerPak, or a single format for both?
The iNES header is mostly very functional, but it can be a pain in the ass:

* Having to do complex operations to get trivial values like "PRG size" or "Mapper Number".
The emulator has to do complex operations anyway to get trivial values like "current VRAM address given values written to $2005". I agree with rainwarrior that the bit packing is no worse than what you already have to deal with when emulating an NES.
* Just look at Byte 10 (RAM size) of iNES 2.0. It's sufficient, for now, but complex, involves a lookup table because lots of information was stuffed into 4 bits.
RAM sizes are a bit shift expression, not a lookup table. Try this:

Code: Select all

static inline size_t NES2_0_RAM_size(unsigned int nibble) {
  return nibble ? 64L << nibble : 0;
}

/* ... */
  size_t prg_ram_size = NES2_0_RAM_size(header[10] & 0x0F);
  size_t prg_ram_size_battery = NES2_0_RAM_size((header[10] >> 4) & 0x0F);
  size_t chr_ram_size = NES2_0_RAM_size(header[11] & 0x0F);
  size_t chr_ram_size_battery = NES2_0_RAM_size((header[11] >> 4) & 0x0F);
What if some oddball cart is discovered that doesn't fit the mold?
From the spec as amended: "Sizes that are not a power of two, such as the 5120 byte battery-backed RAM of Taito's X1-017 (mapper 82), are rounded up."
if the NES ROM file format were being developed today, based on what we know now. What would it look like?
It'd look like a PKZIP file containing a PRG ROM file, a CHR ROM file, and XML metadata. At least that's what the designer of WSZ (Winamp skin package), SMZIP (StepMania package), ODT (OpenOffice Writer document), DOCX (Microsoft Office 2007 Word document), JAR (Java archive), or APK (Android installer package) would have come up with. I'm aware that a zipped format wouldn't be practical on the successor to the PowerPak.
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: A note to all NES-emulator authors..

Post by blargg »

Thread author: please retitle this thread to something mentioning NES ROM headers, since that's what it's about. The current thread is very vague, only one step away from "Thread".
Post Reply