.fds format: Can checksums be heuristically detected?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

hex_usr
Posts: 92
Joined: Sat May 09, 2015 7:21 pm

.fds format: Can checksums be heuristically detected?

Post by hex_usr »

The .fds format, as we all know, omits checksums that are physically present on the disks. This oversight makes it impossible to emulate a feature that was believed to not be important.

I would argue that that feature really is important, but this is based on speculation, not hard facts.

Games can lie to the BIOS about how many files are on the disk as an anti-piracy measure. The ones that do this have their own copies of the loading routine that hard-codes the actual number of files, making the file amount block unreliable for its intended purpose. I don't know this for certain, but I suspect that those very same loading routines can also ignore checksum errors reported by the RAM adapter, meaning that any number of files can deliberately include invalid checksums to trick pirates. It may even be possible that a disk loading routine will err out if the checksum really is correct.

And the .fds format took those checksums out, so that emulators would never know.

Nintendo's .qd format is better because it includes checksums, but not only does it omit the 00 padding between blocks, it also has one flaw not in the .fds format: it can take 65,536 bytes per side instead of the 65,500 that actual disks could hold!

So I'm wondering if it is possible to expand the .fds format to make checksums optional instead of forbidden. I'd rather not have the header modified to indicate the presence of checksums; most emulators support leaving out the header entirely because it's practically useless right now, just like Super Famicom and PC Engine copier headers. So I'll argue for heuristic detection.

Block types 1, 2, and 3, which are the disk info block, file amount block, and the file header block respectively, always have fixed sizes of 56, 2, and 16 when counting the block ID and omitting the checksum. This means that, no matter what, $0000 will always hold the byte 0x01, $0038 will always hold the byte 0x02, $003A will always hold the byte 0x03, and $004A will always hold the byte 0x04.

If checksums are included, the block sizes above change to 58, 4, and 18 respectively, changing the locations of bytes 0x01, 0x02, 0x03, and 0x04 to $0000, $003A, $003E, and $0050 respectively.

Thus, in order to detect whether a .fds file includes checksums, the emulator needs to verify the following:

No checksums (all current .fds files)
  • The addresses $0038, $003A, and $004A hold the byte values 0x02, 0x03, and 0x04.
Checksums present (new .fds files created to have them)
  • The addresses $003A, $003E, and $0050 hold the byte values 0x02, 0x03, and 0x04.
  • The address $0038 holds the 16-bit checksum of the disk info block ($0000-$0037), and said checksum is correct.
  • The address $003C holds the 16-bit checksum of the file amount block ($003A-$003B), and said checksum is correct.
  • The address $004E holds the 16-bit checksum of the first file's header block ($003E-$004D), and said checksum is correct.
So, is it at all practical to make such a drastic change, considering the age of the .fds format and most older emulators that supported it? Keep in mind, these heuristics make checksums optional, not required, so all the existing .fds dumps in existence would not become invalid as a result of incorporating these heuristics.

================================================
By the way, the NesDev wiki does not include the formula for calculating checksums. The Famicom Disk System technical reference by Brad Taylor includes a formula, but it's in x86 assembly. By comparing the checksums I am getting with those from Nintendo's .qd images, I have found that this C-like pseudocode converted from that x86 formula produces the correct checksum:

Code: Select all

uint16_t fds_crc(uint8_t* data, unsigned size) {
  //Do not include any existing checksum, not even the blank checksums 00 00 or FF FF.
  //The formula will automatically count 2 0x00 bytes without the programmer adding them manually.
  //Also, do not include the gap terminator (0x80) in the data.
  //If you wish to do so, change sum to 0x0000.
  uint16_t sum = 0x8000;

  for(unsigned byte_index = 0; byte_index < size + 2; byte_index++) {
    uint8_t byte = byte_index < size ? data[byte_index] : 0x00;
    for(unsigned bit_index = 0; bit_index < 8; bit_index++) {
      bool bit = (byte >> bit_index) & 1;
      bool carry = sum & 1;
      sum = (sum >> 1) | (bit << 15);
      if(carry) sum ^= 0x8408;
    }
  }
  return sum;
}
The byte sequence (80) 02 07 (00 00) should have the checksum 0x4BE3. Checksums are stored in little-endian order on the disk, so the data will become (80) 02 07 E3 4B.
bsnes-mcfly: the bsnes v073 and bsnes-classic killer (GitLab repository)
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: .fds format: Can checksums be heuristically detected?

Post by tepples »

hex_usr wrote:Nintendo's .qd format is better because it includes checksums, but not only does it omit the 00 padding between blocks, it also has one flaw not in the .fds format: it can take 65,536 bytes per side instead of the 65,500 that actual disks could hold!
Is 65,500 just an assumption, or is that from Mitsumi's actual spec? Otherwise I'm of half a mind to just adopt .qd.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: .fds format: Can checksums be heuristically detected?

Post by rainwarrior »

Thanks for the C checksum implementation! That is good reference.

As for the format change, I don't really see why .QD isn't already doing exactly what you need. Why try to change .FDS?

Whether it might "allow" 36 more bytes seems completely unimportant to me? You can enforce this limit yourself when you make the file, or in the emulator, if you want.


If you really want to change the .FDS format, though, I wouldn't really suggest messing with the disk data to store the "use checksums" metadata. That's real information that shouldn't be tampered with, and your whole point was accuracy.

Just alter the 5 byte header. Instead of "FDS",$1A use "FDS",$1B maybe? It would also prevent emulators that don't support checksums from trying to run these files (which it will fail on, because this is not a backwards-compatible option).
hex_usr
Posts: 92
Joined: Sat May 09, 2015 7:21 pm

Re: .fds format: Can checksums be heuristically detected?

Post by hex_usr »

rainwarrior, I didn't mean to imply that we should change the existing dumps - just to use the format for new dumps in the future.

I... actually don't know if 65,500 is accurate. The 65,500 statistic comes from the NesDev wiki article about the Famicom Disk System, which cites this Japanese-language souce (where it's broken into "6550" and "0" on adjacent lines).

In theory, .qd could be used, but it stands for "QuickDisk" and does not indicate what system the disk image is intended for, so I'm not sure it's a good idea to use the extension directly.

On my computer, I name my PlayStation 2 ISOs with a compound extension ".ps2.iso" to indicate which console they are for. At least Dolphin standardized ".gcm" for GameCube disk images.
bsnes-mcfly: the bsnes v073 and bsnes-classic killer (GitLab repository)
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: .fds format: Can checksums be heuristically detected?

Post by rainwarrior »

hex_usr wrote:rainwarrior, I didn't mean to imply that we should change the existing dumps - just to use the format for new dumps in the future.
That's not what I meant either. What I was saying is that "CRCs or not" is metadata information, so it has to go outside of the disk image information. Put it in the header, not in the disk info block. (Using a different FOURCC in the header for this purpose also prevents compatibility issues.)
hex_usr
Posts: 92
Joined: Sat May 09, 2015 7:21 pm

Re: .fds format: Can checksums be heuristically detected?

Post by hex_usr »

But the CRCs are physically present on the disk. They are just as much a part of the disk data as the Super Famicom ROM's internal header at $7FB0-$7FDF/$FFB0-$FFDF is part of that format.

A few years ago, I heard that a ROM site ruined its entire Nintendo 64 ROM collection by removing the first 64 (?) bytes because they confused internal ROM headers with copier headers. I want to make it very clear that I am making that distinction.

Er, don't games that have save data need to write the CRC as well when saving data? Or does the RAM adapter handle that for them?

EDIT: In case it's not clear, the CRCs are for each individual data block, not for the entire disk image as a whole.
bsnes-mcfly: the bsnes v073 and bsnes-classic killer (GitLab repository)
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: .fds format: Can checksums be heuristically detected?

Post by rainwarrior »

hex_usr wrote:But the CRCs are physically present on the disk.
Sorry, there is a miscommunication here.

Yes, I understand you want to include the CRCs in the .FDS file, as 2 extra bytes on the relevant blocks. That part is not what I'm talking about.

What you are proposing is to change the emulator behaviour to either include CRCs or exclude CRCs. What I think is a bad idea is having the emulator make this decision based on data in the disk image. This metadata decision should be made in the file header, outside the data on the disk image. It should not try to read any of the disk image before this decision has been made.

My offhand suggestion was to do this by simply changing the FOURCC to "FDS",$1B (CRCs) instead of "FDS",$1A (no CRCs).
hex_usr
Posts: 92
Joined: Sat May 09, 2015 7:21 pm

Re: .fds format: Can checksums be heuristically detected?

Post by hex_usr »

I won't deny that there is a chance that the heuristic could fail. That the checksum for the first data block could have a 0x02 for its least significant byte among other things.

But many emulators support .fds images with no header at all, and without a header, how can they tell which is which? The only usable information in the header right now is a byte at $04 that tells how many disk sides are present, which is already possible to determine by checking the file size.

Should SNES emulators mandate that .sfc ROMs include headers in order to tell them that they are LoROM or HiROM? No, they already figure that out by reading the headers at each of the possible locations for the header and comparing them with the mode byte (Mode 20 at $7FB0-$7FDF, Mode 21 at $FFB0-$FFDF, Mode 25 at $40FFB0-$40FFDF).

Some Mega Drive emulators, including Kega Fusion, don't rely on the extension of the ROM to tell whether they are interleaved. They look for "SEGA" or " SEGA" at a certain address or addresses (it gets split when interleaved), so it is entirely possible to have a Mega Drive ROM with the extension ".smd" that is NOT interleaved and still load it in those emulators.

I don't even know how some PC Engine emulators figure out ROM dumps where the bits in each byte are reversed (erroneously called "encryption" even though it comes from naïve misdumpings with a dumper intended for the wrong region). I have a Bonk's Adventure ROM where, for each byte, bit 0 is bit 7, bit 1 is bit 6, and so on, and some PC Engine emulators can figure it out already. Notably, Mednafen is NOT among them.

That's what I'm referring to with this proposal.
bsnes-mcfly: the bsnes v073 and bsnes-classic killer (GitLab repository)
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: .fds format: Can checksums be heuristically detected?

Post by rainwarrior »

hex_usr wrote:But many emulators support .fds images with no header at all, and without a header, how can they tell which is which? The only usable information in the header right now is a byte at $04 that tells how many disk sides are present, which is already possible to determine by checking the file size.
Yes, there are two versions of the format, one with a header, one without, both very widely supported.

If you use one with a header, there is a place for metadata like changes to the format. If you use one without, there is not. I suppose a different FOURCC will make emulators think it is a "headerless" file though, unless they go by file size... kind of a catch 22.

But... very simply only one of these two versions of the format can be extended with metadata: the headered version. The headerless version will forever be the "no CRC" version of the file.

If you want accurate emulation the decision needs to be made before it starts interpreting the disk data. Like, you should be able to feed it random data, bad disk info blocks, etc. and it should behave "accurately", and you can't do this if your metadata about behaviour is stored in the disk image itself.


It doesn't really matter what SNES/MD formats do. This is about what you should do, not whether it does something that some older formats did. If accuracy is really your goal, the behaviour decision needs to be outside the raw image. This is true of SFC files too-- auto detection based on the image data means you can't emulate "bad" image info without being able to force the alternate behaviour externally.

At least, I thought the whole point here is that you want to emulate how "improper" image data works (i.e. bad CRCs), but unless you put that behaviour switch outside the disk image you're preventing the technique from working accurately with a bad disk info block as well. That's why I think you should do it in the header (if you're going to do this at all).
hex_usr
Posts: 92
Joined: Sat May 09, 2015 7:21 pm

Re: .fds format: Can checksums be heuristically detected?

Post by hex_usr »

Well, I guess we should invalidate every single Super Famicom and Mega Drive emulator ever made.

Let the word be known: external headers are absolutely mandatory to emulate the Super Famicom! If your ROM is headerless, add a header to it, or lose the ability to play it forever! This header is the only way to distinguish LoROMs from HiROMs! Also, all those patches on ROMhacking.net that are intended for headerless ROMs? No good, throw them all out!


</sarcasm>

Maybe you think this would make it impossible to play disk images with CRCs in old, deprecated emulators that are not being updated anymore. And you would be right. Those emulators would simply use the same CRC-less images that already exist, just like they treat NES 2.0 headers as if they were the older iNES headers. But newer emulators exist, and they improve on what came before.

Truthfully, I don't think this proposal is fully ideal, either. I am a fan of higan's cartridge folders, which use manifests to describe memory maps, and for the Famicom, split ROMs into separate PRG-ROM and CHR-ROM images. The problem is that higan does not yet emulate any add-on devices such as the Mega-CD which uses disk images, so it does not yet have a way to store disk images and/or their contents yet.

If I really got to have my way, I would scrap the CRC-less version entirely, just as higan scrapped copier headers from Super Famicom ROMs (an included companion app is needed to strip them out instead). And maybe include the gaps between data blocks as well, but I don't know how well that would work considering Mitsumi's own .qd format does not include them, and I'm not sure if multiple production runs of a single game keep consistency on how many 0 bits are in each gap, which is necessary in order for patches to work.

But the world is not so simple. The current format as it is now is defective. Just as so many Super Famicom ROMs have copier headers, so too does every .fds image lack CRCs, and so far, higan is the only emulator ever to even bother taking a stand against the former. That's why I bothered with this heuristics proposal in the first place. The devs behind ZSNES, Snes9x, and SD2SNES wouldn't do so, so why should I expect any Famicom emulator devs to do the same with CRC-less .fds images? This proposal is a compromise.
bsnes-mcfly: the bsnes v073 and bsnes-classic killer (GitLab repository)
User avatar
loopy
Posts: 405
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Re: .fds format: Can checksums be heuristically detected?

Post by loopy »

No.
No No No.
I would argue that that feature really is important, but this is based on speculation
I need more than your speculation for this to be really important.
I don't know this for certain, but I suspect that those very same loading routines can also ignore checksum errors reported by the RAM adapter, meaning that any number of files can deliberately include invalid checksums to trick pirates. It may even be possible that a disk loading routine will err out if the checksum really is correct.
Show me one game that does this (Hint: there aren't any). I've seen plenty of arguments like this. Foggy memories, claims that some game might/could have done this. Never seen one instance where it was actually an issue.

For emulators: CRC is not useful. The BIOS doesn't look at it, emulators just need to set the "CRC GOOD" status flag. Maybe insert dummy bytes for padding. Emulators will always need to do a little "magic" anyway to deal with GAP data etc.
For games: CRC could -theoretically- be used for copy protection or something. But it isn't. And won't be, ever.
For homebrew: Actually an annoyance, requiring an extra step in the compilation process.
For "preservation" (the No-Intro crowd): CRC is redundant. Games would fail to load without a correct CRC so nothing is being lost by not including it.
Nintendo's .qd format is better because it includes checksums, but not only does it omit the 00 padding between blocks, it also has one flaw not in the .fds format: it can take 65,536 bytes per side instead of the 65,500 that actual disks could hold!
Both numbers are arbitrary. "Game Doctor" disks are about 65 kB. There is no absolute maximum size because it depends on physical properties of the drive (switch positions, head alignment, etc.) 00 padding affects the true game size, a disk with lots of files will need more space because of padding. The point is, that number is meaningless. A good disk format should have no fixed length, IMO.
So I'm wondering if it is possible to expand the .fds format to make checksums optional Blah Blah Blah
Your whole proposal for this Really Important change is because there might, theoretically, be a game out there that needs CRC data. I'm not arguing that it's a bad idea to have it. Just that it's not a big deal, never has been, never will be, and not worth wringing our hands over.

If you want a new file format, Make. A. New. File. Format. Morphing the .fds standard (that needs heuristics to parse, wtf) in a way that current emulators won't be able to read is a shit idea. No offence. Not trying to insult you, I'm just strongly opinionated about this :)

----

My proposal for a new format (that will never get implemented, but one can dream...):

A header contains the number of disks, and sizes of each. No fixed disk size, each disk is as long as it needs to be. Each disk is a raw byte-for-byte copy of what goes on the disk including the CRC and GAP data. I've seen proposals going down to low level MFM encoding but that's a PITA and unnecessary IMO. Pre-gap (before block 1) is left out. I would also standardize the GAP size so it's a canonical rom format that game collectors can maintain a sensible database of (more on that here)
Last edited by loopy on Tue May 02, 2017 2:44 pm, edited 2 times in total.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: .fds format: Can checksums be heuristically detected?

Post by rainwarrior »

hex_usr wrote:Well, I guess we should invalidate every single Super Famicom and Mega Drive emulator ever made.

</sarcasm>
There's no need for the sarcasm. Maybe my words sound aggressive, but I'm merely making a counter proposal, because it seemed like you wanted to be able to emulate "bad disk image" behaviour that's outside of the current .FDS format.

My whole point was, why not be able to emulate all possible bad disk image data, rather than just the CRCs? If you're adding only CRCs it feels like a half measure. Why don't you want the rest?


I wasn't calling SFC a bad format. I was just saying that it's incapable of emulating a bad image header (e.g. putting a LoROM ROM on a HiROM board) without some sort of external metadata. It doesn't matter because there's no use cases. Similarly, the lack of CRC currently in .FDS also doesn't really matter because there's no known use cases.


Edit: didn't see Loopy's post before my reply.
User avatar
loopy
Posts: 405
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Re: .fds format: Can checksums be heuristically detected?

Post by loopy »

loopy wrote:My proposal for a new format (that will never get implemented, but one can dream...):

A header contains the number of disks, and sizes of each. No fixed disk size, each disk is as long as it needs to be. Each disk is a raw byte-for-byte copy of what goes on the disk including the CRC and GAP data. I've seen proposals going down to low level MFM encoding but that's a PITA and unnecessary IMO. Pre-gap (before block 1) is left out. I would also standardize the GAP size so it's a canonical rom format that game collectors can maintain a sensible database of (more on that here)
Thinking about it a bit, this has downsides as well. Particularly for games that save. Enforcing a standard GAP size gets annoying and whole file contents get shifted around with variable length disks. Maybe just stick to .FDS. And bump the size up a bit. And put the CRC in too, what the hell. But call it something else, I don't want to see weird mutant .fds files floating around with marginal support.
Last edited by loopy on Tue May 02, 2017 2:47 pm, edited 1 time in total.
hex_usr
Posts: 92
Joined: Sat May 09, 2015 7:21 pm

Re: .fds format: Can checksums be heuristically detected?

Post by hex_usr »

loopy wrote:Show me one game that does this (Hint: there aren't any).
Do you actually know this for certain? Have you actually, carefully examined every single official FDS disk, licensed and otherwise, and verified without a doubt that no games ever do so?

Absence of evidence is not evidence of absence.

You are right about there being no observed use cases, though. But that means that one of us should attempt to create a test ROM (used loosely because these are not ROMs) that engineers this situation to create a use case. And if it proves to be impossible, then I'll accept your point.
loopy wrote:For emulators: CRC is not useful. The BIOS doesn't look at it, emulators just need to set the "CRC GOOD" status flag. ... Games would fail to load without a correct CRC so nothing is being lost by not including it.
Some disk images already lie to the BIOS about how many files (pairs of 0x03 and 0x04 blocks) are in the disk, and they include their own custom loading routines that differ from what the BIOS uses by hardcoding the true number of files. If the BIOS was the end-all be-all of disk access, then this crude form of copy protection would never have been known about. That's why I think it's possible for disk images with deliberately wrong CRCs to exist.

In order to fully lock out the possibility of including a fake CRC, the Famicom DIsk System would need to disable all access to its registers while executing game code and ensure that a program must jump back into the BIOS in order to enable them. But that didn't happen, did it?
bsnes-mcfly: the bsnes v073 and bsnes-classic killer (GitLab repository)
User avatar
loopy
Posts: 405
Joined: Sun Sep 19, 2004 10:52 pm
Location: UT

Re: .fds format: Can checksums be heuristically detected?

Post by loopy »

hex_usr wrote:Do you actually know this for certain? Have you actually, carefully examined every single official FDS disk, licensed and otherwise, and verified without a doubt that no games ever do so?
Of course not. Can't prove a negative and all that. But you're asking for a fix to a problem that (as of yet) doesn't exist and throwing emulators into turmoil, creating a lot of extra work. For a game that doesn't exist. Yet. Let me know when you find one.

Sure, you could make a test rom to prove your point. That we need to update the file format .. for your test roms.
Post Reply