Here's my suggestion for standardizing header correction:
A file has two ROM sizes.
- "Header ROM size" is the total of PRG ROM size and CHR ROM size from the header. It is valid only if the file is at least 16 bytes larger than this size.
- "File ROM size" the file size, minus 16 bytes, rounded down to an 8 KiB boundary, and rounded further down to the sum of two powers of two. This may produce a wrong result in the case of a PlayChoice ROM whose CHR ROM is 8 KiB or nonexistent, as it will assume the instruction ROM is part of CHR ROM. But it will produce the correct for the most notable game with a non-power-of-two PRG ROM size, as the total size of Action 52 is still 2 MiB.
A ROM's hash is a size followed by the SHA-1 hash value of the concatenated PRG ROM and CHR ROM data. (SHA-1 is used because NesCartDB offers it. I concede it is insecure against constructed collisions since SHAttered, but it is still secure against preimages.) The header correction tool contains a database of hashes extracted from an XML dump of NesCartDB. This list maps each hash to the correct NES 2.0 header. Each entry will thus need 32 bytes: 20 for the hash, and 12 for the header excluding the initial "NES\x1A" (from which the size can be calculated).
The latest dump has 3179
<cartridge> elements; even if I pessimistically assume they're all unique (which they aren't), that's still only 100 KiB.
When attempting to correct a ROM image, the tool calculates its header ROM size and file ROM size. For each distinct valid size, it calculates the hash and compares it to the hashes in the database. If it's found, it returns the header associated with that hash.
If header correction fails to find a hash match, the emulator falls back to the existing iNES or NES 2.0 header. This means homebrew, hacks, obscure games, and the like will still run so long as they already have a valid header. So if you're releasing your own stuff, don't release crap.
If you want, I can make a Python prototype of this algorithm.