Some updates on my experiments. I'm mostly in a statistical gathering stage where my test data is "all-nrom.chr".
Initial tests found that matching from 64 bytes back performs marginally better then 128 bytes back. I guess that's because it's the size of a metatile (4 tiles). From that I decided that the block and window size will be 64, and as you said, 64 bytes is also the size of an attribute table.
Organizing the data so that all the first planes gets decoded before any of the second planes do, could help with reducing the number of bytes taken up by "duplicate plane" commands. but I'm still trying to find the best way to code that.
Out of the various match distances I tried, 2 bytes back did not perform as well as I hoped (I wanted it for the dithering cases). However matching to an implied plane of all 0x00 far exceeded my expectations. I guess the reasons for so many 0x00 bytes interjected everywhere is because of monochrome tiles and sprite transparency.
I made up this simple scheme as a demonstration of that last finding:
ASL the control byte to get bit patten for the next 8 bytes.
this will force the last byte to be zero, but statistically that's ok for fonts and such.
0 = 0x00, 1 = read new byte
For each bit in the control byte
0 = use previous byte, 1 = read new byte
00 00 00 00 00 30 30 00 00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 ff
38 6c 6c 6c 6c 6c 38 00 38 00 00 00 00 00 38 00
Example output (with added indenting):
03 30 30
81 00 ff
c3 38 6c 38 00
41 38 38
Obviously this simple scheme is generally worst then pb53, *but* still happens to compress SMB1 and DABG better (to 5968 and 4746 bytes respectively). So I think this is a good start.
Note to admins: Is it ok to get the last three posts of this thread split into a new thread titled "Another attempt at CHR compression"?