Okay, we're back on track.
I'm tired and haven't even looked at the data yet besides openning it to make sure the program spit out reasonable looking data, but the point is we now have prob data. On a glance it looks like it uses the same probability evolution table ... so you guys may have this figured out by the time I wake up.
Thinking about it some more I couldn't figure out how each symbol appeared
to be a bit in the beginning and failed miserably afterwards. The manner in which it failed was interesting though. Sometimes there would be a bit you could choose which was completely "segregated" in the current TOP - BOT range of inputs. Othertimes there literally was none.
These two seemed impossible together unless:
1] the first few data bits were treated differently (like "startup" data)
2] the only "layering" was into the bit depth of each pixel... not inbetween pixels (like the document with the q-coder suggested)
I didn't have enough data to distinquish these, and the first one seemed less likely. So I changed the SNES program to search for prob assuming #2... using input_7030_an1.bin, it made it all the way through!
This is practially statistically "impossible", unless my guess was correct. So there you go, we know enough about the symbols to extract the real info ... the prob values and (sort of) the mps values.
The way to get around this bizarre depth issue was kind of cludgey in the code, so what I have saved instead is the two bit values at the top of each span. Unfortunately this isn't enough to let you figure out how all four sections are mapped to the 2 bits. I can change this if necessary, but for now figuring out the contexts and how they interleave seems more important.
Here's the data:
(the .other. to represent the 'other (unintended) path' described below in my edit)
(EDIT: Warning! I just realized there was an error in my code. The first bit of every pixel it always follows the opposite path it should to match the input data. The data is still good though, as it still represents a valid path (and all the data in the file should be correct for that path), but it means the second bit will always end up taking the lps path. I hope this doesn't restrict your ability to learn from the data too much. I'll do a new run after some sleep.)
The format is different from the .bin output files.
For each symbol I now have two bytes:
first byte is prob value
bit 7 - first bit of 2bit color at top of current input range
bit 6 - second bit of 2bit color at top of current input range
bit 4 - 1 if lps path was taken
bit 3-0 - number of renomalizations before moving onto next symbol
While that long data run went on, I looked through the "first byte" mode 2 data.
snes format has the first pixel of 4bit color stored as such
bit 0 = first bit of output byte 0
bit 1 = first bit of output byte 1
bit 2 = first bit of output byte 16
bit 3 = first bit of output byte 17
Sure enough, looking at the data (only list changes)
Code: Select all
input byte, [output bits]
Looking at the 00-13 range, I was even able to pick out where the next pixel steps through in the same way.
So we now know what mode 0, mode 1, mode 2 mean, and what they were intended to compress. I bet the probability evolution table is the same. All we need is to figure out precisely how the symbols are used to construct the pixel value, and how the predictions for each symbol are made.
EDIT: Oh who am I kidding, I can't sleep.
Here's the data from the fixed program:
I have no idea how the contexts are chosen, for they are shared within a pixel and even between pixels. Anyone have any ideas?