Bee52: Crashes depend on CHRRAM data

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

griever
Posts: 39
Joined: Thu Apr 05, 2007 1:34 pm

Post by griever »

Is everything here?
That's the whole Decompress Subroutine, but some little soubroutines (like 'load_New_gfx_byte') are not displayed.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Bump! For a good reason. I figured the compression out. It was a lot of work, I've been working on it for like, 4 days (with breaks for real life things, of course). It isn't so difficult once you get past all the bit-reading and spaghetti branching and try to describe it in plain text.

Here are my notes on the compression format used in tiles in the game "Bee 52" by Codemasters. This game was used during the whole reverse engineering process, but apparently the format is used some of their other games as well. I checked the game "Big Nose Freaks Out" and it appears to contain the exact same decompression logic, it was just assembled to a different location, and the variables are at different locations as well.

[Overview]

I don't really know what to call this compression scheme. From a point it looks like it's dictionary-based, because it defines common pixel sequences that can be reused, but it also offers the possibility to deviate from the pre-defined sequences by using variable-length codes (longer codes for values that deviate more), something typical of entropy encoding. So all I can say for sure is that it's a hybrid of some sort, unlike anything I've seen before.

The compression scheme is based on pixels (2-bit values) and how likely it is for a color to be followed by another. The compressed stream contains information on how to configure a set of tables that will be used during decompression, until a new set of tables is defined, which can happen before the decompression of any tile. The tables have to be built before any tiles are decompressed, and after that they are reconfigured only when necessary (in theory, it's possible to not set up the tables, but I doubt anyone would decompress several blocks of data using the same statistics).

The first byte in the compressed stream is the tile count, which means that a maximum of 256 tiles are allowed per compressed block. The rest of the stream contains only 1-bit and 2-bit values. Following the tile count is a bit that selects between updating the tables (0) or decoding a tile (1).

[Updating Tables]

There are 4 tables, which I'll call A, B, C and D, each one is 4 bytes long (because there are 4 possible pixel values). Table A defines what happens after each of the 4 possible pixel values is used (index 0 tells what happens when color 0 is used, index 1 tells what happens when color 1 is used, and so on). Each pixel, after drawn, is used as an index into this table. The possible values for this table are:

%00: The color will always be followed by itself. The same pixel will be used until the end of the row.
%01: The color is always followed by a single specific color, which will be stored in table B.
%10: The color can be followed by 2 other colors, in tables B or C (depends on bit from the stream).
%11: The color can be followed by all others, and they may be in tables B, C or D (depends on bit(s) from the stream).

Tables B, C and D are used to hold up to 3 colors (depends on the mode selected) that might follow the original one. The colors that follow the original ones are represented differently depending on the original color. The following is a list of the codes of each color depending on the original one:

original %00:

%01 = %1
%10 = %01
%11 = %00

original %01:

%00 = %1
%10 = %01
%11 = %00

original %10:

%00 = %1
%01 = %01
%11 = %00

original %11:

%00 = %1
%01 = %01
%10 = %00

The tables are updated quite frequently, it seems. The initial tileset (for the Codemasters logo) contains 157 tiles, and the tables are modified 31 times during their decoding. So, in average, each set of tables was used for about 5 tiles. I wonder what criteria an encoder might use to pick the best times to reload the tables.

[Decoding Tiles]

The tiles are processed one row at a time. A bit from the stream selects between drawing a new row (0) or using the previous row again. Reusing rows is great for vertical redundancy. Rows are drawn from left to right, and 2 bits are read from the stream in order to draw the first pixel. After this pixel is drawn, it's used as an index into table A. If the code read from the table is %00 (means that the color is always followed by itself), the same pixel will be repeated for the whole row. If it's not %00, a bit from the stream selects between repeating the previous pixel (1) or processing a new one (0). Repeating the previous pixel using a single bit allows for runs of the same color to be represented in about half their size (a single bit per pixel is used instead of 2).

Code %01 means that the color in question is always followed by a single specific color (unless the previously mentioned bit causes it to repeat, which always an option no matter the color), to be fetched from table B (using the old pixel as an index). Code %10 means that the color in question can be followed by other 2 colors. A bit from the stream selects between fetching the new pixel from table B (0) or C (1). Code %11 means that the color in question can be followed by any of the other 3 colors. A 1 from the stream means that table B should be used. A 0 followed by another 0 selects table C, while a 0 followed by a 1 selects table C. The conclusion is that a color that is followed by fewer colors will be encoded with fewer bits.

The decompressor always checks for the end of the row and for the end of the tile, so that it can decide between continuing to process the current one or starting to process the next one.

[Sample Data]

The following is a very simple example of a valid compressed block of tiles. It doesn't use most of the format's features, but serves as way to illustrate a bit better how it works. The following is valid data for drawing 4 plain tiles, one of each color:

$04: block contains 4 tiles;

0: update decompression tables;
00: pixels of color 3 should be repeated for the whole row;
00: pixels of color 2 should be repeated for the whole row;
00: pixels of color 1 should be repeated for the whole row;
00: pixels of color 0 should be repeated for the whole row;

0: new row;
00: first pixel is color 0;
1111111: repeat row 7 times;

1: decode tile;
0: new row;
01: first pixel is color 1;
1111111: repeat row 7 times;

1: decode tile;
0: new row;
10: first pixel is color 2;
1111111: repeat row 7 times;

1: decode tile;
0: new row;
11: first pixel is color 3;
1111111: repeat row 7 times;

Here are the bytes that are formed by the bits above, in case anyone wants to try it out for themselves (you can patch the game replacing original data with this or simply put this somewhere in RAM and have the game decompress it from there):

$04
%00000000 ;$00
%00001111 ;$0F
%11110011 ;$F3
%11111110 ;$FE
%10111111 ;$BF
%11011111 ;$DF
%11110000 ;$F0

[Conclusion]

This compression scheme yields a respectable amount of compression. I can't say what part of it is most effective (in fact, to me it seems that in many cases representing complex rows of pixels with it might cause a great deal of expansion!), but the truth is that I couldn't achieve comparable compression ratios with any other scheme I tried.

I'm not sure if the decompressor is as well coded as it could be. I found it very hard to follow, and by reading my description of the format it seems like it didn't need to be so messy. On the other hand, the messy code might even have been an attempt to protect the format.

Anyway, I decided to study this because I really wanted to know what kind of tile compression would always beat my best one by about 8%. I'm not sure if I'll work on this any further (like coding an encoder), as I wouldn't feel well using something I didn't invent (or at least figured out myself, even if it was invented before). I will probably consider applying some of the ideas to my own format, but that's all.
griever
Posts: 39
Joined: Thu Apr 05, 2007 1:34 pm

Post by griever »

Ah! This compression scheme already drove me to despair! What a great job, Tokumaru. Thanks a lot and I have no idea how did you unscramble this mess, but that's impressive - need to comment out my graph, I guess.
Thanks again!
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

griever wrote:I have no idea how did you unscramble this mess
I'm not used to reverse engineering code but I was so curious that I decided to go through with it.

I basically rewrote the program several times, each time increasing the level of abstraction. The first time was basically comments to the raw assembly program. From there I replaced longer sequences of code with smaller sentences describing what was done. Then I wrote some indented pseudo-code in order to follow it better. Lastly, I made a schematic of what was done for each bit read, and if finally became clear what sequence of bits would do what. During this process, the function of every variable became clear as well.

It was a lot of work, and I can safely say I'm not looking forward to doing anything like this any time soon! =)

I'm really curious about how an optimum encoder for this scheme would work though... but I shouldn't think about this too much, or I'll end up trying when I don't really have the time! O_o
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

As I understand your description, it's more of a Markov-chain algorithm than anything else.

Each color is associated with a mode number in table A, which equals the number of distinct colors that can follow this color on a scanline in this block of tiles. When you compress tiles, you count the number of distinct colors that can appear to the right of this color, disregarding duplicate rows and pixels that are the same. The most common color goes in table B.

For each non-repeated row:
mode = A[color]
  • mode == 0: Repeat this color to end of line.
  • mode == 1: Read a bit for each pixel to determine repeat or B:
    1: color
    0: B[color]
  • mode == 2: Read a bit to determine repeat or B/C, then possibly another bit to determine B or C.
    1: color
    00: B[color]
    01: C[color]
  • mode == 3: Read a bit to determine repeat or B/C/D, then one or two bits to determine which table.
    1: color
    01: B[color]
    000: C[color]
    001: D[color]
But in fact, for mode 3, tables C and D could have been computed from B, where C is the lower-numbered color that isn't i or B, and D is the remaining color. That would save 8 bits per block of tiles (no need to store D) but take slightly longer to decode.

To determine where to place the block boundaries, you could just punt and use a Pucrunch-style algorithm to represent encoding as a search over the graph of all possible bitstreams for the most efficient path from 0 to the decoded string.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Your understanding of the compression is correct.

I can understand how to find the optimum blocks in LZ compression, because the matches have a well defined length (meaning that if you select the longest ones you'll end up with less blocks and better compression), but in Bee 52's compression the blocks can be as long as the whole stream.

To me it's a kind of paradox: In order too see how well a group of tiles compresses, you must have the tables ready. However, in order to build the tables you must gather statistics from the group of tiles that will be compressed with them. See? Since you don't know how long the block is you can't build the optimum tables for said block, and without tables you can't verify how many blocks will compress well with it.

Also, defining a new set of tables has a cost itself, which should be considered when setting a new boundary.

I'm sure there is an answer, because the data is successfully compressed with this scheme, I'm just not seeing it.

EDIT: I guess it would make sense to do something like this:

1. Try to start a new block at every single tile, with all blocks ending at the last tile, storing the cost of encoding each of these blocks;
2. Looking from the last block backwards, select the one with the best compression ratio to be the last block.
3. Repeat the process, with the new last tile being the one before the previously selected block;
4. After all the best blocks have been selected, output them in the correct order.

Did you have something like this in mind, tepples?
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

A*

Post by tepples »

I had in mind something more like the A* algorithm. After each tile, there are two paths: start a new block here or don't start a new block here. Then g(x) is the number of bits from the tiles encoded so far, and h(x) is an underestimate of the remaining data size, such as 1 bit per remaining pixel.

EDIT: I just thought up a new way to encode tables B and C for colors that use mode 2: have the two bits represent the one color that isn't i, B, or C. Now we're down to two bytes for a new block: table A, and the information needed to generate tables B, C, and D. The more tightly the codec packs the tables, the less costly starting a new block can be.

Mode 0 can only be used for a color that appears exclusively at the right side of a tile. Most often, this would be the background color for a large area. But without large areas of repeated rows or mode 0 pixels, I can't see how this codec could pack a CHR smaller than about 60 percent. Breaking the 50 percent barrier would take some work.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: A*

Post by tokumaru »

tepples wrote:Now we're down to two bytes for a new block: table A, and the information needed to generate tables B, C, and D.
Why 2 bytes? I think it will often be less. 1 byte is always needed, because we need to define how many colors can follow each color, so 2 bits x 4 colors = 1 byte. After that, it varies depending on whether colors are followed by any other or not, and if the followers will be represented in 1 or 2 bits.

Like you cleverly observed, in all modes only one color must be defined and the others (if any) can be discovered based on it.

.1 color follows the original one: the color in question is specified;
.2 colors follow the original one: the color that never follows the original one is specified, so the remaining 2 are used (the other doesn't matter, as both will be encoded using the same number of bits, but the encoder and the decoder must agree on an order);
.3 colors follow the original one: the color that will be encoded with fewer bits is specified, and the remaining 2 are used in whatever other encoder and decoder agreed on (it also doesn't matter because both will be encoded using the same number of bits);

It's really great that it's possible to encode the tables in 2 bytes or less. I believe it will very often be less, because there are always only 3 colors to pick from (the original can't be repeated), and one of them can be specified with a single bit.

I'm feeling awfully tempted to make an encoder for this. Seeing how compact the tables can be makes me want to see how much better than the original compression this can be.
griever
Posts: 39
Joined: Thu Apr 05, 2007 1:34 pm

Re: Bee52: Crashes depend on CHRRAM data

Post by griever »

Hello, guys. I've been aside from the scene for some time. But my programming skills have significantly increased.
Could you please update if unpacker/packer for this compression scheme is still not written? Were there any more efficient compression algorithms found for tiles compression?
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Bee52: Crashes depend on CHRRAM data

Post by tepples »

griever wrote:Hello, guys. I've been aside from the scene for some time. But my programming skills have significantly increased.
Could you please update if unpacker/packer for this compression scheme is still not written? Were there any more efficient compression algorithms found for tiles compression?
In general, see "Tile compression" on the wiki. Tokumaru made an (incompatible) modification to this codec that makes the frequency tables smaller, allowing a compressor to better adapt to changing statistics, and Bisqwit made a encoder for this revised format.
griever
Posts: 39
Joined: Thu Apr 05, 2007 1:34 pm

Re: Bee52: Crashes depend on CHRRAM data

Post by griever »

Thank you, Tepples!
I guess, I still have a small chance to get in history and write my codec, as original Codemasters compatible tool does not exist :)
This could help me do a couple of small translations to mother tongue.
griever
Posts: 39
Joined: Thu Apr 05, 2007 1:34 pm

Re: Bee52: Crashes depend on CHRRAM data

Post by griever »

OK, here is my tool for original compression scheme. I have also updated list of Codemasters games, which used this algorithm.
My tool generates optimal chunks and is always better, than original compressor. But, of course, tokumaru's and Bisqwit's tools have higher compression rates, as they are not tied to original packing scheme.
Post Reply