Page 1 of 2

Bagel NES CHR compression.

Posted: Fri Aug 12, 2016 9:42 am
by JRoatch
Bagel is a new NES CHR compression codec. It decompresses to 64 byte blocks, saves roughly 43.3% in data size, and is only 3x as slow as a raw memory copy. The block decoder compiles to 137 bytes of ROM. Attached is a zip file containing the encoder, decoder, and a demonstration on the various setups the main block decoder can be used in. The demo also includes a game. :)


Prerelease discussion follows:
---
[url=http://forums.nesdev.com/viewtopic.php?p=161620#p161620]Last time on PBJ[/url], tepples wrote:Can your decoder break the RLEINC2 runs into fixed-size output packets? Say I wanted to decode 128 bytes, send that to VRAM, decode the next 128 bytes, send that to VRAM, etc. That was my original motivation behind PB8: to be able to decode while rendering is on.
So lately I've had a few compression ideas that I might try out (and possibly silently fail at). If I were to try to make another compression scheme that had a fixed the decode window of 128 bytes, and has an optional dual segment setup like pb53, would this fulfill 99% of use case of how pb53 is currently used?

Re: PBJ compression format

Posted: Fri Aug 12, 2016 9:54 am
by tepples
JRoatch wrote:So lately I've had a few compression ideas that I might try out (and possibly silently fail at). If I were to try to make another compression scheme that had a fixed the decode window of 128 bytes, and has an optional dual segment setup like pb53, would this fulfill 99% of use case of how pb53 is currently used?
Probably. But the Stock and Shop sections in RHDE used it for a bunch of 64-byte furniture images. And it might also end up used for a 960-byte nametable (or other multiples of 64 bytes, that is, two nametable rows) followed by rows of attribute data.

Re: Another attempt at CHR compression

Posted: Sun Aug 21, 2016 10:58 pm
by JRoatch
Some updates on my experiments. I'm mostly in a statistical gathering stage where my test data is "all-nrom.chr".

Initial tests found that matching from 64 bytes back performs marginally better then 128 bytes back. I guess that's because it's the size of a metatile (4 tiles). From that I decided that the block and window size will be 64, and as you said, 64 bytes is also the size of an attribute table.

Organizing the data so that all the first planes gets decoded before any of the second planes do, could help with reducing the number of bytes taken up by "duplicate plane" commands. but I'm still trying to find the best way to code that.

Out of the various match distances I tried, 2 bytes back did not perform as well as I hoped (I wanted it for the dithering cases). However matching to an implied plane of all 0x00 far exceeded my expectations. I guess the reasons for so many 0x00 bytes interjected everywhere is because of monochrome tiles and sprite transparency.

I made up this simple scheme as a demonstration of that last finding:

Code: Select all

0xxxxxxx:
    ASL the control byte to get bit patten for the next 8 bytes.
    this will force the last byte to be zero, but statistically that's ok for fonts and such.
    0 = 0x00, 1 = read new byte
1xxxxxxx:
    For each bit in the control byte
    0 = use previous byte, 1 = read new byte

Example input:
00 00 00 00 00 30 30 00  00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff  00 00 00 00 00 00 00 ff
38 6c 6c 6c 6c 6c 38 00  38 00 00 00 00 00 38 00

Example output (with added indenting):
03 30 30
00
80 ff
81 00 ff
c3 38 6c 38 00
41 38 38
Obviously this simple scheme is generally worst then pb53, *but* still happens to compress SMB1 and DABG better (to 5968 and 4746 bytes respectively). So I think this is a good start.

Note to admins: Is it ok to get the last three posts of this thread split into a new thread titled "Another attempt at CHR compression"?

Re: Another attempt at CHR compression

Posted: Mon Aug 22, 2016 7:34 am
by tepples
JRoatch wrote:However matching to an implied plane of all 0x00 far exceeded my expectations. I guess the reasons for so many 0x00 bytes interjected everywhere is because of monochrome tiles and sprite transparency.

I made up this simple scheme as a demonstration of that last finding:

Code: Select all

0xxxxxxx:
    ASL the control byte to get bit patten for the next 8 bytes.
    this will force the last byte to be zero, but statistically that's ok for fonts and such.
    0 = 0x00, 1 = read new byte
1xxxxxxx:
    For each bit in the control byte
    0 = use previous byte, 1 = read new byte

Example input:
00 00 00 00 00 30 30 00  00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff  00 00 00 00 00 00 00 ff
38 6c 6c 6c 6c 6c 38 00  38 00 00 00 00 00 38 00

Example output (with added indenting):
03 30 30
00
80 ff
81 00 ff
c3 38 6c 38 00
41 38 38
Obviously this simple scheme is generally worst then pb53, *but* still happens to compress SMB1 and DABG better (to 5968 and 4746 bytes respectively). So I think this is a good start.
It's similar to common byte encoding used in The Legend of Zelda: Oracle games for Game Boy Color, where each 16-byte packet starts with a common byte along with a 16-bit bitfield of which bytes in that packet match the common byte (and thus don't need to be coded). As for your particular choice of $00 as the fixed common byte, I tried this before in another demo that I didn't manage to get to a full release before I got the contract to make Haunted: Halloween '85 back in March of last year.

How well would the RLE mode do if the definition of a "run" is changed from "use the same byte" to "use the byte that most commonly follows the previous byte"? That'd need a 256-byte table of predictions.
Note to admins: Is it ok to get the last three posts of this thread split into a new thread titled "Another attempt at CHR compression"?
I'm still trying to figure out a workable policy of when to grant a split request going forward. Here, when the OP, poster of split point, and requester are same user, with few other users who posted in a topic (and thus few likely to be subscribed to replies through email), it looked like a strong case.

Re: PBJ revisited: Another attempt at CHR compression

Posted: Tue Aug 23, 2016 11:30 pm
by JRoatch
tepples wrote:How well would the RLE mode do if the definition of a "run" is changed from "use the same byte" to "use the byte that most commonly follows the previous byte"? That'd need a 256-byte table of predictions.
I'm not sure if I understood what you meant by this. So I plotted out some bitmaps that mapped block position, byte value at match distance, and byte value read.

I couldn't make sense of the noise in those bitmaps even though it seemed to exhibit some patterns.
I did see that the first byte of each 8 byte plane likely had a zero byte preceding it, and that the first plane of each tile tended to zero while the second plane tended to match the first plane.

So I tried this scheme out, and while it has obvious shortcomings (namely planes with all 0xff, inverted planes, chr bank duplication, and expansion of uncompressible data), it slightly beats pb53 for 'all-nrom.chr':

Code: Select all

For each 64 byte block:
    block header: Each bit in the control byte specifies the type for following 8 planes.
    8 planes:
        type 0 control byte: 0 = use previous byte (first previous byte is 0x00), 1 = read new byte
        type 1 control byte:
            if odd plane; 0 = 0x00, 1 = read new byte
            else if even plane; 0 = byte from previous plane, 1 = read new byte

Example input:
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff  ff ff ff ff ff 30 70 ff
00 81 00 ff ff 00 81 00  00 00 00 00 00 00 00 00
38 6c 6c 6c 6c 6c 38 00  38 6c 6c 6c 6c 6c 38 00

Example output (with indenting):
19
  00    00
  80 ff    06 30 70
  5a 81 ff ff 81    00
  c3 38 6c 38 00    00

Re: PBJ revisited: Another attempt at CHR compression

Posted: Wed Aug 24, 2016 10:02 pm
by JRoatch
Ok, I have a thing that's at nice compromise between high complexity and small file size. It's just like the previous post but with 2 special codes to signal a inverted+0xff plane and for duplication of the previous 64 byte block.

Code: Select all

First, 1 byte for the block count ranging from 1-128 (or more if you want nametables as well).
    "0 blocks" is a special mode where 64 blocks are interleaved with 64 more blocks.
For each 64 byte block:
    block header: Each bit in the control byte specifies the type for following 8 planes.
        if block header is 0x00 and first plane control byte is 0x00: duplicate previous 64 byte block.
    8 planes:
        type 0 control byte:
            if odd plane; 0 = 0x00, 1 = read new byte
            else if even plane; 0 = byte from previous plane, 1 = read new byte
            if whole control byte is 0xff:
                on odd planes, 0xff 8 times
                on even planes, inverted previous plane
        type 1 control byte: 0 = use previous byte (first previous byte is 0x00), 1 = read new byte
Out of the 193 CHR files I tested, this scheme only lost to 6 of them against pb53 (including "Nuts & Milk" and "Othello"). On average the scheme saves 41.7% on filesize. For comparison tokumaru's compression saves 48.1% and pb53 saves 38.3%.

The next step is for me to implement the NES decoder and python encoder/decoder. For the NES I plan to use only 0x0140~0x017f for the decode buffer. The duplicate block command will be implemented by simply not touching the decode buffer.

Questions:
What should I name this thing?
What do I need to consider if I'm implementing this as a drop in replacement for pb53?
How should I handle input data that's not a multiple of 64 bytes?

Re: PBJ revisited: Another attempt at CHR compression

Posted: Thu Aug 25, 2016 5:09 am
by tepples
JRoatch wrote:For the NES I plan to use only 0x0140~0x017f for the decode buffer. The duplicate block command will be implemented by simply not touching the decode buffer.
[...]
What do I need to consider if I'm implementing this as a drop in replacement for pb53?
An issue related to real-time CHR loading: If your decode buffer is 64 bytes, and you can produce only one output per frame (as implied by the fixed decode buffer address), you won't be able to fill the entire vblank. Something that expects 128 bytes per frame, such as loading screenshots in the Action 53 menu, would become slower.

Re: PBJ revisited: Another attempt at CHR compression

Posted: Thu Aug 25, 2016 7:59 am
by JRoatch
tepples wrote:
JRoatch wrote:For the NES I plan to use only 0x0140~0x017f for the decode buffer. The duplicate block command will be implemented by simply not touching the decode buffer.
[...]
What do I need to consider if I'm implementing this as a drop in replacement for pb53?
An issue related to real-time CHR loading: If your decode buffer is 64 bytes, and you can produce only one output per frame (as implied by the fixed decode buffer address), you won't be able to fill the entire vblank. Something that expects 128 bytes per frame, such as loading screenshots in the Action 53 menu, would become slower.
For precisely this reason is why I'm choosing 0x0140~0x017f and not 0x0100~0x013f. To fill up 128 bytes at a time, decode a 64 byte block, copy that to the lower 64 bytes, then decode another block. While implementing this, maybe I'll find that requiring to set a buffer pointer might be better. Alternatively, and if it's technically feasible, there could be parallel text rendering. Besides that, for blank screen bulk uploads I think there shouldn't be a need to use the whole 128 byte range.

Re: PBJ revisited: Another attempt at CHR compression

Posted: Tue Aug 30, 2016 10:24 am
by JRoatch
After doing some back of the envelope calculations to determine the decoder complexity of the previous scheme, I decided to improve the codec some more. The details of it a bit cumbersome to explain in english. With this new codec I now got 43.3% savings in filesize (almost exactly in the middle of pb53 and tokumaru's compression), but the decompression speed theoretically the same as pb53. For each 64 byte block the minimum size is 1 byte (100% black tiles) and the maximum 65 bytes.

I still don't know what to name this thing. Any good short names based around the themes of squishing and food dehydration?

Re: PBJ revisited: Another attempt at CHR compression

Posted: Fri Sep 02, 2016 7:08 pm
by JRoatch
I am so close to releasing this thing. The tentative name is PBH. I've implemented the block decoder. The code size is 133 bytes, and it uncompresses at around 36 cycles per output byte (+14 for uploading to the PPU). In dabg (discounting the time for PPU uploading), pb53 takes 26 cpb vs this which takes 31 cpb. The slightly large code size is due to making sure it'll behave sanely with random input. As it is currently implemented, it takes only 2 parameters: a stream pointer in zero page, and X for the offset from $0100 which is assumed to have the contents of the previously decoded block.

I still have a few interoperable things to figure out such as:
- Normal bulk mode of X number of 64 byte blocks
- Parallel segment decoding bulk mode for action53
- Streaming from a 128 byte buffer.
- Allow encoding blocks independent of each other and maybe print out a list of block offsets in the file.
- Rework PBJ by gutting the pb8 mode and instead have a signaled exit to allow the stream pointer to be reused by other things like this PBH.
- Encoder options for all of the above
- Documentation
- (secretly replace pb53 in 240p test suite)

Anything else I might need to consider?

Re: PBJ revisited: Another attempt at CHR compression

Posted: Fri Sep 02, 2016 11:22 pm
by Myask
JRoatch wrote:I still don't know what to name this thing. Any good short names based around the themes of squishing and food dehydration?
PBJR continues the naming theme, while being disgusting and squishy.

Re: PBJ revisited: Another attempt at CHR compression

Posted: Sat Sep 03, 2016 10:07 pm
by JRoatch
Myask wrote:
JRoatch wrote:I still don't know what to name this thing. Any good short names based around the themes of squishing and food dehydration?
PBJR continues the naming theme, while being disgusting and squishy.
I don't understand what that means, or how it relates to peanut butter and jelly.

In anycase, I'm thinking it might be better to use some name that's distinct from PBJ altogether. As it is, the PBH ('H' is for honey) file extension visually clashes with PBJ a bit too much (at least PB53 is four characters and has numbers). Maybe something that starts with the letter 'C'.

Re: PBJ revisited: Another attempt at CHR compression

Posted: Sun Sep 04, 2016 10:14 am
by Myask
PBJR Pino Batch J Roatch
Which leads to the secondary thought of Peanut Butter, Jelly, and Roach.

Re: PBJ revisited: Another attempt at CHR compression

Posted: Sun Sep 04, 2016 10:21 am
by tepples
PB53's name came from its use as a packet-oriented alternative to Apple PackBits, which I had used in previous NES games. But that works too.

Re: Bagel NES CHR compression.

Posted: Mon Sep 19, 2016 8:21 am
by JRoatch
Sorry for the wait. The first post has been updated with the new finished codec. For naming I decided to follow a theme of bread foods.

The zip file also contain a more optimized form of my nametable compression, but I think I'll see if I can make up a slightly better one that decompresses to some form of PPU strings before I call the nametable compression good enough.