Compression benchmarks

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Compression benchmarks

Post by tepples »

Sik wrote:
tepples wrote:I like to use a unary code, in which each bit of a control word can be 0 for a single literal symbol or 1 for part of previous run, to be fairly efficient on real data and easy to decode to a multiple of 8 bytes. LZSS in the Allegro 4 library and Game Boy Advance and Nintendo DS BIOS uses a variant of this unary code, where each bit of the control word is one way for a literal and the other way for a reference to previous data.
I don't think that qualifies under the definition of unary code you linked, though... (the tokens are always 0 or 1, not a string of bits)
That's because unary coding and run-length encoding are inverses of each other. From an RLE point of view, PB8's repeat coding is a unary code.

0: Run of 1 byte, then change byte
10: Run of 2 bytes, then change byte
110: Run of 3 bytes, then change byte
1110: Run of 4 bytes, then change byte
etc.
Also what you've described applies to pretty much nearly every LZ77-based scheme out there
It models the probability of a back-reference as 50%, with the other 50% for a single literal byte, with no dependence on the previous runs. This is equivalent to runs of back-references and runs of literals each having a geometric distribution to their lengths. Unary codes and other Golomb codes are ideal for geometric distributions, with the Golomb code's M parameter depending on the distribution's p parameter. Optimally, M should be close to -log(2)/log(1 - p), with the unary code corresponding to M = 1 or p = .5. Most schemes branded as "RLE", on the other hand, treat all run lengths up to 32 or 128 or whatever as equally likely.

Another option for coding run lengths is the exp-Golomb code. It replaces the unary code for the quotient with the Elias gamma code, resulting in longer codes for values 1 and 3 (runs of length 2 or 4) for shorter codes for longer runs. Exp-Golomb is ideal for power law (Zipf) distributions.
(LZSS's particularity is the use of a ring buffer instead of the already decoded data, allowing for more room to reuse strings as repeated strings don't go into the buffer).
Repeated strings do go into the ring buffer, at least in Allegro 4 LZSS (which is equivalent to GBA LZSS with different bit ordering). The use of a ring buffer is to allow encoding or decoding without keeping the entire uncompressed stream in RAM at once. The difference between LZSS and the format described in Ziv and Lempel's 1977 paper is that original LZ77 assumed that all back references would be followed by exactly one literal and potentially have zero length, while LZSS allows back-references to follow one another directly.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Compression benchmarks

Post by calima »

Guys, please keep compression theory in a separate topic. This topic's intention was to have data on existing open-source compression algorithms' performance on the NES, not to argue what is the theoretical best.
User avatar
TOUKO
Posts: 306
Joined: Mon Mar 30, 2015 10:14 am
Location: FRANCE

Re: Compression benchmarks

Post by TOUKO »

if you are interested by LZ4 decompressor, you can find the 6502 implementation here :
http://pferrie.host22.com/misc/appleii.htm
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Compression benchmarks

Post by calima »

Added lz5hc results. It's a new variant, performs worse than LZO in both metrics, but it's liberally licensed.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Compression benchmarks

Post by calima »

I've made an asm version of lz4 that will probably soon be a part of cc65. Its speed has been posted on the first page, ~7x faster than the C version.

I also made a version that unpacks to VRAM. Rough speeds for a full 1024 nametable:
- rle takes 0.5 frames
- uncompressed takes 1.3 frames
- lz4 takes 2.8 frames

It's slower, but far from being an issue, as unpacking a nametable happens during a black screen anyway. The compression tends to be better than the Shiru version of RLE.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Compression benchmarks

Post by Oziphantom »

Can you post your "samples" the things you compress to make the tests I have some other methods I would like to compare.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Compression benchmarks

Post by calima »

As mentioned, the base tests were done with the first 768 bytes of the GPL (https://github.com/clbr/nes/blob/master ... /srcdata.h). I can't post the nametable I tested, as it's a part of an upcoming MegaCat game.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Compression benchmarks

Post by Oziphantom »

Code: Select all

Algo         | Ratio  | Speed | License, comments
------------------------------------------------
exomizer mem | 0.5807 | 9.4   | Free to use/modify; needs about 162 bytes of temp RAM, 4 of which need to be ZP
Speed is based upon 143492 cycles for 768 bytes extrapolated to estimated NES speeds. Assumes Not ZP based selfmodifying Next Byte routine, with all variables stored in ZP. Using indirect vector would save RAM but cost cycles, ZP based get next byte would get more speed. Data was packed in reverse with literals.
exomizer 'mem' mode is 446 bytes, the 'sfx' Self extracting program is 696, the RAW decoder used for mem is 189 bytes, with literal support.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Compression benchmarks

Post by calima »

Added, thanks.
Post Reply