It is currently Fri Sep 20, 2019 3:39 pm

All times are UTC - 7 hours



Forum rules





Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Wed Jul 16, 2008 5:02 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1524
EDIT: Got it working, see post below instead.

Okay, trying to understand arithmetic coding, so that I can start learning QM-coding and all that. Having almost no luck.

I was able to write an encoder, but it seems to crash whenever I try scaling the probability table up.

For instance, the below code allows me to compress the Chrono Trigger ROM from 4096kb to 3978kb, and then decompress it successfully.

However, when I set a range > 192, it starts failing. The higher the value, the faster it fails. At 16384, compression ratio is great, getting me down to 3500kb (well, figuratively great for pure entropy coding on a mostly already compressed ROM). But I can only decompress the first four bytes correctly before errors occur.

Can anyone shed any light on what I'm doing wrong? Many thanks in advance ...

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>

typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;
typedef unsigned long long uint64_t;
FILE *fp;
uint8_t *data;

void aricode(const char *infn, const char *outfn) {
  fp = fopen(infn, "rb");
  if(!fp) return;

  fseek(fp, 0, SEEK_END);
  unsigned size = ftell(fp);
  rewind(fp);

  data = new uint8_t[size];
  fread(data, 1, size, fp);
  fclose(fp);

  fp = fopen(outfn, "wb");

  //========================================
  //generate probability tables
  //========================================

  unsigned otable[256];
  memset(&otable, 0, sizeof otable);
  for(unsigned i = 0; i < size; i++) otable[data[i]]++;

  uint16_t ptable_lo[256], ptable_hi[256];
  memset(&ptable_lo, 0, sizeof ptable_lo);
  memset(&ptable_hi, 0, sizeof ptable_hi);

  unsigned scale = 0;
  for(unsigned i = 0; i < 256; i++) {
    //the higher the scalar, the better the compression.
    //optimal value would be 16384, yet this crashes with > 196 ...
    unsigned probability = double(otable[i]) / size * 196.0;
    if(probability == 0) probability = 1;

    unsigned prev = !i ? 0 : i - 1;
    ptable_lo[i] = ptable_hi[prev];
    ptable_hi[i] = ptable_lo[i] + probability;
    scale += probability;
  }
  for(unsigned i = 0; i < 256; i++) {
    ptable_hi[i]--;
    //printf("%3d range = %3d-%3d\n", i, ptable_lo[i], ptable_hi[i]);
  }
  printf("scale = %d\n", scale);

  fputc(size, fp);
  fputc(size >> 8, fp);
  fputc(size >> 16, fp);
  fputc(size >> 24, fp);
  fwrite(ptable_lo, 1, sizeof ptable_lo, fp);
  fwrite(ptable_hi, 1, sizeof ptable_hi, fp);

  //========================================
  //encode
  //========================================

  uint16_t lo = 0x0000;
  uint16_t hi = 0xffff;

  unsigned bitpos = 0, bitval = 0;

  for(unsigned i = 0; i < size; i++) {
    unsigned range   = (hi - lo) + 1;
    unsigned symbol = data[i];
    unsigned problo = ptable_lo[symbol];
    unsigned probhi = ptable_hi[symbol];
    //printf("range = %0.5x, lo = %0.4x, hi = %0.4x, sym = %0.2x, pl = %3d, ph = %3d\n", range, lo, hi, symbol, problo, probhi);

    hi = lo + range * (double(probhi) / scale);
    lo = lo + range * (double(problo) / scale);

    //printf("adjust: range = %0.5x, lo = %0.4x, hi = %0.4x\n", (hi - lo) + 1, lo, hi);

    while(((hi - lo) + 1) < scale) {
      bitval = (bitval << 1) | (lo >> 15);
      if(++bitpos == 8) { fputc(bitval, fp); bitval = bitpos = 0; }
      lo = (lo << 1);
      hi = (hi << 1) + 1;
      //printf("renormalize: range = %0.5x, lo = %0.4x, hi = %0.4x\n", (hi - lo) + 1, lo, hi);
    }
  }

  if(bitpos > 0) fputc(bitval, fp);
  fputc(lo >> 8, fp);
  fputc(lo, fp);

  //========================================
  //finish
  //========================================

  delete[] data;
  fclose(fp);
}

void aridecode(const char *infn, const char *outfn) {
  fp = fopen(infn, "rb");
  if(!fp) return;

  fseek(fp, 0, SEEK_END);
  unsigned size = ftell(fp);
  rewind(fp);

  unsigned decompsize = fgetc(fp);
  decompsize |= fgetc(fp) << 8;
  decompsize |= fgetc(fp) << 16;
  decompsize |= fgetc(fp) << 24;

  //========================================
  //read probability tables
  //========================================

  uint16_t ptable_lo[256], ptable_hi[256];
  memset(&ptable_lo, 0, sizeof ptable_lo);
  memset(&ptable_hi, 0, sizeof ptable_hi);

  fread(ptable_lo, 1, sizeof ptable_lo, fp);
  fread(ptable_hi, 1, sizeof ptable_hi, fp);

  unsigned scale = 0;
  for(unsigned i = 0; i < 256; i++) {
    scale += ptable_hi[i] - ptable_lo[i] + 1;
  }
  printf("scale = %3d\n", scale);

  size -= 4;
  size -= sizeof ptable_lo;
  size -= sizeof ptable_hi;
  data = new uint8_t[size + 64];
  memset(data, 0, size + 64);
  fread(data, 1, size, fp);
  fclose(fp);

  unsigned p = 0;

  fp = fopen(outfn, "wb");

  //========================================
  //decode
  //========================================

  uint16_t lo = 0x0000;
  uint16_t hi = 0xffff;
  uint16_t code = 0;

  code  = data[p++] << 8;
  code |= data[p++];

  unsigned bitval = data[p++], bitpos = 0;

  unsigned outsize = 0;

  while(outsize < decompsize) {
    unsigned range = (hi   - lo) + 1;
    uint16_t pos   = (code - lo) + 1;
    pos = pos * scale / range;

    unsigned symbol;
    for(unsigned i = 0; i < 256; i++) {
      if(pos >= ptable_lo[i] && pos <= ptable_hi[i]) {
        //found symbol
        symbol = i;
        fputc(symbol, fp);
        outsize++;
        break;
      }
    }

    unsigned problo = ptable_lo[symbol];
    unsigned probhi = ptable_hi[symbol];

    hi = lo + range * double(probhi) / scale;
    lo = lo + range * double(problo) / scale;

    while(((hi - lo) + 1) < scale) {
      code = (code << 1) + ((bitval >> 7) & 1);
      bitval <<= 1;
      if(++bitpos >= 8) { bitval = data[p++]; bitpos = 0; }
      lo = (lo << 1);
      hi = (hi << 1) + 1;
    }
  }

  //========================================
  //finish
  //========================================

  delete[] data;
  fclose(fp);
}

int main() {
  aricode("test.bin", "test.ari");
  aridecode("test.ari", "output.bin");

  printf("done\n");
  getch();
  return 0;
}


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 16, 2008 9:44 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1524
Ah, found the problem. My lo variable was sometimes being set to higher than hi after performing renormalization. My fix was to just detect this and keep performing renormalization until hi was greater than or equal to lo, but this somehow seems very suboptimal to me ...

I tried to understand the "underflow" stuff discussed here, but it made absolutely no sense to me ...

In the end, I essentially came up with my own coder from scratch. I really don't even know if it's still technically an arithmetic coder or not. Or whether it's order-0, order-1, or whatever ... how strange.

Anyway, the only thing that bothers me now is that this PDF's implementation is beating mine in compression ratio.

For shell32.dll, 8188kb, mine gets it down to 6195kb, whereas the PDF's version gets 6105kb. For Chrono Trigger (U), 4096kb, mine gets 3687kb, whereas the PDF's version gets 3616kb.

But on the bright side, my version is more than twice as fast, even with the linear symbol search. It should be 4x as fast with a binary symbol search. I have a feeling that if I were to handle this "underflow" case correctly, I could probably equal the PDF version in compression ratio, too.

So then, anyone have any idea what kind of coder I've implemented here, and any suggestions for compression ratio improvement? I'm not interested in speeding it up, as this is just for learning.

New version:

Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <conio.h>

typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;
typedef unsigned long long uint64_t;

static const unsigned scalelimit = 511;

void aricode(const char *infn, const char *outfn) {
  FILE *fp = fopen(infn, "rb");
  if(!fp) return;

  fseek(fp, 0, SEEK_END);
  unsigned size = ftell(fp);
  rewind(fp);

  uint8_t *data = new uint8_t[size];
  fread(data, 1, size, fp);
  fclose(fp);

  fp = fopen(outfn, "wb");

  //========================================
  //generate probability tables
  //========================================

  unsigned prob[256];
  memset(&prob, 0, sizeof prob);
  for(unsigned i = 0; i < size; i++) prob[data[i]]++;
  for(unsigned i = 0; i < 256; i++) if(!prob[i]) prob[i]++;

  unsigned scale = 0, scalar = 1;

  //search for scalar that gets as close to scalelimit as possible,
  //with exceeding it ...
  while(true) {
    unsigned scalesum = 0;
    for(unsigned i = 0; i < 256; i++) {
      unsigned p = double(prob[i]) / size * scalar;
      if(p == 0) p = 1;
      scalesum += p;
    }
    if(scalesum > scalelimit) { scalar--; break; }
    scalar++;
  }

  //now scale probabilities by said scalar ...
  for(unsigned i = 0; i < 256; i++) {
    unsigned p = double(prob[i]) / size * scalar;
    prob[i] = (p == 0) ? 1 : p;
    scale += prob[i];
  }

//this search is much faster, but less optimal
//while(true) {
//  scale = 0;
//  for(unsigned i = 0; i < 256; i++) if(!prob[i]) prob[i] = 1;
//  for(unsigned i = 0; i < 256; i++) scale += prob[i];
//  if(scale < scalelimit) break;
//  for(unsigned i = 0; i < 256; i++) prob[i] >>= 1;
//}

  uint16_t problo[256], probhi[256];
  for(unsigned i = 0; i < 256; i++) {
    problo[i] = i == 0 ? 0 : probhi[i - 1];
    probhi[i] = problo[i] + prob[i];
  }
  for(unsigned i = 0; i < 256; i++) probhi[i]--;

  fwrite(&size, 1, 4, fp);
  fwrite(probhi, 1, sizeof probhi, fp);

  //========================================
  //encode
  //========================================

  uint16_t lo = 0x0000;
  uint16_t hi = 0xffff;

  unsigned bitpos = 0, bitval = 0;

  for(unsigned i = 0; i < size; i++) {
    unsigned range = (hi - lo) + 1;
    uint8_t symbol = data[i];

    hi = lo + range * probhi[symbol] / scale;
    lo = lo + range * problo[symbol] / scale;

    while(lo > hi || (hi - lo) + 1 <= scale) {
      bitval = (bitval << 1) | (lo >> 15);
      if(++bitpos == 8) { fputc(bitval, fp); bitval = bitpos = 0; }
      lo = (lo << 1);
      hi = (hi << 1) + 1;
    }
  }

  for(unsigned i = 0; i < 16; i++) {
    bitval = (bitval << 1) | (lo >> 15);
    if(++bitpos == 8) { fputc(bitval, fp); bitval = bitpos = 0; }
    lo <<= 1;
  }
  if(bitpos > 0) fputc(bitval, fp);

  //========================================
  //finish
  //========================================

  delete[] data;
  fclose(fp);
}

void aridecode(const char *infn, const char *outfn) {
  FILE *fp = fopen(infn, "rb");
  if(!fp) return;

  fseek(fp, 0, SEEK_END);
  unsigned size = ftell(fp);
  rewind(fp);

  unsigned decompsize;
  fread(&decompsize, 1, 4, fp);
  size -= 4;

  //========================================
  //read probability tables
  //========================================

  uint16_t problo[256], probhi[256];
  fread(probhi, 1, sizeof probhi, fp);
  size -= sizeof probhi;
  for(int i = 255; i >= 0; i--) problo[i] = (i == 0) ? 0 : probhi[i - 1] + 1;

  unsigned scale = 0;
  for(unsigned i = 0; i < 256; i++) scale += (probhi[i] - problo[i]) + 1;

  uint8_t *data = new uint8_t[size];
  fread(data, 1, size, fp);
  fclose(fp);

  fp = fopen(outfn, "wb");

  //========================================
  //decode
  //========================================

  uint16_t lo = 0x0000;
  uint16_t hi = 0xffff;
  uint16_t code = (data[0] << 8) + data[1];
  unsigned bitval = data[2], bitpos = 0;
  unsigned p = 3;

  while(decompsize) {
    unsigned range = (hi - lo) + 1;
    uint16_t pos = ((code - lo) + 1) * scale / range;

    uint8_t symbol;
    //O(log n) binary search would be ideal here.
    //use O(n) linear search for simplicity ...
    for(unsigned i = 0; i < 256; i++) {
      if(pos <= probhi[i]) {
        fputc(symbol = i, fp);
        decompsize--;
        break;
      }
    }

    hi = lo + range * probhi[symbol] / scale;
    lo = lo + range * problo[symbol] / scale;

    while(lo > hi || (hi - lo) + 1 <= scale) {
      code = (code << 1) + ((bitval >> (7 - bitpos)) & 1);
      if(++bitpos >= 8) { bitval = data[p++]; bitpos = 0; }
      lo = (lo << 1);
      hi = (hi << 1) + 1;
    }
  }

  //========================================
  //finish
  //========================================

  delete[] data;
  fclose(fp);
}

int main() {
  time_t start, end;

  printf("encoding ... ");
  start = clock();
  aricode("test.bin", "test.ari");
  end = clock();
  printf("%d clocks\n", int(end - start));

  printf("decoding ... ");
  start = clock();
  aridecode("test.ari", "output.bin");
  end = clock();
  printf("%d clocks\n", int(end - start));

  printf("done\n");
  getch();
  return 0;
}


Top
 Profile  
 
PostPosted: Wed Jul 16, 2008 9:45 pm 
Offline

Joined: Thu Jun 22, 2006 11:39 pm
Posts: 205
I didn't read through all of it yet, but based on your description the errors are probably in disagreements in how rounding occurs between your encoder/decoder, or in your renormalization step.

I can see there is at least an error in the renormalization step.
Here's from the encoder:
Code:
    while(((hi - lo) + 1) < scale) {
      bitval = (bitval << 1) | (lo >> 15);
      if(++bitpos == 8) { fputc(bitval, fp); bitval = bitpos = 0; }
      lo = (lo << 1);
      hi = (hi << 1) + 1;
      //printf("renormalize: range = %0.5x, lo = %0.4x, hi = %0.4x\n", (hi - lo) + 1, lo, hi);
    }


Image if (in binary) hi and lo were:
hi = 10000000 00000001
lo = 01111111 11111111

This is an extreme case, but helps make the point well. The current range is much lower than scale, but we can't determine the output bits yet!. You are shifting out the bits assuming that at this point hi and lo agree on the highest bit. This need not be the case.

So your code will need to be changed to have some method to give 'arbitrary precision'. You won't run into this problem with the decoder though, it is just the encoder that has to do this extra lifting.

There are many ways to do this, and I'll describe just one (which may be 'non-standard', Andreas can probably comment on this better than me, but this will work). Instead of your state variables being hi and low, have your state variables be low (which includes an array of the output) and range.

So you'd have pseudo code something like:
Code:
uint16 lo=0 (I assume this can hold values > scale*256 below)
unsigned range = 0x10000  (I assume this is > scale*256 below)
byte output[] (maybe just initially start with an array the size of the uncompressed data + a few bytes to be sure it will fit)
out_index = 0


for each symbol {
    get your probhi and problo values for this symbol

    lo += range * (double(problo) / scale)
    for(i=0;if carry set;i++)   
        output_array[out_index-i]++;

    range = range * (double(probhi - problo + 1) / scale);

    if(range < scale) {
        output_array[out_index++]=(lo>>8);
        lo <<= 8;
        range <<= 8;
    }
}

Then handle the remaining bits in lo (you could just output them all).  Although sometimes you can save a byte or so by only outputting exactly that which is necessary to land you in the span of values of: (lo) <= x < (lo + range).


I think that should work.
Regardless of how you decide to adjust your code, you of course need to make the decoder renormalization work the same to keep rounding the same between the two.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 16, 2008 10:07 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1524
Ah, lame. I'm getting beaten by a static huffman compressor ... ugh.

I see what you mean about hi and lo having a different MSB yet still being within the range of scale. I wonder how my updated version above is working, then ... weird.

I don't follow this part:
for(i=0;if carry set;i++)
output_array[out_index-i]++;

What do you mean by if carry set?

I don't follow how you're working with eight-bits at a time like that, either ... I suppose I'll have to get more time, and write a reference implementation to see what yours is doing, step by step.

Many thanks for the info!


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 16, 2008 10:24 pm 
Offline

Joined: Thu Jun 22, 2006 11:39 pm
Posts: 205
Oh. It looks like we posted at the same time. I hope that is still useful to you.

byuu wrote:
I see what you mean about hi and lo having a different MSB yet still being within the range of scale. I wonder how my updated version above is working, then ... weird.

Your fix works because both the encoder and decoder have exactly the same method of handling that problem. What you essentially did was output all bits until you regain a position that your encoder expects... so that is where you are losing compression, you are outputting bits you don't need to (but it saves you from needing any extra precision).

byuu wrote:
I don't follow this part:
for(i=0;if carry set;i++)
output_array[out_index-i]++;

What do you mean by if carry set?

Like in assembly language. I mean "if carry is set from the lo or output[.] addition". It looks messier in C like code, so I just wrote it that way so the mess wouldn't cover up the simplicity of what is going on.

byuu wrote:
I don't follow how you're working with eight-bits at a time like that, either ... I suppose I'll have to get more time, and write a reference implementation to see what yours is doing, step by step.

It's not that I'm working with eight bits at a time, it's just that I want to shift out 8 bits at a time. So I need at least "scale * 256" so I can always shift out 8 bits and still have the "range bits" not overflow.

Actually, in case there is a circumstance where two renormalizations would be needed, I guess this:
"if(range < scale)"
should really be something like
"while(range < scale)"
but with your current value for scale, I don't think that would ever occur.


This is fun stuff to play with, but you don't need to worry about any of this if you want to help Andreas and I figure out the remaining parts of the SPC7110 compression since the arithmetic portion has been reproduced exactly for quite awhile. Actually, even for mode 1 we now understand how the probabilities are calculated as well. All that we have left is trying to understand how it keeps rearranging its list of pixel values from "least likely" to "most likely".

We're so close, but no one has found this last little pattern yet. The data files I've been posting already abstract out the arithmetic encoding part, so you can jump in immediately if you want (although I completely understand the appeal of working your way up to it first if you wish).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group