It is currently Wed Dec 12, 2018 12:57 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Thu Sep 07, 2006 5:20 am 
Offline
User avatar

Joined: Sat Jul 09, 2005 6:03 am
Posts: 85
Sorry if this has been brought up before, but the search function on this board is broken (atleast from here).

Me and friend of mine are waist deep in a small RPG project, and well, I don't see many of our goals getting accomplished without some sort of text compression.

I'm not familiar with any methods of text compression. I imagine maybe some sort of substitution method. However, I understand the limitations of an NES game. So....

What (if any) method would be viable for compressing raw text on the NES?


Top
 Profile  
 
PostPosted: Thu Sep 07, 2006 6:06 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20865
Location: NE Indiana, USA (NTSC)
mbrenaman wrote:
Me and friend of mine are waist deep in a small RPG project, and well, I don't see many of our goals getting accomplished without some sort of text compression.

I'm not familiar with any methods of text compression.

Go look up LZ77 and LZ78 on Wikipedia.

Quote:
I imagine maybe some sort of substitution method. However, I understand the limitations of an NES game. So....

What (if any) method would be viable for compressing raw text on the NES?

A lot of games use "dual tile encoding", where common pairs of characters are represented as a single byte. This is the special case of LZ78 where the dictionary is precomputed and limited to strings of length 2. Some games use word-wise Huffman encoding, where each word is given a number and the compressed text consists of a stream of numbers.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Sep 07, 2006 10:27 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7604
Location: Chexbres, VD, Switzerland
The only compression I've done until now is run-lenght compression (wich will never work on text, unless you have a lot of "heeeeeeeey", "haaaaaaaaaa", etc..., but that would be stupid), and DTE, dual tile encoding wich is especially efficitent for text, because the same group of two or more letters come QUITE regulary. Also, all words begining and finishing with the same letter can also be DTE encoded with the space that go before/after that letter.
Finally, DTE is VERY easy to implement, unlike other compression shemes. In the programmer's viewpoint, it is almost as if no compression was used.
The only difference is that when you read the input buffer, if the value is on a certain range instead of copy it to the output buffer just read the value of two separate tables and send them to the output buffer.
The only con is that no assembler support DTE encoding with their ASC tables.
Good luck for your project.

_________________
Life is complex: it has both real and imaginary components.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 08, 2006 5:33 am 
Offline
User avatar

Joined: Sat Jul 09, 2005 6:03 am
Posts: 85
Thank you for the replies.

DTE seems simple enough but; I'd have to setup a directory of text snippets saved in individual text files, write a program (C probably) to go through all of them, and .incbin the results, plus the 2 char table used to decode the compressed text.

Quite a pain but maybe well worth the results.

Once again, thanks for the info. I have something to work from now.

Thanks.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 08, 2006 8:53 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7604
Location: Chexbres, VD, Switzerland
mbrenaman wrote:
I'd have to setup a directory of text snippets saved in individual text files, write a program (C probably) to go through all of them, and .incbin the results, plus the 2 char table used to decode the compressed text.

This is the real downside, and the same apply to all other compression shemes. I have no idea how to code a C programm to convert/compress data right now (but I probably will someday), and that is somewhat limitating in developement methods.

_________________
Life is complex: it has both real and imaginary components.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 08, 2006 11:44 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20865
Location: NE Indiana, USA (NTSC)
Bregalad wrote:
I have no idea how to code a C programm to convert/compress data right now (but I probably will someday)

You could make an NES program that compresses ROM to SRAM ;-)


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 08, 2006 11:52 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3596
Location: Indianapolis
tepples wrote:
Bregalad wrote:
I have no idea how to code a C programm to convert/compress data right now (but I probably will someday)

You could make an NES program that compresses ROM to SRAM ;-)


Ha, yeah that's what I do. That's how my RLE compressor works. Upside is then you've got compression code you could also put in your program for.. something.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 10, 2006 7:19 am 
Offline
User avatar

Joined: Sun Mar 19, 2006 12:37 am
Posts: 223
Location: San ANto, TX
I was bored and i found this site
Code:
http://www.cs.tut.fi/~albert/Dev/pucrunch/


There some 6502 coding going on there having to compression on thus im guessing. Not sure but thought that would help.

_________________
" If I have seen further it is by standing on the shoulders of giants" - Issac Newton, in a letter to Robert Hooke.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Sep 10, 2006 7:55 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20865
Location: NE Indiana, USA (NTSC)
Pucrunch and other LZ77-class compressors are intended for machines with more RAM than ROM, such as the Commodore 64, the Apple II, or the Famicom Disk System.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 12, 2006 1:50 pm 
Offline
User avatar

Joined: Mon Sep 11, 2006 6:48 pm
Posts: 194
Location: Moose Lake, Minnesota
DTE is a good idea. Here's how I would do it:

Assuming that the game text in the ROM is in ASCII format (which I always use, as it allows me to simply .db "text goes here",) none of the top 128 characters are used for normal text, so we can use bit 7 of the text character as a DTE flag (I'm sure 128 DTE table entries should be all you need.)

Since bit 7 is the 6502's Negative flag, any time a byte with bit 7 set is loaded into the accumulator, the Negative flag will be set. This allows us to use a BMI to jump to a DTE decompression routine, saving us a compare instruction, and just output the character normally otherwise.

In the DTE decompression code, assuming that the DTE table is always at the same place in memory (which it should be,) we can use indexed addressing to get the appropriate values, so we will do an ASL A (more on this in a minute,) and a TAX. We do an ASL A to get rid of the now-superfluous bit 7 and multiply the DTE entry number by 2 (the size of a DTE table entry.) Thus we have our index into the DTE table.

So we can now LDA table,X to get our first character, and then output that character (whether directly or by JSRing to a special character-out routine, it doesn't matter,) then INX and LDA table,X again to get our second character, output that, and return to the normal string-printing loop.

So, if I were programming this, here's how it would look:

Code:
textout:
                    ; This routine assumes that a pointer to the string to print is loaded into $00, and destroys the contents of A, X, and Y.
                    ; Also, it assumes that the string is zero-terminated (allowing us to check for the end of the string with a BEQ.)
        LDA ($00),Y ; In some assemblers, the parentheses should be replaced with brackets []
        BMI doDTE
        BEQ done
        JSR charout
increment:
        INY
        BEQ fixup
        JMP textout ; I don't have my 6502 manual handy, so I don't remember whether it would
                    ; be faster to BNE textout (I don't think so.)
done:
        RTS

doDTE:
        ASL A
        TAX
        LDA DTEtable,X
        JSR charout
        INX
        LDA DTEtable,X
        JSR charout
        JMP increment

fixup:
        INC $01
        JMP textout


Of course, I'm not the best 6502 programmer out there, and I'm sure there's something I missed that would make it even faster, but that should give you a general idea of how to do it. The bit 7 thing is of course dependent on your text characters all being in the lower half of the background tile table, so you may have to go with a different method for detecting DTE characters.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 12, 2006 2:28 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
commodorejohn wrote:
We do an ASL A to get rid of the now-superfluous bit 7 and multiply the DTE entry number by 2 (the size of a DTE table entry.) Thus we have our index into the DTE table.

Speed-up tips to save yourself a few cycles:
1. Don't get rid of bit 7, displace the base address of the table 128 bytes back ( LDA DTEtable-128, X ) instead.
2. Make 2 tables, one with the first character and one with the second, so that you don't have to multiply by 2, nor INX for the second char. Just load one character from a table and the other from another table.

Not much, but this kind of speed-up can make a difference when things get bigger and more complicated.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 12, 2006 2:37 pm 
Offline
User avatar

Joined: Mon Sep 11, 2006 6:48 pm
Posts: 194
Location: Moose Lake, Minnesota
Ah, good point.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 12, 2006 3:14 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11011
Location: Rio de Janeiro - Brazil
This DTE encoding seems interesting. Does anyone know how efficient it really is? Well, I know it's compression ratio must be less than 50%, since it encodes at most 2 characters. I don't know... the idea is interesting, but doesn't seem very efficient.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 12, 2006 5:21 pm 
Offline
User avatar

Joined: Mon Sep 11, 2006 6:48 pm
Posts: 194
Location: Moose Lake, Minnesota
I think the main point is that it's fast and easy to implement, even if it's not terribly space-reducing. And there are a number of commonly-used letter pairs in English.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 12, 2006 7:20 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20865
Location: NE Indiana, USA (NTSC)
There are also a number of commonly-used letter triples and quadruples, etc. They're called "words." Has anyone investigated separating the text into words, assigning a code number to each word, and then on decompression looking up the code number in a table to get characters?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group