I timed it using NintendulatorDX (checking my options for tile compression in Streemerz). The "jsr Decompress" takes 1,194,448 cycles to decompress that 4KB worth of SMB tileset, that's approximately 40 frames (668 ms) on NTSC NES.tokumaru wrote:I didn't time it. Just run the ROM yourself with FCEUXD's tile viewer open and you can have an idea of how long it takes to decode the first half of SMB's tileset (it's about as long as it takes for them to be displayed, although some of the time is spent setting up the name and attribute tables and other kinds of initialization). FXEUXD seems to lag a bit when it starts though, so I can't properly time it unless I use the NMI to count how many frames are needed until all the tiles are done.
Tile encoder/decoder
Moderator: Moderators
Re:
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
Re: Tile encoder/decoder
Hey tokumaru thanks for providing this. That's some impressive ratios.
sonder
Re:
I think that the codec should support arbitrary numbers of tiles (if it doesn't already, correct me if I'm wrong). In a CHR RAM project stuff is going to be loaded in pieces not entire banks.tokumaru wrote:Well, I don't know how other people usually handle their graphics, but I usually don't have a single CHR file. I have separate files for each character, item, font, and so on. Depending on which tiles will be used at the same time I merge the files to create blocks of tiles that I can compress.
I don't know if implementing a feature like this is worth the trouble. If you have a large CHR file but want to encode it to separate blocks, doesn't that mean that there is a reason you think they shouldn't be packed together, which means it makes sense to have them in separate CHR files to begin with?
About the pointers, I wouldn't make that automatic because I like to have options. You can incbin a compressed block anywhere in your source under a label and you have it's address. However, you might want to handle them differently, you may want to have a bank index along with each pointer (in case you have graphics scattered across different program banks). You might not need a table at all, if you have just a couple of graphic blocks and the code to read from the small table would be larger than using immediate values.
I don't like to work with tools that are too specific and limit what you can do with them. As it is now I don't think it's hard at all to include the files by hand and build your own table of pointers. Like I said before (and you agreed! ) I don't care too much for user friendliness, and this qualifies as it IMO. When I'm programming I just want the tools to be usable, the main project is the game, and that's where I'll put my efforts in order to make it a polished product.
sonder
Re: Tile encoder/decoder
Then make "separate files for each character, item, font, and so on" like tokumaru recommended, and load multiple files into CHR RAM before the level starts.
Re: Tile encoder/decoder
Yeah that's how I would do it - I guess I was just confused, for some reason I got the idea that the compressor expected 256 tiles exactly.tepples wrote:Then make "separate files for each character, item, font, and so on" like tokumaru recommended, and load multiple files into CHR RAM before the level starts.
sonder
Re: Tile encoder/decoder
My test of tokumaru compression on tiles archives from battletoads:
table for unpacked, original packed, and tokumaru packed.
table for unpacked, original packed, and tokumaru packed.
- Attachments
-
- tokumaru_packer_test.xls
- (9 KiB) Downloaded 496 times
Re: Tile encoder/decoder
And the same thing in tab-separated values for people who happen not to have Microsoft Office installed:
- Attachments
-
- tokumaru_packer_test.txt
- (1.85 KiB) Downloaded 466 times
Re: Tile encoder/decoder
I've used spread32 , it's 1Mb size.tepples wrote:And the same thing in tab-separated values for people who happen not to have Microsoft Office installed:
Re: Tile encoder/decoder
I rewrote a compressor / decompressor for this scheme from scratch.
Incidentally, it also compresses marginally better than Tokumaru's tool (due to better algorithm in finding block splitting points). The compression format is the same though.
It can be downloaded (C++ source) at: http://iki.fi/bisqwit/src/tokumaru-source.zip
Precompiled version for x86_64 Windows: http://iki.fi/bisqwit/src/tokumaru-w64.exe
Precompiled version for i386 Windows: http://iki.fi/bisqwit/src/tokumaru-w32.exe
It can be downloaded at: http://bisqwit.iki.fi/source/tokumaru.html
Incidentally, it also compresses marginally better than Tokumaru's tool (due to better algorithm in finding block splitting points). The compression format is the same though.
Precompiled version for x86_64 Windows: http://iki.fi/bisqwit/src/tokumaru-w64.exe
Precompiled version for i386 Windows: http://iki.fi/bisqwit/src/tokumaru-w32.exe
It can be downloaded at: http://bisqwit.iki.fi/source/tokumaru.html
Last edited by Bisqwit on Sun May 01, 2016 4:24 pm, edited 2 times in total.
Re: Tile encoder/decoder
The C++ source link is 404 Not Found, and it is not listed on the page one level up. But through "access the directory list directly" I found it: tokumaru. with no extension.
What sort of speedup does use of threads typically give? Because in a large build, it might be wiser to default to one thread and use job-level parallelism, where make -j3 calls multiple tokumaru instances at once (along with multiple converters for other things).
And things like "%u tiles (%u bytes) decompressed into %u bytes\n" typically go to std::stderr, not std:stdout, so that they don't mix with the output even if quiet is disabled.
The origin of this software must not be misrepresentedzlib license
What sort of speedup does use of threads typically give? Because in a large build, it might be wiser to default to one thread and use job-level parallelism, where make -j3 calls multiple tokumaru instances at once (along with multiple converters for other things).
And things like "%u tiles (%u bytes) decompressed into %u bytes\n" typically go to std::stderr, not std:stdout, so that they don't mix with the output even if quiet is disabled.
Re: Tile encoder/decoder
The source link has been fixed. I forgot to actually click submit to my edit to the post. Sorry about that.
Threads give a significant speedup when running at the highest optimization levels. The process is highly parallelizable. Threads can be explicitly disabled with a commandline option.
As for messages, I have typically a guideline where errors go into stderr and everything else goes into stdout. The exception is if the actual compressed output goes into stdout; then everything else goes into stderr. This version does not support compressing to stdout currently.
Yes, zlib license. Unless there is some other point in your quote? I so dislike contextless quotations. Implications. If you are trying to accuse that I am misrepresenting the author of the original software, my program is written entirely from scratch based on the algorithm documentation I collected yesterday, and even then I am providing a credited link to the code that I read. It is neither a modification nor a port of other software. This is the original software, even though what it seeks to accomplish is the same as Tokumaru's program.
Threads give a significant speedup when running at the highest optimization levels. The process is highly parallelizable. Threads can be explicitly disabled with a commandline option.
As for messages, I have typically a guideline where errors go into stderr and everything else goes into stdout. The exception is if the actual compressed output goes into stdout; then everything else goes into stderr. This version does not support compressing to stdout currently.
Yes, zlib license. Unless there is some other point in your quote? I so dislike contextless quotations. Implications. If you are trying to accuse that I am misrepresenting the author of the original software, my program is written entirely from scratch based on the algorithm documentation I collected yesterday, and even then I am providing a credited link to the code that I read. It is neither a modification nor a port of other software. This is the original software, even though what it seeks to accomplish is the same as Tokumaru's program.
Re: Tile encoder/decoder
How do you arrive at an algorithm that achieves the optimal split point or just optimal ways to apply compression in general? Is it a greedy algorithm? Are multiple arbitrary permutations performed and then compared?
Re: Tile encoder/decoder
IIRC, testing all possible length combinations of 2 consecutive blocks worked really well for me.
Re: Tile encoder/decoder
Just summarizing for the benefit of readers before they click through. I've seen a lot of projects on GitHub, for example, that make it hard to find the license info, with no top-level LICENSE or COPYING file and no license notice at the bottom of README.Bisqwit wrote:Yes, zlib license. Unless there is some other point in your quote? I so dislike contextless quotations.
Re: Tile encoder/decoder
The way I do it here is that I make a directed acyclic graph of all possible encode-lengths at all possible offsets, and then use Dijkstra's algorithm with a priority queue to find the shortest path through that graph. The path length is the number of bits generated for that particular arc.Drag wrote:How do you arrive at an algorithm that achieves the optimal split point or just optimal ways to apply compression in general? Is it a greedy algorithm? Are multiple arbitrary permutations performed and then compared?
This works very well since the splits are independent: When a block is split, no data from the previous block is carried over to the next block. It is also efficient because of the high granularity of block splits (every 16 bytes), and high likelihood of short splits being most efficient.
The compression level is just a cap on the encode-lengths to test. Without a cap, the performance would be O(n^2). With a cap, it becomes O(n*cap). In practice even short caps work really well.
By contrast, this approach would not work well with e.g. deflate, because the number of plausible block lengths is several orders of magnitudes higher.