Page 1 of 2

Working through compression

Posted: Thu Mar 06, 2008 9:18 pm
by Celius
So I have two projects at the moment. I'd like to talk about both of them, because I'm dealing with map compression in both of them.

My first game involves a large RPG map like that of Final Fantasy 1. This map is 256x256 metatiles(2x2) large, so I can't have it without compression. The compression method is very similar to Final Fantasy 1's, where the map has a header that is a list of 256 pointers. They point to the location of the start of each 256 tile row. The reason they have this is because the game uses RLE, and the rows vary in byte size.

My other game involves a very large map (64x60 screens) which uses 8x8 metatiles. The tiles are compressed like this:

2x2 metatile - composed of 4 1x1 tiles
4x4 metatile - composed of 4 2x2 metatiles
8x8 metatile - composed of 4 4x4 metatiles

So each 8x8 metatile has 4 predefined 4x4 metatiles. The map points to screens which are composed of these metatiles.

Almost every game uses compression for their levels, and I don't think it usually takes them 1000 years to work through it. If I need to find a specific metatile ID, for collision/updating and whatnot, I'll have to be able to do a decompression on the fly.

What I don't know is how to do this in an efficient manner. My current decompression method for the RPG map would take an extremely long time to do what I need to have done. I need to look through 15 rows to get 15 metatile IDs for each column that's updated. Do games usually struggle with decompression? How do most games handle it? Both of my games are 4-way scrolling. My RPG map engine can decompress a lot more slowly than the other engine, which is for a platformer. My platformer can decompress based on math, since every byte represents the same number of tiles. But my other engine will require working with every byte and doing lots of counting/looping, which worries me. Any comments/suggestions?

Posted: Thu Mar 06, 2008 9:24 pm
by Dwedit
For RLE, you should instantly look up the start of a row, then quickly use the run totals to skip to the correct run byte representing a column. You can also do sequential columns past that point very easily.

Dragon Warrior 4 actually split the rows in half, to avoid needing to seek to the second half of a row. (funny how Dragon Warrior 2 and 3 didn't need to do that)

Posted: Fri Mar 07, 2008 1:32 am
by Celius
Dwedit wrote:For RLE, you should instantly look up the start of a row, then quickly use the run totals to skip to the correct run byte representing a column.
The problem is that a lot of the data is uncompressed, so it seems I'd have to check every byte and count until I reach where the left edge of the screen is. In my current plan for getting which tiles in a row will be on screen, I start out at the beginning of a row, and I check whether or not the byte indicates compression, or if the byte indicates the end of a row (If the byte is $FF, it indicates the end of a row). If it's just a single tile, I decrease the counter by one and move on to the next byte. If the byte does indicate compression, I subtract the run length from the counter. If the run length is greater than the distance left to the left edge of the screen, I'll know that the run will extend onto the screen. Is this like what you were describing?

And the Dragon Warrior method seems like it could be good, however, you would be having a lot more space used up.

Posted: Fri Mar 07, 2008 2:00 am
by Dwedit
Dragon Warrior 2-4 use this method of RLE compressing the overworld map:

Code: Select all

xxx..... = tile number
...xxxxx = length of the run + 1
If the "tile number" is zero, you're specifying a literal instead of a run. It uses the Length bits to specify a tile number from 32 tiles, instead of the 7 tiles available for runs.
The 32 tiles used for literals do not have to be the same 7 tiles for runs. Dragon Warrior 2 made the first 7 literal tiles (excluding tile #0) the same as the RLE tiles, for a total maximum of 32 different tiles. Dragon Warrior 3 & 4 wised up, and made all the literal tiles separate, for a total of 39 different tiles.
I'm sure that this method of compression only works well if you have a Dragon Warrior 2 style map, with long stretches of up to 7 different terrain types, and the occasional town or bridge. Additionally, DW2-4 use row pointers, so seeking to a specific tile is fast.

You said you're trying for a Final Fantasy-style overworld map... Sounds like you just need a good auto-smoother for the overworld in order to use DW3's compression system.

Posted: Fri Mar 07, 2008 2:39 am
by Celius
Oh, I suppose that would be much easier to work through. If I can find a way to work through mine easily, I can have a total of 64 different tiles, and the runs can be 256 tiles long if I want. I use the two lower bits of a byte to select the palette I assign to the metatile. So this:

Code: Select all

.db $59
Indicates that I want a 2x2 metatile made out of tiles $58-$5B with pallet 1 assigned to it. I'll sacrifice two values. One will indicate a compression, and the other will indicate the end of a row. So this:

Code: Select all

Compression indicator = $35 (Probably a water metatile with a brown palette assigned to it.)

.db $35,$17,$50
Means that tile $50 is repeated $17 times. Final Fantasy 1 only had 32 metatiles on the map. Their compression method compressed 3 or more tiles, while mine compresses 4 or more tiles. I seem to remember FF1's compression algorithm only allowing for 32 different metatiles. I was greatly opposed to the idea of only 32 metatiles being used, so I changed the algorithm a little to fit my liking.

My map will be scrolled 16 pixels at a time, so I have multiple frames that I can do stuff in. My main concern is my platformer. There won't be lots of activity going on in an RPG map, but a platformer engine has to be fast. And decompression of a map seems like it could take way too long. I'll write a routine to find a 2x2 metatile in the 8x8 metatile, and I'll post it up probably tomorrow (Well, today, but after I get up. It's 3:36 AM here), and I'll see if anyone thinks it can be optimized.

Posted: Fri Mar 07, 2008 2:44 am
by Dwedit
You know what people did for platformers? Add 8K of RAM into the cartridge to hold a decompressed map. Worked for M.C.Kids and Super Mario 3.

Posted: Fri Mar 07, 2008 4:01 pm
by Celius
I'm currently making use of the 8K. I have just a little under 5k of RAM left. But that's a lot of RAM. I could really make use of that. I suppose I could decompress whole rows/columns of 8x8 metatiles at once. But still, that seems like it would take a long time. I'll write a sample routine and paste it tomorrow. I should go to bed now.


So I have made a sample routine to find the value of one 2x2 metatile in an 8x8 metatile. The routine is given X and Y coordinates for which the value can be 0-3. Keep in mind the the meta tiles are defined like this:

Code: Select all

.db Tile1, Tile2, Tile3, Tile4
And it will look like this on screen:

Code: Select all

Tile1 Tile2
Tile3 Tile4
So here's my routine. I think I know how it could be optimized slightly:

Code: Select all

;In this routine we fetch a single 2x2 metatile out of a large 8x8 metatile.
;We come in with the 8x8 metatile ID that we want to fetch the value from.
;We also come in with X and Y coordinates for the desired 2x2 metatile.
;These values can range from 0-3. Since there are only 2 bits used for
;X and Y definitions, we will have both X and Y coords in the same byte.
;Like this: 0000YYXX. Of course, YY represents the two bits for the Y coord,
;And likewise for the X coord.

	lda Desired8x8ID		;3 ;Here we get the desired pointer value. Every 4 bytes is a definition
	asl a			;5 ;So if I want to see the definitions for metatile number 9, I need to point
	rol PointerHigh		;10 ;to 8x8Metatiles + 36, which is 9 * 4.
	rol a			;12
	rol PointerHigh		;17
	sta PointerLow		;20
	clc			;22
	adc #<8x8Metatiles		;26 ;<- Cycle count error. The assembler figures that out, so it's immediate, which brings
	sta PointerLow		;29 ;The total down by 2.
	lda PointerHigh		;32
	adc #>8x8Metatiles		;36 ;Same for here. Total is now down by 4
	sta PointerHigh		;39

	lda XYCoords		;42 ;In this part, we need to take these bits: 0000Y0X0
	and #$08			;44 ;And turn them into: 000000YX
	lsr a			;46 ;Here we take the most significant bit of the Y coord
	lsr a			;48 ;Shift it over to bit one
	sta TempVar		;51 ;Save it
	lda XYCoords		;54 ;Take the original value
	and #$02			;56 ;AND out everything but the most significant bit of the X coord
	lsr a			;58 ;Shift it over
	ora TempVar		;63 ;OR it with the Y value
	tay			;65 ;Put it in the Y register (This gives us the 4x4 metatile we want to check)

	lda (PoiniterLow),y		;70 ;We go through the same process we did up on top for the 4x4 metatile
	asl a			;72
	rol PointerHigh2		;77
	rol a			;79
	rol PointerHigh2		;84
	sta PointerLow2		;87
	clc			;89
	adc #<4x4Metatiles		;91
	sta PointerLow2		;94
	lda PointerHigh2		;97
	adc #>4x4Metatiles		;99
	sta PointerHigh2		;102

	lda XYCoords		;105 ;Here we take the LSB of XYCoords (00000Y0X)
	and #$04			;107 ;And turn it into (000000YX)
	lsr a			;109 ;This gives us our desired 2x2 metatile position
	sta TempVar		;112
	lda XYCoords		;115
	and #$01			;117
	ora TempVar		;122
	tay			;124

	lda (PointerLow2),y		;129
	sta Desired2x2		;132	;With cycle count error fixed, total is 128 cycles

 .db $90,$15,$34,$92
 .db $B3,$10,$2A,$2A
 ....			;For 256 more definitions

 .db $25,$46,$A3,$44
 .db $10,$AD,$3D,$73
 ....			;For 256 more definitions
So it takes 128 cycles to find a single tile ID. This is not good. I could reduce the overall speed for updating rows and columns by fetching multiple values at the same time. But I don't want it to take more than a scanline to find one of these tiles. And even having it take a scanline may be bad.

My idea for optimization would be to not have the X and Y values in one byte. I could use four bytes, and I could have each bit by itself in a byte. I COULD really reduce the speed this way. Any other ideas/comments?

Posted: Fri Mar 07, 2008 11:26 pm
by tokumaru
Celius, this is how I plan on reading a single metatile out of my level map, which is compressed somewhat similarly to yours:

Code: Select all

;-- SUBROUTINE --------------------------------------------------
; Reads a single metatile from the level map.
; A: index of the metatile inside the screen;
; X: Y coordinate of the screen;
; Y: X coordinate of the screen;
; A: index of the metatile;

	lda #(>ScreenLarge0) >> 2
	sta LargeBlock0+1
	lda #(>LargeMedium0) >> 2
	sta MediumBlock0+1
	lda #(>MediumSmall0) >> 2
	sta SmallBlock0+1
	lda #(>SmallMeta0) >> 2
	sta Metatile0+1

	;Set addresses of the data to read based on the coordinates of the metatile
	rol LargeBlock0+1
	rol MediumBlock0+1
	rol SmallBlock0+1
	rol Metatile0+1
	rol LargeBlock0+1
	rol MediumBlock0+1
	rol SmallBlock0+1
	rol Metatile0+1

	;Get the index of the screen map
	jsr ReadScreenMap

	;Y must be zero because it's not used to index data
	ldy #$00

	;Read from structure to structure until reaching the metatile
	sta LargeBlock0+0
	lda (LargeBlock0), y
	sta MediumBlock0+0
	lda (MediumBlock0), y
	sta SmallBlock0+0
	lda (SmallBlock0), y
	sta Metatile0+0
	lda (Metatile0), y

Obviously, in order to find the ID (I usually call this "index") of a specific metatile inside the level I must have it's coordinates. They conveniently fit into 24 bits, so I use the three 6502 registers to hold this information:

A: yyyyxxxx

This makes it possible to point to any metatile in the level. Registers X and Y hold the highest bits of each coordinate, so they are directly used to fetch the screen ID. This would be the call to "ReadScreenMap" which just uses a pre-calculated table (which has pointers to each row of the level map) to read a screen from the level map. This happens pretty quick.

After I got the screen ID, I must read from structure to structure until I reach the metatile. To do that as quickly as possible, I set up various pointers to the data I need, and have these pointers modified by the coordinates of the metatile, because they are what in fact dictate what I want to read. These are the structures I have, all the way to the metatile:

Screen = 2x2 large blocks (256x256 pixels)
Large block = 2x2 medium blocks (128x128 pixels)
Medium block = 2x2 small blocks (64x64 pixels)
Small block = 2x2 metatiles (32x32 pixels)
Metatile = 2x2 tiles (16x16 pixels)

Since each of these structures is composed by 4 smaller structures there are 4 possibilities for each pointer. I first initialize the pointers with a base value, without the bits that decide which of the 4 possible areas are going to be read. Then I just shift the coordinate bits out of the accumulator into the appropriate pointers. The lower byte of the pointers is always the ID of the previous structure.

Now that I think of it, it would probably be faster to transfer that index to Y and always keep the lower byte of the pointers as 0. But the idea is the same. I haven't tested this specific piece of code yet, because I didn't get to the point of reading individual metatiles, so I don't even know how long it takes to execute.

I'm also rewriting much of my code now, after the switch to MMC3, and I do see some room for improvement here, so keep in mind that this isn't *exactly* what I'll be using in the final game.

For reading rows and columns I use a completely different piece of code (which does work) that is optimized to read a series of metatiles at once. It needs twice as many pointers, because 2 structures are read from each structure this time, but the main idea of the pointers being modified by the coordinates is still there. It is a big unrolled loop, so it's pretty quick.

I think it is absolutely possible to have your levels decoded in real time if you have it encoded like this. I can say for sure it works well when scrolling. I may not have tested the single metatile code yet, but you don't usually need to read that many single metatiles. You often just need to read about 5 of them for the main character collision, 1 or 2 for each enemy and that's probably it. So this part may not need to be extra fast, I guess.

EDIT: Of course you need to have your data interleaved for this to work. Some of you may have noticed that I reeeaaally like interleaved data, because it's usually easier (and faster) to read. It's not as easy to define though, but that should not be a concern. I believe that data should be stored in the best possible way for the program, not for the programmer. If something is too difficult for humans to understand, you can always write simple convertion apps to do the work for you.

Anyway, by "interleaved" (I'm not sure if this is the best word for this, so I feel like I should explain) I mean that the top left tile of all metatiles are stored sequentially, then come all the top right tiles, then the bottom left tiles, and so on. Grouping similar entities together makes it simpler to index them (for example, you can read all the tiles of a metatile by using the index of the metatile, without having to increment it after each read tile). It surely is possible to manipulate pointers if you have you data arranged in other ways, but it would probably not be as fast, as the bits of interest might then not be placed so conveniently.

Posted: Sat Mar 08, 2008 3:27 am
by Bregalad
If I were to do a large 4-way map handler (either RPG or platformer), I'd definitely decompress the data of 4 "screens" simuntaneously in RAM, so that you can randomly acess each metatile easily. Assuming a "screen" is 32x32 metatiles large, the RAM area needed would be 64x64 bytes, that is 4k. Oh I didn't exept it to be so large before writing this ;-)
Anyway, using that method, whenever the player is scolling to a screen, the game should decompress the 2 nexts (either 2 horizontal or 2 vertical) and store it to the unused RAM area on the fly. For example if the player walks down, then as soon as the metatile 31 is out of the top border screen, the game loads the 2 next horizontal screens. This would allow better compression than RLE tough, as the data is only decoded once, and in a sequential order.

For random-acessing RLE maps, effectively the best way would be to have a screen-based format (where each screen's row is separately stored to ROM) and it shouldn't be hard to make a routine that gets them in a random order by counting literals and runs. This would allow large areas of water to use the same pointer again and again. Unfortunately, the amount of pointers needed is incredibly large.

EDIT : I got my math all wrong above. 1 screen would be 32x30 tiles, not mtattiles, so it would only be 16x16 metatiles (rounded up for simplicity) and then 4 maps in RAM would be 1kb, not 4kb, which is much more reasonnable.

Posted: Sat Mar 08, 2008 11:15 am
by tokumaru
Bregalad, I first considered decompressing 4 screens to RAM, But I figured it could be a slow process (decoding 2 of them, I mean... maybe even 3 when moving diagonally sometimes) and could cause some lag every few seconds when moving really fast.

Plus, when you have a limited number of screens loaded, it's not possible to keep track of anything else in the level besides an area slightly larger than the camera. Not that you'll want to have many active objects far away from the camera, because that'd sure be slow, but at least you have the option, in case a particular object needs it. Celius once told me that he needed a boss to keep moving inside it's (quite large) arena even when away from the camera. If not decompressing on the fly, he'd have to buffer the whole arena.

Random access to any part of the level is a pretty good advantage overall. You can even have the player moving faster than the camera without worrying about him colliding with the wrong metatiles.

And what I'm saying is, althought at first it may look a bit more complicated to read a compressed map directly, once you have the optimized code working it can be done at decent speeds.

Posted: Sat Mar 08, 2008 11:55 am
by Bregalad
Well, in the case of a RPG overworld, you don't bother as much with speed as in a sonic game ;-)

Anyway I don't know if it's better to do slow random acess on each row, or slow decompression on each 16 metatiles. In any case, the game would have to be designed so it doesn't lag (or if it does that's just for a frame). Honnestly, a one-frame lag on each 31 metatiles would hardly be noticeable in a RPG when scrolling at the standard 1 pixel per frame speed. Since the player crosses 16-pixel metatiles at a time, the game engine could easily separate the decompressing process in 16 equal parts, so that it does a little of processing in each frame when the player moves beyond screen boundaries.

For platformers, more speed is certainly needed, and Celius seems to work with really big metatiles (8x8 tiles), there is only 4x4 of them on the screen at once, that's quite big. I use 4x4 tiles metatiles in the game I'm currently making, and I'm fine with them. A decompressed map is 8x6 metatiles (that leaves room for the status bar) and take 64 bytes of RAM, easily workable in a cartridge with no extra RAM. While I only have one of them loaded at the time currently, I could easily made 2 or 4 of them fit in memory. For platformers I guess the best option is then to still buffer the maps in RAM, because random acess to them rocks (especially in the case where you'd want to re-update the metatiles for other resons than scrolling, which is the case in my game). In any RPG, you're SURE that this will be needed, because of the menus, and additionally some details like doors and chests that are open/closed, or some other background alteration will be needed for most RPGs.

Posted: Sat Mar 08, 2008 5:01 pm
by tokumaru
I understand what you are saying, and the options you described are the ones usually used by NES games. I bet almost all games that used WRAM had their active map decompressed there. It sure is great to have quick random access to any metatile, but even WRAM is not large enough for incredibly large maps.

I believe that Celius works with the concept of "rooms", which fit very well to the idea of fully decompressing maps to WRAM, and I really think he should do it like this.

However, for sonic-sized levels, WRAM isn't enough to decompress all the way to the metatile level. Even the MD titles didn't do it. Levels in Sonic 1 are composed by 16x16-metatile blocks (256x256 pixels), and from Sonic 2 on they used 8x8-metatile blocks (128x128 pixels). With blocks that large you can really build enormous levels. The SMS Sonic games had decompressed levels in RAM, but the basic blocks are 32x32 pixels large. Levels weren't really big, because only 4KB of RAM was dedicated to them I think.

The point is that I'm (I'm not sure if Celius is serious about using this approach too) defending the idea that it's possible to do things differently than most games used to. When was the last time you saw a NES game with a level as large as 32768x2048 pixels? I can't even think of a Sonic game that had a level this large. In my engine it is possible, because of this unusual technique.

I agree with you that reading individual metatiles is not the fastest task in the game, but i don't think it will cause problems either, as the process isn't THAT slow. There are a few other purposes to reading individual metatiles other than testing for collisions... such as redrawing the background where there was a background object. Such objects aren't very large though.

Posted: Sat Mar 08, 2008 7:01 pm
by Celius
Yeah, in my current RPG map scrolling engine, it took longer than a frame to decode a column of metatiles. This is really really bad. It took me about 16 scanlines to decode a row with the X position as $FF. Now keep in mind that speed really varies depending on the byte size of the row. I can't really think of too many shortcuts that will allow me to not have to look at every single byte in a row. I COULD split the rows in half like Dwedit suggested. That could greatly increase the size of the map though.

I think I'll try to rewrite my routine, and try and make it faster, but I don't know if I really can make it that much faster. There is always the option of doing row/column buffers when the game is scrolling. The fastest traveling vehicle on the world map will be able to go 4 pixels per frame. Final Fantasy games have always worked with 2x2 cells. So when you press up, you move to the 2x2 metatile above the character. So if you're in the fastest vehicle on the world map, it will take 4 frames to scroll from one spot to the next. In that time, I could find the values for the rows/columns surrounding the edges of the screen for where the character is moving to. I could even save lots of time by being able to fetch the values of one of those rows from the screen copy in RAM. For example, if you move one space to the right, the left-most column on screen will be destroyed. And it will also be put back on screen if you choose to move to the left after you've moved to the right. So I could take the column that's being destroyed and hold onto it for one of my updating options. It's an idea I thought of a while ago. So I do have 4 frames where I can do stuff between spaces.

As for my platformer, I think the decompression will be a lot faster, since everything is compressed in blocks. The placement of everything can be calculated. However, the RPG map is different, since you have to know everything about a row to determine where exactly a metatile is.

And it's an interesting idea, Tokumaru, your idea of "interleaved" data. I suppose things could be a lot faster that way. I think I will use the concept of rooms to make my maps. They're a little too big for complete decompression. Look at this map: ... _200_6.jpg

Each small blue square represents a screen. If you can't find one, the red ones are save rooms, which also represent a screen. This is the map of Castlevania Symphony of the Night. My map will be very similar. See how some rooms are made up of about 64 screens, which means that I won't have nearly enough room in RAM to deal with that. Oh, and Tokumaru, you were talking earlier about your blank screens issue. Look how many are blank on that map (Everything that's a white square)!

I don't think there's much buffering I can do for my platformer. My speeds will definitely increase for updating, due to the fact that I can fetch multiple values at once. But yeah, on the fly decompression would be ideal, since I wouldn't have as many limits. I'm trying to make my game as unlimited as I can, too. I'll do some reworking and get back to you guys. Thanks for the replies!

Posted: Sat Mar 08, 2008 7:16 pm
by tepples
tokumaru wrote:The point is that I'm (I'm not sure if Celius is serious about using this approach too) defending the idea that it's possible to do things differently than most games used to. When was the last time you saw a NES game with a level as large as 32768x2048 pixels?
For comparison, Zelda 1's overworld is only 4096x1408px. It used 16x176 pixel metatiles.

Another question that few people seem to stop to consider: How big is a pixel in SI units? For example, if an object is 16 pixels tall, how much distance does that represent?

Posted: Sat Mar 08, 2008 8:01 pm
by Celius
Well, some sort of good news. I've shortened my routine I posted above to fetch a tile in 98 cycles instead of the original 128. That's still not that impressive. Instead of doing all sorts of bit shifts for the X and Y coordinates, I just used a lookup table to determine the result of the mathematical procedures. I'm going to look into applying more lookup tables to the method.