I'm writing a ROM-editing tool in C and need to extract some 8x8 tiles from CHR-ROM. First, I want to convert 2pp NES tiles to indexed 8bpp images. Second, I want to render them to an OpenGL canvas so the user can edit them. I haven't learned a lick of OpenGL yet but I just managed to wrap my head around the conversion part. This is my code:
for (y=0; y<8; ++y)
{
a = tile[y+0];
b = tile[y+8];
for (x=0; x<8; ++x)
{
output[y][x] = (((b >> (7-x)) & 1) << 1) | ((a >> (7-x)) & 1);
}
}
This is probably even easier for a compiler to optimize.
Edit: I was curious so I stuck both examples into the godbolt compiler explorer. I'm kinda surprised, but GCC unrolls the shift-register approach better? Maybe? The practical difference might be marginal, though.
@rainwarrior
Great answer. It took me a minute to digest it (and I had to rotate your output 90 degrees) but it's a very elegant solution. And your hunch was correct. The shift-register approach is approximately 4% faster. Thanks for answering my first question.
@lidnariq
Yikes. I actually came across that standford page while looking for a solution. There's some good stuff in there but it's an overwhelming amount of information! This chr2pgm is a good reference, though.