HQ2X Algorithm Ported to Verilog

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

HQ2X Algorithm Ported to Verilog

Post by jwdonal »

Hello all!

I have some very exciting news! Well, it's exciting me to at least. :) I have successfully ported the HQ2X pixel scaling algorithm to Verilog HDL. Here are some of my test image results directly out of my emulator (captured from my FPGA emulator using my VGA capture card). I also provide links to zoomed-in/magnified versions of the test images because the actual images are so small.

Test 2 Image (Input - Unmagnified)
Test 2 Image (Input - Magnified)
Test 2 Image (Output - Unmagnified)
Test 2 Image (Output - Magnified)

Test 3 Image (Input - Unmagnified)
Test 3 Image (Input - Magnified)
Test 3 Image (Output - Unmagnified)
Test 3 Image (Output - Magnified)

Test 4 Image (Input - Unmagnified)
Test 4 Image (Input - Magnified)
Test 4 Image (Output - Unmagnified)
Test 4 Image (Output - Magnified)

Test 5 Image (Input - Unmagnified)
Test 5 Image (Input - Magnified)
Test 5 Image (Output - Unmagnified)
Test 5 Image (Output - Magnified)

Test 6 Image (Input - Unmagnified)
Test 6 Image (Input - Magnified)
Test 6 Image (Output - Unmagnified)
Test 6 Image (Output - Magnified)

I will be posting a gameplay video of the scalar operating live on my VeriNES very soon!

UPDATE: Videos have been added in a later post. Enjoy! :)

Pz!

Jonathon
Last edited by jwdonal on Mon Sep 05, 2011 2:03 pm, edited 4 times in total.
Zelex
Posts: 268
Joined: Fri Apr 29, 2011 9:44 pm

Post by Zelex »

Awesome :)
3gengames
Formerly 65024U
Posts: 2284
Joined: Sat Mar 27, 2010 12:57 pm

Post by 3gengames »

I barely know what this is, but it's sweet! Good job.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Post by infiniteneslives »

This is some pretty awesome stuff you've got here. I can only hope to have verilog skills like yourself someday.

So do you have an application of this planned for the NES or your NOAC? Or just playing around with the stuff?
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

It's probably too late, especially considering you've already made it.

But I have a 6KB C++ implementation of HQ2x that relies on a 256-byte condensed pattern table and rotation of the pattern tree for the four corners. blargg helped with much of this, too. Compiles down to maybe 1KB of binary code. Might be more suitable for saving transistors.

Source here if you're interested, consider it public domain.

http://pastebin.com/YXpmqvW5

You'd want to do the YUV conversion manually instead of via table, of course.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

3gengames wrote:I barely know what this is, but it's sweet! Good job.
hq2x is one of the well-known pixel art interpolation algorithms.

Oh and thanks byuu.

jwdonal: That's the oldest trick in the book ;-) You made the "output" images slightly brighter than the input images. And studies show louder is better.
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

Thank you all for the kind words. :)
byuu wrote:It's probably too late, especially considering you've already made it.

But I have a 6KB C++ implementation of HQ2x that relies on a 256-byte condensed pattern table and rotation of the pattern tree for the four corners. blargg helped with much of this, too. Compiles down to maybe 1KB of binary code. Might be more suitable for saving transistors.
It absolutely is more resource efficient for both software and hardware implementations. To be totally honest....I'm slightly bummed by your post because that was going to be the "upgrade" that I implemented today and was going to post about it! (Less the rotation of the pattern tree for the corners - I don't know what that means yet but I'll definitely check it out.) But you and blargg beat me to it! ;)

I will post here again once I complete the upgrade. Thanks byuu!
infiniteneslives wrote:So do you have an application of this planned for the NES or your NOAC? Or just playing around with the stuff?
Yes, I will be integrating it into my VeriNES and will post some videos on Youtube. But I want to make some optimizations to the code first (e.g. what was stated above).


tepples: ;)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

To be totally honest....I'm slightly bummed by your post because that was going to be the "upgrade" that I implemented today and was going to post about it!
Oops, sorry about that :(
(Less the rotation of the pattern tree for the corners - I don't know what that means yet but I'll definitely check it out.)
Okay, the original HQ2x code is here:

http://code.google.com/p/hqx/source/bro ... src/hq2x.c

Basic pattern explanations:

Code: Select all

#define PIXEL00_0     *dp = w[5];
#define PIXEL00_10    Interp1(dp, w[5], w[1]);
#define PIXEL00_11    Interp1(dp, w[5], w[4]);
#define PIXEL00_20    Interp2(dp, w[5], w[4], w[2]);
#define PIXEL01_0     *(dp+1) = w[5];
#define PIXEL01_10    Interp1(dp+1, w[5], w[3]);
#define PIXEL01_11    Interp1(dp+1, w[5], w[2]);
#define PIXEL01_20    Interp2(dp+1, w[5], w[2], w[6]);
PIXEL00 = top-left, 01=top-right, 10=bottom-left, 11=bottom-right

Then there is a 256-entry switch on the 8-diff pattern for each pixel of input that produces four unique outputs:

Code: Select all

                case 177:
                    {
                        PIXEL00_20
                        PIXEL01_22
                        PIXEL10_20
                        PIXEL11_12
                        break;
                    }
Case 177 = rule 4: blend2(E, D, B);

In HQ2x official, the pixel grid around the current pixel is used:
123
456
789

So for rule 1, PIXEL00_20 = 542. Blend what is above and to the left only.

(side-note: blend rules only ever work with five adjacent pixels, for the top-left, that's 12468, IIRC.)

Yet PIXEL01_20 = 526. Blend with what is above and to the right only. This leads way to a symmetry. We don't even need to look at PIXEL10_20 to know that it will be 584.

So rather than encode four rule sets per output pixel, we only really need to store the first rule set, that is, PIXEL00.

So we ignore 5, since we don't "diff" (compare to see if two pixels are similar or not) 5 to 5. So our pattern is actually "12346789" in one byte.
123
4X6
789

So after rendering the top-left, we can rotate the pattern matches by two steps: each step moves everything counter-clockwise two times. So let's do this:

369
2X8
147

Notice how 01_20 = 526 in the same spot that 00_20 had 542?

The 256-byte rotate table stores this calculation for us, since there's no easy bit-twiddling way to do it. So our new pattern byte after rotation is 74182963. We can do this again for the bottom right, and one last time for the bottom left pixel.

The PIXEL01/10/11 rules are only different because they're all considering 1-9 patterns to be the same for each corner.

So in our case 177, PIXEL01_22 is used: Interp2(dp+1, w[5], w[3], w[2]);

177=B1=pattern match:
101
1X0
001

Rotated:

101
0X0
110

=10100110=$a5=rule 165. And surprise, there we have: PIXEL01_20

I built my symmetry table off of PIXEL00 only. And as it turns out, there are actually six? cases where the symmetry does not hold up in MaxSt's original tables.

I believe them to be unintentional bugs as a result of MaxSt presumably building this table by hand. I did a blind test on my forum of my rotation HQ2x versus MaxSt's HQ2x table, and everyone favored mine, as it eliminated a couple extra jaggy points. But I admit that is subjective.

The difference was something like 6 pixels per video frame, because the differences were in very uncommon pattern matches.

I do wish I knew how MaxSt generated his initial switch table, or if he really just did it painstakingly by hand. I'd like to think there is some sort of computer algorithm that can build that comparison table, and that we can improve upon that algorithm to consider more than five surrounding pixels, to make a super-HQNx algorithm. But ... I haven't been able to figure it out.

> But you and blargg beat me to it!

Let's see if I remember ...

I came up with the pattern table + rotation, and used a script to build the pattern[] array by parsing the original hq2x.c file.

The 16->32->16 bit-packing for simple blending rules without having to split things out per channel came from Allegro's alpha-blending code, that I remembered from the '90s.

blargg took the original pixel differencing code, and implemented a much faster version using a simplified YUV table and some magic constants. He also came up with a distinction between same/diff that is used in there at some point. This differencing is also not a 100% match to the original algorithm, but is 99.9% the same, and way faster.

I came up with a tiny optimization where the pattern match for 6 can be moved to 4 for the next pixel, but it didn't seem to help performance and required extra code.

blargg had some amazing non-grow/pack variants for the most common blend cases using his clamped-RGB add/sub code, but it turned out to be slower on more modern x86 processors for some crazy reason (I guess multiplies have gotten a lot faster lately.)

blargg also has a lot of optimizations that help on the PowerPC more than on the x86. I kept them out to keep the code simpler.

Hopefully that's everything. I'd be really appreciative if anyone can find a way to reduce my code even more. The algorithm becomes somewhat beautiful with the symmetry, and the smaller it is, the more impressive it is.
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

byuu, that - is - freakin - awesome ! I'm still deciphering your notes and matching them up to your C code but just from skimming it it's really cool! It's amazing that you guys figured all this out. I will post a more in-depth reply once I implement these upgrades myself but I can't thank you enough for sharing this. :-D

If I figure out any more optimizations I will def be happy to tell you what they were (if they relate to something that could be done in software and not just hardware).

UPDATE:
byuu wrote:So our pattern is actually "12346789" in one byte.
Very minor fix, but that should say "98764321". ;)
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

> It's amazing that you guys figured all this out.

I really have no idea how I came up with the pattern rotation. I think I just got really lucky.

I still don't understand blargg's super-optimized YUV diff/same functions, but they're definitely a lot faster than the LGPL-C version from MaxSt.

> Very minor fix, but that should say "98764321".

Sorry, it's all pretty much from memory. May be other small mistakes as well, as I did this quite a long time ago.
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

Okay, so I've finally figured out how (or moreso why) this works. And just for everyone's benefit I'm going to go ahead and re-post what byuu posted with some corrections because the errors threw me off for a while (no offense _whatsoever_ intended toward byuu - I am very thankful for his help. The errors, like he said, are just because he was doing everything from memory.).

Anyway, the errors result from the incorrect assumed pattern of "12346789". There were actually just a couple of errors as a result of that first error, but I figured it would make more sense to re-post without snipping just so people don't have to look back and forth between the two posts. Here goes...

-------------------------------------------------------
So we ignore 5, since we don't "diff" (compare to see if two pixels are similar or not) 5 to 5. So our pattern is actually "98764321" in one byte.
123
4X6
789

So after rendering the top-left, we can rotate the pattern matches by two steps: each step moves everything counter-clockwise two times. So let's do this:

369
2X8
147

Notice how 01_20 = 526 in the same spot that 00_20 had 542?

The 256-byte rotate table stores this calculation for us, since there's no easy bit-twiddling way to do it. So our new pattern byte after rotation is 74182963. We can do this again for the bottom right, and one last time for the bottom left pixel.

The PIXEL01/10/11 rules are only different because they're all considering 1-9 patterns to be the same for each corner.

So in our case 177, PIXEL01_22 is used: Interp2(dp+1, w[5], w[3], w[2]);

177=B1=10110001=[98764321]=pattern match:
100
0X1
101

Rotated counter-clockwise by 2 positions:

011
0X0
101

New pattern=[74182963]=10100110=$A6=rule 166

And surprise, there we have: PIXEL01_12

The significance of this (i.e. the "surprise") is that PIXEL01_12 blends the (now rotated) pixels at locations 5 & 6. Which are the exact same pixels (i.e. pixels 5 & 8) that PIXEL11_12 would have blended had we stuck with rule 177 using the non-rotated pattern.
-------------------------------------------------------

The optimization that byuu created is extremely elegant which consequently makes it difficult to explain really well with just words. I plan on adding some hand-drawn graphs to this post to visually describe this process. Fixing the errors, and then drawing all this out is the only thing that allowed me to finally understand it. :-P I just need to clean up what I've already created and scan it into my machine. I'll put the files on dropbox and post a link.

UPDATE:
Here ya go. Hopefully this doesn't just confuse people more. :( lol
http://dl.dropbox.com/u/36237540/hq2x/h ... xample.pdf

Pz!

Jonathon W. Donaldson
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

> The significance of this (i.e. the "surprise") is that PIXEL01_12 blends the (now rotated) pixels at locations 5 & 6. Which are the exact same pixels (i.e. pixels 5 & 8) that PIXEL11_12 would have blended had we stuck with rule 177 using the non-rotated pattern.

Typo in bold should be PIXEL11_12.

Yeah, sounds like you have it now, you're reminding me how it works again :D
I unfortunately had my forum in purge mode to conserve space, and lost the old post where I explained this when I first figured it out.

Thanks for the corrections.

Always felt some sort of animation showing the rotation in real-time would be kinda neat to explain this.

You can technically forgo the need to rotate if you store a 1024-byte table, but I like the smaller version. Would be really cool if someone could come up with a better 'rotate' bit-twiddling algorithm so that we don't need the second 256-byte table.

Code: Select all

n = ((n >> 2) & 0x11) | ((n << 2) & 0x88)
  | ((n & 0x01) << 5) | ((n & 0x08) << 3)
  | ((n & 0x10) >> 3) | ((n & 0x80) >> 5);
Oh, I also don't think I mentioned it, but I load in all eight surrounding pixels, but since there are only five "points" that HQ2x ever blends against, I rearrange their order when calling the blend functions, which is based around the same rotation, but I saw no reason to shuffle eight 16-bit values.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

To make rotation as easy as rotation, order the bits as

Code: Select all

0 1 2
7 x 3
6 5 4
User avatar
jwdonal
Posts: 719
Joined: Sat Jun 27, 2009 11:05 pm
Location: New Mexico, USA
Contact:

Post by jwdonal »

byuu wrote:Typo in bold should be PIXEL11_12.
Not as I understand the algorithm. 01_12 is correct as I have it. Maybe it's just that I worded it funny or something.

01_12 is blending 5 & 6. Pixel 6 _was_ the pixel that was in location 8 prior to the matrix rotation. And prior to the rotation you would have had to use 11_12 to blend 5 & 8. So, 01_12 (5,6) in rule 166 has taken the place of 11_12 (5,8) in rule 177.

Maybe the new wording will help. I drew this all out on paper and it does work. Maybe I should scan that one too. We may just be thinking about it differently. I'm guessing there is more than one way to look at this optimization - maybe that's what happening.

Pz!

Jonathon
Near
Founder of higan project
Posts: 1553
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near »

> To make rotation as easy as rotation, order the bits as

0 1 2
7 x 3
6 5 4

Reordering won't help.

01273654 ROL 2 = 27365401
273
6x5
401

That is not rotated in any way. We would want:

234
1x5
076

> Not as I understand the algorithm

Okay, I'll just defer to your judgment then, you've studied a lot more than I have lately now =)
Post Reply