It is currently Mon Jul 24, 2017 9:51 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 26 posts ]  Go to page Previous  1, 2
Author Message
 Post subject:
PostPosted: Fri Aug 19, 2011 1:38 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18670
Location: NE Indiana, USA (NTSC)
byuu wrote:
> To make rotation as easy as rotation, order the bits as

0 1 2
7 x 3
6 5 4

Reordering won't help.

01273654

Where did you get the number 01273654? Is there a reason that you're still scanning left to right, top to bottom, instead of scanning in a circle?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Aug 19, 2011 5:38 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1318
Because real binary streams are not circular? :/

Okay, the raw pattern from diffed pixels:

IHG
FeD
CBA

You suggest re-arranging it like so:

FGH
DeI
CBA

I can re-order the hqTable like this, no problem.

But what have we accomplished? We need a CC-rotation to give us:

HIA
GeB
FDC

And how do we get this value? pattern << 2 | pattern >> 6 won't do it. Neither will pattern << 6 | pattern >> 2.

Ultimately, we always have to look into an 8-bit array, so we MUST pack the result down into a bitstream to do it:

FGHDICBA
->
HIAGBFDC

With your method, we'd have to transform the values one at a time:

0->5, 1->3, 2->0, 4->1, 7->2, 6->4, 5->7, 3->6
01247653 needs to become 53012476

Which looks like our nice (n<<6)|(n>>2), but we have to move individual bits. So we really need:
((n&0x01)<<5) | ((n&0x02)<<2) | ((n&0x04)>>2) | ((n&0x10)>>3) | ((n&0x80)>>5) | ((n&0x40)>>2) | ((n&0x20)<<2) | ((n&0x08)<<3)

Which is ... no better than what we are doing now.

---

But anyway, the code's posted. If you use the latest bsnes, you can compile the snesfilter HQ2x file separately using pure C++98 code. If you can get it to work and eliminate the rotation table, I'll pay you $20 =)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 20, 2011 12:36 am 
Offline
User avatar

Joined: Sat Jun 27, 2009 11:05 pm
Posts: 701
Location: New Mexico, USA
Dang, this is awesome. I just finished implementing a new version of my HQ2X Verilog filter. This version includes just the optimized rotation symmetry enhancement (i.e. it does not include any of the other optimizations that are shown in byuu's bsnes hq2x filter source code). The rotation symmetry upgrade all by itself resulted in a 34% overall reduction in FPGA resources. Rockin!! Thanks byuu!

More updates to come when I implement more of the bsnes optimizations.

Pz!

Jonathon :)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 20, 2011 1:56 am 
Offline
User avatar

Joined: Sat Jun 27, 2009 11:05 pm
Posts: 701
Location: New Mexico, USA
Hey byuu,

Would you mind explaining your grow/pack functions and how/why they work? And how they're better than MaxSt's.

I could just go ahead and implement them blindly in Verilog and they would work fine, but I always want to understand what I'm implementing otherwise I don't learn anything.

Thanks!

Jonathon


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 20, 2011 8:48 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1318
Code:
uint16_t blend2(uint32_t A, uint32_t B, uint32_t C) {
  grow(A); grow(B); grow(C);
  return pack((A * 2 + B + C) >> 2);
}


Code:
#define Interp02(c1, c2, c3) \
(((((c1 & Mask_2) *  2 + (c2 & Mask_2)     + (c3 & Mask_2)    ) >> 2) & Mask_2) + \
 ((((c1 & Mask13) *  2 + (c2 & Mask13)     + (c3 & Mask13)    ) >> 2) & Mask13))


Unsure if the equality test on some of those functions will help or not. Certainly will for solid-color screens, but how common/rare is that? Extra test could make it slower in some cases.

Ignoring that ... Lots of masking and repeated multiplications there.

It's masking FF00FF, performing math on that, then masking 00FF00 and doing the same again, and combining the results. Looks to be working on 24-bit input.

Mine splits the channels apart and does the multiplication only once, works on SNES 15-bit input (can do 16-bit too.)

The idea is that n*4 in the worst case can spill over by two extra bits:
%11111*4=%(11)11100, the part in parenthesis have spilled over, which would alias into the next color channel. But if we have some zero values between them, we can shift around and mask. So mine turns:
0rrrrrgggggbbbb into:
000000ggggg00000 0rrrrr00000bbbbb
Then does the math on them, shifts back, and then packs it back together.

I couldn't say which was faster (would guess mine), you'd have to bench-mark it. I just like mine more for readability.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 21, 2011 9:48 pm 
Offline
User avatar

Joined: Sat Jun 27, 2009 11:05 pm
Posts: 701
Location: New Mexico, USA
Okay, thanks a lot.

Did you notice that in your blend() function the case 0 will never fire (since hqTable[] contains no 0 values)? Same goes for cases 7, 8, 9, 10, and 11.

Also, can you go into a little more detail on why you have both diff() and same() functions? Instead of just one or the other.

And one more thing...
byuu wrote:
0rrrrrgggggbbbb into:
000000ggggg00000 0rrrrr00000bbbbb
Maybe this is some weird SNES thing that I don't know about but how does shifting and masking get you 5 'b's when you only have 4 'b's to start with?

Thanks byuu!

Jonathon


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 23, 2011 8:03 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1318
There should have been five b's.

I think I kept the "holes" to match HQ2x rules, but yeah, if our goal is code size, I should get rid of the duplicates, good idea. The table was generated by writing a parser for hq2x.cpp from MaxSt.

diff v same is because one caches part of the decode process when comparing the center pixel against other pixels. Slight speedup.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 05, 2011 1:57 pm 
Offline
User avatar

Joined: Sat Jun 27, 2009 11:05 pm
Posts: 701
Location: New Mexico, USA
Hello all!

I finally integrated my verilog HQ2X pixel scalar into my VeriNES emulator. Now I can finally demo real games running with the scalar enabled rather than single static images (as in my first post). Unfortunately, the codec that I used to record these videos performs some of its own blending and such, but you can certainly still tell the difference between when the scalar is enabled and when it's not. The HQ2X implementation that I finally integrated into my emulator is ~75% smaller than my original unoptimized implementation. The biggest optimizations were byuu's (author of bsnes) symmetry optimization, a huge BRAM reduction, and a couple major parallelization/pipelining optimizations.

Here are some videos (Xvid codec) - I think Solstice is the best demonstration of the scalar. There is really nothing to see here that can't be seen in either bsnes, nestopia, or whatever. This is really just to prove that I accomplished what I originally set out to do.
Super Mario Bros. (HQ2X Demo) (31MB)
Legend of Zelda (HQ2X Demo) (56MB)
Solstice (HQ2X Demo) (38MB)

Major thanks to byuu for telling me about his symmetry optimization.

Pz!

Jonathon :)


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 05, 2011 4:43 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18670
Location: NE Indiana, USA (NTSC)
Good job with hq2x.

I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 05, 2011 5:06 pm 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1841
Location: WhereverIparkIt, USA
This is awesome to see. I'm even more excited to get my hands on an accurate NOAC that implements this!

So do you plan to make it so that the user could just flick a switch (change an input) and turn it on and off seamlessly like you were in the video then?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 05, 2011 5:53 pm 
Offline
User avatar

Joined: Sat Jun 27, 2009 11:05 pm
Posts: 701
Location: New Mexico, USA
tepples wrote:
Good job with hq2x.

Thanks!

tepples wrote:
I noticed a couple unrelated problems in the SMB1 video, in both scaled and unscaled mode. You seem to skip a single column of pixels near the left side: move forward while watching the hills and floor tiles closely. And you appear not to be doing the 33rd fetch and have blank pixels at the far right.

Yeah, I've had those bugs for almost 2 years. LOL. I have literally just been working on everything else and implementing new features (fixing CPU bugs, APU, FIR filters, porting to altera, etc). Once I got the PPU to a point where I could play pretty much every game without any major trouble I moved on to other things. But I really need to get back to fixin my PPU....some day. :)

infiniteneslives wrote:
This is awesome to see.

Thanks!

infiniteneslives wrote:
So do you plan to make it so that the user could just flick a switch (change an input) and turn it on and off seamlessly like you were in the video then?

Yep. It will also be controllable via my Qt GUI interface.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 26 posts ]  Go to page Previous  1, 2

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group