It is currently Tue Dec 12, 2017 9:11 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Thu May 01, 2014 6:17 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3076
Location: Brazil
Target is my emulator. I want to scale a 256x240 bitmap into a bigger one, like 3x, 4x... but my method seems slow and... well... too basic! :P
For my best, an 2x scaler that simply duplicates the current pixel at X+1 and at Y+1 is kinda slow. It's even slower if I think to expand the idea for 3x and 4x scalers.

Any ideas, guys?

The following example is illustrative only, no optimizations.

Code:
 Bitmap->line[ y ][ x ] = putpixel( value ); //current pixel
 Bitmap->line[ y ][ x+1 ] = putpixel( value ); //duplicate it

 Bitmap->line[ y+1 ][ x ] = putpixel( value ); //for next line
 Bitmap->line[ y+1 ][ x+1 ] = putpixel( value ); //duplicate it


Top
 Profile  
 
PostPosted: Thu May 01, 2014 6:49 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5891
Location: Canada
Normally that shouldn't really be a "slow" operation. If the code you provided runs slow, I would primarily suspect the "putpixel" function call. Does it need to be a function call at all? Aren't you just copying data here? Make sure it isn't a virtual function at least, and maybe inline it.

Here's a different expression of the same idea that a compiler should be able to generate fairly efficient code from:
Code:
for (int y=0; y<240; ++y)
{
    uint32* restrict line0 = bitmap->line[(y*2)+0];
    uint32* restrict line1 = bitmap->line[(y*2)+1];
    uint32* restrict src = source->line[y];
    for (int x=256; x; --x)
    {
        const uint32 p = *src;
        line0[0] = p;
        line1[0] = p;
        line0[1] = p;
        line1[1] = p;
        ++src;
        line0 += 2;
        line1 += 2;
    }   
}

You could also unroll the loop a little, but the compiler may already do that for you. Check the assembly produced if you want to be sure. The restrict keyword is a C99 extension, so it may be slightly differently named in your compiler (or require C99 extensions to be manually enabled), but it tells the compiler that the memory regions belonging to these pointers do not overlap, which can assist optimization in this case.

For simple upscaling, though, a lot of programs nowadays would offload this work to a graphics card instead of doing it in software. Maybe you should investigate that route instead?


Top
 Profile  
 
PostPosted: Thu May 01, 2014 6:04 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
All this assumes linear memory here, where 1 byte represents the palette index of 1 pixel (you remember DOS days, so I'm referring to something like 320x200 VGA, segment 0xA000). You'll get the general idea though. :-)

I don't know what putpixel() does behind the scenes, but if you wrote it, the easy solution is to make a routine that's identical to putpixel() but instead of writing 1 byte (8-bits) to draw a pixel, try writing 2 bytes (16-bits) to draw 2 pixels, or 4 bytes (32-bits) to draw 4 pixels, etc...

The premise is to minimise the number of repeated calls to putpixel(). You'll be surprised how much of a speed-up this provides.

I always think of things like this in assembly, so what I'm about to say probably doesn't apply necessarily to things like pure C, but: you'd be limited to 4x scaling on 32-bit archs, 8x scaling on 64-bit archs, etc... But I think extensions like MMX etc. provide things like 128-bit registers.

Anyway you get the gist of it I'm sure: instead of calling a subroutine 2 times to draw 2 pixels, instead call a subroutine that natively draws 2 pixels, or 4 pixels, or 8 pixels, etc...


Top
 Profile  
 
PostPosted: Thu May 01, 2014 6:12 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
rainwarrior wrote:
For simple upscaling, though, a lot of programs nowadays would offload this work to a graphics card instead of doing it in software. Maybe you should investigate that route instead?

The problem with this method, at least with DirectX and friends, is that the "memory" (I think in DX terms it's the memory associated with a surface?) used is often on the graphics card natively, which results in stupid things like interpolation or anti-aliasing or whatever -- the visual results look blurry as fuck. Crazy blind fools think "it looks smoother" but they're crazy blind fools (it looks like shit: period). I cannot STAND emulators which do this. But I also am a serious hater of things like 2xSai and all that other garbage; I want the "pixellated look" to remain that way (linear scaling) if I scale 2x, 4x, 8x, etc...

As I understand it, there are ways around this with DirectX, where you (somehow) tell the thing to use system-based surface memory instead, and for linear scaling you get non-blurry results. I imagine there are probably ways to do this in Direct3D while still using GPU-native memory. In fact, I know there is...

One such example was versions of Nestopia, where to get that "linear pixellation" look, you had to go into Options / Video and change Memory Pool from Video to System. However, the version of Nestopia I use today (the unofficial version that some dudes maintain somewhere, where EmuCR posts SVN/GIT builds of it whenever something changes), allows me to use Options / Video / Memory Pool: Video and still get linear scaling that looks crisp/sharp. I just go to View / Screen Size / 2x and it looks good. It didn't used to be this way though, so someone somewhere improved something.

Attached is an example; speaks for itself. (Sorry, had to use VirtuaNES to get the blurry look (but normally I check Option / Graphics / SystemMemory Surface to ensure this doesn't happen)

The downside, as I understand it, is that using a system memory surface is slower/takes up more time than using native video memory (but the level of impact varies per system's hardware and operating system; if someone's using a PCI video card on a circa-2004 motherboard then the impact is probably going to be quite high compared to, say, a PCIe video card on a circa-2009 motherboard). But as indicated (in Nestopia), there is a way to get linear scaling looking correct when using native video memory. I just don't speak DirectX or Direct3D to know how to do it.

And no I don't test full-screen mode of anything.

And please don't forget about us XP users, by the way. Don't go the "screw it, I'll use DirectX 11 exclusively" route, unless you also plan on implementing a version using DX9, or even GDI.


Attachments:
example.png
example.png [ 95.55 KiB | Viewed 2105 times ]
Top
 Profile  
 
PostPosted: Thu May 01, 2014 6:39 pm 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19334
Location: NE Indiana, USA (NTSC)
Ultimately, the most efficient way to achieve what koitsu asks for involves using a pixel shader to run FBI. This strikes the best balance among preserving the correct 1.143:1 (NTSC) pixel aspect ratio, preserving sharp edges of pixels, and keeping diagonal edges even.

Forgetting about users of Windows 2000 and Windows XP has become far more acceptable over the past month. Even Microsoft has forgotten about Windows XP. Windows Vista, on the other hand, is still under extended support for the next few years, so don't leave DX10 out.


Top
 Profile  
 
PostPosted: Thu May 01, 2014 9:20 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5891
Location: Canada
At least Visual Studio Express 2013 still comes with a compiler that can target XP, even if it isn't the default compiler.


Top
 Profile  
 
PostPosted: Thu May 01, 2014 9:35 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5891
Location: Canada
koitsu wrote:
The problem with this method, at least with DirectX and friends, is that the "memory" (I think in DX terms it's the memory associated with a surface?) used is often on the graphics card natively, which results in stupid things like interpolation or anti-aliasing or whatever -- the visual results look blurry as fuck. Crazy blind fools think "it looks smoother" but they're crazy blind fools (it looks like shit: period). I cannot STAND emulators which do this. But I also am a serious hater of things like 2xSai and all that other garbage; I want the "pixellated look" to remain that way (linear scaling) if I scale 2x, 4x, 8x, etc...

As I understand it, there are ways around this with DirectX, where you (somehow) tell the thing to use system-based surface memory instead, and for linear scaling you get non-blurry results. I imagine there are probably ways to do this in Direct3D while still using GPU-native memory. In fact, I know there is...


This has nothing to do with the memory usage. It has everything to do with the texture sampling settings. Zepper is asking for nearest-neighbour interpolation, which you will get if you tell your graphics API to do so. If you set up your scaling routine correctly, the type of interpolation should be explicit.

FCEUX, for example, right now does not appear to have an option for interpolation method used for hardware scaling, so you get bilinear interpolation unless you select no hardware acceleration (i.e. software scaling). This isn't an issue with the memory location, the program just lacks code to tell the hardware what scaling method to use.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 7:12 am 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19334
Location: NE Indiana, USA (NTSC)
rainwarrior wrote:
This isn't an issue with the memory location, the program just lacks code to tell the hardware what scaling method to use.

Just telling the hardware "nearest neighbor" will produce uneven diagonal lines when you scale 256x240 to 584x480. Look at the hill and ground. Telling the hardware to use area sampling/FBI involves a pixel shader. How these pixel shaders are written differs between old and new Direct3D, and lack of support for new Direct3D in older versions of Windows is the other thing koitsu was complaining about.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 7:38 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5891
Location: Canada
That's not a relevant problem. The OP asked about 2x or 3x scaling. 2x/3x nearest neighbour scaling can be very easily accomplished with hardware acceleration, and without having to write pixel shaders. There's no reason for koitsu to advise against using hardware scaling for this, it's very easy to do correctly.

Anyhow, what I'm getting at is there's multiple very easy and rather robust solutions to the problem asked about. All these issues being brought up are related to solving different (but maybe related) problems. I do agree with the advice not to exclusively target D3D11, though.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 9:53 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3076
Location: Brazil
Code:
 (unsigned int *)buffer[y][x] = pixel | (pixel << 16)


Does it mean something?
In fact, I use pointers. That code is only illustrative. ;)


Top
 Profile  
 
PostPosted: Fri May 02, 2014 10:08 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5891
Location: Canada
If you're using a 16 bit colour buffer, that would horizontally double pixels.

Are you using a 16 bit colour buffer? That's kind of an unusual format these days... I don't think I've used such a thing since Windows 95.


Top
 Profile  
 
PostPosted: Sun May 04, 2014 11:55 am 
Offline

Joined: Wed Apr 16, 2014 2:01 pm
Posts: 5
koitsu wrote:
The problem with this method, at least with DirectX and friends, is that the "memory" (I think in DX terms it's the memory associated with a surface?) used is often on the graphics card natively, which results in stupid things like interpolation or anti-aliasing or whatever -- the visual results look blurry as fuck. Crazy blind fools think "it looks smoother" but they're crazy blind fools (it looks like shit: period).


Obviously, that's a matter of opinion. But these games were made to be played on old crappy CRTs, which by nature blurred adjacent pixels. Some people might feel the pixelated rendering is too digital or artificial. In your comparison shots, the right image looks more like the real thing.


Top
 Profile  
 
PostPosted: Sun May 04, 2014 12:31 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6509
Location: Seattle
Bilinear filtering is NOT an accurate representation of the effects of NTSC (or PAL) demodulation and an electron beam going through an aperture grill.

Horizontally, for NTSC sets, a sinc rescaler is most nearly accurate, because NTSC demodulation serves as a "brick wall" lowpass filter at less than 1.5× the pixel frequency. PAL's bandwidth is higher and close to 1.5× the pixel frequency, so for PAL the 3rd harmonic is mostly present and correspondingly nearest neighbor is most accurate. Using bilinear instead adds too many negatively correlated high-spatial frequency components producing less sharp edges of pixels than exist in reality; it's about as far from NTSC's reality as nearest neighbor is.

Vertically, for any set where interlaced and progressive content looks different (which is not all! some smaller sets have too large of an electron beam as a percentage of the height of the screen), failing to emulate scanlines is also incorrect: using bilinear vertically in this case incorrectly blurs things when there should be a distinct and visible separation.

Finally, on both axes, the electron beam itself is approximately a gaussian blur. This is then quantized to the phosphors. This also has to be done with the input correctly gamma-de-corrected, filtered, and re-corrected.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group