Windowed VSync

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: Windowed VSync

Post by WedNESday »

*Bump*

I have finally got windowed VSYNC working in Windows 7 64-bit, Aero theme.

0% CPU usage. 1% GPU usage. 60 FPS.

I basically use Direct3DCreate9 and use a point list with custom colours to render pixels followed by a call to Present().

If anybody would like the code I'll PM it to them.
FHorse
Posts: 232
Joined: Sat May 08, 2010 9:31 am

Re: Windowed VSync

Post by FHorse »

WedNESday wrote:*Bump*

I have finally got windowed VSYNC working in Windows 7 64-bit, Aero theme.

0% CPU usage. 1% GPU usage. 60 FPS.

I basically use Direct3DCreate9 and use a point list with custom colours to render pixels followed by a call to Present().

If anybody would like the code I'll PM it to them.
I am interested. :)
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: Windowed VSync

Post by WedNESday »

Due to demand for some code I've decided to post some.

Code: Select all

#include <d3d9.h>

#pragma comment (lib, "d3d9.lib")

D3DPRESENT_PARAMETERS D3DPP;

LPDIRECT3D9 LPD3D9;
LPDIRECT3DDEVICE9 LPD3DD9;
LPDIRECT3DVERTEXBUFFER9 LPD3DVB9;

void * ppbData;

struct SBitmap
{
	float x;
	float y;
	float z;
	float rhw;

	COLORREF Colour;
} Bitmap[256 * 240];

void Blt()
{
	LPD3DD9->BeginScene();
	LPD3DVB9->Lock(NULL, NULL, &ppbData, NULL);

	memcpy(ppbData, Bitmap, sizeof(SBitmap));

	LPD3DVB9->Unlock();

	LPD3DD9->SetFVF(D3DFVF_DIFFUSE | D3DFVF_XYZRHW);
	LPD3DD9->SetStreamSource(NULL, LPD3DVB9, NULL, sizeof(SBitmap));
	LPD3DD9->DrawPrimitive(D3DPT_POINTLIST, NULL, 256 * 240);
	LPD3DD9->EndScene();

	LPD3DD9->Present(NULL, NULL, NULL, NULL);
}

void CreateBitmap()
{
	LPD3D9 = Direct3DCreate9(D3D_SDK_VERSION);

	ZeroMemory(&D3DPP, sizeof(D3DPRESENT_PARAMETERS));
	D3DPP.hDeviceWindow = hWnd;
	D3DPP.PresentationInterval = D3DPRESENT_INTERVAL_ONE;
	D3DPP.SwapEffect = D3DSWAPEFFECT_DISCARD;
	D3DPP.Windowed = true;

	LPD3D9->CreateDevice(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, hWnd, D3DCREATE_SOFTWARE_VERTEXPROCESSING, &D3DPP, &LPD3DD9);

	LPD3DD9->CreateVertexBuffer(sizeof(SBitmap), NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL);

	temp2 = 0;

	while (temp2 < 240)
	{
		temp = 0;

		while (temp < 256)
		{
			Bitmap[(temp2 * 256) + temp].Colour = 0x00000000;
			Bitmap[(temp2 * 256) + temp].rhw = 0.0;
			Bitmap[(temp2 * 256) + temp].x = (float)256;
			Bitmap[(temp2 * 256) + temp].y = (float)240;
			Bitmap[(temp2 * 256) + temp].z = 1.0;

			temp++;
		}

		temp2++;
	}
}
The NULLs in Present can be changed to suit your needs. The order of the variables in SBitmap cannot be changed.

Bitmap[0].Colour = 0 means edit the first pixel and Bitmap[61439] = 0 means edit the last. That's all you need to do to update the bitmap.

Edit: Improved code.
Last edited by WedNESday on Thu Apr 11, 2013 9:51 am, edited 1 time in total.
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Re: Windowed VSync

Post by thefox »

I really don't understand why this would get any better results than drawing two textured triangles.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: Windowed VSync

Post by WedNESday »

thefox wrote:I really don't understand why this would get any better results than drawing two textured triangles.
From all the code that I read up on regarding using textures the code was about 10x bigger than what is above. Someone even posted their texture D3D10 code for me and it was incredibly longer.

I have done no speed comparison of the two but the above is so damn fast you really wouldn't need any better.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Windowed VSync

Post by koitsu »

Two questions:

Code: Select all

   LPD3DD9->CreateVertexBuffer(sizeof(SBitmap) * 256 * 240, NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL);
Why not LPD3DD9->CreateVertexBuffer(sizeof(Bitmap), NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL); ?

Code: Select all

         Bitmap[(temp2 * 256) + temp].x = (float)256;
         Bitmap[(temp2 * 256) + temp].y = (float)240;
Why force-casting when members x and y are already float? The compiler will not throw warnings if you removed the force-cast of (float).

Politely: you have an awful, awful tendency to force-cast things excessively, dude. Break this habit ASAP, as it WILL bite you.

I would also get rid of that statically-allocated Bitmap[256*240] nonsense and use malloc() instead. That's multiple megabytes of crap allocated on the heap which you can't guarantee will actually be available (your program will crash mysteriously as a result). You can handle error conditions with malloc().
Zelex
Posts: 268
Joined: Fri Apr 29, 2011 9:44 pm

Re: Windowed VSync

Post by Zelex »

koitsu wrote:Two questions:

Code: Select all

   LPD3DD9->CreateVertexBuffer(sizeof(SBitmap) * 256 * 240, NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL);
Why not LPD3DD9->CreateVertexBuffer(sizeof(Bitmap), NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL); ?

Code: Select all

         Bitmap[(temp2 * 256) + temp].x = (float)256;
         Bitmap[(temp2 * 256) + temp].y = (float)240;
Why force-casting when members x and y are already float? The compiler will not throw warnings if you removed the force-cast of (float).

Politely: you have an awful, awful tendency to force-cast things excessively, dude. Break this habit ASAP, as it WILL bite you.

I would also get rid of that statically-allocated Bitmap[256*240] nonsense and use malloc() instead. That's multiple megabytes of crap allocated on the heap which you can't guarantee will actually be available (your program will crash mysteriously as a result). You can handle error conditions with malloc().
Its a global variable. So its in the data or zero section of the executable. That won't crash a program any more than malloc could fail. Your also talking about 256*240*20 = 1.17mb. If you can't malloc 1.17mb you may have bigger problems.

Also, accessing the global variable may be faster than a pointer dereference. (marginally so in most cases, but still faster.)
Last edited by Zelex on Thu Apr 11, 2013 9:54 am, edited 3 times in total.
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: Windowed VSync

Post by WedNESday »

Actually koitsu;

1. I use the C++ new keyword instead of Bitmap[256 * 240].

Code: Select all

} *Bitmap;

Bitmap = new SBitmap[BitmapWidth * BitmapHeight];
2. If I don't use (float) then VS gives me a warning about using integers with floats etc.

In fact any force-casting that you see in my code is just there temporarily. What you see above is just an example that wasn't supposed to be too serious.
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: Windowed VSync

Post by WedNESday »

Just removed the unnecessary 256 * 240 code above. Thanks koitsu.
Zelex
Posts: 268
Joined: Fri Apr 29, 2011 9:44 pm

Re: Windowed VSync

Post by Zelex »

Also, I don't see anything in that code that specifically guarantees windowed vsync. It may just happen to vsync on your machine by chance only. Asking D3D to vsync windowed doesn't always work reliably unfortunately =/
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: Windowed VSync

Post by WedNESday »

Zelex wrote:Also, I don't see anything in that code that specifically guarantees windowed vsync. It may just happen to vsync on your machine by chance only. Asking D3D to vsync windowed doesn't always work reliably unfortunately =/
Well in that case if possible could someone give it a try please. I know that FHorse will at some point.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Windowed VSync

Post by koitsu »

Zelex wrote:
koitsu wrote:Two questions:

Code: Select all

   LPD3DD9->CreateVertexBuffer(sizeof(SBitmap) * 256 * 240, NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL);
Why not LPD3DD9->CreateVertexBuffer(sizeof(Bitmap), NULL, D3DFVF_DIFFUSE | D3DFVF_XYZRHW, D3DPOOL_MANAGED, &LPD3DVB9, NULL); ?

Code: Select all

         Bitmap[(temp2 * 256) + temp].x = (float)256;
         Bitmap[(temp2 * 256) + temp].y = (float)240;
Why force-casting when members x and y are already float? The compiler will not throw warnings if you removed the force-cast of (float).

Politely: you have an awful, awful tendency to force-cast things excessively, dude. Break this habit ASAP, as it WILL bite you.

I would also get rid of that statically-allocated Bitmap[256*240] nonsense and use malloc() instead. That's multiple megabytes of crap allocated on the heap which you can't guarantee will actually be available (your program will crash mysteriously as a result). You can handle error conditions with malloc().
Its a global variable. So its in the data or zero section of the executable. That won't crash a program any more than malloc could fail.
1) Scope has no bearing on this,
2) It's in one of a series of segments (I don't care to remember which -- .dynsym, .text, .rodata, .data, .bss, or god knows what else), correct
3) Incorrect. There are two reasons for this:

i) If the size of the buffer exceeds stack size, the program's behaviour is unknown (it varies per OS, and on some OSes the result is undefined). There is really no justification for allocating such large amounts of memory on the stack.

ii) With the "static allocation" method, once the data inside of Bitmap[] is used, it can never be released back to the OS -- there is no way to free() it. Ultimately whether or not this matters depends on what the application is doing, naturally, but it's a very bad habit to get into -- instead, just use malloc() and friends and you'll always be safe, with no negative trade-offs.

I've talked about item #3 at length on the FreeBSD lists as well:

http://lists.freebsd.org/pipermail/free ... 72530.html
http://lists.freebsd.org/pipermail/free ... 72534.html
http://lists.freebsd.org/pipermail/free ... 72545.html

There are some ways on OSes to "advise" the VM on what to do with such memory, but they're often not portable.
Zelex wrote:Your also talking about 256*240*20 = 1.17mb. If you can't malloc 1.17mb you may have bigger problems.
Except that you can handle the situation gracefully if you do the allocation yourself, rather than the user getting a bizarre error from the OS that may not make any sense.
Zelex wrote:Also, accessing the global variable may be faster than a pointer dereference. (marginally so in most cases, but still faster.)
Again: scope has no bearing here. As for the dereferencing hit -- it's negligible in 98% of the cases out there.

Footnote: for C++ you're supposed to use new and delete and all that crap -- sure, whatever works -- I'm used to how shit works under the hood.
Zelex
Posts: 268
Joined: Fri Apr 29, 2011 9:44 pm

Re: Windowed VSync

Post by Zelex »

wait wait, how does a global variable have anything to do with the size of your stack? Is this some kind of "this is how it works with some compiler on some obscure embedded platform?"

Also global data is allocated by the OS at exe init time. Its not allocated on first use. You may be thinking of a local static variable. For example,

int foo() {
static int bar = 0;
return bar++;
}

bar in this example is allocated at program startup by the OS, it is however only initialized to 0 when foo is first run. (watch out for multi-threading issues)
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Windowed VSync

Post by koitsu »

Zelex wrote:wait wait, how does a global variable have anything to do with the size of your stack? Is this some kind of "this is how it works with some compiler on some obscure embedded platform?"

Also global data is allocated by the OS at exe init time. Its not allocated on first use. You may be thinking of a local static variable. For example,

int foo() {
static int bar = 0;
return bar++;
}

bar in this example is allocated at program startup by the OS, it is however only initialized to 0 when foo is first run. (watch out for multi-threading issues)
That code does nothing and is mostly irrelevant to the discussion. Are we talking past one another? I'm not sure.

Tell me: what do you think happens when "the global data allocated by the OS at exe init time" fails? What happens when do something like char buf[1024*1024*32]; (scope doesn't matter) and the allocation by the ELF loader (or the "executable loader") fails? I can tell you what happens: undefined behaviour, based on too many situations/scenarios/variables. Instead, allocate memory during runtime and you can handle this situation elegantly.

Edit: Okay, it looks like scope does matter, at least on *IX systems. With char buf[1024*1024*32]; declared outside of main(), the allocation ends up being part of the processes active datasize. If declared inside of main(), it ends up being part of the processes active stacksize. If inside of a sub-function, the allocation only happens when that function is called and is part of the processes active stacksize. If inside of a sub-function that isn't used, the allocation never happens.

The behaviour if the allocation is too large (exceeds system limits, etc.) is undefined however; on some systems it might segfault right off the bat, but on others it might do god-knows-what or just lock up. The risk isn't worth it -- allocate "large chunks" (see those freebsd.org mailing list threads) dynamically so your program can always start/run and can elegantly handle situations where memory pressure is an issue. You should not depend on the ELF/executable loader to do this for you.
User avatar
Zepper
Formerly Fx3
Posts: 3262
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Re: Windowed VSync

Post by Zepper »

What about creating a BITMAP structure, and then... BITMAP *bmp = malloc( foo ) ?
Post Reply