It is currently Tue Oct 17, 2017 5:28 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 721 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 49  Next
Author Message
 Post subject:
PostPosted: Wed Mar 02, 2011 12:22 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19091
Location: NE Indiana, USA (NTSC)
Dwedit wrote:
DirectDraw provides horrible vblank waiting code, then SDL uses that horrible vblank waiting code.

Is there a standard workaround to this without dropping Windows altogether?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 02, 2011 5:30 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
Why is everyone guessing at what the root cause is? Can no one profile the actual emulator? :-)


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 02, 2011 5:34 pm 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 3942
Because usually only bad vblank waiting code causes 100% CPU usage (of one core, don't be confused by numbers like "50% usage" on a dual core system), regardless of anything else.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 02, 2011 5:37 pm 
Offline

Joined: Sat May 08, 2010 9:31 am
Posts: 225
I think the problem is my emulator (maybe) and that is why I continue to search a solution in the code. I'have uploaded the latest attempt if someone want try. When i drink three beer i think better (maybe) :)


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 03, 2011 10:41 am 
Offline
User avatar

Joined: Sat Jan 22, 2005 8:51 am
Posts: 427
Location: Chicago, IL
tepples wrote:
Dwedit wrote:
DirectDraw provides horrible vblank waiting code, then SDL uses that horrible vblank waiting code.

Is there a standard workaround to this without dropping Windows altogether?


Prior to migrating my emulator to DX10 (where the problem has, apparently, been fixed), I would time the frame emulation and, if there were more than x milliseconds left before the next frame, I'd make a Sleep() call. The default granularity of Sleep() is 10ms, but if you want more accurate timing, it's possible to reduce it with another API call (don't remember what that call is...). Worked reasonably well.

_________________
get nemulator
http://nemulator.com


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 03, 2011 12:07 pm 
Offline

Joined: Sat May 08, 2010 9:31 am
Posts: 225
and that's exactly what I do in the version uploaded last night after three beers :)


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 03, 2011 1:06 pm 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2961
Location: Tampere, Finland
FHorse wrote:
and that's exactly what I do in the version uploaded last night after three beers :)

Yeah CPU usage is down to 15% (software) / 25% (opengl) now.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 03, 2011 5:47 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
Dwedit wrote:
Because usually only bad vblank waiting code causes 100% CPU usage (of one core, don't be confused by numbers like "50% usage" on a dual core system), regardless of anything else.

Or someone using while(1) with no delays (or even if they use delays, using too small of a delay). The renowned "amazing programmer" of Dwarf Fortress did this for many years; I wonder how many laptops he was responsible for destroying.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 03, 2011 6:04 pm 
Offline
User avatar

Joined: Mon Oct 27, 2008 2:48 pm
Posts: 50
Location: Ålesund, Norway
Could you elaborate on that, please?


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 03, 2011 6:41 pm 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
koitsu wrote:
Dwedit wrote:
Because usually only bad vblank waiting code causes 100% CPU usage (of one core, don't be confused by numbers like "50% usage" on a dual core system), regardless of anything else.

Or someone using while(1) with no delays (or even if they use delays, using too small of a delay). The renowned "amazing programmer" of Dwarf Fortress did this for many years; I wonder how many laptops he was responsible for destroying.


Code:
volatile int new_flag, old_flag;
static void clock_me() { new_flag++; }

while(old_flag == new_flag) {
 rest(1);
}
old_flag = new_flag;


- Any other suggestion, Mr. ? :) I'm tired of destroying laptops.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 04, 2011 2:44 am 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
Simple: the processor (or a single core if applicable) is spinning away mindlessly in a while(1) loop. Unless there are delays (sleeps) or other equivalents that result in the processor actually halting its operations, the processor is going to spin out of control. Period.

Many GUI-based libraries implement or wrap main() with their own routine to deal with this problem. On the Windows platform, the easiest way to halt things effectively is to use GetMessage() along with DispatchMessage() and TranslateMessage().

On UNIX and UNIX-like OSes, select() is a common choice, while other operating systems like FreeBSD offer things like kqueue() which are more efficient (and are very reminiscent of how Windows does it). Linux offers things like epoll(). Solaris offers things like poll(7d) (do not confuse this with poll(2), which is different).

EDIT: Oh, there's also POSIX threads (specifically pthread_cond_wait()), but FreeBSD behaves slightly different than OS X behaves slightly different than Linux behaves slightly different than Solaris. :-)

In summary, please:

1) Don't program in 2010 like you're developing an application for a non-multitasking environment (e.g. MS-DOS days),

2) Keep in mind that just because you don't see the repercussions of what you are currently doesn't mean they're not happening (the more cores you have the less likely you'll see this as 100% CPU time; remember, multiple cores get aggregated in total CPU usage, e.g. 100% of a single core on an 8-core machine is 12.5%),

3) Spend the time to read up on whatever "core environment" you're using to find out if it wraps main() with its own routine. If there's no documentation, ask the author to explain how it works or don't use it at all.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 04, 2011 5:32 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19091
Location: NE Indiana, USA (NTSC)
koitsu wrote:
Many GUI-based libraries implement or wrap main() with their own routine to deal with this problem. On the Windows platform, the easiest way to halt things effectively is to use GetMessage() along with DispatchMessage() and TranslateMessage().

On UNIX and UNIX-like OSes, select() is a common choice, while other operating systems like FreeBSD offer things like kqueue() which are more efficient (and are very reminiscent of how Windows does it). Linux offers things like epoll(). Solaris offers things like poll(7d) (do not confuse this with poll(2), which is different).

Provided that one waiting function actually works for all sources of wakeup events in which a program is interested. For example, select() on Windows doesn't work on anything but network sockets. And can GetMessage() or select()/kqueue()/epoll() see vertical blanks?

Quote:
3) Spend the time to read up on whatever "core environment" you're using to find out if it wraps main() with its own routine. If there's no documentation, ask the author to explain how it works

Both "core environments" that I've used on Windows (Allegro and SDL) rely on a function in DirectX pre-10 known to use spin waiting. Good luck getting clarification out of Microsoft for best sleeping practice in DirectX pre-10 when Microsoft wants video game programmers to switch to a newer API that isn't ported to Windows XP so that it can get more users off of Windows XP and onto Windows 7.

Quote:
or don't use it at all.

Don't use (Windows|Linux|Mac OS X) at all? That almost sounds like console fanboys on Slashdot, who claim that individual independent game developers don't deserve to have a platform on which to make and self-publish a video game. Instead, they should move to a city and state with lots of established video game studios and work for ten years as an apprentice before putting their ideas into practice. I hope I misunderstood you.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 04, 2011 6:56 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
tepples wrote:
Provided that one waiting function actually works for all sources of wakeup events in which a program is interested. For example, select() on Windows doesn't work on anything but network sockets. And can GetMessage() or select()/kqueue()/epoll() see vertical blanks?


I wouldn't use select() on Windows (I'm surprised it even exists), and Cygwin is a disgusting broken pile of shit -- please don't get me started on Cygwin, I can rant about it for weeks (and also do not respond with some nonsense like "well it works for me"; that's nice, it's still horribly broken -- there's too many reasons we're getting rid of it in our enterprise environment at my job and going pure Windows on our Win2K3 and 2K8 boxes). It's also not relevant to the discussion; we're not talking about how to accomplish POSIX on Win32, we're talking about how to not chew up 100% CPU time while waiting for VBL.

Regarding GetMessage() on Win32 and select/kqueue/epoll on *IX and VBL -- no, obviously these tools cannot "see" VBL. They aren't intended for interfacing with graphics interfaces. On Windows, DirectX provides that framework. I'm not familiar with GUI/graphics implementations on *IX OSes, but I imagine there are many (probably SDL being the most common). How those behave/work is unknown to me; I don't do graphics programming on Windows nor on *IX. But I don't need to do graphics programming to apply the knowledge of the problem. EDIT: It looks like SDL can accomplish proper VBL waiting by using the SDL_Flip function (which also offers double buffering). Of course, what this function does behind the scenes (meaning how it waits for VBL) is the entire point of this discussion. :-)

Simply put, a VBL firing should induce an interrupt (think: like NMI on the NES), which the kernel traps via an exception handler. Video card drivers provide this (confirmed for nVidia, ATI, and Intel). The handlers support applications (userland) to "tie in" to it, so that when the interrupt is seen the program then does something.

kqueue can't tie in to a hardware interrupt directly, but it can tie in to a file descriptor or vnode. The kernel driver would need to provide an interface (preferably through a file in /dev somewhere) for this to be beneficial. I sure as hell would hope video card manufacturers who offer drivers for *IX would offer this capability (maybe not through /dev but a documented handler which a library (SDL, etc.) could utilise and tie into for VBL handling).

GetMessage behaves similarly, as does select and epoll. On Win32 there may be some added abstraction needed given the use of DirectX, but yes, they should absolutely be used. If the graphics library/API provides its own methodology that accomplishes the same task, then that should be used. But it needs to be documented for people to know about it. I imagine Win32 absolutely offers some way to tie a hardware interrupt to a running program.

The point I'm making: while(1) { jack-off; } is wrong. Do not do it. Instead, your application should simply block (resulting in it doing NOTHING CPU-wise) and wait for the kernel to tell it to do something. If you ABSOLUTELY CANNOT do anything about it, then please sleep() for long durations of time within the while(1) loop to minimise impact.

And before someone smarmy brings it up I'll cover my ass: it is absolutely fine to use while(1) when you know *for an absolute fact* that the loop will be broken out of VERY quickly. A profiler will greatly help in determining how much time you spend inside such a loop. Most present-day kernels use while(1) where applicable. It's completely fine when you know for a fact the loop will exit very quickly. Otherwise, you need to use locks and behave like I described above. But for an overall "main" loop used in, say, main(), this is wrong and will result in CPU skyrocketing, which is exactly what Dwarf Fortress did and is what induced my "destroyed laptops" comment.

tepples wrote:
Both "core environments" that I've used on Windows (Allegro and SDL) rely on a function in DirectX pre-10 known to use spin waiting.


A spin wait/spin lock literally ties up the processor (a very small/short loop), and are heavily dependent upon their timeout/expire capability (hopefully being set to a VERY minimal/small value). This broken design methodology is documented on Windows as well as in general (see 2nd paragraph). For the Wikipedia link, see the section titled "Busy-waiting alternatives". You'll also see in that section that sleep() is mentioned. This is important. EDIT: There's also an article about spinlock alternatives which also delve into other options (such as switching threads; I wouldn't have thought of this, but sounds like an excellent method).

So again: do not use while(1) { jack-off; }. At bare minimum use sleep() or a wait condition inside of there, with a decently-sized number. Please do not tie up the CPU when it doesn't need to be.

tepples wrote:
Good luck getting clarification out of Microsoft for best sleeping practice in DirectX pre-10 when Microsoft wants video game programmers to switch to a newer API that isn't ported to Windows XP so that it can get more users off of Windows XP and onto Windows 7.


I understand/acknowledge the complaint, but all it means is more coding efforts required on the part of the application programmer. The programmer then has to use different design methodology for DX9 vs. DX10. Many implement this by defining separate models/methodologies if DX8 is detected vs. DX9 vs. DX10.

Encapsulating all of my points: take a look at the source code to NEStopia for how this is done. Said emulator handles VBlank properly in multiple Windows OSes (read: across DX versions), and does not tie up CPU when unneeded.

tepples wrote:
Don't use (Windows|Linux|Mac OS X) at all? That almost sounds like console fanboys on Slashdot ... I hope I misunderstood you.


This isn't what I said / what I intended. I said if you're using a graphics API or subsystem layer (example: Allegro), and it does not offer proper documentation outlining how the waiting methodology works, or the author cannot explain it, do not use the API/subsystem. This is in no way shape or form the same as "do not use {insert OS here}". There is no way I would advocate the latter; I am a vehement opponent of OS advocacy and a strong proponent of use-whatever-OS-suits-your-needs.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 04, 2011 7:45 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
In true koitsu fashion, I'm following up to my own post with further information -- mainly because I was curious how the hell SDL waits for VBL, particularly on Windows.

The source is hardly documented -- in true open-source fashion! :-) -- so I had to make some educated guesses about where the DirectX piece is for SDL. I believe this is it:

Code:
SDL-1.2.14/src/video/windx5/SDL_dx5video.c

 439 static int DX5_FlipHWSurface(_THIS, SDL_Surface *surface);
...
2096 static int DX5_FlipHWSurface(_THIS, SDL_Surface *surface)
2097 {
...
2103         /* to prevent big slowdown on fast computers, wait here instead of driver ring 0 code */
2104         /* Dmitry Yakimov (ftech@tula.net) */
2105         while(IDirectDrawSurface3_GetFlipStatus(dd_surface, DDGBS_ISBLTDONE) == DDERR_WASSTILLDRAWING);
...


I believe SDL 1.3 might use Direct3D instead -- which does VBL handling completely different (and more reliably from what I've been told by someone who does CUDA programming on a daily basis). Okay, so let's find out what the heck IDirectDrawSurface3_GetFlipStatus is and does. It looks like it's just GetFlipStatus.

So there's the likely while(1) equivalent in SDL, at least for DirectDraw, which doesn't help answer my question either, simply because I don't know what GetFlipStatus() does behind the scenes (DX/Microsoft code), but I sure hope it's intelligently done. Honestly, the while() loop above doesn't look like it waits for VBL at all, but again it depends on what GetFlipStatus() does behind the scenes.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Mar 04, 2011 11:53 pm 
Offline

Joined: Sat Jun 12, 2010 8:05 pm
Posts: 14
koitsu wrote:
I believe SDL 1.3 might use Direct3D instead -- which does VBL handling completely different (and more reliably from what I've been told by someone who does CUDA programming on a daily basis).


Yes, SDL HG has Direct3D/OpenGL blitters. Haven't tried it yet, but it does indeed use at least D3D9 for blits.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 721 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 49  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group