puNES Emulator

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Dwedit wrote:DirectDraw provides horrible vblank waiting code, then SDL uses that horrible vblank waiting code.
Is there a standard workaround to this without dropping Windows altogether?
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

Why is everyone guessing at what the root cause is? Can no one profile the actual emulator? :-)
User avatar
Dwedit
Posts: 4922
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit »

Because usually only bad vblank waiting code causes 100% CPU usage (of one core, don't be confused by numbers like "50% usage" on a dual core system), regardless of anything else.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
FHorse
Posts: 232
Joined: Sat May 08, 2010 9:31 am

Post by FHorse »

I think the problem is my emulator (maybe) and that is why I continue to search a solution in the code. I'have uploaded the latest attempt if someone want try. When i drink three beer i think better (maybe) :)
User avatar
James
Posts: 431
Joined: Sat Jan 22, 2005 8:51 am
Location: Chicago, IL
Contact:

Post by James »

tepples wrote:
Dwedit wrote:DirectDraw provides horrible vblank waiting code, then SDL uses that horrible vblank waiting code.
Is there a standard workaround to this without dropping Windows altogether?
Prior to migrating my emulator to DX10 (where the problem has, apparently, been fixed), I would time the frame emulation and, if there were more than x milliseconds left before the next frame, I'd make a Sleep() call. The default granularity of Sleep() is 10ms, but if you want more accurate timing, it's possible to reduce it with another API call (don't remember what that call is...). Worked reasonably well.
get nemulator
http://nemulator.com
FHorse
Posts: 232
Joined: Sat May 08, 2010 9:31 am

Post by FHorse »

and that's exactly what I do in the version uploaded last night after three beers :)
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Post by thefox »

FHorse wrote:and that's exactly what I do in the version uploaded last night after three beers :)
Yeah CPU usage is down to 15% (software) / 25% (opengl) now.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

Dwedit wrote:Because usually only bad vblank waiting code causes 100% CPU usage (of one core, don't be confused by numbers like "50% usage" on a dual core system), regardless of anything else.
Or someone using while(1) with no delays (or even if they use delays, using too small of a delay). The renowned "amazing programmer" of Dwarf Fortress did this for many years; I wonder how many laptops he was responsible for destroying.
User avatar
dr_sloppy
Posts: 52
Joined: Mon Oct 27, 2008 2:48 pm
Location: Ålesund, Norway
Contact:

Post by dr_sloppy »

Could you elaborate on that, please?
User avatar
Zepper
Formerly Fx3
Posts: 3262
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper »

koitsu wrote:
Dwedit wrote:Because usually only bad vblank waiting code causes 100% CPU usage (of one core, don't be confused by numbers like "50% usage" on a dual core system), regardless of anything else.
Or someone using while(1) with no delays (or even if they use delays, using too small of a delay). The renowned "amazing programmer" of Dwarf Fortress did this for many years; I wonder how many laptops he was responsible for destroying.

Code: Select all

volatile int new_flag, old_flag;
static void clock_me() { new_flag++; }

while(old_flag == new_flag) {
 rest(1);
}
old_flag = new_flag;
- Any other suggestion, Mr. ? :) I'm tired of destroying laptops.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

Simple: the processor (or a single core if applicable) is spinning away mindlessly in a while(1) loop. Unless there are delays (sleeps) or other equivalents that result in the processor actually halting its operations, the processor is going to spin out of control. Period.

Many GUI-based libraries implement or wrap main() with their own routine to deal with this problem. On the Windows platform, the easiest way to halt things effectively is to use GetMessage() along with DispatchMessage() and TranslateMessage().

On UNIX and UNIX-like OSes, select() is a common choice, while other operating systems like FreeBSD offer things like kqueue() which are more efficient (and are very reminiscent of how Windows does it). Linux offers things like epoll(). Solaris offers things like poll(7d) (do not confuse this with poll(2), which is different).

EDIT: Oh, there's also POSIX threads (specifically pthread_cond_wait()), but FreeBSD behaves slightly different than OS X behaves slightly different than Linux behaves slightly different than Solaris. :-)

In summary, please:

1) Don't program in 2010 like you're developing an application for a non-multitasking environment (e.g. MS-DOS days),

2) Keep in mind that just because you don't see the repercussions of what you are currently doesn't mean they're not happening (the more cores you have the less likely you'll see this as 100% CPU time; remember, multiple cores get aggregated in total CPU usage, e.g. 100% of a single core on an 8-core machine is 12.5%),

3) Spend the time to read up on whatever "core environment" you're using to find out if it wraps main() with its own routine. If there's no documentation, ask the author to explain how it works or don't use it at all.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

koitsu wrote:Many GUI-based libraries implement or wrap main() with their own routine to deal with this problem. On the Windows platform, the easiest way to halt things effectively is to use GetMessage() along with DispatchMessage() and TranslateMessage().

On UNIX and UNIX-like OSes, select() is a common choice, while other operating systems like FreeBSD offer things like kqueue() which are more efficient (and are very reminiscent of how Windows does it). Linux offers things like epoll(). Solaris offers things like poll(7d) (do not confuse this with poll(2), which is different).
Provided that one waiting function actually works for all sources of wakeup events in which a program is interested. For example, select() on Windows doesn't work on anything but network sockets. And can GetMessage() or select()/kqueue()/epoll() see vertical blanks?
3) Spend the time to read up on whatever "core environment" you're using to find out if it wraps main() with its own routine. If there's no documentation, ask the author to explain how it works
Both "core environments" that I've used on Windows (Allegro and SDL) rely on a function in DirectX pre-10 known to use spin waiting. Good luck getting clarification out of Microsoft for best sleeping practice in DirectX pre-10 when Microsoft wants video game programmers to switch to a newer API that isn't ported to Windows XP so that it can get more users off of Windows XP and onto Windows 7.
or don't use it at all.
Don't use (Windows|Linux|Mac OS X) at all? That almost sounds like console fanboys on Slashdot, who claim that individual independent game developers don't deserve to have a platform on which to make and self-publish a video game. Instead, they should move to a city and state with lots of established video game studios and work for ten years as an apprentice before putting their ideas into practice. I hope I misunderstood you.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

tepples wrote:Provided that one waiting function actually works for all sources of wakeup events in which a program is interested. For example, select() on Windows doesn't work on anything but network sockets. And can GetMessage() or select()/kqueue()/epoll() see vertical blanks?
I wouldn't use select() on Windows (I'm surprised it even exists), and Cygwin is a disgusting broken pile of shit -- please don't get me started on Cygwin, I can rant about it for weeks (and also do not respond with some nonsense like "well it works for me"; that's nice, it's still horribly broken -- there's too many reasons we're getting rid of it in our enterprise environment at my job and going pure Windows on our Win2K3 and 2K8 boxes). It's also not relevant to the discussion; we're not talking about how to accomplish POSIX on Win32, we're talking about how to not chew up 100% CPU time while waiting for VBL.

Regarding GetMessage() on Win32 and select/kqueue/epoll on *IX and VBL -- no, obviously these tools cannot "see" VBL. They aren't intended for interfacing with graphics interfaces. On Windows, DirectX provides that framework. I'm not familiar with GUI/graphics implementations on *IX OSes, but I imagine there are many (probably SDL being the most common). How those behave/work is unknown to me; I don't do graphics programming on Windows nor on *IX. But I don't need to do graphics programming to apply the knowledge of the problem. EDIT: It looks like SDL can accomplish proper VBL waiting by using the SDL_Flip function (which also offers double buffering). Of course, what this function does behind the scenes (meaning how it waits for VBL) is the entire point of this discussion. :-)

Simply put, a VBL firing should induce an interrupt (think: like NMI on the NES), which the kernel traps via an exception handler. Video card drivers provide this (confirmed for nVidia, ATI, and Intel). The handlers support applications (userland) to "tie in" to it, so that when the interrupt is seen the program then does something.

kqueue can't tie in to a hardware interrupt directly, but it can tie in to a file descriptor or vnode. The kernel driver would need to provide an interface (preferably through a file in /dev somewhere) for this to be beneficial. I sure as hell would hope video card manufacturers who offer drivers for *IX would offer this capability (maybe not through /dev but a documented handler which a library (SDL, etc.) could utilise and tie into for VBL handling).

GetMessage behaves similarly, as does select and epoll. On Win32 there may be some added abstraction needed given the use of DirectX, but yes, they should absolutely be used. If the graphics library/API provides its own methodology that accomplishes the same task, then that should be used. But it needs to be documented for people to know about it. I imagine Win32 absolutely offers some way to tie a hardware interrupt to a running program.

The point I'm making: while(1) { jack-off; } is wrong. Do not do it. Instead, your application should simply block (resulting in it doing NOTHING CPU-wise) and wait for the kernel to tell it to do something. If you ABSOLUTELY CANNOT do anything about it, then please sleep() for long durations of time within the while(1) loop to minimise impact.

And before someone smarmy brings it up I'll cover my ass: it is absolutely fine to use while(1) when you know *for an absolute fact* that the loop will be broken out of VERY quickly. A profiler will greatly help in determining how much time you spend inside such a loop. Most present-day kernels use while(1) where applicable. It's completely fine when you know for a fact the loop will exit very quickly. Otherwise, you need to use locks and behave like I described above. But for an overall "main" loop used in, say, main(), this is wrong and will result in CPU skyrocketing, which is exactly what Dwarf Fortress did and is what induced my "destroyed laptops" comment.
tepples wrote:Both "core environments" that I've used on Windows (Allegro and SDL) rely on a function in DirectX pre-10 known to use spin waiting.
A spin wait/spin lock literally ties up the processor (a very small/short loop), and are heavily dependent upon their timeout/expire capability (hopefully being set to a VERY minimal/small value). This broken design methodology is documented on Windows as well as in general (see 2nd paragraph). For the Wikipedia link, see the section titled "Busy-waiting alternatives". You'll also see in that section that sleep() is mentioned. This is important. EDIT: There's also an article about spinlock alternatives which also delve into other options (such as switching threads; I wouldn't have thought of this, but sounds like an excellent method).

So again: do not use while(1) { jack-off; }. At bare minimum use sleep() or a wait condition inside of there, with a decently-sized number. Please do not tie up the CPU when it doesn't need to be.
tepples wrote:Good luck getting clarification out of Microsoft for best sleeping practice in DirectX pre-10 when Microsoft wants video game programmers to switch to a newer API that isn't ported to Windows XP so that it can get more users off of Windows XP and onto Windows 7.
I understand/acknowledge the complaint, but all it means is more coding efforts required on the part of the application programmer. The programmer then has to use different design methodology for DX9 vs. DX10. Many implement this by defining separate models/methodologies if DX8 is detected vs. DX9 vs. DX10.

Encapsulating all of my points: take a look at the source code to NEStopia for how this is done. Said emulator handles VBlank properly in multiple Windows OSes (read: across DX versions), and does not tie up CPU when unneeded.
tepples wrote:Don't use (Windows|Linux|Mac OS X) at all? That almost sounds like console fanboys on Slashdot ... I hope I misunderstood you.
This isn't what I said / what I intended. I said if you're using a graphics API or subsystem layer (example: Allegro), and it does not offer proper documentation outlining how the waiting methodology works, or the author cannot explain it, do not use the API/subsystem. This is in no way shape or form the same as "do not use {insert OS here}". There is no way I would advocate the latter; I am a vehement opponent of OS advocacy and a strong proponent of use-whatever-OS-suits-your-needs.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu »

In true koitsu fashion, I'm following up to my own post with further information -- mainly because I was curious how the hell SDL waits for VBL, particularly on Windows.

The source is hardly documented -- in true open-source fashion! :-) -- so I had to make some educated guesses about where the DirectX piece is for SDL. I believe this is it:

Code: Select all

SDL-1.2.14/src/video/windx5/SDL_dx5video.c

 439 static int DX5_FlipHWSurface(_THIS, SDL_Surface *surface);
...
2096 static int DX5_FlipHWSurface(_THIS, SDL_Surface *surface)
2097 {
...
2103         /* to prevent big slowdown on fast computers, wait here instead of driver ring 0 code */
2104         /* Dmitry Yakimov (ftech@tula.net) */
2105         while(IDirectDrawSurface3_GetFlipStatus(dd_surface, DDGBS_ISBLTDONE) == DDERR_WASSTILLDRAWING);
...
I believe SDL 1.3 might use Direct3D instead -- which does VBL handling completely different (and more reliably from what I've been told by someone who does CUDA programming on a daily basis). Okay, so let's find out what the heck IDirectDrawSurface3_GetFlipStatus is and does. It looks like it's just GetFlipStatus.

So there's the likely while(1) equivalent in SDL, at least for DirectDraw, which doesn't help answer my question either, simply because I don't know what GetFlipStatus() does behind the scenes (DX/Microsoft code), but I sure hope it's intelligently done. Honestly, the while() loop above doesn't look like it waits for VBL at all, but again it depends on what GetFlipStatus() does behind the scenes.
mudlord
Posts: 11
Joined: Sat Jun 12, 2010 8:05 pm
Contact:

Post by mudlord »

koitsu wrote: I believe SDL 1.3 might use Direct3D instead -- which does VBL handling completely different (and more reliably from what I've been told by someone who does CUDA programming on a daily basis).
Yes, SDL HG has Direct3D/OpenGL blitters. Haven't tried it yet, but it does indeed use at least D3D9 for blits.
Post Reply