C++ WTF

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Re: C++ WTF

Post by thefox »

When you come across a bug like that, IMO you should find out its root cause even if you can fix it by changing something unrelated. Otherwise the bug may resurface at some other time.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: C++ WTF

Post by koitsu »

thefox wrote:When you come across a bug like that, IMO you should find out its root cause even if you can fix it by changing something unrelated. Otherwise the bug may resurface at some other time.
+1

Any time a programmer describes an issue as "or something" it indicates they didn't take the time to find out what was really going on/causing the issue. I have a friend who writes code like this and makes preposterous claims like "calloc() crashes my program but using malloc() works just fine so the compiler or underlying C libraries obviously are broken" (I gave up talking to him about this sort of thing long ago).
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: C++ WTF

Post by WedNESday »

thefox wrote:When you come across a bug like that, IMO you should find out its root cause even if you can fix it by changing something unrelated. Otherwise the bug may resurface at some other time.
+1

Couldn't agree more. But to be honest I think that if it was getting stuck in the PPU loop because there were not enough cycles being emulated then the problem has since gone away. Of course I will go over all my code at some point to check for the like of buffer overflows.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: C++ WTF

Post by tepples »

koitsu wrote:Any time a programmer describes an issue as "or something" it indicates they didn't take the time to find out what was really going on/causing the issue.
It could be that, or it could be that he did consult the documentation and found it incomplete. This was the case for some of the OAM refresh "bugs" in the NES PPU that were characterized starting in early 2009.
I have a friend who writes code like this and makes preposterous claims like "calloc() crashes my program but using malloc() works just fine so the compiler or underlying C libraries obviously are broken"
I seem to remember one version of the C library for Windows segfaulting on free(NULL); when that's supposed to be a no-op according to the C standard. If a difference between the library's behavior and the standard can be demonstrated in a 20-line program, isn't it the library's fault?
Zelex
Posts: 268
Joined: Fri Apr 29, 2011 9:44 pm

Re: C++ WTF

Post by Zelex »

tepples wrote:
koitsu wrote:Any time a programmer describes an issue as "or something" it indicates they didn't take the time to find out what was really going on/causing the issue.
It could be that, or it could be that he did consult the documentation and found it incomplete. This was the case for some of the OAM refresh "bugs" in the NES PPU that were characterized starting in early 2009.
I have a friend who writes code like this and makes preposterous claims like "calloc() crashes my program but using malloc() works just fine so the compiler or underlying C libraries obviously are broken"
I seem to remember one version of the C library for Windows segfaulting on free(NULL); when that's supposed to be a no-op according to the C standard. If a difference between the library's behavior and the standard can be demonstrated in a 20-line program, isn't it the library's fault?
Possibly, but these situations are exceedingly rare. 99.9999% of the time it is user error.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: C++ WTF

Post by rainwarrior »

I'd say I encounter a bug produced by a compiler about once every year or two. It's pretty common for the programmer to blame the compiler at first, but yeah, it's generally pretty rare that this is really the case. Slightly more common is being able to crash/fatal-error the compiler, which doesn't produce a bug since it failed to build your code, but can be pretty annoying to resolve. Library errors are not terribly uncommon, I'd say, especially with younger ones, or platform specific things that don't have as wide a testing net.

One of my favourite bugs that we tried to blame on a compiler was corruption of 32 bit floating points that were being endian swapped for big-endian platforms. The process involved reinterpreting a float as a 32 bit int, swapping its bytes, then reinterpreting it back into a float. This occasionally produced a NAN, which if loaded into the floating point unit would end up changing a few bits. However, with optimization on sometimes the conversion back to a float was optimized away, and the FPU was bypassed, leaving it intact. The result was something that produced occasional corrupt data if run in debug, and very little corrupt data if run in release, so this bug actually lived for maybe 6 months before somebody figured out that something was wrong with the data being produced, and yeah, our initial reaction was to blame the compiler, until we looked hard at the assembly and couldn't find anything wrong with it. (The lesson learned was never to put the data back into a float type after endian swapping it, just keep it as an integer until you write it to disk.)

Anyhow, it's a valid instinct to suspect the compiler, I think. It will be wrong most of the time, but on those few occasions where you can follow through and actually find a problem with the code it's producing, it pays off when you can report the bug and the compiler gets fixed.
Last edited by rainwarrior on Mon Jan 07, 2013 3:06 pm, edited 1 time in total.
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: C++ WTF

Post by WedNESday »

Code: Select all

...
int Mapper;
int Mirror[4];
int NMI;
...
becomes

Code: Select all

...
int WhatFetch;
int X;
int Mirror[4];
...fixes most of my errors.

W. T. F.
User avatar
Dwedit
Posts: 4922
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: C++ WTF

Post by Dwedit »

Stack corruption? Out of bounds writes on that last array?
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: C++ WTF

Post by WedNESday »

No out of bound writes on that array whatsoever. How would I check for stack corruption?

...btw I did uninstall Visual Studio from D: and reinstall to C: WITHOUT reinstalling the OS. When it first ran I did get a load of wierd error messages. Could that affect something?
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: C++ WTF

Post by rainwarrior »

When you find a variable is getting an unexpected value in it, put a data breakpoint on it. This will break whenever it gets written to, and you can often find the offending piece of code quite easily this way. (You can even make it break only when the offending value gets written.)

Stack corruption can be really hard to deal with, since it usually leaves you with unreadable/garbage callback information in the debugger. I've usually had to do a lot of printf-style instrumentation of the code to try and figure out exactly how far it gets before crashing to figure out where to look.
WedNESday
Posts: 1284
Joined: Thu Sep 15, 2005 9:23 am
Location: Berlin, Germany
Contact:

Re: C++ WTF

Post by WedNESday »

The 4 Mirror[] entries are 240, 4, 14 and 0 when the debugger is run. At no point in the code are they set to anything other than 0, 1, 2 or 3 for obvious reasons.
User avatar
cpow
NESICIDE developer
Posts: 1097
Joined: Mon Oct 13, 2008 7:55 pm
Location: Minneapolis, MN
Contact:

Re: C++ WTF

Post by cpow »

WedNESday wrote:The 4 Mirror[] entries are 240, 4, 14 and 0 when the debugger is run. At no point in the code are they set to anything other than 0, 1, 2 or 3 for obvious reasons.
Then you're memcpy()ing or otherwise walking beyond the end of some other nearby array in memory...

If you ever see something in memory you know your code isn't writing there, you're usually wrong. Your code is writing it there but not the code you'd expect. For example, looking at all of your accesses to Mirror[] will not lead you to the culprit--it may actually lead you to some other problem if you discover that you're accessing Mirror[] with an out-of-bounds index, but that's not what's causing this problem. Looking at accesses to nearby arrays could help...but it's a long shot.

Set a breakpoint somewhere that you'll hit each time through a frame. See if you can narrow down when the corruption is occurring. Does it start out bad?
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: C++ WTF

Post by koitsu »

cpow wrote:
WedNESday wrote:The 4 Mirror[] entries are 240, 4, 14 and 0 when the debugger is run. At no point in the code are they set to anything other than 0, 1, 2 or 3 for obvious reasons.
Then you're memcpy()ing or otherwise walking beyond the end of some other nearby array in memory...
Given this showing exactly that, I would say your theory is likely. Not picking on ya WedNESday! Just saying cpow's proposal sounds likely given some established history.

Too bad Windows doesn't have native valgrind; it can usually detect this kind of thing.

I'll expand on what cpow wrote here with something that's a little more technical but might make more sense to you:
cpow wrote:If you ever see something in memory you know your code isn't writing there, you're usually wrong. Your code is writing it there but not the code you'd expect. For example, looking at all of your accesses to Mirror[] will not lead you to the culprit--it may actually lead you to some other problem if you discover that you're accessing Mirror[] with an out-of-bounds index, but that's not what's causing this problem. Looking at accesses to nearby arrays could help...but it's a long shot.
The reason it's a "long shot" and so on has to do with how OSes handle memory allocation. When a program starts there's actually a boatload of memory allocated all over the place (based on relevant executable header data and all the underlying segments defined in the executable itself -- yes I'm greatly and intentionally simplifying). A crash/exception (again keeping it simple) only happens when trying to access memory that is outside of the allocated space for your program -- anything that your program has allocated (either intentionally, or the kernel allocating for your program as a result of the program loading, etc.) is game for being accessed (read or written) without any complaint.

Memory allocation schemes in an OS do so in pages -- sequential amounts of memory that are not necessarily back-to-back linear. Phrased differently, let's say you have this line: foo = malloc(65536); bar = malloc(65536); You might be inclined to think that the underlying VM might allocate both 64KBytes back-to-back so that you could technically access foo[65536] and foo[65537] and actually be accessing memory allocated pointed to by the bar pointer. That assumption is wrong -- however, there may be memory (for other reasons) allocated for your program past that 64KByte allocation (referring to what foo points to) that can be accessed without an exceptions generated. It could be for some variables you allocated on the heap or the stack (either or). It could be for some underlying API bits that your program uses that allocates memory itself. All this is memory your program technically owns, which means you're actually free to access it in whatever ways you wish -- intentional or unintentional. This is how, for lack of better term, "memory gets corrupted" when a program does something it shouldn't be doing.

The result of this is often the programmer resorting to stupid ideas that "seem" to work and make him/her think they've solved the problem. Things like "I turned off optimisation and the problem is gone", "the issue doesn't happen if I enable debug symbols", "if I run 5 instances of the program the 4th one works fine", or screwing around with stack size (I really hate it when I see people do this). All these result in the programmer suddenly believing the underlying OS or system "is unstable" when in fact it's their software that's broken.

I've mentioned this before (in the same thread I linked above actually). My point in bringing that up is that depending on where the VM decided to allocate memory for the pointer called Pixel, it could be next to memory used for other things. When I say "other things" I mean quite literally anything relating to your program. Again: accessing something out-of-bounds that's still associated with your process memory space won't result in an exception.

I myself learned about this the hard way, maybe a year or so after I had started learning C. I had a piece of code (a simple fread() call and nothing more) that worked when using -O2 (optimisation level 2, i.e. more optimisations), but broke (crashed) when using -O1 or -O0. I had no idea why; I started blaming the compiler because the situation seemed backwards (I'd heard of optimiser bugs but generating working code with -O2 but crashing code with -O1 or no optimisation?) and I was pompous. It wasn't until "other mysterious issues" happened a week later that I compared my code to that of an open-source program. It took me a while to understand what was going on, but it was quite simply the exact same thing you experienced above with COLORREF *Pixel (but for me it was with FILE *fp vs. FILE fp and how -- or rather, what -- I was passing to fread()).

Tracking down out-of-bounds accesses like this is somewhat difficult and often requires that you build your binaries with a kind of "guard" or "wrapper" that may wrap itself around every single system or library call in attempt to try and do the messy work for you. I mentioned valgrind above; it does some of this as a wrapper, but there are other solutions that involve compiler features or third-party libraries that inject themselves into things (i.e. malloc() might now actually call a third-party library to do some tracking, then gets handed off to the real malloc()). I'm sure there are tools for this under Windows I just don't have familiarity with Windows development to be able to recommend any. :(
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Re: C++ WTF

Post by blargg »

Not sure if anyone mentioned, but enabling all warnings and adjusting code to quiet them is a good way to let the compiler help you.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: C++ WTF

Post by koitsu »

blargg wrote:Not sure if anyone mentioned, but enabling all warnings and adjusting code to quiet them is a good way to let the compiler help you.
...until you find people forcing typecasts to squelch said warnings (which in my experience is only necessary maybe 20-30% of the time; the rest are usually indicators of something anomalous). I strongly recommend -Wall -Werror. I still remember back in the early 90s when I was learning C and literally everyone I knew who did C kept telling me to "just ignore warnings". This came from a good 8 or 9 people. To this day it was the worst advice a large number of people (who now do professional programming) ever gave me. They were so incredibly wrong.

For folks using gcc, this is what I've used for many years on my own projects for debug/beta builds, and only are all of these removed for final production releases:

Code: Select all

-g3 -ggdb -Werror -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wsign-compare -Wstrict-prototypes -Wunreachable-code -Wwrite-strings
For production, the only thing I use is:

Code: Select all

-fno-inline
HTH.
Post Reply