It is currently Wed Dec 12, 2018 3:07 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 43 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Thu Nov 29, 2018 11:58 am 
Offline
Formerly Espozo
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3404
Location: Richmond, Virginia
Quote:
there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

I wouldn't think everywhere; aren't you able to omit the left set of square brackets and index past the row dimension in the right set of square brackets?

Quote:
The point of optimizing compilers is to keep the programmer from having to make brainless transpositions of the code like this and let them write something easier to read.

I guess I wasn't quite aware of the magnitude modern compilers are able to optimize code...


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 12:08 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7011
Location: Canada
tepples wrote:
By a constraint of a superscalar microarchitecture requiring it to be scheduled on an execute...

I don't think that helps, tepples.

My simple point was that an add isn't "always" faster than a multiply, and it's generally better to let the compiler figure this out. (Yes, an add can be faster in the right circumstances.)

Most compilers are very good at figuring out the best instruction to use, locally. There's always stuff that can be improved "by hand" but in general you should not resort to assembly until you have identified a specific performance problem with a specific piece of code. (i.e. if you aren't already timing that code, you're not ready to optimize it.)

Something that compilers can't do is figure out a better algorithm, or reorganize your data structures. That's why ap9 was talking about knowing whether your arrays are stored by rows or columns; if you know that one is more important than the other, you should arrange your array that way.

Something that compilers couldn't do very well in the past but are getting better at is figuring out what the important cases in your data are. Usually this is an area where you have a lot more capacity to know what your data really looks like, so you might know that the common case should be optimized at the expense of the rare one. This is now somewhat approachable by the compiler through profile guided optimization, which is pretty neat.


Last edited by rainwarrior on Thu Nov 29, 2018 12:58 pm, edited 2 times in total.

Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 12:19 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7011
Location: Canada
Drew Sebastino wrote:
Quote:
there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

I wouldn't think everywhere; aren't you able to omit the left set of square brackets and index past the row dimension in the right set of square brackets?

I don't understand the question, but the hypothetical I was thinking of is a case where j is technically invariant but the compiler doesn't realize this for some reason.

Here's a couple of illustrations:
Code:
// in a case like this, current_row() probably won't be understood as invariant (though it might if it's an inline)
for (int i=0; i<50; ++i)
   a[current_row()][i] = thing;

// you can manually make it invariant like this:
const int j = current_row();
for (int i=0; i<50; ++i)
   a[j][i] = thing;

// another common idiom is to take a pointer to a row of data and use that instead
// (not faster in this simple case, but it's a valid way to temporarily slice the structure, semantically)
int * const row = a[j];
for (int i=0; i<50; ++i)
   row[i] = thing;

It's hard to come up with a good example for this though, because it kinda has to get more complex than this before this stuff even matters to performance. With simple loops, there's often a bunch of ways to express the same idea that all compile to equivalent cod.

It's hard to think of a case where it would matter whether you use a 2D fixed array vs. manually indexing a 1D fixed array. In most situations these are simply equivalent.


A more realistic situation would be to find that some loop is taking a lot of time, and maybe looking at a disassembly (or especially a debugger with pipeline visualization) and thinking about how to address whatever the more specific issue the code is having. Usually this just involves tweaking the C code a little to get the compiler to handle it better, more rarely you might add cache control instructions, or even rewrite the assembly... but often just getting good knowledge of why something isn't working will give you a new idea for how to approach it on a higher level, and change the algorithm rather than trying to optimize at the finer level.

Like that's sort of what people are saying above, even if you had two ways of indexing an array where one has e.g an extra instruction, it still might not actually make any operations slower in your program. Successfully optimizing modern C software usually takes more careful situational consideration than just applying old canards like "use ++ as a prefix instead of suffix". A lot of very normal ways of doing loops and arrays and things like that have been well studied, and optimizers do a good job with them.

Drew Sebastino wrote:
I guess I wasn't quite aware of the magnitude modern compilers are able to optimize code...

It's a completely different situation than writing assembly. (...or writing C with CC65, where the compiler doesn't really have an optimizer.)


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 2:35 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3722
Location: Mountain View, CA
Should we introduce him to https://godbolt.org/ or not?


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 2:43 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7011
Location: Canada
I was going to mention it, but I don't think it really demonstrates the case he was asking about. Godbolt will show you the generated assembly, but won't show you what's faster, or where the pipeline is going to stall, what your cache does, etc. which are a lot more relevant to the array access question.

Writing some code and timing it will answer those questions a lot better. I think you need to know about what's actually faster before you can really speculate about the speed of the assembly output.

Edit: just noticed it has "LLVM-MCA" in Godbolt's tools list, which seems to do some interesting analysis. Maybe that's worth playing with.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 3:26 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3722
Location: Mountain View, CA
rainwarrior wrote:
Drew Sebastino wrote:
Quote:
there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.

I wouldn't think everywhere; aren't you able to omit the left set of square brackets and index past the row dimension in the right set of square brackets?

I don't understand the question, but ...

I think I understand his question, but it just prompts me to ask him questions. Let's assume char b[8][1024] is declared:

If the question is "can you do something like b[1024]?" then the answer is no; the compiler will tell you at compile-time that the array index is out-of-bounds.

If the question is "is the memory in a multidimensional array guaranteed to be allocated linearly in memory?" then the answer is yes, assuming the array is declared as described. The below program demonstrates exactly that -- by accessing individual bytes of the array directly with a pointer:

Code:
#include <stdio.h>

int main(void) {
  char b[8][1024];
  char *ptr;

  b[0][0] = 'x';
  b[0][1] = 'A';
  b[1][0] = 'y';
  b[2][0] = 'z';

  ptr = (char *) b;

  printf("%c\n", *(ptr));            /* will output x */
  printf("%c\n", *(ptr+1));          /* will output A */
  printf("%c\n", *(ptr+(1*1024)));   /* will output y */
  printf("%c\n", *(ptr+(2*1024)));   /* will output z */
  printf("%c\n", *(ptr+1900));       /* what do you think this will show? */

  return 0;
}

The last printf(), if you aren't sure, will output whatever happens to be at that byte in memory at that time -- it'll probably be a gobbledegook character, but you have no way of guaranteeing what it is (I don't want to get into explaining how ld.so and ELF end up declaring this stuff, it's advanced material). I should have done something like memset(&b, '-', sizeof(b)); in advance, to pre-initialise all the bytes to the ASCII character -. NULL or '\0' is not relevant here; %c will indeed output a null byte if encountered (read: %c is not %s).

Edit: typo.


Last edited by koitsu on Thu Nov 29, 2018 3:38 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 3:35 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 991
rainwarrior wrote:
Drew Sebastino wrote:
I guess I wasn't quite aware of the magnitude modern compilers are able to optimize code...
It's a completely different situation than writing assembly. (...or writing C with CC65, where the compiler doesn't really have an optimizer.)

I recently had an issue where changing a file in my C++ simulation code (to change how a flag was calculated) didn't work without me deleting every object file in the directory before running make. Naturally, make was assuming that since .o files existed for every .cc and .h file I hadn't changed, they didn't need to be recompiled. Unfortunately, it seems the Intel compiler with -O3 was doing optimization across multiple files; it had noticed that this particular flag was always zero in my old code and had optimized out the tests, so when the new code was linked with the older code, the new flag handling did nothing.

Fun stuff.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 5:54 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2352
Location: DIGDUG
Quote:
didn't work without me deleting every object file in the directory

'make' options.

‘--always-make’
Consider all targets out-of-date. GNU make proceeds to [remake all] targets

‘--assume-new=file’
Pretend that the target file has just been modified.

or have a make recipe that specifically deletes all .o files like

clean: rm -f *.o

(windows)
clean: del -f *.o

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 7:15 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7011
Location: Canada
The description sounded more like a compiler bug to do with "whole program optimization"; one which I hope was reported and eventually fixed.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 7:37 pm 
Offline
User avatar

Joined: Tue Jun 24, 2008 8:38 pm
Posts: 2124
Location: Fukuoka, Japan
tepples wrote:
I assume koitsu was referring to my habit of pointing out edge cases, particularly if they are believed not to change the conclusion meaningfully. Shift instructions are good for multiplication and division by powers of 2 but not for arbitrary multipliers and divisors.


As for myself, I just love teasing Koitsu, it's one of my many hobbies :lol:

Still, my opinion remain, since you just finally learned C, which means there is still a lot of things that are required to be learned before being comfortable, this is the last thing you should focus on: optimisation. You start to optimise and learn more about it once you have coded enough or the basic idea is done. It doesn't mean that you shouldn't be aware of performance issues, this is not what I'm saying but it shouldn't be your primary focus while learning. It will come naturally, once the need arise.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 8:06 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 991
I'm not really worried about the bug, if you can call it that (I just figured that was how it worked and you had to watch out for it). It's easy to work around, and I have to focus all my attention on my research because it's running very late. Maybe I'll try to figure out who to report it to and what they need to reproduce it after I finish this project.

My main point was that modern optimizers can get pretty adventurous. In this case, the optimizer deleted parts of my code based on information parsed from a completely different file, which is not something you'd expect it to do if your only experience was '80s-era assembly. If you write lda #$00; bne + in a SNES program, the assembler will assemble it with no questions asked.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 9:31 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7011
Location: Canada
As you described it I definitely think Intel would consider that a bug, and would want to fix it. I'm sure other users of the compiler would appreciate not having to deal with the same bug, too.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 9:59 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 991
Okay, but is it Intel's problem or is it GNU's problem? Or neither? The optimization is working correctly as far as I can tell; it's just that make didn't realize that the optimization in the older files relied on assumptions that the newer file broke, so it didn't bother recompiling them. In fact, from what I can tell with a quick search, this may be something that can be fixed at the makefile level.

But I'm not a make guru at the best of times, and this particular makefile is downright byzantine. It's about 100 KB, plus a 53 KB makefile.def, and I didn't write it. I'm also far from the only person using it. Given the pressure I'm under, I think it's best if I file this as somebody else's problem for now.

Anyway, I don't want to hijack this thread. I was just commenting on the state of modern optimizing compilers. If you're writing C/C++ for a modern PC, coding for speed means something very different than it does when writing assembly for the SNES.


Top
 Profile  
 
PostPosted: Thu Nov 29, 2018 11:41 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7011
Location: Canada
Oh, well I suppose if you don't have header dependencies configured correctly a problem like that could easily happen regardless of optimization. I thought you were saying that this was caused specifically by a whole program optimization (/IPO?) feature, but maybe I misinterpreted that entirely. Yes, makefiles are very easy to get wrong. :S


Top
 Profile  
 
PostPosted: Fri Nov 30, 2018 12:49 am 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 991
Well, yeah, it seems to have been caused by whole program optimization inducing a dependency between otherwise independent files, which the makefile was not set up to know about.

All checks of a certain flag had been stripped out of the compiled object files, because the compiler had determined, by parsing across multiple files, that the flag was always set to 0. Then I changed one .cc file to conditionally set the flag to 1 and remade the program; make checked the timestamps and only recompiled the file I'd changed...


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 43 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group