C: multi vs single dimensional array performance?
Moderator: Moderators
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
C: multi vs single dimensional array performance?
Well, I finally got around to having learned C, and while I think I have a good understanding of how most features get translated into machine code, I either don't understand how multidimensional arrays work, or I do and they're really just as slow as I think they are. It appears to me that say in a two dimensional array, to generate the offset, the processor would multiply the number in the left set of square brackets by the size of the first dimension, then add this to the number in the right set of square brackets? I figure this can't actually be what's going on, because this is stupid inefficient past anything other than a single random access.
Kind of a shame there doesn't seem to be anything between C and assembly. Cross compatibility is great, but not at the sake of an unnecessarily large performance loss.
Kind of a shame there doesn't seem to be anything between C and assembly. Cross compatibility is great, but not at the sake of an unnecessarily large performance loss.
Re: C: multi vs single dimensional array performance?
Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)
Re: C: multi vs single dimensional array performance?
It takes more time to read from memory nowadays than it does to multiply two numbers together. Besides, most of those multiplications will get optimized by the compiler.
You don't know enough to hold such an opinion It's okay just learn a bit more. I think you'll see things differently when you do.Cross compatibility is great, but not at the sake of an unnecessarily large performance loss.
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: C: multi vs single dimensional array performance?
If you want to know how fast or slow they are, time it.Drew Sebastino wrote:...or I do and they're really just as slow as I think they are.
However...
What are you comparing against? What's the alternative here? If you were to time how "slow" this thing is or isn't, what is the alternative that you would also time and compare with?Drew Sebastino wrote:It appears to me that say in a two dimensional array, to generate the offset, the processor would multiply the number in the left set of square brackets by the size of the first dimension, then add this to the number in the right set of square brackets?
If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?
You should also understand that in most C compilers (even going back 20 years) these 3 things would generate exactly the same code:
Code: Select all
a[(y<<8)+x] // 1D array
a[(y*256)+x] // 1D array
a[y][x] // 2D array
Re: C: multi vs single dimensional array performance?
Going off of what I know of the OP: probably because when it comes to CPU architectures and assembly, all he's familiar with is 65816 -- the final member of the 65xx family that never had native multiplication/division instructions for integers of any value**. What's easily unknown to "younger folks" is that other CPUs at the time the 6502 came out also lacked such capability (read: Intel 808, Zilog Z80). The only exceptions I know of for that same time frame are the VAX-11 and IBM's System 360 mainframe (a bloody behemoth), neither of which are microprocessors (the latter two are microcomputers, whose CPUs were physically enormous). Today people are "spoiled" by mul/imul/div/idiv (x86, starting with 8086) or muls/mulu/divs/divu (68K) or whatever ARM offers (AFAIK, multiplication-only up until roughly ARMv6 or v6, I forget, which added division).lidnariq wrote:Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)
I mention age because when I started doing x86 (80286/80386) for the first time in 1993 or so -- I was 16 years old -- the native imul/idiv instructions were a godsend. So were having more than 3 registers (particularly having more than one common-purpose accumulator, subsequently followed by having 32-bit registers and linear addressing (real mode/segmented memory is AWFUL)). I believe the OP is equally young and may be equally unaware.
** -- I have to say this because some pendant will come along and pee in my corn flakes blabbing about bit shifts.
Re: C: multi vs single dimensional array performance?
Sample values for a Phenom II: integer multiplication 1-4 cycles, RAM access 121+ cycles.
Re: C: multi vs single dimensional array performance?
Which is also (so I hear) why it's good to know whether the language stores multidimensional arrays in row-column or column-row order, because if you write your inner loop so that it increments the wrong one, you can end up skipping all over memory and load yourself down with cache misses.
Maybe a modern compiler would optimize that out; I don't know. Quite a trick to optimize it out if you have multiple loops that don't use the same order...
Maybe a modern compiler would optimize that out; I don't know. Quite a trick to optimize it out if you have multiple loops that don't use the same order...
Re: C: multi vs single dimensional array performance?
There is a slight advantage to manually designing the array, say to pre-compute a position in an array for later use, if it doesn't change throughout an operation.
Example:
As for multiplication versus shifts or tables for performance, it depends on the CPU/RAM being used. PowerPC had particularly fast arithmetic compared to slow system RAM. Today's chips can execute so many instructions per cycle.
The compiler tends to take care of these things well enough that most cases are trivial on manual improvements. If there's anything I've learned, it's better to write the way you like or understand it, so you can get to prototype stage faster. Time/profile it afterward, if it matters. Making optimizations can be particularly time consuming.
Example:
Code: Select all
const int timings[5*8] = {...};
int current_frequency;
int *current_timings;
void set_freq(int freq)
{
current_frequency = freq;
current_timings = timings + freq*5;
}
void tick_frame()
{
ticks+= current_timings[frame]; // instead of timings[frame+freq*5]
++frame;
}
The compiler tends to take care of these things well enough that most cases are trivial on manual improvements. If there's anything I've learned, it's better to write the way you like or understand it, so you can get to prototype stage faster. Time/profile it afterward, if it matters. Making optimizations can be particularly time consuming.
Re: C: multi vs single dimensional array performance?
@Koitsu
Yes, some bits can shift in your corn flakes... what where we talking about, condiments?
@Drew Sebastino
On a serious note, unless you target a very specific CPU that have low specs or you just are starting to learn plain C, you don't have to worry yet about those things: those are the kind of things you worry later when you have specific bottle neck and have more experience on the subject. These days you don't have to worry much about that anymore.
Yes, some bits can shift in your corn flakes... what where we talking about, condiments?
@Drew Sebastino
On a serious note, unless you target a very specific CPU that have low specs or you just are starting to learn plain C, you don't have to worry yet about those things: those are the kind of things you worry later when you have specific bottle neck and have more experience on the subject. These days you don't have to worry much about that anymore.
Re: C: multi vs single dimensional array performance?
I assume koitsu was referring to my habit of pointing out edge cases, particularly if they are believed not to change the conclusion meaningfully. Shift instructions are good for multiplication and division by powers of 2 but not for arbitrary multipliers and divisors.Banshaku wrote:@Koitsu
Yes, some bits can shift in your corn flakes... what where we talking about, condiments?
Edge case II: Digital eventually turned VAX into a microprocessor several years later (CVAX, 1987). But by then, the market for microprocessors with a PDP-11-inspired 32-bit ISA was favoring sɯɐᴉllᴉM's 68K, and a few years later, Digital pivoted to Alpha.
Such as 6502, LR35902, or what have you.Banshaku wrote:On a serious note, unless you target a very specific CPU that have low specs
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
Re: C: multi vs single dimensional array performance?
Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine. Not to mention it still has to do an extra addition for generating the offset in a multidimensional array that it wouldn't have to otherwise.lidnariq wrote:Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)
Even if in cache? (Depends on what level of cache, but you get the point)pubby wrote:It takes more time to read from memory nowadays than it does to multiply two numbers together.
With what, a stopwatch?rainwarrior wrote:If you want to know how fast or slow they are, time it.
The starting offset will be generated the same way no matter what, which is why I said it wouldn't be slower for a single access. Say if you wanted to increment all the data on the fifth [4] row of a 20x20 array though; this:rainwarrior wrote:If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?
Code: Select all
i = 4 * 20;
j = i + 20;
for(; i < j; i++)
array[i]++;
Code: Select all
j = 4;
for(i = 0; i < 20; i++)
array[j][i]++;
The obvious thing would be just don't use a multidimensional array for this then. Except, I think there are many people who are not aware that the above code would be faster, never mind not caring; it looks uglier, for one thing. Not saying that programming needs to be fool-proofed though...
Actually 19 now (kind of pathetic at this point...) The NECV33 / 80186 in the Irem M92 has multiplication and division instructions, but I think I remember they're still a fair bit slower than addition / subtraction. Maybe I'm just thinking of the 68000 where they're infamously slow.koitsu wrote: I believe the OP is equally young and may be equally unaware.
Shit...calima wrote:RAM access 121+ cycles.
Re: C: multi vs single dimensional array performance?
Maybe. Maybe not. Optimizers are so crazy these days that it's hard to guess what will faster without actually checking. But for most things like this, the difference is irrelevant other than in the most extreme situations (a bottle-neck of code that has to run bajillions of times, or something extremely timing-sensitive).should be faster than this:Code: Select all
i = 4 * 20; j = i + 20; for(; i < j; i++) array[i]++;
Code: Select all
j = 4; for(i = 0; i < 20; i++) array[j][i]++;
My games: http://www.bitethechili.com
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: C: multi vs single dimensional array performance?
How do you make a 1 cycle instruction faster than another 1 cycle instruction?Drew Sebastino wrote:Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine.lidnariq wrote:Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)
Yes.Drew Sebastino wrote:Even if in cache? (Depends on what level of cache, but you get the point)pubby wrote:It takes more time to read from memory nowadays than it does to multiply two numbers together.
That's not an invalid way of timing a test that takes long enough, but of course you should probably use something designed for the purpose of timing software.Drew Sebastino wrote:With what, a stopwatch?rainwarrior wrote:If you want to know how fast or slow they are, time it.
No, in that case it won't be faster at all. ap9 was saying that in that example iterating over j will be slower than iterating over i. Iterating directly on the row won't have any meaningful improvement. For a loop like that the compiler would definitely know that j is a invariant here, and will operate accordingly.Drew Sebastino wrote:The starting offset will be generated the same way no matter what, which is why I said it wouldn't be slower for a single access. Say if you wanted to increment all the data on the fifth [4] row of a 20x20 array though; this:rainwarrior wrote:If you've got something that's conceptually a 2D grid, what's your alternative way of dealing with it? You need to turn 2 coordinates into an address. How else do you propose to do that?
should be faster than this:Code: Select all
i = 4 * 20; j = i + 20; for(; i < j; i++) array[i]++;
Edit: I think ap9 was talking about the same thing?Code: Select all
j = 4; for(i = 0; i < 20; i++) array[j][i]++;
It would generally be capable of doing this even if you put j=4 inside the loop. (Identifying loop invariants is something compilers usually do very well.) ...but even if it didn't it might not matter for performance due to latency/pipelines, and then if it does there's certainly lower impact ways to fix the problem than having to change your data structure everywhere else it's used.
...except it isn't faster. There's a reason I suggest that you time these things. What's seems to be "obvious" to you is factually wrong. You won't be able to correct your intuition about this without finding some way to actually verify what you believe.Drew Sebastino wrote:The obvious thing would be just don't use a multidimensional array for this then. Except, I think there are many people who are not aware that the above code would be faster...
Also, talking about shifts vs. multiplies, even though the instruction speed is a probably not different, there are often still advantages to keeping rows of your structures a power of 2 in length. Sometimes it helps with caching, sometimes it helps with other optimziations. (Most of them there is no reason to do "by hand".)
Even on older systems where e.g. a multiply by 10 might potentially have been a slower instruction than two shifts and an add, compilers would automatically compile * 10 into those shift and add instructions. Just because you write * does not specify imul. The compiler is allowed to get that operation done in whatever ways it knows how. The point of optimizing compilers is to keep the programmer from having to make brainless transpositions of the code like this and let them write something easier to read.
Re: C: multi vs single dimensional array performance?
A friend of mine was in a competition to write an AI for Laser Chess ≈ Khet. He managed to fit the entire board state into a single slice of L1 cache, and as such in the time limit alloted per turn he was able to run his less-clever AI to search through 3 or 4 more moves in the future ("ply"), which got him to second place out of 20-ish contestants (First place was an AI that really had a better intuition about how the game worked).93143 wrote:Which is also (so I hear) why it's good to know whether the language stores multidimensional arrays in row-column or column-row order, because if you write your inner loop so that it increments the wrong one, you can end up skipping all over memory and load yourself down with cache misses
Re: C: multi vs single dimensional array performance?
By a constraint of a superscalar microarchitecture requiring it to be scheduled on an execute unit that another instruction is already using. If you have one and a half units (U-V pipeline in Pentium and Pentium MMX as described in John Stokes' article), or one full unit and two half units (4-1-1 pipeline in Pentium Pro, Pentium II, and Pentium III), it's best for the compiler to interleave instructions that can run on the half units with instructions that need the full unit. Lately, even different threads have started to share resources, as seen in Intel processors with Hyper-Threading Technology and AMD processors with Clustered MultiThreading.rainwarrior wrote:How do you make a 1 cycle instruction faster than another 1 cycle instruction?Drew Sebastino wrote:Well, it's not like it's never been "meaningfully slow", but it'll always be slower than integer addition, I'd imagine.lidnariq wrote:Why do you think integer multiplication is meaningfully slow on modern computers? (By which I mean everything newer than the pentium 2)