"Modern compilers will outperform handwritten assembly."
Moderator: Moderators
- Drew Sebastino
- Formerly Espozo
- Posts: 3496
- Joined: Mon Sep 15, 2014 4:35 pm
- Location: Richmond, Virginia
"Modern compilers will outperform handwritten assembly."
...Is something that is almost always mentioned in any discussion about assembly programing in the present. I never see any code comparisons, statistics, or citations or anything; I assume it's just supposed to be obvious enough that it doesn't require an in-depth explanation. Maybe this is due to me being biased (even after learning C and much of C++), but I find this difficult to believe. I don't see how a compiler could generate more efficient code when it has no real intuition of how any specific program operates, and (at least for what I've seen for x86), function calls will always be expensive no matter what, because seemingly no assumptions are made about where / how functions are called.
I thought then that, maybe with how complex instruction timing has become and other headaches with modern computing, that a compiler may be able to write more efficient code this way, but this specific level of optimization should change with just about every different processor, or at least every different microarchitecture, although I don't know how drastically.
I thought then that, maybe with how complex instruction timing has become and other headaches with modern computing, that a compiler may be able to write more efficient code this way, but this specific level of optimization should change with just about every different processor, or at least every different microarchitecture, although I don't know how drastically.
- TmEE
- Posts: 960
- Joined: Wed Feb 13, 2008 9:10 am
- Location: Norway (50 and 60Hz compatible :P)
- Contact:
Re: "Modern compilers will outperform handwritten assembly."
A compiler can have intimate knowledge of how to order instructions so that various latencies inside modern CPUs won't add up but get overlapped instead. I know I'll be able to beat a compiler when writing 486 and Pentium code since there's very little that gets in the way but with a P4 and most current x86 CPU I definitely won't be. As far as other modern high performance architectures go, I don't know.
Re: "Modern compilers will outperform handwritten assembly."
Generating optimal code often requires knowing some things that compilers can know but programmers generally won't, but also requires knowing some things that programmers will know but compilers can't. Further, it requires recognizing that any "optimizations" predicated upon the notion that a programmer won't do X will be counter-productive if a programmer needs to do X.Drew Sebastino wrote:Maybe this is due to me being biased (even after learning C and much of C++), but I find this difficult to believe. I don't see how a compiler could generate more efficient code when it has no real intuition of how any specific program operates, and (at least for what I've seen for x86), function calls will always be expensive no matter what, because seemingly no assumptions are made about where / how functions are called.
Unfortunately, development on C compilers like clang and gcc has been taken over by people who mistakenly believe "clever" and "stupid" are antonyms. The authors C Standard didn't think it necessary to say "This Standard makes no attempt to forbid obtuse compiler writers from doing stupid things that make their compilers unsuitable for many purposes". They recognize that principle in the published Rationale, but not within the Standards document itself. Consequently, there's an increasing divergence between the language that became popular in the 1980s, and the dialect that clang and gcc process with optimizations enabled, with the latter being suitable only for a few specialized applications that do not involve processing any potentially-invalid input.
Re: "Modern compilers will outperform handwritten assembly."
While it's possible for modern compilers to exploit things about architectures that programmers wouldn't be able to with any reasonable amount of effort, programmers know more than compilers about what needs to be done. Consider, for example, a function which is supposed to accept pointers to two 4096-byte blocks of memory and return 0 if they contain matching contents or any non-zero value if they're don't. One approach would be to simply compare the two blocks a byte at a time, and stop after all 4096 bytes have been compared or a mismatch has found. Another would be to check the pointers for alignment, select one of 64 comparison methods based upon the pointer alignments, and then compare process eight bytes at a time, using whatever shifts are required. The first approach may be more than double the speed of almost anything else in cases where it's necessary to compare many blocks that match except for the first byte. The second approach may be almost eight times as fast as the first in cases where the two blocks match. If the Standard were to include alternate memory-comparison functions which implementations could simply chain to `memcmp` if desired, but which implementations would be encouraged to optimize for different cases, then picking the right routine in various situations could do more to improve performance than anything a clever implementation might hope to accomplish.TmEE wrote:A compiler can have intimate knowledge of how to order instructions so that various latencies inside modern CPUs won't add up but get overlapped instead. I know I'll be able to beat a compiler when writing 486 and Pentium code since there's very little that gets in the way but with a P4 and most current x86 CPU I definitely won't be. As far as other modern high performance architectures go, I don't know.
While some compiler writers might suggest "profile-driven optimization" as a solution to such issues, that fails to consider the fact that a compiler will generally have no way of distinguishing cases where performance matters from those where it doesn't. If an operation needs to be performed 1000 times, it may be better to have it take 10us 99% of the time and 1000us 1% of the time, than to have it take 100us every time, provided the time for all 1000 operations never exceeds 200ms. If the time for 1000 operations could sometimes reach a full second, however, that may be unacceptable. Such issues can be especially important in cases where performance can depend upon the data a program receives from potentially-malicious sources. If a language has no way of expressing such concepts, compilers cannot be expected to generate optimal code that meets them.
Re: "Modern compilers will outperform handwritten assembly."
I suspect the idea that compilers can outperform handwritten assembly comes from modern 32- and 64-bit multi-core high-end processors that have super-complex instructions, deep pipelines that must be kept full, multi-level cache, out-of order execution, and things that simply don't apply to the 6502. I've never heard of a 6502 compiler that could remotely compete with hand assembly. Perhaps a great compiler could be written if there were the commercial interest to invest the necessary man-hours, but I still don't think it would produce code that could match, let alone exceed, the performance of code from any skilled 6502 assembly-language programmer.
http://WilsonMinesCo.com/ lots of 6502 resources
Re: "Modern compilers will outperform handwritten assembly."
The idea post-dates most of the 32 bit era.
The compilers started getting competitive with hand written stuff in the 486/pentium era, and have gotten better as the used subset of x86 shrinks down, and the chips become a bit more regular as far as timing goes. A hefty amount is due to the microarchitecture doing a lot of the work for them.
For modern stuff, the only real reason to drop to inline assembly anymore is when you're trying to express concepts that C/C++ really don't want to do, or possibly SIMD. But those areas are shrinking rapidly, and SIMD is far, far easier to deal with via intrinsics.
Compilers are generally pretty good about tedious, mechanical, predictable, repetitive operations like register allocation, loop unrolling, and inlining (though unrolling is a lot more situational now, and arguably has been since branch prediction became a thing). They're at best tied on subexpression elimination, but their hands are tied a bit there by language semantics.
It's been a rather long while since I last ran into a situation where the compiler output wasn't mostly what I would have written anyways, for anything where it mattered, and even those cases there was usually a way to tweak the C/C++ code to make the compiler do the better thing without having to go to assembly.
The compilers started getting competitive with hand written stuff in the 486/pentium era, and have gotten better as the used subset of x86 shrinks down, and the chips become a bit more regular as far as timing goes. A hefty amount is due to the microarchitecture doing a lot of the work for them.
For modern stuff, the only real reason to drop to inline assembly anymore is when you're trying to express concepts that C/C++ really don't want to do, or possibly SIMD. But those areas are shrinking rapidly, and SIMD is far, far easier to deal with via intrinsics.
Compilers are generally pretty good about tedious, mechanical, predictable, repetitive operations like register allocation, loop unrolling, and inlining (though unrolling is a lot more situational now, and arguably has been since branch prediction became a thing). They're at best tied on subexpression elimination, but their hands are tied a bit there by language semantics.
It's been a rather long while since I last ran into a situation where the compiler output wasn't mostly what I would have written anyways, for anything where it mattered, and even those cases there was usually a way to tweak the C/C++ code to make the compiler do the better thing without having to go to assembly.
Re: "Modern compilers will outperform handwritten assembly."
Actually, the idea substantially predates the 6502. Many optimization techniques would result in code that was very brittle, and would thus need to be substantially reworked if requirements changed. If one were trying to write an assembly-language program to process a two-dimensional array, one would likely have to either write code that would only work for a power-of-two array size, or write less efficient code that would work with any size. If one were instead writing the code in FORTRAN, however, one wouldn't have to change anything other than the array dimension to make the compiler automatically generate whatever code was appropriate for the specified size. Hand-written assembly code that was specifically tailored for a particular array size might outperform the compiler-generated machine code for that size, but assembly code that needed to be easily maintainable could not outperform the machine code generated by a compiler that was not thus encumbered.Garth wrote:I suspect the idea that compilers can outperform handwritten assembly comes from modern 32- and 64-bit multi-core high-end processors that have super-complex instructions, deep pipelines that must be kept full, multi-level cache, out-of order execution, and things that simply don't apply to the 6502. I've never heard of a 6502 compiler that could remotely compete with hand assembly. Perhaps a great compiler could be written if there were the commercial interest to invest the necessary man-hours, but I still don't think it would produce code that could match, let alone exceed, the performance of code from any skilled 6502 assembly-language programmer.
Re: "Modern compilers will outperform handwritten assembly."
Fair 'nuff. My experience is almost entirely on 486+ PC hardware, retro consoles, and PS2/GC-present consoles.
I could certainly see the promise of a sufficiently smart compiler being closer to reality on the sort of machines people wrote FORTRAN code for. It definitely wasn't the case for most of the PC compilers (Watcom excepted).
Console compilers have been a wide range of hit and miss there, with a particularly painful era of false promises in the 360/PS3 era when IBM managed to convince everyone that out of order was a passing fad and the compiler would totally be able to schedule code properly.
I could certainly see the promise of a sufficiently smart compiler being closer to reality on the sort of machines people wrote FORTRAN code for. It definitely wasn't the case for most of the PC compilers (Watcom excepted).
Console compilers have been a wide range of hit and miss there, with a particularly painful era of false promises in the 360/PS3 era when IBM managed to convince everyone that out of order was a passing fad and the compiler would totally be able to schedule code properly.
Re: "Modern compilers will outperform handwritten assembly."
I think that's a very small part of programming though; and even then, you could write assembly macros that would generate the optimum code for the needed array size. If you decide to change the array size, there's no need to re-write the code that invokes the macros. I've had macro definitions that were 50 lines or more yet only laid down two ML instructions, depending on all the conditions the macros watched for.supercat wrote:Actually, the idea substantially predates the 6502. Many optimization techniques would result in code that was very brittle, and would thus need to be substantially reworked if requirements changed. If one were trying to write an assembly-language program to process a two-dimensional array, one would likely have to either write code that would only work for a power-of-two array size, or write less efficient code that would work with any size. If one were instead writing the code in FORTRAN, however, one wouldn't have to change anything other than the array dimension to make the compiler automatically generate whatever code was appropriate for the specified size. Hand-written assembly code that was specifically tailored for a particular array size might outperform the compiler-generated machine code for that size, but assembly code that needed to be easily maintainable could not outperform the machine code generated by a compiler that was not thus encumbered.Garth wrote:I suspect the idea that compilers can outperform handwritten assembly comes from modern 32- and 64-bit multi-core high-end processors that have super-complex instructions, deep pipelines that must be kept full, multi-level cache, out-of order execution, and things that simply don't apply to the 6502. I've never heard of a 6502 compiler that could remotely compete with hand assembly. Perhaps a great compiler could be written if there were the commercial interest to invest the necessary man-hours, but I still don't think it would produce code that could match, let alone exceed, the performance of code from any skilled 6502 assembly-language programmer.
Last edited by Garth on Tue Dec 03, 2019 9:33 pm, edited 1 time in total.
http://WilsonMinesCo.com/ lots of 6502 resources
Re: "Modern compilers will outperform handwritten assembly."
A macro assembler that can let the programmer specify and handle all of the proper cases will generally be more complicated than a compiler that can simply generate the proper code for the proper cases, and even when using such tools the level of programmer effort required to handle such cases will be greater than would have been needed in a higher-level language.Garth wrote: I think that's a very small part of programming though; and even then, you could write assembly macros that would generate the optimum code for the needed array size. If you decide to change the array size, there's no need to re-write the code that calls the macros. I've had macro definitions that were 50 lines or more yet only laid down two ML instructions, depending on all the conditions the macros watched for.
I gave array sizes as a simple example of something I know compilers could do even in the 1960s. Register usage and constant reuse would actually be even more significant issues, but I'm not sure what the state of the art was when the 6502 was introduced. If one writes something in FORTRAN equivalent to the C code:
Code: Select all
#define SCALE1 2.5f
#define SCALE2 1.5f
double adjust(float x, float y)
{
return x*SCALE1 + y*(SCALE2+1.0f);
}
Code: Select all
int array1[1000];
int test(int n)
{
for (int i=0; i<100; i++)
{
array1[i] += 1;
someFunction();
array1[i]--;
}
Re: "Modern compilers will outperform handwritten assembly."
"Modern compilers will outperform handwritten assembly." I think it depends on the program being written, on the compiler, on the computer (or VM) being targeted, and on the programmer. (In some cases, I know because I have tried it, that the handwritten assembly code will work better.)
(Free Hero Mesh - FOSS puzzle game engine)
Re: "Modern compilers will outperform handwritten assembly."
I think the cases where people even try to hand-write assembly, and to optimize it, are now so rare that basically no engineers are able to do that anymore. The performance gain is not just the code but the time it takes to actually hand-write highly performing assembly code. In the industry it's not worth paying an engineer to take two weeks optimizing his assembly routine when you could just use sub-optimal code and better, hardware instead. The hardware might be more expensive but the lower time to market finally makes this solution the most performing.
In the case of retro consoles it's different because the hardware is fixed and we have to push it to it's limit.
In the case of retro consoles it's different because the hardware is fixed and we have to push it to it's limit.
Re: "Modern compilers will outperform handwritten assembly."
Even on modern microcontrollers, there are many cases where spending a few hours on hand-written assembly may offer some major performance improvements. Most of the improvements that would be achievable within a couple weeks wouldn't take more than a few hours to achieve. A bigger issue is determining what dialect of C makes the most sense, and ensuring that one's compiler will actually process it without jumping the rails.Bregalad wrote:I think the cases where people even try to hand-write assembly, and to optimize it, are now so rare that basically no engineers are able to do that anymore. The performance gain is not just the code but the time it takes to actually hand-write highly performing assembly code. In the industry it's not worth paying an engineer to take two weeks optimizing his assembly routine when you could just use sub-optimal code and better, hardware instead. The hardware might be more expensive but the lower time to market finally makes this solution the most performing.
For example, if one needs a function that takes a pointer to a 32-bit-aligned byte address and interprets the four bytes there as a big-endian 32-bit unsigned int, such a function might be written a couple of ways for e.g. the Cortex-M0
Code: Select all
uint32_t fetch32le_v1(void *p)
{
uint8_t *pp = p;
return (pp[0]<<24) | (pp[1]<<16) | (pp[2]<<8) | pp[3];
}
uint32_t fetch32le_v2(void *p)
{
uint32_t result = *(uint32_t*)p;
return (result<<24) | ((result<<8) & 0xFF0000) | ((result>>8) & 0xFF00) | (result >> 24);
}
Re: "Modern compilers will outperform handwritten assembly."
This seems more a problem with C language being inapropriate to describe the program, rather than the compiler failing at optimizing the assembly.
- rainwarrior
- Posts: 8732
- Joined: Sun Jan 22, 2012 12:03 pm
- Location: Canada
- Contact:
Re: "Modern compilers will outperform handwritten assembly."
Standard C maybe, but that's also why most compilers have extensions like __declspec(align), or __attribute__ ((aligned)). There's a lot of very practical ways to address domain specific problems when you actually allow domain specific solutions.Bregalad wrote:This seems more a problem with C language being inapropriate to describe the program, rather than the compiler failing at optimizing the assembly.
Also if you wait long enough things that everybody implements anyway do usually eventually make it into the standard spec.