GBDK2020 - Is it smart trying to outsmart the compiler ?

Discussion of programming and development for the original Game Boy and Game Boy Color.
Post Reply
M2m
Posts: 24
Joined: Mon Feb 15, 2021 12:53 pm

GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by M2m »

I am honestly still quite new to GB Dev ( I do it of course as a hobby after work as time permits). Anyway I have a question regarding multiplication.

In assembler a classic multiplication could be implemented as follows maybe:

Code: Select all

;--------------- 
; multiply
; HL = D * E
;---------------
multiply:
	; the following should do HL = D * E (bc will be destroyed)
	ld hl, 0	; clear hl
	ld a, d	; load d in for multiply
	
	or a	; same as cp 0
	ret z	; if 0, we are done
	
	ld b, 0 ; setup bc for looped addition
	ld c, e

multiply_loop:
	add hl, bc; 	add our value onto hl
	dec a		; and loop until a (which is d) is zero
	jr nz, multiply_loop

	ret
But what we see is that as an example 3*13 takes 3 loops (adding up 13), and 13*3 would consequently take 13 loops (adding up 3).
So it would be most likely 'smart' to compare multiplicand and multiplier and switch them accordingly - i.e multiplicand should always be the smaller one, in order to optimize loops.

Now if I am talking about GBDK2020: Would the compiler take care of this ? Would it be better to implement it manually ?

Code: Select all

if (a>=b){ c = b * a;}
else { c = a * b;}
Does it even make a difference ?

Thoughts ?
Last edited by M2m on Thu Apr 08, 2021 8:25 pm, edited 1 time in total.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by lidnariq »

Do you know that SDCC is doing the slower thing?
M2m
Posts: 24
Joined: Mon Feb 15, 2021 12:53 pm

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by M2m »

lidnariq wrote: Thu Apr 08, 2021 8:15 pm Do you know that SDCC is doing the slower thing?
No I do not know what SDCC is doing, and I am not skilled enough yet to debug / disassemble the SDCC generated code. That's exactly why I am asking :D
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by lidnariq »

If you don't know that the compiler is doing the slow thing, then it's definitely premature to try to "help" it.

This is true for all optimizations ever, honestly. If you don't know what's taking up time, your assumption will probably be wrong and will make things worse instead.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by tokumaru »

Most multiplication algorithms actually in use do no loop through the units of one of the factors, they actually loop through the bits of the factor, which means that the number of iterations would be the same regardless of the order of the factors. The number of bits that are set is what would affect the time taken to complete the operation (i.e. looping through 128 would be faster than looping through 15, because the former has only 1 bit set while the latter has 4), since each set bit requires an addition to be performed. It's also possible that the multiplications are table-based, and take a constant amount of time to execute regardless of the values.

Anyway, lidnariq is 100% right, it's pointless to try and optimize something that might not even be doing what you think it is. Either trust the tools and address the problems as (or if) they arise, or look under the hood to find out what the tools are doing and whether you have a problem with that.
Shonumi
Posts: 342
Joined: Sun Jan 26, 2014 9:31 am

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by Shonumi »

Kinda reminds me of a saying I once heard. Something to the effect that if you can generate code by hand better than your compiler, you should probably be programming a better compiler.

Premature optimization is something that's best avoided. Focus on making your code correct, first and foremost, then iron out speed issues later on. I don't know exactly why you need multiplication (if I had to guess, you're using it for 2.5D math or a similar project), but it might ultimately not prove to be a bottleneck, or whatever slowdown that does occur may be insignificant or easily compensated for elsewhere. My advice in this situation regarding compiler vs hand-coded assembly is to wait until you encounter a specific problem before reaching for a specific solution.
coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by coto »

Usually you need a profiler tool and a unit test. Both in hand to test behavior. Otherwise it's guessing game.

A unit test would be testing code behavior to match 100% what it's intended to do as opposed to be erratic:
https://bitbucket.org/Coto88/toolchaing ... rc/master/


A profiler would compare how many instructions were generated, each opcode per cycle and a timer to measure each unit test. That's why you need both.

ROM code generated by official tools are expected to behave better than open source counter parts, like GCC compilers. This means custom compilers will have much more bugs as it isn't a paid environment nor the original engineering team is involved in the specs (considering original unit tests).
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by calima »

coto wrote: Fri Apr 09, 2021 9:06 pm ROM code generated by official tools are expected to behave better than open source counter parts, like GCC compilers.
In my experience, IBM, Nintendo, AMD and Nvidia compilers suck in comparison to gcc.
coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: GBDK2020 - Is it smart trying to outsmart the compiler ?

Post by coto »

calima wrote: Fri Apr 09, 2021 11:39 pm
coto wrote: Fri Apr 09, 2021 9:06 pm ROM code generated by official tools are expected to behave better than open source counter parts, like GCC compilers.
In my experience, IBM, Nintendo, AMD and Nvidia compilers suck in comparison to gcc.
From what I can tell, customer environment use either custom or paid compilers, which are guaranteed to work better. On ARM, GCC emits buggy relocation code. (You can see here: http://forums.nesdev.com/viewtopic.php?f=23&t=18659), since i've switched to Clang, the binaries produce much better code (albeit ~%20 slower). This translates to having usable and not undefined behavior binaries. On GCC, a test case I had to do literally everytime I added interrupt code in the ARM9 interrupt handler was to create a multiple of two buffer (32K, 64K,128K) so linked code would actually boot, and cross fingers expecting code not to explode.

On the NintendoDS, Codewarrior didn't use any Nintendo inhouse compiler, but RVCT (ARM), and it produces excellent code since it's paid. There are some homebrew using such compiler like moonshell and code just works (you just need to read moonshell2 source code, the hand-made inline ARM code into C code requires a few pass of translation units to understand and link not only objects but also the processor registers to carefully not break anything)... for both ARMv4t and ARMv5te code.

GCC It's known to have decreased code quality (because of either LTO or optimized codepaths):
- https://pagure.io/fesco/issue/2020

- https://opensource.apple.com/source/cla ... rison.html

- https://lwn.net/Articles/582697/
I haven't touched the LLVM code base in years, but way back when, I tried to use it as embedded JIT. (And I wound up writing the first version of llvm-config, so it could be easily linked as a JIT.)
Along the way, I tripped over a compiler bug. I can't remember the details now, but I wound up diving into the LLVM C++ code to fix it. And the code was *nice*—there once major intermediate representation, based on a strongly-typed assembly language with a very small number of operations. I was able to find my way around the code base and make my patch without unnecessary pain. Even better, I could dump the intermediate representation as an LLVM "assembly" file at any step of the compilation process, which made debugging very civilized.
So I'm not surprised that LLVM has since gone on to be quite successful. The architecture was always very approachable and very pleasant to hack on. At least at the time, that was not really the case with GCC.
Many people would rather put their code under GPL than have to interface to an unstable API. If the new "modular" GCC can provide useful, *stable* API's that are developed to meet a broad class of use cases, it could gain significant mindshare.
Full disclosure: I'm an LLVM developer. Hopefully this will help clarify the sense in which LLVM is useful as a library:
LLVM does have stable C API's, but doing anything nontrivial with them usually will require exposing something else from C++-land, and so you end up having to get involved with upstream.
One thing to keep in mind is that LLVM's C++ API's "simply exposes LLVM's internals". LLVM's is "library-based", if you define "library" as "a well-factored body of code that does something". However, if you include any notion of compatibility or attention to external use cases in your notion of library, then LLVM doesn't really qualify.
LLVM is like a bunch of libraries that evolve in lock-step. I sometimes describe it as "LLVM is loosely coupled from a software architecture standpoint, but very tightly coupled from a development standpoint".
Also, LLVM is developed with very little attention to external use cases. Most software libraries exist solely to serve external use cases. What is exposed as stable in LLVM's C API is basically "the minimal amount of stuff we can expose right now to meet the use cases of the users that have come barking at us". The C++ API's are literally the internal include/ dir. LLVM has no "public" include/ dir with its "public headers": all the headers in include/ (except for the C API) are evolved in lock-step with the goal of cleaner, better code, and no attention to backwards compatibility.
I'm not saying that this is a good or bad thing (actually, as I mentioned in another thread, this is a very strong disincentive for keeping private branches, which is a really good thing IMO). Personally, as an LLVM developer, I really like the freedom of not having to worry about backwards compatibility 99.9% of the time, even though I'm modifying ostensibly public interfaces. However, it's a headache for users.
Personally, I would not want GCC to go that same route. Or at least, I would like to see GCC learn from this aspect of LLVM and not end up in exactly the same situation. I have no clue what to do differently though.

I should also mention that LLVM does have a few important points of compatibility besides just the C API:
- The textual and bitcode format of LLVM IR, and its semantics.
- The syntax and semantics of the code that Clang translates (as you would expect from any production-quality compiler)
- The commandline options accepted by Clang (as you would expect from any production-quality compiler)
(hell yes to the last paragraph, I can relate to that since I took the same route developing ToolchainGenericDS and code emitted acts as a lockstep codebase (newlib+tgds librarian format + tgds project). This allows to add features without breaking older code, otherwise there'd be a bug with older code itself.



https://alibabatech.medium.com/gcc-vs-c ... 9ede2be378
From the benchmarking tests above, we can see that Clang offers more advantages for the construction of large projects while GCC is always advantageous in performance optimization. The bla depends on your specific application
In addition to performance comparison, I would like to share the advantages and disadvantages of GCC and Clang and LLVM:



Anyway to get to the point, yes, GCC helps to emit good binaries as long as you run a set of unit tests otherwise be prepared to waste years maintaining broken crap, as opposed to Clang's steady emitted code quality. Which DO resembles a paid compiler, it's just a matter of doing a side-by-side comparison of official ROM code against Clang emitted code, you'll see cleaner code. Now doing the same against GCC... yuck.

As a side note, Clang tend to break (and cause code section errors) when using nested pre processor code (like using pre processed defines accesing a struct member at least 5 times, then using them inside a switch statement), in GCC it compiles right away.
Post Reply