Posted by ingve 4 days ago
https://git.savannah.gnu.org/cgit/config.git/tree/
The `testsuite/` directory contains some data files with a fairly extensive list of known targets. The vendor field should be considered fully extensible, and new combinations of know machine/kernel/libc shouldn't be considered invalid, but anything else should have a patch submitted.
But that doesn't mean the article should be ignored in its entirety. LLVM's target triple parsing is more relevant for several projects (especially given that the GNU target triple scheme doesn't include native Windows, which is one of the most common targets in practice!). Part of the problem is that for many people "what is a target triple" is actually a lead-in to the question "what are the valid targets?", and trying to read config.guess is not a good vehicle to discover the answer. config.guess isn't also a good way to find about target triples for systems that aren't designed to run general-purpose computing, like if you're trying to compile for a GPU architecture, or even a weird x86 context like UEFI.
LLVM has a lot of problems, but it feels significantly more modern.
I do wish we had a "new LLVM" doing to LLVM what it did to gcc. Just because it's better doesn't mean it's perfect.
Basically, you can respect history while also being honest about the current state of things. But also, doing so requires you to care primarily about things like ease of use, rather than things like licenses. For some people, they care about licenses first, usability second.
And as a bonus, seems to me a nailed down IR actually is that portable assembly language the C people keep telling us is what they wanted. Most of them don't actually want that and won't thank you - but if even 1% of the "I need a portable assembler" crowd actually did want a portable assembler they're a large volume of customers from day one.
If you're ignoring the API and writing IR directly there are advantages to LLVM though.
Immature to the point of alienating.
also, gcc is relatively old and comes with a lot of baggage. LLVM is sort of the defacto standard now with improvements in performance
Distributions, and therefore virtually all the software used by a distribution user, still generally use gcc. LLVM is only the de facto standard when doing something new, and for JIT.
From what I gathered, LLVM has a lot of C++ specific design choices in its IR language anyway. I think I'd count that as baggage.
I personally don't think one is better than the other. Sometimes clang produces faster code, sometimes gcc. I haven't really dealt with compiler bugs from either. They compile my projects at the same speed. Clang is better at certain analyses, gcc better at certain others.
> After all, you don’t want to be building your iPhone app on literal iPhone hardware.
iPhones are impressively powerful, but you wouldn't know it from the software lockdown that Apple holds on it.
Example: https://www.tomsguide.com/phones/iphones/iphone-16-is-actual...
There's a reason people were clamoring for Apple to make ARM laptops/desktops for years before Apple finally committed.
> A critical piece of history here is to understand the really stupid way in which GCC does cross compiling. Traditionally, each GCC binary would be built for one target triple. [...] Nobody with a brain does this ^2
You're doing GCC a great disservice by ignoring its storied and essential history. It's over 40 years old, and was created at a time where there were no free/libre compilers. Computers were small and slow. Of course you wouldn't bundle multiple targets in one distribution.
LLVM benefitted from a completely different architecture and starting from a blank slate when computers were already faster and much larger, and was heavily sponsored by a vendor that was innately interested in cross-compiling: Apple. (Guess where LLVM's creator worked for years and lead the development tools team)
Also, in this specific case, this ignores the history around LLVM offering itself up to the FSF. gcc could have benefitted from this fresh start too. But purely by accident, it did not.
On my system, "dnf repoquery --whatrequires cross-gcc-common" lists 26 gcc-*-linux-gnu packages (that is, kernel / firmware cross compilers for 26 architectures). The command "dnf repoquery --whatrequires cross-binutils-common" lists 31 binutils-*-linux-gnu packages.
The author writes, "LLVM and all cross compilers that follow it instead put all of the backends in one binary". Do those compilers support 25+ back-ends? And if they do, is it good design to install back-ends for (say) 23 such target architectures that you're never going to cross-compile for, in practice? Does that benefit the user?
My impression is that the author does not understand the modularity of gcc cross compilers / packages because he's unaware of (or doesn't care for) the scale that gcc aims at.
rustc --print target-list | wc -l
287
I'm kinda surprised at how large that is, actually. But yeah, I don't mind if I have the capability to cross-compile to x86_64-wrs-vxworks that I'm never going to use.I am not an expert on all of these details in clang specifically, but with rustc, we take advantage of llvm's target specifications, so you that you can even configure a backend that the compiler doesn't yet know about by simply giving it a json file with a description. https://doc.rust-lang.org/nightly/nightly-rustc/rustc_target...
While these built-in ones aren't defined as JSON, you can ask the compiler to print one for you:
rustc +nightly -Z unstable-options --target=x86_64-unknown-linux-gnu --print target-spec-json
It's lengthy so instead of pasting here, I've put this in a gist: https://gist.github.com/steveklabnik/a25cdefda1aef25d7b40df3...Anyway, it is true that gcc supports more targets than llvm, at least in theory. https://blog.yossarian.net/2021/02/28/Weird-architectures-we...
I vaguely recall the FSF (or maybe only Stallman) arguing against the modular nature of LLVM because a monolothic structure (like GCC's) makes it harder for anti-GPL actors (Apple!) to undermine it. Was this related?
Chris Lattner offered to donate the copyright of LLVM to the FSF at one point: https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg00888.html
He even wrote some patches: https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg01112.html
However, due to Stallman's... idiosyncratic email setup, he missed this: https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00...
> I am stunned to see that we had this offer.
> Now, based on hindsight, I wish we had accepted it.
Note this email is in 2015, ten years after the initial one.
and innately disinterested in Free Software, too
At work we had some benchmarking suites that ran on physical devices and even with significant effort put into cooling them they spent more time sleeping waiting to cool off than actually running the benchmarks.
So, it turns out, actually a lot of people call it x64 — including author's own friends! — it's just that the author dislikes it. Disliking something is fine, but why claim outright falsehood which you know first-hand is false?
Also, the actual proper name for this ISA is, of course, EM64T. /s
> The fourth entry of the triple (and I repeat myself, yes, it’s still a triple)
Any actual justification except the bald assertions from the personal preferences? Just call it a "tuple", or something...
If someone said "old compilers were usually cross-compilers", that would be an ahistoric statement (somewhat).
If someone used clang in a movie set in the 90s, that would be anachronistic.
Oh, sure, there have been plenty of native-host-only compilers. It was never a property of all compilers, though. Most system brings-ups, from the mainframes of the 1960s through the minis of the 1970s to the micros and embeddeds of the 1980s and onwards have required cross compilers.
I think what he means is that a single-target toolchain is an anachronism. That's also not true, since even clang doesn't target everything under the sun in one binary. A toolchain needs far more than a compiler, for a start; it needs the headers and libraries and it needs a linker. To go from source to executable (or herd of dynamic shared objects) requires a whole lot more than installing the clang (or whatever front-end) binary and choosing a nifty target triple. Most builds of clang don't even support all the interesting target triples and you need to build it yourself, which require a lot more computer than I can afford.
Target triples are not even something limited to toolchains. I maintain software that gets cross-built to all kinds of targets all the time and that requires target triples for the same reasons compilers do. Target triples are just a basic tool of the trade if you deal with anything other than scripting the browser and they're a solved problem rediscovered every now and then by people who haven;t studied their history.
And ironically in all of this, building a full toolchain based on GCC is still easier than with LLVM.