Posted by Qadriq 2 days ago
I wonder how true this is.
Haven't use feldera but other rust stuff I have if I run as debug it has serious performance problems. However, for testing I have it compile a few crates that do work like `image` to be optimized (and the vast majority as debug) and that is enough to make the performance issues not noticeable. So if the multi-crate hadn't worked, possibly just only compile some of the stuff as optimized.
Edit: grammar
It becomes more evident when you consider the amount of work it is doing as well.
Rust is fast in theory, but if in practice they can't even get their compiler to squeeze any juice from the CPU, then what's the value of that language from a software engineering viewpoint?
I don't agree. In a large project there is going to be a lot of stuff that can be compiled in parallel without problems.
I also think it's kind of why people get confused by Rust's module system; they assume that it works like the module system of whatever language they're coming from. But they all work differently!
Hopefully, you can see why this reasoning is a problem. The main stumbling point being compilation speed != runtime speed.
Because C and C++ communities for historical reasons embrace binary libraries, and binary component frameworks like COM, so while in theory a full build from scratch takes similar time as Rust, in practice that isn't the case.
Also note that D, a language as complex as C++, with three compilers, one of them being based on LLVM, is largely faster to compile than Rust while using the LLVM backend, because Walter Bright did the right decisions on what to focus for development workflow.
except the company had a straight forward solution to the problem
and it's not always possible in C++ land either
and COM isn't part of C/C++ but a microsoft specific extension which solves very different issues as it's for cross application communication while we here have compile time perf issues inside of a single library
> binary libraries
I'm not sure if you mean dynamic linking or binary objects but there is the thing:
- dynamic linking isn't the issue here as the issue has nothing to do with re-compilation and similar (where dynamic linking can help),
- binary object files on the other hand are also something rust has and uses, it just sets the boundaries in different places (crate instead of file) which makes development easier, can lead to better runtime performance etc. It just has the drawback that you sometimes have to split things into multiple crates.
> a language as complex as
how complex a language is to write has not too much to do with how complex it is to split a single code unit into multiple parts for parallel compilation. C, C++ and D mainly side steps this by making each file a compilation unit, while in rust it's each crate. But that isn't fundamentally better or worse, it's trade offs. It's trade offs.
> because Walter Bright did the right decisions on what to focus for development workflow.
and so did rust, just with different priorities and trade offs
and given that D is mostly irrelevant and rust increasingly more successful maybe it was pursing the more important priorities
Also the OP case is about release builds i.e. not the normal dev loop
neither does the splitting affect dev experience as it's all auto generated code
and if projects which aren't auto generated it's very normal to split out libraries etc. anyway, and weather you split them into their own module, file or crate doesn't matter too much as long as you keep code clean (as in not having supper entangled Spagetti code)
and I wouldn't even be sure if D does perform in any relevant way better in compiler time if you compare their performance with the end result after they split the crate
Turns out that it is mostly used on Windows since Vista days, as means to have an OOP based OS, with most libraries written in C++, and having a stable ABI for such components.
So while it isn't ISO C++, it is mostly used by and from C++. .NET land usually only reaches for COM when using Windows APIs.
Binary libraries, mean binary libraries, doesn't matter if static or dynamically linked.
My point is that Rust ecosystem does not use them, you always need to compile from source code the complete dependency tree after a git clone, some of them even multiple times due to different feature flags configurations.
Not so with most commercial C and C++ development, we enjoy having binary libraries for dependecies, after a git clone only the main code needs to be compiled from scratch.
Yes there are ways to kind of do with sccache, but it is additionally tooling, not something that apparently cargo will ever support.
Also if you watch recent talks from Microsoft regarding their Rust adoption, the lack of tooling support for binary libraries distribution is one of their pain points.
mainly the concepts of
- how build steps are cached in on instance of the project
- how build steps are cached across projects of common dependencies
- linking to system libraries
- bundling dependencies
--
Lets first look at it from a POV of system dependencies vs. bundling dependencies:
For each dependency you either link them, making them a system dependencies (because you link against the on in you system and require systems to have them) or bundle them, doesn't matter if it's C or Rust. The problem with system dependencies is they don't just need API compatibility but also ABI compatibility and not everything with ABI compatibility is actually API compatible. Which is all nice and fun except a ton of thing highly useful (some would say required) do not work well (or at all) if you need ABI compatibility and there are tone of (potential security bugs) seemingly API compatible but not API compatible libraries has.
In general history has shown that making most dependencies system dependencies is a complete shit show not worth anyone time and money, especially if people start mixing versions which seem compatible but aren't leading to strange runtime bug aren't possible with supported builds but anyway somehow are your fault as library maintainer.
Which is why the huge majority of the software industry _gave up on them_ for anything where they aren't strictly needed.
Rust can produce and use system dependencies, using a C API and with some less officially supported way also using rlibs (i.e. the binary libraries rust produces when compiling a crate, so yes it's using binary libraries).
But mostly it's not worth bothering with it, in the same way the majority of the rest of the software ecosystem stopped doing it.
--
Then let's look at reusing builds i.e. caching.
By default rust does that, but only on the scope of the project. I.e. you build a project change something and then rebuild it and dependencies won't be build again (except if you need to rebuild them I come back to it later). To be clear this is _not_ incremental building, which is a feature to re-use build parts on a more granular level then crates.
If you want it to cache things across projects or with some company build server you can do so using 3rd party software, i.e. same situation as with C.
> most commercial C and C++ development
Committing binary build artifact to a source code repo is a huge anti pattern, a terrible way to have distributed build caches. Stuff like that can easily make your company fail security reviews or become classified as having acted negligently if sued for damaged (e.g. caused by a virus sneaked into your program).
Also please _never ever_ checkout a 3rd party open source project with any pre-build binary artifacts in it, it's a huge security threat.
So in C/C++ you also should use the additional tools.
> Rust ecosystem does not use them
as mentioned they produce rlibs which are binary libraries (or you could say binary libraries bundled with metadata, and stuff which is roughly like how C++ templates are handled wrt. binary libraries)
And yes the tooling for shipping pre-build rlibs could be better, and it probably will get better. It's not that it can't be done, just priorities have been elsewhere so far.
> even multiple times due to different feature flags configurations.
Features are strictly additive, so no that won't happen.
The only reason for them being build multiple times is different incompatible versions of the package (which from rust POV are two different dependencies altogether). And while that seems initially kinda dump (unnecessary binary size/build time) but I can't understate how much of a huge blessing this turned out to be.
> not something that apparently cargo will ever support.
yes and make doesn't support distributed build caches without including 3rd party tools either. But it doesn't matter as long as you can just pull in the 3rd party tools if you need them.
EDIT: Rust features is like using #if and similar in the C/C++ pre-processor, i.e. if they change you have to rebuild. Like in C/C++. Also even without a crate might have been only partially compiled before (e.g. only 1 of 3 functions) so if you start using the other parts they still need to be compiled which will look a lot like recompilation (and without the incremental build feature might be a rebuild).
it _has_ issues there, which can be improved on, people are working on it, just kinda slowly
but they aren't fundamental ones, and aren't because rust doesn't "embrace binary libraries"
the only thing AFIK rust misses out one is "rust ABI/API" not being suited for becoming a new system library binary interface. I.e. it doesn't provide what swift is providing for Apple. But, it can implement many of the existing system library binary interfaces reasonable fine (mainly C ABI, COM, etc.).
While D is complex, it's a different beast than Rust. I suspect think various checks, from lifetime to trait resolution, might make it more complex to parallelize than C++.
Sure it does. Or at least could, depending on what you mean by that term. See, e.g., GCC's -flto-partition option, which supports various partitioning strategies for parallel LTO codegen.
That said, it is a more fine grained parallelism, for sure. Rust does LTO as well as codegen-units.
Really, as you gesture towards, on some level, this is all semantics: our linkers are also basically compilers too, at this point.
The main point being, Rust focuses on the wrong type of optimizations.
Rust focuses on a different set of optimizations. I'm still working out when I prefer the Rust set or the C++ set or the Python set. I want to love Rust, but when doing exploratory work with unfamiliar APIs, the slow recompile loop and need to get everything correct for each incremental experimental build I do to try to figure out how something works is pretty painful. I don't know how much better it gets with familiarity. Rust is very nice when I fully understand the problem I'm solving and the environment I'm working in, and I vastly prefer it to C++ for that. But I frequently dive into unfamiliar codebases to make modifications.
cargo has the `-j` flag and defaults it to #cpus (logical cpus), so it's by default using the most times most optimal choice there
And this will parallelize the compilation of all "jobs", roughly like with make. Where a job is normally (oversimplified) compiling one code unit into one object file (.o).
And cargo does that too.
The problem is that where rust and C/C++ (and I think D) etc. set code unit boundaries differ.
In rust it's per crate. In C/C++ it's (oversimplified!!) per .h+.c file pair.
This has drawbacks and benefits. But one drawback is that it parallelizes less good. Hence why rust internally split one "semantic code unit" into multiple internal code units passed to LLM. So this is an additional level of parallelism to the -j flag.
In general this works fine and if people speak about rust builds being slow it is very rarely related to this aspect. But it puts a limit onto how much code you want in a single crate which people sometimes overlook.
But in the OP article they did run into it due to placing like idk 100k lines of code (with proc macors maybe _way_ more then that) into a single crate. And then also running into a bug where this internal parallelization somehow failed.
Basically imagine 100k+ line of code in a single .cpp file passing `-j` to the build to it will not help ;)
I think one important takeaway is that it could make sense to crate awareness about this by emitting a warning if your crate becomes way to big with a link to a in-depth explanation. Through practically most projects either aren't affected or split it into crates way earlier for various reasons (which sometimes include build time, but related to caching and incremental rebuilds, not fully clean debug builds).
And the same has happened in C and C++ land, albeit in the opposite direction, where multiple compilation units can be optimized together, i.e. LTO. See, e.g., GCC's -flto-partition option for selecting strategies for partitioning symbols for LTO.
Also note that you can manually partition LTO in your Makefile by grouping compilation units into object files to be individually LTO'd.
you also have that in rust to link different crates (and the internal splits for parallel compute) together
Rust/Cargo does this automagically, except the only control you have are the crate and module boundaries. The analogous approach for C is to (optionally) manually group compilation units into a smaller set of object files in your Makefile, LTO'ing each object file in parallel (make -j), and then (optionally) telling the compiler to partition and parallelize a second time on the backend. Which is what Rust does, basically, IIUC--a crate is nominally the LTO codegen unit, except to speed up compilation Rust has heuristics for partitioning crates internally for parallel LTO.
codegen in the OP article is machine gode generation (i.e. running LLVM)
It's semantically kinda like having a single 100k file, but because rust knows it often generates huge "files" there is a splitting step, somewhere between parsing AST and generating machine code (I think after generating MIR but not fully sure). And the codgen-unit setting is in how many parts rust is allowed to split a thing which semantically is just one code unit. By default for release builds 16 (and as it can affect perf. of generated code it's not based on #cpus). But in there case there seems to be a bug which makes it effective more like 1! Which is much worse then it should be. (But also the statistic they show aren't sufficient to draw too much conclusions).
- Compilation is an inherently hard to parallelize thing, and not just hard to parallelize but there are a lot of trade offs. This trade offs aren't even rust specific (i.e. C,C++, etc. are affected as much) and can lead to less performant generated binaries. And much more memory pressure which could make the compilation in total slower. Luckily for most code this doesn't matter as long as you don't goo too parallel. But's it's the reason why max codegen units is 16 not #num_cpus (for release builds, 256 for debug builds).
- Codegen means producing machine code, i.e. we are speaking about of LLVM so the bug might not be rust specific and might affect other languages, too.
- This still should mean 16 threads at high load not 1, so they seem to have hit a perforamnce-bug, i.e. not how things normally work. It's very unclear if the bug is in rust of LLVM if it's the later C/C++ might be affected, too.
- While rust does strongly prefers you splitting you code into multiple crates, it still shouldn't be stuck at a singe threat, i.e. from everything we can tell we are hitting some performance bug.
- Algorithm most times dominate performance much more then if you language is slightly faster or slower this is clearly an algorithmic error i.e. failing to parallelize while it normally does parallelize and the issue might be in C/C++ code, so "rust is fast" has pretty much nothing to do with this.
- Through it still should be mentioned that for certain design reasons rust doesn't want you to make a single crate too large. In most normal situations you often end up splitting a crate for various reasons before it becomes too large, but if you have a very huge blog of auto generated code that is easy to miss. Funnily what can lead to compiler time issues if your crate is way to big also does "in general" lead to better performance which brings us back to a lot of decisions in compilers having trade offs.
> then what's the value of that language from a software engineering viewpoint?
You mean besides producing fast code in a way which is much better to maintain then C++ (or many other languages) but has many of the benefits C++ has over C when it comes to being able to reuse code and algorithm which practically makes it much easier to use better algorithms which often is a much higher performance gain then any normal language code optimizations. Something which has shown repeatedly in Praxis. Not even speaking about the fact that tends to have less bugs, makes it much easier to communicate interface constraints in a reliable maintainable way etc.
The argument of a single case of running into a compile time performance bug while already doing something which isn't exactly a normal use case and doesn't follow the common advice to split crates if they become large somehow implying that rust has no value for software engineers is just kind dump. I mean you also wouldn't go around saying cats have no value from a family POV in general because one specific case where a cat did repeatedly scratch a teenager twice.