Cutting down Rust compile times from 30 to 2 minutes with one thousand crates

Posted by Qadriq 4/17/2025

Cutting down Rust compile times from 30 to 2 minutes with one thousand crates(www.feldera.com)

143 points | 87 commentspage 2

ay 4/17/2025|

I have just went through this with a project of mine, though unfortunately the code wasn’t autogenerated, so I needed to do a lot of mind-numbingly boring search-and-replace commands. I have cobbled together a little utility that allowed to automate the process somewhat.

Mostly a throwaway code with a heavy input from Claude, so the docs are in the code itself :-)

But in case anyone can find it useful:

https://github.com/ayourtch/tweak-code

pdimitar 4/18/2025|

Zero documentation? Do you expect potential users t9 figure it out by themselves?

ay 4/18/2025||

Thanks for the feedback !

The evidently misguided assumption was that whoever uses it will need to tweak it anyhow, so might as well read it through. As I wrote - it’s very close to throwaway code.

Anyway, I decide to experiment with Claude also writing a README - the result doesn’t seem too terribly incorrect on the first squint, and hopefully gives a slightly more impression of what that thing was attempting to do. (Disclaimer: I didn’t test it much other than my use case, so YMMV on whether it works at all).

pdimitar 4/18/2025||

That's much better, thanks.

> The evidently misguided assumption was that whoever uses it will need to tweak it anyhow, so might as well read it through. As I wrote - it’s very close to throwaway code.

Even if that assumption is true for part of the potential users, they would appreciate a starting point, you know.

Now I have bookmarked it and will check it out at one point.

If you ever figure you want to invest some more effort into it: try make it into an LSP server so it can integrate with LSP Code Actions directly.

ay 4/18/2025||

Thanks, hopefully it will be of any use and not too buggy ! (It was a classic “worked on my machine for my needs”, but I would be very suspicious of the autogenerated code being 100% bug-free).

I looked shortly at LSP but never had experience with it, and it looked very overwhelming… (and given that I generally use vi, it seemed like a bit too much overhead to also start using a different editor or learn integrations - which I looked at but they seemed a bit unsatisfying).

As a result this exercise got me into an entirely worse kind of shiny: writing my own TUI editor, with a function/type being the unit of editing rather than file. facepalm.

probably entirely worthless exercise, but it is a ton of fun and that is what matters for now ! :-)

pdimitar 4/18/2025||

Well, you can at least check out Zed and Helix first? Many people say they work perfectly for them and that they are simpler than both Emacs and [Neo]vim.

LSP Code Actions is super neat though. You can have a compiler error or a warning and when your cursor is positioned on it (inside the editor) you can invoke LSP Code Actions and have changes offered to you with a preview, then you can just agree to it and boom, it's done.

Obviously this might be too much work or too tedious for a hobby project, but it's good for you to know what's out there and how it's used. I don't use LSP Code Actions too often but find them invaluable when I do.

ay 4/19/2025||

I had looked at Zed, but it’s a GUI, and I would rather stay inside a terminal for a variety of reasons. Helix I didn’t try though…

It might be a bit closer to what I am after, so I will definitely give it a try even if as a source of inspiration for my reinvention of bicycle !

Thanks a lot !

1vuio0pswjnm7 4/17/2025||

The dependency costs seem to be self-evident but perhaps it would be enlightening to see a comparison of the energy costs, as well as the time costs and storage costs, of compiling Rust programs versus their C equivalents. For example, comparison might reveal that there is no difference and no trade-off or that any differences and trade-offs are small enough to be worth making in the interest of some higher purpose.

skeptrune 4/17/2025||

Rust documentation needs some work to emphasize that splitting the codebase into many smaller crates is the "correct" way to do things if you care about build time.

kzrdude 4/17/2025||

I wonder if it would make a difference, at the starting point, to use fully optimized compile for dependencies but only opt-level=1 for the main crate?

protolyticmind 4/18/2025||

One thing I am curious about is why the need for crates? Did they try modules or was the initial compiler using them??

Edit: grammar

xrisk 4/17/2025||

Not sure about the nature of the generated code, but wouldn’t doing the equivalent of dynamic linking mitigate this problem?

HKH2 4/17/2025|

How can you get around Rust's lack of a stable ABI?

pjmlp 4/18/2025||

The same way as in other ecosystems that lack it, compile everything with the same toolchain.

HKH2 4/19/2025||

And then you have to recompile everything every time you upgrade the toolchain.

pjmlp 4/19/2025||

Which is what you already have to do in Rust anyway, as the compilation hashes are bound to the toolchain.

sitzkrieg 4/17/2025||

the rust compiler is so impressively slow

ameixaseca 4/17/2025|

It actually isn't.

It becomes more evident when you consider the amount of work it is doing as well.

upmind 4/18/2025||

I agree that it is doing a LOT of work, but I believe OP and many others will compare it to other languages and notice that the Rust compiler is a LOT slower.

amelius 4/17/2025|

> despite having a 64-core machine with 128 threads, Rust barely puts any of them to work.

Rust is fast in theory, but if in practice they can't even get their compiler to squeeze any juice from the CPU, then what's the value of that language from a software engineering viewpoint?

swiftcoder 4/17/2025||

Compilation is inherently pretty hard to parallelise, and various design decisions around how Rust modules/creates work make it even harder (as demonstrated by achieving greater parallelism here with many smaller crates). There's no particular reflection on the performance of rust code here, it's really a design/algorithms problem.

amelius 4/17/2025|||

> Compilation is inherently pretty hard to parallelise

I don't agree. In a large project there is going to be a lot of stuff that can be compiled in parallel without problems.

swiftcoder 4/17/2025||

And Rust's module design makes this significantly more complicated than in superficially-similar languages. That's why splitting it into modules brings drastic improvements - you are effectively giving the compiler clear boundaries across which it doesn't need to propagate as much information.

steveklabnik 4/17/2025||

Splitting it into crates, not modules.

swiftcoder 4/18/2025||

Yeah, my bad. I do think this terminology drives some confusion, honestly, since neither "module" nor "crate" is a very portable term to other languages' compilation schemes.

steveklabnik 4/18/2025||

It's all good! I agree it's frustrating.

I also think it's kind of why people get confused by Rust's module system; they assume that it works like the module system of whatever language they're coming from. But they all work differently!

Ygg2 4/17/2025|||

C++ is fast in theory, but if they can't get C++ compiler (LLVM) to squeeze any juice from CPU, then what's the value of that language from a software engineering viewpoint.

Hopefully, you can see why this reasoning is a problem. The main stumbling point being compilation speed != runtime speed.

pjmlp 4/17/2025|||

Except the Rust ecosystem lacks the solutions we have in C++ land to compile fast and have easy paralelisation of builds.

Because C and C++ communities for historical reasons embrace binary libraries, and binary component frameworks like COM, so while in theory a full build from scratch takes similar time as Rust, in practice that isn't the case.

Also note that D, a language as complex as C++, with three compilers, one of them being based on LLVM, is largely faster to compile than Rust while using the LLVM backend, because Walter Bright did the right decisions on what to focus for development workflow.

dathinab 4/17/2025|||

> have easy paralelisation of builds.

except the company had a straight forward solution to the problem

and it's not always possible in C++ land either

and COM isn't part of C/C++ but a microsoft specific extension which solves very different issues as it's for cross application communication while we here have compile time perf issues inside of a single library

> binary libraries

I'm not sure if you mean dynamic linking or binary objects but there is the thing:

- dynamic linking isn't the issue here as the issue has nothing to do with re-compilation and similar (where dynamic linking can help),

- binary object files on the other hand are also something rust has and uses, it just sets the boundaries in different places (crate instead of file) which makes development easier, can lead to better runtime performance etc. It just has the drawback that you sometimes have to split things into multiple crates.

> a language as complex as

how complex a language is to write has not too much to do with how complex it is to split a single code unit into multiple parts for parallel compilation. C, C++ and D mainly side steps this by making each file a compilation unit, while in rust it's each crate. But that isn't fundamentally better or worse, it's trade offs. It's trade offs.

> because Walter Bright did the right decisions on what to focus for development workflow.

and so did rust, just with different priorities and trade offs

and given that D is mostly irrelevant and rust increasingly more successful maybe it was pursing the more important priorities

Also the OP case is about release builds i.e. not the normal dev loop

neither does the splitting affect dev experience as it's all auto generated code

and if projects which aren't auto generated it's very normal to split out libraries etc. anyway, and weather you split them into their own module, file or crate doesn't matter too much as long as you keep code clean (as in not having supper entangled Spagetti code)

and I wouldn't even be sure if D does perform in any relevant way better in compiler time if you compare their performance with the end result after they split the crate

pjmlp 4/18/2025||

COM is one approach, among others like SOM, DCE, ... as means to write binary libraries in poliglot languages.

Turns out that it is mostly used on Windows since Vista days, as means to have an OOP based OS, with most libraries written in C++, and having a stable ABI for such components.

So while it isn't ISO C++, it is mostly used by and from C++. .NET land usually only reaches for COM when using Windows APIs.

Binary libraries, mean binary libraries, doesn't matter if static or dynamically linked.

My point is that Rust ecosystem does not use them, you always need to compile from source code the complete dependency tree after a git clone, some of them even multiple times due to different feature flags configurations.

Not so with most commercial C and C++ development, we enjoy having binary libraries for dependecies, after a git clone only the main code needs to be compiled from scratch.

Yes there are ways to kind of do with sccache, but it is additionally tooling, not something that apparently cargo will ever support.

Also if you watch recent talks from Microsoft regarding their Rust adoption, the lack of tooling support for binary libraries distribution is one of their pain points.

dathinab 4/18/2025||

you are mixing up use cases and concepts

mainly the concepts of

- how build steps are cached in on instance of the project

- how build steps are cached across projects of common dependencies

- linking to system libraries

- bundling dependencies

Lets first look at it from a POV of system dependencies vs. bundling dependencies:

For each dependency you either link them, making them a system dependencies (because you link against the on in you system and require systems to have them) or bundle them, doesn't matter if it's C or Rust. The problem with system dependencies is they don't just need API compatibility but also ABI compatibility and not everything with ABI compatibility is actually API compatible. Which is all nice and fun except a ton of thing highly useful (some would say required) do not work well (or at all) if you need ABI compatibility and there are tone of (potential security bugs) seemingly API compatible but not API compatible libraries has.

In general history has shown that making most dependencies system dependencies is a complete shit show not worth anyone time and money, especially if people start mixing versions which seem compatible but aren't leading to strange runtime bug aren't possible with supported builds but anyway somehow are your fault as library maintainer.

Which is why the huge majority of the software industry _gave up on them_ for anything where they aren't strictly needed.

Rust can produce and use system dependencies, using a C API and with some less officially supported way also using rlibs (i.e. the binary libraries rust produces when compiling a crate, so yes it's using binary libraries).

But mostly it's not worth bothering with it, in the same way the majority of the rest of the software ecosystem stopped doing it.

Then let's look at reusing builds i.e. caching.

By default rust does that, but only on the scope of the project. I.e. you build a project change something and then rebuild it and dependencies won't be build again (except if you need to rebuild them I come back to it later). To be clear this is _not_ incremental building, which is a feature to re-use build parts on a more granular level then crates.

If you want it to cache things across projects or with some company build server you can do so using 3rd party software, i.e. same situation as with C.

> most commercial C and C++ development

Committing binary build artifact to a source code repo is a huge anti pattern, a terrible way to have distributed build caches. Stuff like that can easily make your company fail security reviews or become classified as having acted negligently if sued for damaged (e.g. caused by a virus sneaked into your program).

Also please _never ever_ checkout a 3rd party open source project with any pre-build binary artifacts in it, it's a huge security threat.

So in C/C++ you also should use the additional tools.

> Rust ecosystem does not use them

as mentioned they produce rlibs which are binary libraries (or you could say binary libraries bundled with metadata, and stuff which is roughly like how C++ templates are handled wrt. binary libraries)

And yes the tooling for shipping pre-build rlibs could be better, and it probably will get better. It's not that it can't be done, just priorities have been elsewhere so far.

> even multiple times due to different feature flags configurations.

Features are strictly additive, so no that won't happen.

The only reason for them being build multiple times is different incompatible versions of the package (which from rust POV are two different dependencies altogether). And while that seems initially kinda dump (unnecessary binary size/build time) but I can't understate how much of a huge blessing this turned out to be.

> not something that apparently cargo will ever support.

yes and make doesn't support distributed build caches without including 3rd party tools either. But it doesn't matter as long as you can just pull in the 3rd party tools if you need them.

EDIT: Rust features is like using #if and similar in the C/C++ pre-processor, i.e. if they change you have to rebuild. Like in C/C++. Also even without a crate might have been only partially compiled before (e.g. only 1 of 3 functions) so if you start using the other parts they still need to be compiled which will look a lot like recompilation (and without the incremental build feature might be a rebuild).

dathinab 4/19/2025||

To be clear my point isn't that rust has no problems with what I call remote build cache, other might call shipping build artifacts etc.

it _has_ issues there, which can be improved on, people are working on it, just kinda slowly

but they aren't fundamental ones, and aren't because rust doesn't "embrace binary libraries"

the only thing AFIK rust misses out one is "rust ABI/API" not being suited for becoming a new system library binary interface. I.e. it doesn't provide what swift is providing for Apple. But, it can implement many of the existing system library binary interfaces reasonable fine (mainly C ABI, COM, etc.).

Ygg2 4/17/2025|||

> Also note that D, a language as complex as C++

While D is complex, it's a different beast than Rust. I suspect think various checks, from lifetime to trait resolution, might make it more complex to parallelize than C++.

steveklabnik 4/17/2025||

Rust is already able to do more fine-grained parallel compilation than C or C++, at least in the codegen step. The "codegen units" concept doesn't work for those languages.

wahern 4/17/2025||

> The "codegen units" concept doesn't work for those languages.

Sure it does. Or at least could, depending on what you mean by that term. See, e.g., GCC's -flto-partition option, which supports various partitioning strategies for parallel LTO codegen.

steveklabnik 4/17/2025||

LTO is distinct, and happens after what codegen units does.

That said, it is a more fine grained parallelism, for sure. Rust does LTO as well as codegen-units.

Really, as you gesture towards, on some level, this is all semantics: our linkers are also basically compilers too, at this point.

amelius 4/17/2025|||

I have been using make's -j flag to compile C++ projects with great success.

The main point being, Rust focuses on the wrong type of optimizations.

sfink 4/17/2025|||

I also use make -j for C++ with great success. And so I am also stuck with having to declare my functions in a separate file from where they're defined, and thus when stepping through code with a debugger or just a good source viewer that can jump to callers/callees, I never see the comments because they're in the header not the source. Not to mention the problems I run into when linking with a library that was compiled with different options or a different version of the source. And bending over backwards to implement concurrent algorithms to get decent runtime performance, and debugging the inevitable bugs that follow.

Rust focuses on a different set of optimizations. I'm still working out when I prefer the Rust set or the C++ set or the Python set. I want to love Rust, but when doing exploratory work with unfamiliar APIs, the slow recompile loop and need to get everything correct for each incremental experimental build I do to try to figure out how something works is pretty painful. I don't know how much better it gets with familiarity. Rust is very nice when I fully understand the problem I'm solving and the environment I'm working in, and I vastly prefer it to C++ for that. But I frequently dive into unfamiliar codebases to make modifications.

dathinab 4/17/2025||||

codegen units is quite the same as the `-j` flag

cargo has the `-j` flag and defaults it to #cpus (logical cpus), so it's by default using the most times most optimal choice there

And this will parallelize the compilation of all "jobs", roughly like with make. Where a job is normally (oversimplified) compiling one code unit into one object file (.o).

And cargo does that too.

The problem is that where rust and C/C++ (and I think D) etc. set code unit boundaries differ.

In rust it's per crate. In C/C++ it's (oversimplified!!) per .h+.c file pair.

This has drawbacks and benefits. But one drawback is that it parallelizes less good. Hence why rust internally split one "semantic code unit" into multiple internal code units passed to LLM. So this is an additional level of parallelism to the -j flag.

In general this works fine and if people speak about rust builds being slow it is very rarely related to this aspect. But it puts a limit onto how much code you want in a single crate which people sometimes overlook.

But in the OP article they did run into it due to placing like idk 100k lines of code (with proc macors maybe _way_ more then that) into a single crate. And then also running into a bug where this internal parallelization somehow failed.

Basically imagine 100k+ line of code in a single .cpp file passing `-j` to the build to it will not help ;)

I think one important takeaway is that it could make sense to crate awareness about this by emitting a warning if your crate becomes way to big with a link to a in-depth explanation. Through practically most projects either aren't affected or split it into crates way earlier for various reasons (which sometimes include build time, but related to caching and incremental rebuilds, not fully clean debug builds).

wahern 4/17/2025||

> Hence why rust internally split one "semantic code unit" into multiple internal code units passed to LLM.

And the same has happened in C and C++ land, albeit in the opposite direction, where multiple compilation units can be optimized together, i.e. LTO. See, e.g., GCC's -flto-partition option for selecting strategies for partitioning symbols for LTO.

Also note that you can manually partition LTO in your Makefile by grouping compilation units into object files to be individually LTO'd.

dathinab 4/18/2025||

> C and C++ land, albeit in the opposite direction, [...] LTO

you also have that in rust to link different crates (and the internal splits for parallel compute) together

deciduously 4/17/2025|||

Which is exactly what this project is now able to do. Your parallel make jobs don't help if you have one gigantic compilation unit, as they originally did.

wahern 4/17/2025||

LTO can be parallelized, both implicitly from the Makefile, and also within the compiler. GCC's -flto itself takes an optional argument to control the number of parallel threads/jobs. See also the -flto-partition option for selecting symbol partitioning strategies.

Rust/Cargo does this automagically, except the only control you have are the crate and module boundaries. The analogous approach for C is to (optionally) manually group compilation units into a smaller set of object files in your Makefile, LTO'ing each object file in parallel (make -j), and then (optionally) telling the compiler to partition and parallelize a second time on the backend. Which is what Rust does, basically, IIUC--a crate is nominally the LTO codegen unit, except to speed up compilation Rust has heuristics for partitioning crates internally for parallel LTO.

imtringued 4/17/2025|||

It's a single Rust file with 100k lines of code spit out by a code generator.

dathinab 4/17/2025||

yesn't

codegen in the OP article is machine gode generation (i.e. running LLVM)

It's semantically kinda like having a single 100k file, but because rust knows it often generates huge "files" there is a splitting step, somewhere between parsing AST and generating machine code (I think after generating MIR but not fully sure). And the codgen-unit setting is in how many parts rust is allowed to split a thing which semantically is just one code unit. By default for release builds 16 (and as it can affect perf. of generated code it's not based on #cpus). But in there case there seems to be a bug which makes it effective more like 1! Which is much worse then it should be. (But also the statistic they show aren't sufficient to draw too much conclusions).

dathinab 4/17/2025||

you are missing a lot of things

- Compilation is an inherently hard to parallelize thing, and not just hard to parallelize but there are a lot of trade offs. This trade offs aren't even rust specific (i.e. C,C++, etc. are affected as much) and can lead to less performant generated binaries. And much more memory pressure which could make the compilation in total slower. Luckily for most code this doesn't matter as long as you don't goo too parallel. But's it's the reason why max codegen units is 16 not #num_cpus (for release builds, 256 for debug builds).

- Codegen means producing machine code, i.e. we are speaking about of LLVM so the bug might not be rust specific and might affect other languages, too.

- This still should mean 16 threads at high load not 1, so they seem to have hit a perforamnce-bug, i.e. not how things normally work. It's very unclear if the bug is in rust of LLVM if it's the later C/C++ might be affected, too.

- While rust does strongly prefers you splitting you code into multiple crates, it still shouldn't be stuck at a singe threat, i.e. from everything we can tell we are hitting some performance bug.

- Algorithm most times dominate performance much more then if you language is slightly faster or slower this is clearly an algorithmic error i.e. failing to parallelize while it normally does parallelize and the issue might be in C/C++ code, so "rust is fast" has pretty much nothing to do with this.

- Through it still should be mentioned that for certain design reasons rust doesn't want you to make a single crate too large. In most normal situations you often end up splitting a crate for various reasons before it becomes too large, but if you have a very huge blog of auto generated code that is easy to miss. Funnily what can lead to compiler time issues if your crate is way to big also does "in general" lead to better performance which brings us back to a lot of decisions in compilers having trade offs.

> then what's the value of that language from a software engineering viewpoint?

You mean besides producing fast code in a way which is much better to maintain then C++ (or many other languages) but has many of the benefits C++ has over C when it comes to being able to reuse code and algorithm which practically makes it much easier to use better algorithms which often is a much higher performance gain then any normal language code optimizations. Something which has shown repeatedly in Praxis. Not even speaking about the fact that tends to have less bugs, makes it much easier to communicate interface constraints in a reliable maintainable way etc.

The argument of a single case of running into a compile time performance bug while already doing something which isn't exactly a normal use case and doesn't follow the common advice to split crates if they become large somehow implying that rust has no value for software engineers is just kind dump. I mean you also wouldn't go around saying cats have no value from a family POV in general because one specific case where a cat did repeatedly scratch a teenager twice.