Introducing architecture variants

Posted by jnsgruk 3 days ago

Introducing architecture variants(discourse.ubuntu.com)

238 points | 142 commentspage 2

physicsguy 2 days ago|

This is quite good news but it’s worth remembering that it’s a rare piece of software in the modern scientific/numerical world that can be compiled against the versions in distro package managers, as versions can significantly lag upstream months after release.

If you’re doing that sort of work, you also shouldn’t use pre-compiled PyPi packages for the same reason - you leave a ton of performance on the table by not targeting the micro-architecture you’re running on.

PaulHoule 2 days ago||

My RSS reader trains a model every week or so and takes 15 minutes total with plain numpy, scikit-learn and all that. Intel MKL can do the same job in about half the time as the default BLAS. So you are looking at a noticeable performance boost but zero bullshit install with uv is worth a lot. If I was interested in improving the model than yeah I might need to train 200 of them interactively and I’d really feel the difference. Thing is the model is pretty good as it is and to make something better I’d have to think long and hard about what ‘better’ means.

ciaranmca 2 days ago||

Out of interest, what reader is this? Sounds interesting

PaulHoule 2 days ago||

I've talked about it a lot here, see https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

colechristensen 2 days ago|||

Most of the scientific numerical code I ever used had been in use for decades and would compile on a unix variant released in 1992, much less the distribution version of dependencies that were a year or two behind upstream.

owlbite 2 days ago|||

Very true, but a lot of stuff builds on a few core optimized libraries like BLAS/LAPACK, and picking up a build of those targeted at a modern microarchitecture can give you 10x or more compared to a non-targeted build.

That said, most of those packages will just read the hardware capability from the OS and dispatch an appropriate codepath anyway. You maybe save some code footprint by restricting the number of codepaths it needs to compile.

physicsguy 2 days ago|||

I mean that’s just lucky and totally depends on your field and what is normal - just as an example, we used the LLNL SUNDIALS package for implicit time integration. On Ubuntu 24.04 the latest version is 6.4.1 where the latest published is v7.5.0. We found their major version releases tended to require changes.

There’s also the difference between being able to run and being able to run optimised. At least 5 years ago, the Ubuntu/Debian builds of FFTW didn’t include the parallelised OpenMP library.

In a past life I did HPC support and I recommend the Spack package manager a lot to people working in this area because you can get optimised builds with whatever compiler tool chain and options you need quite easily that way.

zipy124 2 days ago|||

Yup, if you're using OpenCV for instance compiling instead of using pre-built binaries can result in 10x or more speed-ups once you take into account avx/threading/math/blas-libraries etc...

oofbey 2 days ago||

Yup. The irony is that the packages which are difficult to build are the ones that most benefit from custom builds.

niwtsol 2 days ago|||

Thanks for sharing this. I'd love to learn more about micro-architectures and instruction sets - would you have any recommendations for books or sources that would be a good starting place?

physicsguy 2 days ago||

My experience is mostly practical really - the trick is to learn how to compile stuff yourself.

If you do a typical: "cmake . && make install" then you will often miss compiler optimisations. There's no standard across different packages so you often have to dig into internals of the build system and look at the options provided and experiment.

Typically if you compile a C/C++/Fortran .cpp/.c/.fXX file by hand, you have to supply arguments to instruct the use of specific instruction sets. -march=native typically means "compile this binary to run with the maximum set of SIMD instrucitons that my current machine supports" but you can get quite granular doing things like "-march=sse4,avx,avx2" for either compatibility reasons or to try out subsets.

jeffbee 2 days ago||

I wonder who downvoted this. The juice you are going to get from building your core applications and libraries to suit your workload are going to be far larger than the small improvements available from microarchitectural targeting. For example on Ubuntu I have some ETL pipelines that need libxml2. Linking it statically into the application cuts the ETL runtime by 30%. Essentially none of the practices of Debian/Ubuntu Linux are what you'd choose for efficiency. Their practices are designed around some pretty old and arguably obsolete ideas about ease of maintenance.

smlacy 2 days ago||

I presume the motivation is performance optimization? It would be more compelling to include some of the benefits in the announcement?

embedding-shape 2 days ago||

They do mention it in the linked announcement, although not really highlighted, just as a quick mention:

> As a result, we’re very excited to share that in Ubuntu 25.10, some packages are available, on an opt-in basis, in their optimized form for the more modern x86-64-v3 architecture level

> Previous benchmarks we have run (where we rebuilt the entire archive for x86-64-v3 57) show that most packages show a slight (around 1%) performance improvement and some packages, mostly those that are somewhat numerical in nature, improve more than that.

pushfoo 2 days ago||

ARM/RISC-V extensions may be another reason. If a wide-spread variant configuration exists, why not build for it? See: - RISC-V's official extensions[1] - ARM's JS-specific float-to-fixed[2]

1. https://riscv.atlassian.net/wiki/spaces/HOME/pages/16154732/... 2. https://developer.arm.com/documentation/dui0801/h/A64-Floati...

zdw 2 days ago||

Many other 3rd party software has already required x86-64-v2 or -v3 already.

I couldn't run something from NPM on a older NAS machine (HP Microserver Gen 7) recently because of this.

stabbles 2 days ago||

Seems like this is not using glibc's hwcaps (where shared libraries were located in microarch specific subdirs).

To me hwcaps feels like a very unfortunate feature creep of glibc now. I don't see why it was ever added, given that it's hard to compile only shared libraries for a specific microarch, and it does not benefit executables. Distros seem to avoid it. All it does is causing unnecessary stat calls when running an executable.

mwhudson 1 day ago|

No it's not using hwcaps. That would only allow optimization of code in shared libraries, would be irritating to implement in a way that didn't require touching each package that includes shared libraries and would (depending on details) waste a bunch of space on every users system. I think hwcaps would only make sense for a small number of shared libraries if at all, not a system wide thing.

ElijahLynn 2 days ago||

I clicked on this article expecting an M series variant for Apple hardware...

malkia 2 days ago||

This is awesome, but ... If you process requires deterministic results (speaking about floats/doubles mostly here), then you need to get this straight.

sluongng 2 days ago||

Nice. This is one of the main reasons why I picked CachyOS recently. Now I can fallback to Ubuntu if CachyOS gets me stuck somewhere.

yohbho 2 days ago|

CachyOS uses this one percent of performance gains? Since it uses every performance gain, unsurprising. But now I wonder how my laptop from 2012 did run CachyOS, they seem to switch based on hardware, not during image download and boot.

topato 2 days ago||

correct, it just sets the repository in the pacman.conf to either cachyos, -v3, or -v4 during install time based on hardware probe

tommica 2 days ago||

Once they have rebuilt with rust, they get to move away from GPL licenses and get to monetize things.

zer0zzz 2 days ago||

There was a fat elf project to solve this problem at one point I thought.

DrNosferatu 2 days ago|

Link?

mariusor 2 days ago||

Maybe parent is referring to icculus' FatELF proposal from fifteen years ago? https://icculus.org/fatelf/

wyldfire 2 days ago|

Would we have something like aarch64 neon/SVE too?

More comments...