Posted by generichuman 1 day ago
- one is about the format of symbol information in the actual ELF binaries which is only an issue if you are not using the standard libc functions for looking up symbols for some strange reason
- one is an issue that impacts targeting a lower version of glibc from a higher one which is a configuration that was never supported (though usually fails more loudly)
- the last one is a security policy change which is legitimately an ABI break, but mostly impacts programs that have their execstack flags set incorrectly
glibc actually goes to a fair bit of effort to be compatible with old binaries unlike most of the rest of the Linux userspace. The binaries I built for my side project back in 2015 (BlastEm 0.3.0) still work fine on modern Linux and they dynamically link against glibc. This is just a hobby project, not a piece of professional software, and a build from before this JangaFX company even existed works fine.
I find it really bizarre when people talk about Linux binary compat and then complain entirely about glibc rather than the sort of problems that the manylinux project has had to deal with. glibc is one of the few parts of userspace you can depend on. Yes, setting up your toolchain to build against an old glibc on a modern distro is a bit annoying. Sure, if you do something sufficiently weird you might find yourself outside what glibc considers part of their stable ABI. But from where I sit, it works pretty well.
It is actually quite trivial when building with the Zig toolchain since you can simply append the requested glibc version to the target-triple (e.g. `-target aarch64-linux-gnu.2.xx`), but I think this doesn't work with regular clang or gcc (makes one wonder why not when Zig can pull it off).
I wouldn't say it's trivial, but it's not rocket science either. Basically there are two main approaches. One is to just build inside a chroot or container with a sufficiently old distro inside. This is generally the path of least resistance because your build system doesn't really have to have any awareness of what's going on. You just build normally inside the chroot/container. The main downsides with this approach are that it's kind of wasteful (you have a whole distro's filesystem) and if you want to use a newer compiler than what the old distro in question shipped with you generally have to build it yourself inside said chroot/container.
The other main approach is to use a sysroot. gcc and clang both take an optional --sysroot parameter which is an alternative root for header and library lookups. This lets you use a compiler on the normal host, but old headers and libs. You can also bake this parameter in when compiling gcc (and also I assume clang, but less sure there) if you want a dedicated cross-toolchain.
You can ship all of your of the libraries you use with your executable. This isn't possible to do with glibc. It's the exception which is why it's talked about the most.
It's definitely not the only exception. libgl is another obvious example since different GPUs need different userland code. I would be surprised if there there had never been compat regressions in those.
I agree with this article completely.
What exactly prevents this for glibc? I assume you'd need a dlopen equivalent from somewhere for bootstrapping, but are there other issues (like TLS or whatnot)?
That’s because when you’re trying to ship a product glibc is one of the primary sources of pain.
The distro as packager model ensures that everything is mixed together in the filesystem and is actively hostile to external packaging. Vendoring dependencies or static linking improves compatibility by choosing known working versions, but decreases incentive and ability for downstream (or users) to upgrade those dependencies.
The libc stuff in this article is mostly glibc-specific, and you'd have fewer issues targeting musl. Mixing static linking and dlopen doesn't make much sense, as said here[1] which is an interesting thread. Even dns resolution on glibc implies dynamic linking due to nsswitch.
Solutions like Snap, Flatpak, and AppImage work to contain the problem by reusing the same abstractions internally rather than introducing anything that directly addresses the issue. We won't have a clean solution until we collectively abandon the FHS for a decentralized filesystem layout where adding an application (not just a program binary) is as easy as extracting a package into a folder and integrates with the rest of the system. I've worked on this off and on for a while, but being so opinionated makes everything an uphill battle while accepting the current reality is easy.
[1] https://musl.openwall.narkive.com/lW4KCyXd/static-linking-an...
I have fond memories of installed Warlords Battle Cry 3, Warcraft 3, AOE2 etc. directories on flash drives, distributed to 20+ kids in high school (all using the same key). Good days.
They specifically say that it's their way of paying tribute to Civ playing by email.
Because, as far as I’ve heard, it borrowed that wholesale from Sun, who desperately needed an application to show off their new dynamic linking toy. There’s no reason they couldn’t’ve done a godsdamned daemon (that potentially dynamically loaded plugins) instead, and in fact making some sort of NSS compatibility shim that does work that way (either by linking the daemon with Glibc, or more ambitiously by reimplementing the NSS module APIs on top of a different libc) has been on my potential project list for years. (Long enough that Musl apparently did a different, less-powerful NSS shim in the meantime?)
The same applies to PAM word for word.
> Mixing static linking and dlopen doesn't make much sense, as said [in an oft-cited thread on the musl mailing list].
It’s a meh argument, I think.
It’s true that there’s something of a problem where two copies of a libc can’t coexist in a process, and that entails the problem of pulling in the whole libc that’s mentioned in the thread, but that to me seems more due to a poorly drawn abstraction boundary than anything else. Witness Windows, which has little to no problem with multiple libcs in a process; you may say that’s because most of the difficult-to-share stuff is in KERNEL32 instead, and I’d say that was exactly my point.
The host app would need to pull in a full copy of the dynamic loader? Well duh, but also (again) meh. The dynamic loader is not a trivial program, but it isn’t a huge program, either, especially if we cut down SysV/GNU’s (terrible) dynamic-linking ABI a bit and also only support dlopen()ing ELFs (elves?) that have no DT_NEEDED deps (having presumably been “statically” linked themselves).
So that thread, to me, feels like it has the same fundamental problem as Drepper’s standard rant[1] against static linking in general: it mixes up the problems arising from one libc’s particular implementation with problems inherent to the task of being a libc. (Drepper’s has much more of an attitude problem, of course.)
As for why you’d actually want to dlopen from a static executable, there’s one killer app: exokernels, loading (parts of) system-provided drivers into your process for speed. You might think this an academic fever dream, except that is how talking to the GPU works. Because of that, there’s basically no way to make a statically linked Linux GUI app that makes adequate use of a modern computer’s resources. (Even on a laptop with integrated graphics, using the CPU to shuttle pixels around is patently stupid and wasteful—by which I don’t mean you should never do it, just that there should be an alternative to doing it.)
Stretching the definitions a little, the in-proc part of a GPU driver is a very very smart RPC shim, and that’s not the only useful kind: medium-smart RPC shims like KERNEL32 and dumb ones like COM proxy DLLs and the Linux kernel’s VDSO are useful to dynamically load too.
And then there are plugins for stuff that doesn’t really want to pass through a bytestream interface (at all or efficiently), like media format support plugins (avoided by ffmpeg through linking in every media format ever), audio processing plugins, and so on.
Note that all of these intentionally have a very narrow waist[2] of an interface, and when done right they don’t even require both sides to share a malloc implementation. (Not a problem on Windows where there’s malloc at home^W^W^W a shared malloc in KERNEL32; the flip side is the malloc in KERNEL32 sucks ass and they’re stuck with it.) Hell, some of them hardly require wiring together arbitrary symbols and would be OK receiving and returning well-known structs of function pointers in an init function called after dlopen.
Only so long as you don't pass data structures from one to the other. The same caveats wrt malloc/free or fopen/fclose across libc boundaries still applies.
Well, not anymore, but only because libc is a system DLL on Windows now with a stable ABI, so for new apps they all share the same copy.
That's one of the reasons that OpenBSD is rather compelling. BSDAuth doesn't open arbitrary libraries to execute code, it forks and execs binaries so it doesn't pollute your program's namespace in unpredictable ways.
> It's true that there's something of a problem where two copies of a libc can't coexist in a process...
That's the meat of this article. It goes beyond complaining about a relatable issue and talks about the work and research they've done to see how it can be mitigated. I think it's a neat exercise to wonder how you could restructure a libc to allow multi-libc compatibility, but question why anyone would even want to statically link to libc in a program that dlopen's other libraries. If you're worried about a stable ABI with your libc, but acknowledge that other libraries you use link to a potentially different and incompatible libc thus making the problem even more complicated, you should probably go the BSDAuth route instead of introducing both additional complexity and incompatibility with existing systems. I think almost everything should be suitable for static linking and that Drepper's clarification is much more interesting than the rant. Polluting the global lib directory with a bunch of your private dependencies should be frowned upon and hides the real scale of applications. Installing an application shouldn't make the rest of your system harder to understand, especially when it doesn't do any special integration. When you have to dynamically link anyway:
> As for why you’d actually want to dlopen from a static executable, there’s one killer app: exokernels, loading (parts of) system-provided drivers into your process for speed.
If you're dealing with system resources like GPU drivers, those should be opaque implementations loaded by intermediaries like libglvnd. [1] This comes to mind as even more reason why dynamic dependencies of even static binaries are terrible. The resolution works, but it would be better if no zlib symbols would leak from mesa at all (using --exclude-libs and linking statically) so a compiled dependency cannot break the program that depends on it. So yes, I agree that dynamic dependencies of static libraries should be static themselves (though enforcing that is questionable), but I don't agree that the libc should be considered part of that problem and statically linked as well. That leads us to:
> ... when done right they don't even require both sides to share a malloc implementation
Better API design for libraries can eliminate a lot of these issues, but enforcing that is much harder problem in the current landscape where both sides are casually expected to share a malloc implementation -- hence the complication described in the article. "How can we force everything that exists into a better paradigm" is a lot less practical of a question than "what are the fewest changes we'd need to ensure this would work with just a recompile". I agree with the idea of a "narrow waist of an interface", but it's not useful in practice until people agree where the boundary should be and you can force everyone to abide by it.
[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28...
> Even if you managed to statically link GLIBC—or used an alternative like musl—your application would be unable to load any dynamic libraries at runtime.
But then they immediately said they actually statically link all of their deps aside from libc.
> Instead, we take a different approach: statically linking everything we can.
If they're statically linking everything other than libc, then using musl or statically linking glibc will finish the job. Unless they have some need for loading share libs at runtime which they didn't already have linked into their binary (i.e. manual dlopen), this solves the portability problem on Linux.
What am I missing (assuming I know of the security implications of statically linked binaries -- which they didn't mention as a concern)?
Neither static nor dynamic linking is looking to solve the 20 year old binaries issue, so both will have different issues.
But I think it's easier for me to find a 20 year old ISO of a Red Hat/Slackware where I can simply run the statically linked binary. Dependency hell for older distros become really difficult when the older packages are not archived anywhere anymore.
Even for simple 2D "Super VGA" you're needing to choose the correct XFree86 implementation and still tweak your Xorg configuration. The emulated hardware also has bugs, since most of the focus is now on virtio drivers.
(The 20-year-old program was linked against libsdl, which amusingly means on my modern system it supports Wayland with no issues.)
Some things built on top of that:
https://manpages.debian.org/man/debsnap https://manpages.debian.org/man/debbisect https://wiki.debian.org/BisectDebian https://metasnap.debian.net/ https://reproduce.debian.net/
Docker can run WASM runtimes, but I don't think podman or nerdctl can yet.
From https://news.ycombinator.com/item?id=38779803 :
docker run \
--runtime=io.containerd.wasmedge.v1 \
--platform=wasi/wasm \
secondstate/rust-example-hello
From https://news.ycombinator.com/item?id=41306658 :> ostree native containers are bootable host images that can also be built and signed with a SLSA provenance attestation; https://coreos.github.io/rpm-ostree/container/ :
rpm-ostree rebase ostree-image-signed:registry:<oci image>
rpm-ostree rebase ostree-image-signed:docker://<oci image>
Native containers run on the host and can host normal containers if a container engine is installed. Compared to an electron runtime, IDK how minimal a native container with systemd and podman, and WASM runtimes, and portable GUI rendering libraries could be.CoreOS - which was for creating minimal host images that host containers - is now Fedora Atomic is now Fedora Atomic Desktops and rpm-ostree. Silverblue, Kinoite, Sericea; and Bazzite and Secure Blue.
Secureblue has a hardened_malloc implementation.
From https://jangafx.com/insights/linux-binary-compatibility :
> To handle this correctly, each libc version would need a way to enumerate files across all other libc instances, including dynamically loaded ones, ensuring that every file is visited exactly once without forming cycles. This enumeration must also be thread-safe. Additionally, while enumeration is in progress, another libc could be dynamically loaded (e.g., via dlopen) on a separate thread, or a new file could be opened (e.g., a global constructor in a dynamically loaded library calling fopen).
FWIU, ROP Return-Oriented Programming and Gadgets approaches have implementations of things like dynamic header discovery of static and dynamic libraries at runtime; to compile more at runtime (which isn't safe, though: nothing reverifies what's mutated after loading the PE into process space, after NX tagging or not, before and after secure enclaves and LD_PRELOAD (which some go binaries don't respect, for example).
Can a microkernel do eBPF?
What about a RISC machine for WASM and WASI?
"Customasm – An assembler for custom, user-defined instruction sets" (2024) https://news.ycombinator.com/item?id=42717357
Maybe that would shrink some of these flatpaks which ship their own Electron runtimes instead of like the Gnome and KDE shared runtimes.
Python's manylinux project specifies a number of libc versions that manylinux packages portably target.
Manylinux requires a tool called auditwheel for Linux, delicate for MacOS, and delvewheel for windows;
Auditwheel > Overview: https://github.com/pypa/auditwheel#overview :
> auditwheel is a command line tool to facilitate the creation of Python wheel packages for Linux (containing pre-compiled binary extensions) that are compatible with a wide variety of Linux distributions, consistent with the PEP 600 manylinux_x_y, PEP 513 manylinux1, PEP 571 manylinux2010 and PEP 599 manylinux2014 platform tags.
> auditwheel show: shows external shared libraries that the wheel depends on (beyond the libraries included in the manylinux policies), and checks the extension modules for the use of versioned symbols that exceed the manylinux ABI.
> auditwheel repair: copies these external shared libraries into the wheel itself, and automatically modifies the appropriate RPATH entries such that these libraries will be picked up at runtime. This accomplishes a similar result as if the libraries had been statically linked without requiring changes to the build system. Packagers are advised that bundling, like static linking, may implicate copyright concerns
github/choosealicense.com: https://github.com/github/choosealicense.com
From https://news.ycombinator.com/item?id=42347468 :
> A manylinux_x_y wheel requires glibc>=x.y. A musllinux_x_y wheel requires musl libc>=x.y; per PEP 600
Same as any other kernel—the runtime is just a userspace program.
> Can a microkernel do eBPF?
If it implements it, why not?
Linux containers have process isolation features that userspace sandboxes like bubblewrap and runtimes don't.
Flatpaks bypass selinux and apparmor policies and run unconfined (on DAC but not MAC systems) because the path to the executable in the flatpaks differs from the system policy for */s?bin/* and so wouldn't be relabeled with the necessary extended filesystem attributes even on `restorecon /` (which runs on reboot if /.autorelabel exists).
Thus, e.g. Firefox from a signed package in a container on the host, and Firefox from a package on the host are more process-isolated than Firefox in a Flatpak or from a curl'ed statically-linked binary because one couldn't figure out the build system.
Container-selinux, Kata containers, and GVisor further secure containers without requiring the RAM necessary for full VM virtualization with Xen or Qemu; and that is possible because of container interface standards.
Linux machines run ELF binaries, which could include WASM instructions
/? ELF binary WASM : https://www.google.com/search?q=elf+binary+wasm :
mewz-project/wasker https://github.com/mewz-project/wasker :
> What's new with Wasker is, Wasker generates an OS-independent ELF file where WASI calls from Wasm applications remain unresolved.*
> This unresolved feature allows Wasker's output ELF file to be linked with WASI implementations provided by various operating systems, enabling each OS to execute Wasm applications.
> Wasker empowers your favorite OS to serve as a Wasm runtime!
Why shouldn't we container2wasm everything? Because (rootless) Linux containers better isolate the workload than any current WASM runtime in userspace.
/? awesome return oriented programming sire:github.com https://www.google.com/search?q=awesome+return+oriented+prog...
This can probably find multiple versions of libc at runtime, too: https://github.com/0vercl0k/rp :
> rp++ is a fast C++ ROP gadget finder for PE/ELF/Mach-O x86/x64/ARM/ARM64 binaries.
The problem is they also have problems which motivates people to statically link.
I remember back in the Amiga days when there were multiple libraries that provided file requesters. At one point I saw a unifying file requester library that implemented the interfaces of multiple others so that all requesters had the same look.
It's something that hasn't been done as far as I am aware on Linux. partially because of the problems with Linux dynamic libraries.
I think the answer isn't just static linking.
I think the solution is a commitment.
If you are going to make a dynamic library, commit to backwards compatibility. If you can't provide that, that's ok, but please statically link.
Perhaps making a library at a base level with a forever backwards compatible interface with a static version for breaking changes would help. That might allow for a blend of bug support and adding future features.
There are entire distros, like alpine, built on musl. I find this very hard to believe.
Glibc's NSS is mostly relevant for LANs. Which is a lot of corporate and home networks.
[1] They're playing with fire here because you can't really assume to know for sure how the module 'dns' behaves. A user could replace the lib that backs it with their own that resolves everything to zombo.com. It would be one thing if nsswitch described behavior which was well defined and could be emulated but it doesn't, it specifies a specific implementation.
Where do the resolvers come from? It needs to be possible to install resolvers separately and dynamically load them. Unless you want to have NIS always installed. Better to install LDAP for those who need it.
[1] https://steamcommunity.com/sharedfiles/filedetails/?id=28643...
I disagree with their idea for fixing it by splitting up glibc. I think it's a bad idea because it doesn't actually fix the problems that lead to compat breakage, and it's bad because it's harder than it seems.
They cite these compat bugs as part of their reasoning for why glibc should be split up:
- https://sourceware.org/bugzilla/show_bug.cgi?id=29456
- https://sourceware.org/bugzilla/show_bug.cgi?id=32653
- https://sourceware.org/bugzilla/show_bug.cgi?id=32786
I don't see how a single one of these would be fixed by splitting up glibc. If their proposed libdl or libthread were updated and had one of these regressions, it would cause just as much of a bug as if a monolithic libc updates with one of these regressions.
So, splitting up glibc wouldn't fix the issue.
Also, splitting up glibc would be super nasty because of how the threading, loading, and syscall parts of libc are coupled (some syscalls are implemented with deep threading awareness, like the setxid calls, threads need to know about the loader and vice-versa, and other issues).
I think the problem here is how releases are cut. In an ideal world, glibc devs would have caught all three of those bugs before shipping 2.41. Big corpos like Microsoft manage that by having a binary compatibility team that runs All The Apps on every new version of the OS. I'm guessing that glibc doesn't have (as much of) that kind of process.
1. Zig's toolchain statically links with musl libc, producing binaries that depend only on the Linux kernel syscall ABI, not any specific glibc version.
2. This eliminates all the symbol versioning nightmares (`GLIBC_2.xx not found`) that plague distribution across different Linux systems.
3. Vulkan provides a standardized, vendor-neutral GPU API that's supported across all modern graphics hardware, eliminating driver-specific dependencies.
4. The resulting binary is completely self-contained - no external library dependencies, no version mismatches, no containerization needed.
5. You get forward AND backward compatibility - the binary will run on both ancient and cutting-edge distros without modification.
The only real limitation is for NVIDIA CUDA-specific workloads, where you'd still need their proprietary stack.
Furthermore, for those who prefer a higher-level API, Zig CC + WebGPU offers similar benefits with a simpler programming model, though with slightly less mature implementations and a possible small performance overhead.
However, in cases where this wasn't consistently feasible - e.g. COM - Windows instead mandates the use of a common API to manage memory: CoGetMalloc etc.
> More importantly, separating the dynamic linker from the C library itself would allow multiple versions of libc to coexist, eliminating a major source of compatibility issues. This is exactly how Windows handles it, which is one of the reasons Windows maintains such strong binary compatibility. You can still run decades-old Windows software today because Microsoft doesn’t force everything to be tied to a single, ever-changing libc.
One of the questions of multiple versions on the same box is what about security issues of those older versions...The REAL reason windows maintains binary compatibility is because it is commercial and nobody ships source code.
In fact, many applications ship a whole boatload of DLLs, which I think is the commercial equivalent of static linking.
It is, in the sense that the package is bigger, and the package ships "everything it needs".
It isn't in the sense that those parts can be updated independently as long as the DLL interface is backward compatible.
For example, I ship OpenSSL dlls with my app. Which means swapping in a later (compatible) OpenSSL can be done (by the user if necessary.)
If I'm making a small utility I static link it - and I still use utilities daily I compiled 25 years ago. Obviously those dynamically link to KERNEL etc, but Microsoft has insane levels of compatibility there.
And perhaps that's the fundamental root of the issue. Windows has one provider, very committed to the longevity of software. Linux, well, does not.
That's OK. The world has room for different philosophies. And each one will have strengths and weaknesses.
Historically, they (almost) never break and they are steadily incremented to prevent overlaps in differences of parameters.
As WASI is also implementing syscalls for WASM, I'd argue that the binary format doesn't really matter as long as it's using the same syscalls in the end.
I understand this topic is mostly focussing on glibc/muslc problems, but if you want to develop stable software, use CGo free Go binaries. They likely will run in 10 years the same way they do today.
C ABI compatibility is a mess on linux mostly because upstream maintainers don't give a damn about semantic versioning. Just take a look at the SO file headers, and how they differ from upstream "semantic" versions of the library. As long as shared objects differ in versions due to breaking changes, and as long as the C ecosystem doesn't enforce correct versioning, this won't change.
In a way, we already have it in the form of ChromeOS, and I certainly don't want ChromeOS or Android to be the default experience, or for the community to start to cater to that model.
All of the things Linux would need to become to get Windows level marketshare would strip away what makes it appealing in the first place, in particular, full user control. I don't want a nanny desktop, but that's exactly what it would become.
Linux can stay niche for those who appreciate it as it is.