Posted by vermaden 6 hours ago
The way capabilities usually work is you more or less turn off the usual do whatever you want syscalls, and have to do restricted things through FDs that have the capability to do them. So like, no more open any path, you have to use openat with a FD in your directory of interest. But that requires the program to understand how to use the capabilities and how to be passed them. It's not something that you can just impose.
My understanding of SELinux, is it can be imposed on a program without the knowledge of the program, because it's more or less matching rules for syscalls... rather than giving a restricted FD to use with openat, you restrict the options for open.
But I am pretty sure you CAN get your capabilities from a patent process using capsicum, since they are just file descriptors.
Sometime ago I wrote a library for a customer that did exactly that: Open a number of resources, e.g., stdin, stdout, stderr, a pipe or two, a socket or two, make the seccomp calls necessary to restrict the use of read/write/etc. to the associated file descriptors, then lock out all other system calls - which includes seccomp-related calls.
Basically, the library took a very Capsicum-like approach of whitelisting specific actions then sealing itself against further changes.
This is a LOT of work, of course, and the available APIs don't make it particularly easy or elegant, but it is definitely doable. I chose this approach because the docker whitelist approach was far too open ended and "uncurated", if you will, for the use-case we were targeting.
In this particular case, I was aided by the fact the library was written to support the very specific use-case of filters running in containers using FIFOs for IPC, logging, and reporting: Every filter saw exactly the same interfaces to the world, so it was relatively easier to lock things down.
Having said that, I wish Linux had a Capsicum-equivalent call, or, even better for the approach I took, a friendlier way to whitelist specific calls.
Capsicum attaches rights to descriptors and gives kernel enforced primitives like cap_enter and cap_rights_limit, so delegation is explicit and easier to reason about. If you want Linux parity, use libseccomp to shrink the syscall surface, combine it with mount and user namespaces and Landlock for filesystem constraints, and design your app around FD based delegation instead of trying to encode every policy into BPF.
I'm not sure what glibc's latest policy is on linking statically, but at least it used to be basically unsupported and bugs about it were ignored. But even if supported, you can't know if it under some configurations or runtime circumstances uses dlopen for something.
Or maybe once you juggle more than X file descriptors some code switches from using `poll()` to using `select()` (or `epoll()`).
My thoughts last time I looked at seccomp: https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h...
That would break capsicum, too, so I don’t see how that’s a problem when “comparing Capsicum to using seccomp in the same way”.
“The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock is a stackable LSM [(Linux Security Model)], it makes it possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. ... Landlock empowers any process, including unprivileged ones, to securely restrict themselves.”
The one restriction I'm not sure about is that you can't say "~/ except ~/.gnupg". You have to actually enumerate everything you do want to allow. But maybe that's for the best. Both because it mandates rules not becoming too complex to reason about, and because that's a weird requirement in general. Like did you really mean to give access to ~/.gnupg.backup/? Probably not. Probably best to enumerate the allowlist.
And if you really want to, I guess you can listdir() and compose the exhaustive list manually, after subtracting the "except X".
I find seccomp unusable and not fit for purpose, but landlock closes many doors.
Maybe you know better? I'd love to hear your take.
On Linux I understand that Landlock is the way to go.
Between file system, bind/connect, and sending signals, that covers most of it. Probably the biggest remaining risk is any unpatched bugs in the kernel itself.
So one would need to first gain execution in the process, and then elevate that access inside the kernel, in a way that doesn't just grant you root but still Landlocked, and with a much smaller effective syscall attack surface. Like even if there's a kernel bug in ioctl on devs, landlock can turn that off too.
I already find it very frustrating that most open source projects spawning on HN's front page are resume-boosting AI slop but if blogs start being the same the internet is definitely dead.
Edit: it doesn't even looks like it's resume-boosting in this case, the “person” behind it doesn't even appear to exist. We can only speculate about the intent behind this.
But hey, it's a game!
It reminds me the pinnacle of design - Microsoft Authenticator. On Android, out of the blue, it displays global overlay to select one of the 3 numbers to confirm login.
The overlay is ... transparent.
The UI is fun but unreadable, but content is solid. Explain how this is slop please.
1. The post mainly reiterates a single idea (Capsicum enumerates what the process can do, seccomp provides a configurable filter) in many different ways. There is not much actual depth, code samples notwithstanding. Nothing on why different designs were chosen, how easy each is to use, outcomes besides the Chrome example, etc.
2. There are a lot of AI writing tells, like staccato sentences, parallelism ("Same browser. Same threat model. Same problem."), pointless summary tables, "it's not X, it's Y" contradiction ("This is not a bug. It is the original Unix security model"), etc.
3. The author has roughly a blog post a day, all with similar style and on widely varied topics, and in the same writing style. Unless the author has deep expertise on a remarkably wide range of topics and spends all their time writing, these can't reflect deep insight or experience, but minimal editing of AI output.
So yes, it's pretty sloppy.