Bugs Rust won't catch

Posted by lwhsiao 9 hours ago

348 points | 166 comments

collinfunk 6 hours ago|

Hi, I am one of the maintainers of GNU Coreutils. Thanks for the article, it covers some interesting topics. In the little Rust that I have used, I have felt that it is far too easy to write TOCTOU races using std::fs. I hope the standard library gets an API similar to openat eventually.

I just want to mention that I disagree with the section titled "Rule: Resolve Paths Before Comparing Them". Generally, it is better to make calls to fstat and compare the st_dev and st_ino. However, that was mentioned in the article. A side effect that seems less often considered is the performance impact. Here is an example in practice:

  $ mkdir -p $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')
  $ while cd $(yes a/ | head -n 1024 | tr -d '\n'); do :; done 2>/dev/null
  $ echo a > file
  $ time cp file copy

  real 0m0.010s
  user 0m0.002s
  sys 0m0.003s
  $ time uu_cp file copy

  real 0m12.857s
  user 0m0.064s
  sys 0m12.702s

I know people are very unlikely to do something like that in real life. However, GNU software tends to work very hard to avoid arbitrary limits [1].

Also, the larger point still stands, but the article says "The Rust rewrite has shipped zero of these [memory saftey bugs], over a comparable window of activity." However, this is not true [2]. :)

[1] https://www.gnu.org/prep/standards/standards.html#Semantics [2] https://github.com/advisories/GHSA-w9vv-q986-vj7x

dapperdrake 5 hours ago||

First of all, thank you for presenting a succinct take on this viewpoint from the other side of the fence from where I am at.

So how can I learn from this? (Asking very aggressively, especially for Internet writing, to make the contrast unmistakable. And contrast helps with perceiving differences and mistakes.) (You also don’t owe me any of your time or mental bandwidth, whatsoever.)

So here goes:

Question 1:

How come "speed", "performance", race conditions and st_ino keep getting brought up?

Speed (latency), physically writing things out to storage (sequentially, atomically (ACID), all of HDD NVME SSD ODD FDD tape, "haskell monad", event horizons, finite speed of light and information, whatever) as well as race conditions all seem to boil down to the same thing. For reliable systems like accounting the path seems to be ACID or the highway. And "unreliable" systems forget fast enough that computers don’t seem to really make a difference there.

Question 2:

Does throughput really matter more than latency in everyday application?

Question 3 (explanation first, this time):

The focus on inode numbers is at least understandable with regards to the history of C and unix-like operating systems and GNU coreutils.

What about this basic example? Just make a USB thumb drive "work" for storing files (ignoring nand flash decay and USB). Without getting tripped up in libc IO buffering, fflush, kernel buffering (Hurd if you prefer it over Linux or FreeBSD), more than one application running on a multi-core and/or time-sliced system (to really weed out single-core CPUs running only a single user-land binary with blocking IO).

dijit 3 hours ago|||

> Does throughput really matter more than latency in everyday application?

In my experience latency and throughput are intrinsically linked unless you have the buffer-space to handle the throughput you want. Which you can't guarantee on all the systems where GNU Coreutils run.

awesome_dude 2 hours ago|||

> Question 2:

> Does throughput really matter more than latency in everyday application?

IME as a user, hell yes

Getting a video I don't mind if it buffers a moment, but once it starts I need all of that data moving to my player as quickly as possible

OTOH if there's no wait, but the data is restricted (the amount coming to my player is less than the player needs to fully render the images), the video is "unwatchable"

WJW 58 minutes ago||

I don't mean to nitpick, but absolute values for both of these matter much less than how much it is compared to "enough". As long as the throughput is enough to prevent the video from stuttering, it doesn't matter if the data is moved to your video player program at 1 GB/s or 1 TB/s. Conversely, you say you don't mind if a video buffers for a moment but I'm willing to bet there's some value of "a moment" where it becomes "too long". Nobody is willing to wait an hour buffering before their video starts.

The perception of speed in using a computer is almost entirely latency driven these days. Compare using `rg` or `git` vs loading up your banking website.

gzread 4 minutes ago|||

I see even the coreutils maintainers find themselves needing -n (no newlines) and -c (count) options to "yes".

s20n 6 hours ago|||

Sorry, complete noob here. Why didn't you just cd into $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')? Why do you need to use the while loop for cd?

EDIT: got it. -bash: cd: a/a/a/....../a/a/: File name too long

collinfunk 6 hours ago||

No need to apologize at all. Doing it in one cd invocation would fail since the file name is longer than PATH_MAX. In that case passing it to a system call would fail with errno set to ENAMETOOLONG.

You could probably make the loop more efficient, but it works good enough. Also, some shells don't allow you to enter directories that deep entirely. It doesn't work on mksh, for example.

dapperdrake 5 hours ago||

Facetious reply:

> However, GNU software tends to work very hard to avoid arbitrary limits [1].

Joker_vD 5 hours ago|||

Yes? The quote says "tends to", and you still can cd into that directory, albeit not in a single invocation. Windows has similar limitations [0], it's just that their MAX_PATH is just 260 so it's somewhat more noticeable... and IIRC the hard limit of 32 K for paths in non-negotiable.

[0] https://learn.microsoft.com/en-us/windows/win32/fileio/maxim...

nonameiguess 1 hour ago|||

It's not a GNU limit. It's in Linux: https://github.com/torvalds/linux/blob/v6.19/include/uapi/li...

theteapot 3 hours ago|||

Probably a dumb question, but is GNU Core utils interested in / planning on doing its own rust rewrite?

greatgib 1 hour ago||

The rewrite in Rust is mostly vanity and marketing but not based on a real technical need...

So I don't see why they would want to do that.

kibwen 42 minutes ago||

Canonical's usage of uutils is likely for marketing. But the codebase itself was developed for fun, as an excuse for people to have a hands-on way to learn Rust back before Rust was even released, with a minor justification as being cross-platform. From the original README in 2013:

Why?

----

Many GNU, linux and other utils are pretty awesome, and obviously some effort has been spent in the past to port them to windows. However those projects are either old, abandonned, hosted on CVS, written in platform-specific C, etc.

Rust provides a good platform-agnostic way of writing systems utils that are easy to compile anywhere, and this is as good a way as any to try and learn it.

https://github.com/uutils/coreutils/blob/9653ed81a2fbf393f42...

PunchyHamster 10 minutes ago||

>Canonical's usage of uutils is likely for marketing

Currently their usage is actively worsening the security of their distro

cyberax 5 hours ago|||

To be fair, Vec::set_len bug in Rust was in 2021. And even then it had to be annotated as `unsafe`. It was then deprecated and a linter check was added: https://github.com/rust-lang/rust-clippy/issues/7681

Dr_Emann 5 hours ago|||

To be even fair-er, it wasn't actually memory unsafety, it was "just" unsoundness, there was a type, that IF you gave it an io reader implementation that was weird, that implementation could see uninit data, or expose uninit data elsewhere, but the only readers actually used were well behaved readers.

orlp 2 hours ago|||

Vec::set_len is by no means deprecated. The lint you linked only covers a very specific unsound pattern using set_len.

kibwen 2 hours ago||

Indeed, and it doesn't need to be deprecated, because it's an API explicitly designed to give you low-level control where you need it, and because it is appropriately defined as an `unsafe` function with documented safety invariants that must be manually upheld in order for usage to be memory-safe. The documentation also suggests several other (safe) functions that should be used instead when possible, and provides correct usage examples: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set... .

bcjdjsndon 5 minutes ago||

> and because it is appropriately defined as an `unsafe` function with documented safety invariants that must be manually upheld in order for usage to be memory-safe.

Didn't we learn from c, and the entire raison detre for rust, is that coders cannot be trusted to follow rules like this?

If coders could "(document) safety invariants that must be manually upheld in order for usage to be memory-safe." there's be no need for Rust.

This is the tautology underlying rust as I see it

lyrie 32 minutes ago||

[dead]

wahern 6 hours ago||

> What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing

They knew how to write Rust, but clearly weren't sufficiently experienced with Unix APIs, semantics, and pitfalls. Most of those mistakes are exceedingly amateur from the perspective of long-time GNU coreutils (or BSD or Solaris base) developers, issues that were identified and largely hashed out decades ago, notwithstanding the continued long tail of fixes--mostly just a trickle these days--to the old codebases.

concinds 3 hours ago||

Reading that Canonical thread was jaw-dropping. Paraphrased: "Rust is more secure, security is our priority, therefore deploying this full-rewrite of core utils is an emergency. If things break that's fine, we'll fix it :)".

I would not want to run any code on my machines made by people who think like this. And I'm pro-Rust. Rust is only "more secure" all else being equal. But all else is not equal.

A rewrite necessarily has orders of magnitude more bugs and vulnerabilities than a decades-old well-maintained codebase, so the security argument was only valid for a long-term transition, not a rushed one. And the people downplaying user impact post-rollout, arguing that "this is how we'll surface bugs", and "the old coreutils didn't have proper test cases anyway" are so irresponsible. Users are not lab rats. Maintainers have a moral responsibility to not harm users' systems' reliability (I know that's a minority opinion these days). Their reasoning was flawed, and their values were wrong.

zx8080 1 hour ago|||

Agree with the point. Asking sincerely, how to filter out installing any rust-rewrite packages on my machines? Does anyone know the way?

theandrewbailey 32 minutes ago|||

    sudo apt install coreutils-from-gnu

https://computingforgeeks.com/ubuntu-2604-rust-coreutils-gui...

kibwen 38 minutes ago|||

If you don't want Canonical's packages, you should probably just be using Debian rather than Ubuntu. It's not 2008 anymore, stock Debian is quite user-friendly.

tambre 17 minutes ago|||

Worth noting is that in Debian experimental coreutils defaults to coreutils-from-uutils [0]. This came as a big surprise and as far as I can tell there's been no discussion. A Canonical developer seems to have unilaterally overwritten the coreutils package without discussing with the maintainer. All the package renames that are in Ubuntu aren't in Debian so you can't switch to GNU utils either without deep trickery in a separate recovery environment.

I'm used to running experimental software but I wasn't ready for my computer to not boot one day because of uutils. The `-Z` flag for `cp` wasn't implemented in the 9 month old version shipped in Debian at that time so initramfs creation failed...

[0] https://packages.debian.org/experimental/coreutils

KnuthIsGod 11 minutes ago|||

Or use a sane distribution like Arch or Gentoo instead of Ubuntu based systems.

oefrha 19 minutes ago|||

This leaves such a bad taste in my mouth. If you fucking found 44 CVEs with some relatively amateurish ones (I'm no security engineer but even I've done that exact TOCTOU mitigation before) in such a core component of your system a month before 26.04 LTS release (or a couple months if you count from their round 1), surely the response should be "we need to delay this to 28.04 LTS to give it time to mature", not "we'll ship this thing in LTS anyway but leave out the most obviously problematic parts"?

The snap BS wasn't enough to move me since I was largely unaffected once stripping it out, but this might finally convince me to ditch.

PunchyHamster 8 minutes ago||

Ubuntu has been doing careless shit like that their entire existence, it's nothing new

nine_k 6 hours ago|||

More than that: it seems that Rust stdlib nudges the developer towards using neat APIs at an incorrect level of abstraction, like path-based instead of handle-based file operations. I hope I'm wrong.

NobodyNada 5 hours ago|||

Nearly every available filesystem API in Rust's stdlib maps one-to-one with a Unix syscall (see Rust's std::fs module [0] for reference -- for example, the `File` struct is just a wrapper around a file descriptor, and its associated methods are essentially just the syscalls you can perform on file descriptors). The only exceptions are a few helper functions like `read_to_string` or `create_dir_all` that perform slightly higher-level operations.

And, yeah, the Unix syscalls are very prone to mistakes like this. For example, Unix's `rename` syscall takes two paths as arguments; you can't rename a file by handle; and so Rust has a `rename` function that takes two paths rather than an associated function on a `File`. Rust exposes path-based APIs where Unix exposes path-based APIs, and file-handle-based APIs where Unix exposes file-handle-based APIs.

So I agree that Rust's stdilb is somewhat mistake prone; not so much because it's being opinionated and "nudg[ing] the developer towards using neat APIs", but because it's so low-level that it's not offering much "safety" in filesystem access over raw syscalls beyond ensuring that you didn't write a buffer overflow.

[0]: https://doc.rust-lang.org/std/fs/index.html

juergbi 2 hours ago|||

> So I agree that Rust's stdilb is somewhat mistake prone; not so much because it's being opinionated and "nudg[ing] the developer towards using neat APIs", but because it's so low-level that it's not offering much "safety" in filesystem access over raw syscalls beyond ensuring that you didn't write a buffer overflow.

`openat()` and the other `*at()` syscalls are also raw syscalls, which Rust's stdlib chose not to expose. While I can understand that this may not be straight forward for a cross-platform API, I have to disagree with your statement that Rust's stdlib is mistake prone because it's so low-level. It's more mistake prone than POSIX (in some aspects) because it is missing a whole family of low-level syscalls.

kibwen 1 hour ago||

They're not missing, Rust just ships them (including openat) as part of the first-party libc crate rather than exposing them directly from libstd. You'll find all the other libc syscalls there as well: https://docs.rs/libc/0.2.186/libc/ . I agree that Rust's stdlib could use some higher-level helper functions to help head off TOCTOU, but it's not as simple as just exposing `openat`, which, in addition to being platform-specific as you say, is also error-prone in its own right.

nh2 1 hour ago|||

But those are all unsafe, taking raw strings.

Why can I easily use "*at" functions from Python's stdlib, but not Rust's?

They are much safer against path traversal and symlink attacks.

Working safely with files should not require *const c_char.

This should be fixed .

kibwen 11 minutes ago||

> But those are all unsafe, taking raw strings.

The parent was asking for access to the C syscall, and C syscalls are unsafe, including in C. You can wrap that syscall in a safe interface if you like, and many have. And to reiterate, I'm all for supporting this pattern in Rust's stdlib itself. But openat itself is a questionable API (I have not yet seen anyone mention that openat2 exists), and if Rust wanted to provide this, it would want to design something distinct.

> Why can I easily use "at" functions from Python's stdlib, but not Rust's?*

I'm not sure you can. The supported pattern appears to involve passing the optional `opener` parameter to `os.open`, but while the example of this shown in the official documentation works on Linux, I just tried it on Windows and it throws a PermissionError exception because AFAIK you can't open directories on Windows.

bonzini 44 minutes ago|||

The correct comparison is to rustix, not libc, and rustix is not first-party. And even then the rustix API does not encapsulate the operations into structs the same way std::fs and std::io do.

kibwen 30 minutes ago||

The correct comparison to someone asking for first-party access to a C syscall is to the first-party crate that provides direct bindings to C syscalls. If you're willing to go further afield to third-party crates, you might as well skip rustix's "POSIX-ish" APIs (to quote their documentation) and go directly to the openat crate, which provides a Rust-style API.

bonzini 26 minutes ago||

If I have to use unsafe just to open a file, I might as well use C. While Rustix is a happy middle that is usually enough and more popular than the open at crate, libc is in the same family as the "*-sys" crate and, generally speaking, it is not intended for direct use outside other FFI crates.

masklinn 4 hours ago|||

> For example, Unix's `rename` syscall takes two paths as arguments; you can't rename a file by handle

And then there’s renameat(2) which takes two dirfd… and two paths from there, which mostly has all the same issues rename(2) does (and does not even take flags so even O_NOFOLLOW is not available).

I’m not sure what you’d need to make a safe renameat(), maybe a triplet of (dirfd, filefd, name[1]) from the source, (dirfd, name) from the target, and some sort of flag to indicate whether it is allowed to create, overwrite, or both.

As the recent https://blog.sebastianwick.net/posts/how-hard-is-it-to-open-... talks about (just for file but it applies to everything) secure file system interaction is absolutely heinous.

[1]: not path

mort96 3 hours ago||

How about fd of the file you wanna rename, dirfd of the directory you want to open it in, and name of the new file? You could then represent a "rename within the same directory" as: dfd = opendir(...); fd = openat(dfd, "a"); rename2(fd, dfd, "b");

I can't think of a case this API doesn't cover, but maybe there is one.

masklinn 3 hours ago||

The file may have been renamed or deleted since the fd was opened, and it might have been legitimate and on purpose, but there’s no way to tell what trying to resolve the fd back to a path will give you.

And you need to do that because nothing precludes having multiple entries to the same inode in the same directory, so you need to know specifically what the source direntry is, and a direntry is just a name in the directory file.

JuniperMesos 5 hours ago||||

After reading this article, I'm inclined to think that the right thing for this project to do is write their own library that wraps the Rust stdlib with a file-handle-based API along with one method to get a file handle from a Path; rewrite the code to use that library rather than rust stdlib methods, and then add a lint check that guards against any use of the Rust standard library file methods anywhere outside of that wrapper.

dbdr 3 hours ago|||

If that's the right approach, then it would be useful to make that library public as a crate, because writing such hardened code is generally useful. Possibly as a step before inclusion in the rust stdlib itself.

akoboldfrying 27 minutes ago|||

Agreed. (This approach feels like a cousin of Parse, Don't Validate.)

PunchyHamster 6 minutes ago||||

That's a norm in most languages, this is just more convenient way to operate

jeroenhd 4 hours ago|||

If anything, I find the rust standard library to default to Unix too much for a generic programming language. You need to think very Unixy if you want to program Rust on Windows, unless you're directly importing the Windows crate and foregoing the Rust standard library. If you're writing COBOL style mainframe programs, things become even more forced, though I doubt the overlap between Rust programmers and mainframe programmers that don't use a Unix-like is vanishingly small.

This can also be a pain on microcontrollers sometimes, but there you're free to pretend you're on Unix if you want to.

Someone 26 minutes ago|||

If you want to support file I/O in the standard library, you have to choose _some_ API, and that either is limited to the features common to all platforms, or it covers all features, but call that cannot be supported return errors, or you pick a preferred platform and require all other platforms to try as hard as they can to mimic that.

Almost all languages/standard libraries pick the latter, and many choose UNIX or Linux as the preferred platform, even though its file system API has flaws we’ve known about for decades (example: using file paths too often) or made decisions back in 1970 we probably wouldn’t make today (examples: making file names sequences of bytes; not having a way to encode file types and, because of that, using heuristics to figure out file types. See https://man7.org/linux/man-pages/man1/file.1.html)

bonzini 58 minutes ago|||

That's the same for the C or Python standard libraries. The difference is that in C you tend to use the Win32 functions more because they're easily reached for; but Python and Rust are both just as Unixy.

jeroenhd 7 minutes ago||

Indeed, though for C it makes sense given its origins, and Python sort of grew from a fun project into a massive ecosystem by accident.

AlotOfReading 6 hours ago|||

Someone once coined a related term, "disassembler rage". It's the idea that every mistake looks amateur when examined closely enough. Comes from people sitting in a disassembler and raging the high level programmers who had the gall to e.g. use conditionals instead of a switch statement inside a function call a hundred frames deep.

We're looking solely at the few things they got wrong, and not the thousands of correct lines around them.

Cthulhu_ 2 hours ago|||

Thing is, these tools are so critical that even one error may cause systems to be compromised; rewriting them should never be taken lightly.

(Actually ideally there's formal verification tools that can accurately test for all of the issues found in this review / audit, like the very timing specific path changes, but that's a codebase on its own)

irishcoffee 6 hours ago|||

When I read the article I came away with the impression that shipping bugs this severe in a rewrite of utils used by hundreds of millions of people daily (hourly?) isn’t ok. I don’t think brushing the bad parts off with “most of the code was really good!” is a fair way to look at this.

Cloudflare crashed a chunk of the internet with a rust app a month or so ago, deploying a bad config file iirc.

Rust isn’t a panacea, it’s a programming language. It’s ok that it’s flawed, all languages are.

gmueckl 5 hours ago|||

I think that legitimate real world issues in rust code should be talked about more often. Right now the language enjoys a reputation that is essentiaöly misleading marketing. It isn't possible to create a programing language that doesn't allow bugs to happen (even with formal verification you can still prove correctness based on a wrong set of assumptions). This weird, kind of religious belief that rust leads to magically completely bug free programs needs to be countered and brought in touch with reality IMO.

jeroenhd 4 hours ago|||

Nobody believes Rust programs are but free, though. Rust never promised that. It doesn't even promise memory safety, it only promises memory safety if you restrict yourself to safe APIs which simply isn't always possible.

Galanwe 4 hours ago||

> it only promises memory safety if you restrict yourself to safe APIs which simply isn't always possible.

Less than that actually, considering Rust has its own definition of what "safe" means.

user_of_the_wek 4 hours ago||

Ah, the Dwarf Fortress approach :)

https://dwarffortresswiki.org/DF2014:Fun&redirect=no

testdelacc1 5 hours ago|||

Is it possible you’ve misunderstood what Rust promises?

> It isn't possible to create a programing language that doesn't allow bugs to happen

Yes, that’s true. No one doubts this. Except you seem to think that Rust promises no bugs at all? I don’t know where you got this impression from, but it is incorrect.

Rust promises that certain kinds of bugs like use-after-free are much, much less likely. It eliminates some kinds of bugs, not all bugs altogether. It’s possible that you’ve read the claim on kinds of bugs, and misinterpreted it as all bugs.

I’ve had this conversation before, and it usually ends like https://www.smbc-comics.com/comic/aaaah

adrian_b 2 hours ago||

"Rust" obviously does not promise that.

On the other hand, there are too many less-experienced Rust fans who do claim that "Rust" promises this and that any project that does not use Rust is doomed and that any of the existing decades-old software projects should be rewritten in Rust to decrease the chances that they may have bugs.

What is described in TFA is not surprising at all, because it is exactly what has been predicted about this and other similar projects.

Anyone who desires to rewrite in Rust any old project, should certainly do it. It will be at least a good learning experience and whenever an ancient project is rewritten from scratch, the current knowledge should enable the creation of something better than the original.

Nonetheless, the rewriters should never claim that what they have just produced has currently less bugs than the original, because neither they nor Rust can guarantee this, but only a long experience with using the rewritten application.

Such rewritten software packages should remain for years as optional alternatives to the originals. Any aggressive push to substitute the originals immediately is just stupid (and yes, I have seen people trying to promote this).

Moreover, someone who proposes the substitution of something as basic as coreutils, must first present to the world the results of a huge set of correctness tests and performance benchmarks comparing the old package with the new package, before the substitution idea is even put forward.

testdelacc1 1 hour ago||

Where are these rust fans? Are they in the room with us right now?

You’ve constructed a strawman with no basis in reality.

You know what actual Rust fans sound like? They sound like Matthias Endler, who wrote the article we’re discussing. Matthias hosts a popular podcast Rust in Production where talks with people about sharp edges and difficulties they experienced using Rust.

A true Rust advocate like him writes articles titled “Bugs Rust Won’t Catch”.

> Such rewritten software packages should remain for years as optional alternatives to the originals.

This project was started a decade ago. (https://news.ycombinator.com/item?id=7882211)

> must first present to the world the results of a huge set of correctness tests and performance benchmarks

Yeah, you can see those in https://github.com/uutils/coreutils. This project has also worked with GNU coreutils maintainers to add more tests over time. Check out the graph where the total number of tests increases over time.

> before the substitution idea is even put forward

I partly agree. But notice that these CVEs come from a thorough security audit paid for by Canonical. Canonical is paying for it because they have a plan to substitute in the immediate future.

Without a plan to substitute it’s hard to advocate for funding. Without funding it’s hard to find and fix these issues. With these issues unfixed it’s hard to plan to substitute.

Chicken and egg problem.

> less bugs

Fewer.

fluffybucktsnek 3 hours ago||||

If I'm not mistaken, in the Cloudflare case, both the Rust rewrite and the C++ original version crashed. The primary cause being the bad config file.

adrian_b 1 hour ago||

Yes, but the point was that rewriting something in Rust is not sufficient per se to prevent such bugs.

The goal claimed by all these rewrites is the elimination of bugs.

saghm 31 minutes ago||

The "elimination of bugs" is not synonymous with "the elimination of all bugs". The way you're presenting it, any single bug in a rewrite would be grounds to consider the the entire endeavor a failure, which is a ridiculous standard.

There are plenty of strong arguments to be made against rewriting something in Rust, but this is a pretty weak one.

lelanthran 5 hours ago|||

I find it hilarious that this comment is being downvoted.

Exactly what is the controversial take here?

> I don’t think brushing the bad parts off with “most of the code was really good!” is a fair way to look at this.

Nope. this is fine.

> Cloudflare crashed a chunk of the internet with a rust app a month or so ago, deploying a bad config file iirc.

Maybe this?

> Rust isn’t a panacea, it’s a programming language. It’s ok that it’s flawed, all languages are.

Nope, this is fine too.

dbdr 3 hours ago|||

I didn't downvote, but I feel the last two points show a lack of nuance. It's saying "Rust doesn't prevent 100% of the bugs, like all other programming languages", while failing to acknowledge that if a programming language prevents entire classes of bugs, it's a very significant improvement.

adrian_b 1 hour ago||

Nobody disputes that Rust is one of the programming languages that prevent several classes of frequent bugs, which is a valuable feature when compared with C/C++, even if that is a very low bar.

What many do not accept among the claims of the Rust fans is that rewriting a mature and very big codebase from another language into Rust is likely to reduce the number of bugs of that codebase.

For some buggier codebases, a rewrite in Rust or any other safer language may indeed help, but I agree with the opinion expressed by many other people that in most cases a rewrite from scratch is much more likely to have bugs, regardless in what programming language it is written.

If someone has the time to do it, a rewrite is useful in most cases, but it should be expected that it will take a lot of time after the completion of the project until it will have as few bugs as mature projects.

kibwen 1 hour ago|||

As other people have mentioned, the goal of uutils was not "let's reduce bugs in coreutils by rewriting it in Rust", it was "it's 2013 and here's a pre-1.0 language that looks neat and claims to be a credible replacement for C, let's test that hypothesis by porting coreutils, giving us an excuse to learn and play with a new language in the process". It seems worth emphasizing that its creation was neither ideologically motivated nor part of some nefarious GPL-erasure scheme, it was just some people hacking on a codebase for fun.

Whether or not it was wise for Canonical to attempt to then take that codebase and uplift it into Ubuntu is a different story altogether, but one that has no bearing on the motivations of the people behind the original port itself.

You can see an alternative approach with the authors of sudo-rs. Rather than porting all of userspace to Rust for fun, they identified a single component of a particularly security-critical nature (sudo), and then further justified their rewrite by removing legacy features, thereby producing an overall simpler tool with less surface area to attack in the first place. It was not "we're going to rewrite sudo in Rust so it has fewer bugs", it was "we're going to rewrite sudo with the goal of having fewer bugs, and as one subcomponent of that, we're going to use Rust". And of course sudo-rs has had fresh bugs of its own, as any rewrite will. But the mere existence of bugs does not invalidate their hypothesis, which is that a conscientious rewrite of a tool can result in fewer bugs overall.

bonzini 55 minutes ago|||

It's not a low bar when C/C++/D are basically the only languages in which you can write certain kinds of programs.

huimang 3 hours ago|||

Because the bugs were caused by programmer error, not anything inherent to rust. It was more notable due to cloudflare being a critical dependency for half the internet, but that particular issue could've happened in any language.

This kind of melodramatic reaction to rust code is fatiguing, honestly. Rust does not bill itself as some programming panacea or as a bug free language, and neither do any of the people I know using it. That's a strawman that just won't go away.

Rust applies constraints regarding memory use and that nearly eliminates a class of bugs, provided safe usage. And that's compelling to enough people that it warrants migration from other languages that don't focus on memory safety. Bugs introduced during a rewrite aren't notable. It happens, they get fixed, life moves on.

adrian_b 1 hour ago||

> caused by programmer error, not anything inherent to Rust

Your argument does not work as a praise for Rust because the bugs in any program are caused by programmer errors, except the very rare cases when there are bugs in the compiler tool chain, which are caused by errors of other programmers.

The bugs in a C or C++ program are also caused by programmer errors, they are not inherent to C/C++. It is rather trivial to write C/C++ carefully, in order to make impossible any access outside bounds, numeric overflow, use-after-free, etc.

The problem is that many programmers are careless, especially when they might be pressed by tight time schedules, so they make some of these mistakes. For the mass production of software, it is good to use more strict programming languages, including Rust, where the compiler catches as many errors as possible, instead of relying on better programmers.

huimang 5 minutes ago||

I'm neither praising or admonishing rust. Did you read the parent comment or its parents' comment I was responding to at all?

(grandparent comment): "Cloudflare crashed a chunk of the internet with a rust app a month or so ago"

The actual bug had nothing to do with rust, yet rust is specifically brought up here.

(grandparent comment): "Rust isn’t a panacea, it’s a programming language. It’s ok that it’s flawed, all languages are."

No Rust programmer thinks it's a panacea! Rust has never advertised itself this way.

slopinthebag 6 hours ago|||

Seems pretty impressive they rewrote the coreutils in a new language, with so little Unix experience, and managed to do such a good job with very little bugs or vulns. I would have expected an order of magnitude more at least.

Shows how good Rust is, that even inexperienced Unix devs can write stuff like this and make almost no mistakes.

nine_k 6 hours ago|||

Yes, it's the lack of Unix experience that's terrifying. So many of mistakes listed are rookie mistakes, like not propagating the most severe errors, or the `kill -1` thing. Why were people who apparently did not have much experience using coreutils assigned to rewrite coreutils?

aw1621107 5 hours ago|||

> Why were people who apparently did not have much experience using coreutils assigned to rewrite coreutils?

From what I understand, "assigned" probably isn't the best way to put it. uutils started off back in 2013 as a way to learn Rust [0] way before the present kerfuffle.

[0]: https://github.com/uutils/coreutils/tree/9653ed81a2fbf393f42...

nineteen999 5 hours ago|||

Yeah perhaps learning UNIX API's and Rust at the same time doesn't lead to a drop in replacement ready to be shipped in major distributions. Who whould have thunk it.

aw1621107 2 hours ago||

Strictly speaking it doesn't preclude eventually producing a production-ready drop-in replacement either, though evidently that needs a fresh set of eyes.

bpbp-mango 4 hours ago|||

exactly this. I wrote one of them back then as a learning experience. some of the code I wrote is still intact, incredibly.

JuniperMesos 5 hours ago|||

Why is it even possible to represent a negative PID, let alone treat the integer -1 as a PID meaning "all effective processes"? This seems like a mistake (if not a rookie mistake) in the Linux kernel API itself.

nine_k 4 hours ago|||

-1 is a special case, a way to represent a PID with all bits set in a platform-independent way. It's not very clean, and it comes from ancient times when writing some extra code and storing an extra few bytes was way more expensive.

bonzini 53 minutes ago||

No, -1 is simply the process group with pgid 1:

https://stackoverflow.com/questions/392022/whats-the-best-wa...

The problem is that -DIGIT doubles as both "signal number" and process group. The right way to invoke kill for a process group however would be "kill [OPTS]... -- -PGID".

dminik 4 hours ago||||

It feels a bit like a "better is better" language hitting all of the quirks of a "worse is better" environment.

antonvs 4 hours ago|||

Pretty much all the rough edges being discussed here are design mistakes in Linux or Unix, and/or a consequence of using an unsafe language with limited abstractions and a weak type system. But because of ubiquity, this is everyone’s problem now.

adrian_b 1 hour ago||

You are right, but those who set for themselves the goal to substitute a Linux/UNIX package must implement programs that handle correctly all the quirks of the existing Linux/POSIX specifications.

If they do not like the design mistakes, great, they should set for themselves the goal to write a new operating system together with all base applications, where all these mistakes are corrected.

As long as they have not chosen the second goal, but the first, they are constrained by the existing interfaces and they must use them correctly, no matter how inconvenient that may be.

Anyone who learns English may be frustrated by many design mistakes of English, but they must still use English as it is spoken by the natives, otherwise they will not be understood.

gblargg 3 hours ago||||

Rewriting perfectly good code was a colossal mistake.

Cthulhu_ 2 hours ago|||

Not necessarily, but was the reasoning sound and have the tradeoffs been made? The website (https://uutils.github.io/) shows some reasonable "why"s (although I disagree with making "Rust is more appealing" a compelling reason, but that's just me (disclaimer: I don't like C and don't know Rust so take this comment as you will)), but I think what's missing is how they will ensure both compatibility and security / edge case handling, which requires deep knowledge and experience in the original code and "tribal knowledge" of deep *nix internals.

dwattttt 1 hour ago|||

I do wonder whether people got down the article enough to see the list of bugs patched in GNU coreutils.

That "perfectly good code" that it sounds like no one should question included "split --line-bytes has a user controlled heap buffer overflow".

twhitmore 4 hours ago|||

[flagged]

pando85 5 hours ago||

Memory safety catches buffer overflows. CI catches logic bugs. Neither catches the Unix API gotchas nobody documented.

vhantz 34 minutes ago|||

How does CI catch logic bugs?

bjourne 3 hours ago||||

CI catches all kinds of bugs.

cubefox 4 hours ago|||

LLM account

lionkor 2 hours ago||

I struggle to find anything on this post that wouldn't be caught by some kind of unit test or manual review, especially when comparing with the GNU source for the coreutils. The whole coreutils rewrite is a terrible idea[1] and clearly being done in the wrong way (without the knowledge gained from the previous software).

If you do a rewrite, you should fully understand and learn from the predecessor, otherwise youre bound to repeat all the mistakes. Embarassing.

To be clear; I love Rust, I use it for various projects, and it's great. It doesn't save you from bad engineering.

[1]: https://www.joelonsoftware.com/2000/04/06/things-you-should-...

cwillu 1 hour ago|

I expect nothing less from the creators of unity, upstart, and snap.

hombre_fatal 6 hours ago||

One thing that's hard about rewriting code is that the original code was transformed incrementally over time in response to real world issues only found in production.

The code gets silently encumbered with those lessons, and unless they are documented, there's a lot of hidden work that needs to be done before you actually reach parity.

TFA is a good list of this exact sort of thing.

Before you call people amateur for it, also consider it's one of the most softwarey things about writing software. It was bound to happen unless coreutils had really good technical docs and included tests for these cases that they ignored.

TheDong 4 hours ago||

What's even harder is doing that while trying to avoid the GPL, so doing that without reading the original source code.

uutils would be so much better imo if it was GPL and took direct inspiration from the coreutils source code.

dbdr 3 hours ago||

The GPL prevents you from reading the licensed code before writing related non-GPL code? Which section of the GPL says that?

TheDong 3 hours ago|||

It's based on an interpretation of "derived from".

It does not matter if it's in the GPL explicitly or not since we're talking about uutils and their stance on it, and they've written that:

https://github.com/uutils/coreutils/blob/6b8a5a15b4f077f8609...

> we cannot accept any changes based on the GNU source code [..]. It is however possible to look at other implementations under a BSD or MIT license like Apple's implementation or OpenBSD.

The wording of that clearly implies that you should not look at GNU source code in order to contribute to uutils.

i_think_so 54 minutes ago||

"clearly implies"

Hmmmm....

snovv_crash 1 hour ago|||

This is clean room implementation 101, and why LLMs are so controversial in terms of licensing.

aykutseker 2 hours ago|||

good example from the article: the chroot+nss CVE. the rule that nss is dynamic and dlopens libraries from inside the chroot isn't anywhere obvious. it's encoded in 25+ years of sysadmins finding it out. clean-room rewrites end up re-learning that, usually as new CVEs. and LLM ports of the same code inherit the problem: the function signature is what they read, but the scars are what they need.

cataflutter 2 hours ago||

> the function signature is what they read, but the scars are what they need.

This feels like a golden quote. Don't know if you intended for it to rhyme, but well done :D

aykutseker 2 hours ago||

thanks. honestly didn't catch the rhyme, accidental aphorism :D

einpoklum 2 hours ago||

> The code gets silently encumbered with those lessons, and unless they are documented, there's a lot of hidden work that needs to be done before you actually reach parity.

It should be stressed that failure to document such lessons, or at least the bugs/vulnerabilities avoided, is poor practice. Of course one can't document the bugs/vulnerabilities one has avoided implicitly by writing decent code to begin with, but it is important to share these lessons with the future reader, even if that means "wasting" time and space on a bunch of documentation such as "In here we do foo instead of bar because when we did bar in conditions ABC then baz happens which is bad because XYZ."

Joker_vD 5 hours ago||

> The pattern is always the same. You do one syscall to check something about a path, then another syscall to act on the same path. Between those two calls, an attacker with write access to a parent directory can swap the path component for a symbolic link. The kernel re-resolves the path from scratch on the second call, and the privileged action lands on the attacker’s chosen target.

It's actually even worse than that somewhat, because the attacker with write access to a parent directory can mess with hard links as well... sure, it only messes with the regular files themselves but there is basically no mitigations. See e.g. [0] and other posts on the site.

[0] https://michael.orlitzky.com/articles/posix_hardlink_heartac...

sysguest 5 hours ago|

hmm... maybe a 'write lock' on the directory? though this will become more hairy without timeouts/etc...

masklinn 3 hours ago||

To the extent that locking exists in posix it is various degrees of useless and broken. And as far as I know while BSDs have extensions which make some use cases workable Linux is completely hopeless.

tdiff 3 hours ago||

Ok if there were some rust guys rewriting coreutils with no experience in linux, but how come Ubuntu accepted it into its mainline?

Joeboy 2 hours ago|

Because it's Ubuntu policy to replace some foundational part of the system with some janky unfinished experiment in every release.

I agree with you that that's more the story here than "OMG, somebody wrote Rust code with bugs in it".

12_throw_away 2 hours ago||

Right? Canonical wanted (still wants?) to use a coreutils implementation where "rm ./" would print "invalid input" while silently deleting the directory anyway.

I don't really care that some very amateur enthusiasts wrote some bad code for fun, but how in the world did anyone who knows anything about linux take this seriously as a coreutils replacement?

alkonaut 3 hours ago||

> What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing

So does this mean that neither did the original utils have any test harness, the process of rewriting them didn't start by creating one either?

Sure there are many edge cases, but surely the OS and FS can just be abstracted away and you can verify that "rm .//" actually ends up doing what is expected (Such as not deleting the current directory)?

This doesn't seem like sloppy coding, nor a critique of the language, it's just the same old "Oh, this is systems programming, we don't do tests"?

Alternatively: if the original utils _did_ have tests, and there were this many holes in the tests, then maybe there is a massive lack in the original utils test suite?

omcnoe 3 hours ago|

My understanding is the uutils development process involved extensive testing against the behaviour of the original utilities, including preserving bugs.

alkonaut 2 hours ago||

But we still have CVE's for trivial things? I mean just a medium sized test suite for "rm" alone should probably be many thousand test cases or so. And you'd think that deleting "." and "./" respectively would be among them? Hindsight is always 20/20 and for inputs involving text input you can never be entirely covered, but still....

marcosscriven 2 hours ago||

That’s a great article, and indeed a very good blog. Just spent ages reading lots of their other articles.

Of the bugs mentioned I think the most unforgivable one is the lossy UTF conversion. The mind boggles at that one!

oconnor663 5 hours ago||

> The trap is that get_user_by_name ends up loading shared libraries from the new root filesystem to resolve the username.

That's kind of horrifying. Is there a reliable list somewhere of all the functions that do that? Is that list considered stable?

Joker_vD 5 hours ago|

Nope! But basically, expect anything that resolves usernames, or host names, to be done in the userspace by NSS.

    Sun engineers Thomas Maslen and Sanjay Dani were the first to design and implement
    the Name Service Switch. They fulfilled Solaris requirements with the nsswitch.conf
    file specification and the implementation choice to load database access modules as
    dynamically loaded libraries, which Sun was also the first to introduce.

    Sun engineers' original design of the configuration file and runtime loading of name
    service back-end libraries has withstood the test of time as operating systems have
    evolved and new name services are introduced. Over the years, programmers ported the
    NSS configuration file with nearly identical implementations to many other operating
    systems including FreeBSD, NetBSD, Linux, HP-UX, IRIX and AIX.[citation needed] More
    than two decades after the NSS was invented, GNU libc implements it almost identically.

It's by design, you see.

misja111 4 hours ago|

The root cause of some of the bugs seems to be the opaque nature of some of the Unix API. E.g.

> The trap is that get_user_by_name ends up loading shared libraries from the new root filesystem to resolve the username. An attacker who can plant a file in the chroot gets to run code as uid 0.

To me such a get_user_by_name function is like a booby trap, an accident that is waiting to happen. You need to have user data, you have this get_user_by_name function, and then it goes and starts loading shared libraries. This smells like mixing of concerns to me. I'd say, either split getting the user data and loading any shared libraries in two separate functions, or somehow make it clear in the function name what it is doing.

12_throw_away 2 hours ago||

> The root cause of some of the bugs seems to be the opaque nature of some of the Unix API.

Some, maybe, but if you've decided to rewrite coreutils from scratch, understanding the POSIX APIs is literally your entire job.

And in any case, their test for whether a path was pointing to the fs root was `file == Path::new("/")`. That's not an API problem, the problem is that whoever wrote that is uniquely unqualified to be working on this project.

aw1621107 1 hour ago||

Interestingly, it looks like the `file == Path::new("/")` bit was basically unchanged from when it was introduced... 12 (!) years ago [0] (though back then it was `filename == "/"`). The change from comparing a filename to a path was part of a change made 8 months ago to handle non-UTF-8 filenames.

> That's not an API problem, the problem is that whoever wrote that is uniquely unqualified to be working on this project.

To be fair, uutils started out with far smaller ambitions. It was originally intended to be a way to learn Rust.

[0]: https://github.com/uutils/coreutils/commit/7abc6c007af75504f...

emmelaich 2 hours ago|||

Rather, I think that using a functional safe language tricks people into thinking that the data it deals with is stateless. Whereas many many things change in operating systems all the time.

Until we have a filesystem that can present a snapshot, everything has to checked all the time.

i.e. we need an API which gives input -> good result or failure. Not input -> good result or failure or error.

geocar 3 hours ago|||

> The root cause of some of the bugs seems to be the opaque nature of some of the Unix API.

Seems and smells is weasel words. The root cause is not thinking: Why is root chrooting into a directory they do not control?

Whatever you chroot into is under control of whoever made that chroot, and if you cannot understand this you have no business using chroot()

> To me such a get_user_by_name function is like a booby trap

> I'd say, either split getting the user data and loading any shared libraries in two separate functions, or somehow make it clear in the function name what it is doing.

You'd probably still be in the trap: there's usually very little difference between writing to newroot/etc/passwd and newroot/usr/lib/x86_64-linux-gnu/libnss_compat.so or newroot/bin/sh or anything else.

So I think there's no reason for /usr/sbin/chroot look up the user id in the first place (toybox chroot doesn't!), so I think the bug was doing anything at all.

Joker_vD 2 hours ago||

> The root cause is not thinking: Why is root chrooting into a directory they do not control?

Because you can't call chroot(2) unless you're root. And "control a directory" is weasel words; root technically controls everything in one sense of the word. It can also gain full control (in a slightly different sense of the word) over a directory: kill every single process that's owned by the owner of that directory, then don't setuid into that user in this process and in any other process that the root currently executes, or will execute, until you're done with this directory. But that's just not useful for actual use, isn't it?

Secure things should be simple to do, and potentially unsafe things should be possible.

geocar 1 hour ago||

> And "control a directory" is weasel words;

I did not choose the term to confuse you, that's from the definition document linked to the CVE:

https://cwe.mitre.org/data/definitions/426.html

The CVE itself uses the language "If the NEWROOT is writable by an attacker" which could refer to a shared library (as indicated in the report), or even a passwd file as would have been true since the origin of chroot()

> root technically controls everything in one sense of the word.

But not the sense we're talking about.

> Because you can't call chroot(2) unless you're root

Well you can[1], but this is /usr/sbin/chroot aka chroot(8) when used with a non-numeric --userspec, and the point is to drop root to a user that root controls with setuid(2). Something needs to map user names to the numeric userids that setuid(2) uses, and that something is typically the NSS database.

Now: Which database should be used to map a username to a userid?

- The one from before the chroot(2)?

- Or the one that you're chroot(2)ing into

If you're the author of the code in-question, you chose the latter, and that is totally obvious to anyone who can read because that's the order the code appears in, but it's also obvious that only the first one* is under control of root, and so only the first one could be correct.

[1]: if you're curious: unshare(CLONE_USERNS|CLONE_FS) can be used. this is part of how rootless containers work.

justincormack 2 hours ago||

Yes thats one thing Musl libc removes.

More comments...