Futurelock: A subtle risk in async Rust

Posted by bcantrill 2 days ago

Futurelock: A subtle risk in async Rust(rfd.shared.oxide.computer)

This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

434 points | 242 comments

hitekker 2 days ago|

Skimming through, this document feels thorough and transparent. Clearly, a hard lesson learned. The footnotes, in particular, caught my eye https://rfd.shared.oxide.computer/rfd/397#_external_referenc...

> Why does this situation suck? It’s clear that many of us haven’t been aware of cancellation safety and it seems likely there are many cancellation issues all over Omicron. It’s awfully stressful to find out while we’re working so hard to ship a product ASAP that we have some unknown number of arbitrarily bad bugs that we cannot easily even find. It’s also frustrating that this feels just like the memory safety issues in C that we adopted Rust to get away from: there’s some dynamic property that the programmer is responsible for guaranteeing, the compiler is unable to provide any help with it, the failure mode for getting it wrong is often undebuggable (by construction, the program has not done something it should have, so it’s not like there’s a log message or residual state you could see in a debugger or console), and the failure mode for getting it wrong can be arbitrarily damaging (crashes, hangs, data corruption, you name it). Add on that this behavior is apparently mostly undocumented outside of one macro in one (popular) crate in the async/await ecosystem and yeah, this is frustrating. This feels antithetical to what many of us understood to be a core principle of Rust, that we avoid such insidious runtime behavior by forcing the programmer to demonstrate at compile-time that the code is well-formed

csande17 2 days ago||

In case anyone else was confused: the link/quote in this comment are from the previous "async cancellation issue" write-up, which describes a situation where you "drop" a future: the code in the async function stops running, and all the destructors on its local variables are executed.

The new write-up from OP is that you can "forget" a future (or just hold onto it longer than you meant to), in which case the code in the async function stops running but the destructors are NOT executed.

Both of these behaviors are allowed by Rust's fairly narrow definition of "safety" (which allows memory leaks, deadlocks, infinite loops, and, obviously, logic bugs), but I can see why you'd be disappointed if you bought into the broader philosophy of Rust making it easier to write correct software. Even the Rust team themselves aren't immune -- see the "leakpocalypse" from before 1.0.

zozbot234 1 day ago|||

> The new write-up from OP is that you can "forget" a future (or just hold onto it longer than you meant to), in which case the code in the async function stops running but the destructors are NOT executed.

If you're relying for global correctness on some future being continuously polled, you should just be spawning async tasks instead. Then the runtime takes care of the polling for you, you can't just neglect it - unless the whole thread is blocked, which really shouldn't happen. "Futures" are intentionally a lower-level abstraction than "async runtime tasks".

nialv7 1 day ago||||

Yeah, Rust mostly just eliminates memory safety and data race problems, which is an enormous improvement compared to what we had previously. Unfortunately right now if you really want to write software that's guaranteed to be correct, there's not alternative to formal verification.

dap 1 day ago|||

I would say it can go further than that: Rust enables you to construct many APIs in a way that can’t be misused. It’s not at all unique in this way, but compared with C or Go or the like, you can encode so many more constraints in types.

pjmlp 1 day ago||||

Only if the data structures aren't exposed to outside of the program, in which case, Rust cannot guarantee safety from data race problems caused by OS IPC mechanisms like memory mapped data, shared memory segments or DMA buffers, accessed by external events.

IshKebab 1 day ago|||

Minor nit: formal verification doesn't guarantee correctness.

formerly_proven 1 day ago|||

async rust continues to strike me as half-baked and too complex, if you’re developing an application (as opposed to some high performance utility like e.g. a data plane component) just use threads, they’re plenty cheap and not even half as messy.

kibwen 1 day ago|||

Async Rust is as complex as it needs to be given its constraints. But I wholeheartedly agree with you that people need to treat threads (especially scoped ones) as the default concurrency primitive. My intuition is that experience with other languages has led people astray; in most languages threads are a nightmare and/or async is the default or only way to achieve concurrency, but threads in Rust are absolutely divine by comparison. Async should only be used when you have a good reason that threads don't suffice.

redman25 1 day ago|||

It's a good idea in concept but tons of popular libraries use async which makes it difficult to avoid. Want to do anything with a web server or sending requests, most likely async for popular libraries.

galangalalgol 1 day ago||

Yeah, the nom asynch nats client got deprecated for instance. It really is a shame, because very few projects will ever scale large enough to need asynch, and apart from things like this, there are costs in portability and supply chain attack surface area when you bring in tokio.

mwcampbell 1 day ago|||

In the spirit of "every non-trivial program will expand until ...", I think preemptively choosing async for anything much more complex than a throwaway script might be justified. In this case, the relevant thing isn't performance or expected number of concurrent users/connections, but whether the program is likely to become or include a non-trivial state machine. My primary influence on this topic is this post from @sunshowers: https://sunshowers.io/posts/nextest-and-tokio/

pjmlp 1 day ago||||

The main issue was shipping it without proper runtime support, and even nowadays async/await is synonym with Tokio.

Look at .NET, it took almost a decade to sort out async/await across all platform and language layers, and even today there are a few gotchas.

https://github.com/gerardo-lijs/Asynchronous-Programming

Rust still has a similar path to trail, with async traits, better Pin ergonomics, async lambdas, async loops,..... (yes I know some of them have been dealt with).

xmodem 1 day ago|||

I work on an application that has various components split between sync and async rust. For certain tasks, async actually makes things a lot simpler.

rtpg 2 days ago||

I guess one big question here is whether there's a higher layer abstraction that is available to wrap around patterns to avoid this.

It does feel like there's still generally possibilities of deadlocks in Rust concurrency right? I understand the feeling here that it feels like ... uhh... RAII-style _something_ should be preventing this, because it feels like statically we should be able to identify this issue in this simple case.

I still have a hard time understanding how much of this is incidental and how much of this is just downstream of the Rust/Tokio model not having enough to work on here.

embedding-shape 2 days ago|||

> I guess one big question here is whether there's a higher layer abstraction that is available to wrap around patterns to avoid this.

Something like Actors, on top of Tokio, would be one way: https://ryhl.io/blog/actors-with-tokio/

smallstepforman 2 days ago|||

I love Actors and have used them professionally for over 6 years (C++). However to solve real world problems I have had to introduce “locks” to the Actor framework to support various scenarios. With my home-grown actor library, this was trivial to add, however for some 3rd party actor libraries, ideology is dominant and the devs refuse to add such a purity-breaking feature to their actor framework, and hence I cannot use their library for real-world code.

eklavya 2 days ago|||

That sounds interesting, what kind of actor use cases would require adding locks to actors?

logicchains 1 day ago|||

What scenario requires locks that can't be solved by just having a single actor that owns the resource and controls access?

smallstepforman 1 day ago|||

Any scenario where you have to atomically update 2 actors. To use a simple analogy for illustrative purposes, transferring money between 2 accounts, you need to lock both actors before incrementing/decrementing. Because in the real world, the accounts can change from other pending parallel transactions and edits. Handshakes are very error prone. Lock the actor, do the critical transaction, unlock.

In a rationale world, this works. In a prejudiced world, devs fight against locks in actor models.

Hence why I had to roll my own …

rtpg 1 day ago|||

I would imagine that in... "soft realtime" might be much but in performance sensitive scenarios the actual cost to having some coordination code in that space might start mattering.

Maybe actor abstractions end up compiling away fairly nicely in Rust though!

gf000 1 day ago|||

Then you just replace deadlocks with livelocks, the fundamental problem AFAIK can't be avoided.

gf000 1 day ago||||

> It does feel like there's still generally possibilities of deadlocks in Rust concurrency right?

I mean, is there any generic computation model where you can't have deadlocks? Even with stuff like actors you can trivially have cycles and now your blocking primitive is just different (not CPU-level), and we call it a livelock, but it's fundamentally the same.

IshKebab 1 day ago|||

The Fuchsia guys use the trait system to enforce a global mutex locking order, which can statically prevent deadlocks due to two threads locking mutexes that they are both waiting for.

Doesn't help in this case, but it does suggest that we might be able to do better.

thenewwazoo 1 day ago||

Any chance you could dig up a link to that code? I’m curious to learn more

IshKebab 1 day ago||

https://lwn.net/Articles/995814/

Dagonfly 2 days ago||

That's a really subtle version of the deadlock described in withoutboats FuturesUnordered post [0]

When using “intra-task” concurrency, you really have to ensure that none of the futures are starving.

Spawning task should probably be the default. For timeouts use tokio::select! but make sure all pending futures are owned by it. I would never recommend FuturesUnordered unless you really test all edge-cases.

[0] https://without.boats/blog/futures-unordered/

singron 2 days ago||

This sounds very similar to priority inversion. E.g. if you have Thread T_high running at high priority and thread T_low running at low priority, and T_low holds a lock that T_high wants to acquire, T_high won't get to run until T_low gets scheduled.

The OS can detect this and make T_low "inherit" the priority of T_high. I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead. I would guess detecting the "can't run" case would require quite a bit of overhead, but maybe it can be done.

I think an especially difficult factor is that you don't even need to use a direct await.

    let future1 = do_async_thing("op1", lock.clone()).boxed();
    tokio::select! {
      _ = &mut future1 => {
        println!("do_stuff: arm1 future finished");
      }
      _ = sleep(Duration::from_millis(500)) => {
        // No .await, but both will futurelock on future1.
        tokio::select! {
          _ = do_async_thing("op2", lock.clone()) => {},
          _ = do_async_thing("op3", lock.clone()) => {},
        };
      }
    };

I.e. so "can't run" detector needs to determine that no other task will run the future, and the future isn't in the current set of things being polled by this task.

oconnor663 2 days ago||

> I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead.

Something like this could make sense for Tokio tasks. (I don't know how complicated their task scheduler is; maybe it already does stuff like this?) But it's not possible for futures within a task, as in this post. This goes all the way back to the "futures are inert" design of async Rust: You don't necessarily need to communicate with the runtime at all to create a future or to poll it or to stop polling it. You only need to talk to the runtime at the task level, either to spawn new tasks, or to wake up your own task. Futures are pretty much just plain old structs, and Tokio doesn't know how many futures my async function creates internally, any more than it knows about my integers or strings or hash maps.

mycoliza 2 days ago|||

Yeah, a coworker coming from Go asked a similar question about why Rust doesn't have something like the Go runtime's deadlock detector. Your comment is quite similar to the explanation I gave him.

Go, unlike Rust, does not really have a notion of intra-task concurrency; goroutines are the fundamental unit of concurrency and parallelism. So, the Go runtime can reason about dependencies between goroutines quite easily, since goroutines are the things which it is responsible for scheduling. The fact that channels are a language construct, rather than a library construct implemented in the language, is necessary for this too. In (async) Rust, on the other hand, tasks are the fundamental unit of parallelism, but not of concurrency; concurrency emerges from the composition of `Future`s, and a single task is a state machine which may execute any number of futures concurrently (but not in parallel), by polling them until they cannot proceed without waiting and then moving on to poll another future until it cannot proceed without waiting. But critically, this is not what the task scheduler sees; it interacts with these tasks as a single top-level `Future`, and is not able to look inside at the nested futures they are composed of.

This specific failure mode can actually only happen when multiple futures are polled concurrently but not in parallel within a single Tokio task. So, there is actually no way for the Tokio scheduler to have insight into this problem. You could imagine a deadlock detector in the Tokio runtime that operates on the task level, but it actually could never detect this problem, because when these operations execute in parallel, it actually cannot occur. In fact, one of the suggestions for how to avoid this issue is to select over spawned tasks rather than futures within the same task.

mjevans 2 days ago|||

Thank you. Every time I've tried to approach the concept of Rust's parallelism this is what rubs me the wrong way.

I haven't yet read a way to prove it's correct, or even to reasonably prove a given program's use is not going to block.

With more traditional threads my mental model is that _everything_ always has to be interrupt-able, have some form of engineer chosen timeout for a parallel operation, and address failure of operation in design.

I never see any of that in the toy examples that are presented as educational material. Maybe Rust's async also requires such careful design to be safely utilized.

zenmac 2 days ago||

Guess Rust is more built for memory safety not concurrency? Erlang maybe? Why can't we just have a language that is memory safe and built for concurrency? Like Ocaml and Erlang combine?

jitl 2 days ago|||

Are you looking for Gleam? Simple but powerful typed functional language for BEAM and JavaScript. It’s a bit high level compared to Ocaml in terms of needing a thick runtime and being somewhat far from machine code.

Really beautiful language design imo. Does a great job avoiding the typelevel brainfuck problem I have with Haskell.

https://gleam.run/

kibwen 1 day ago|||

Rust is absolutely built for concurrency, even moreso than for memory safety--it just so happens that memory safety is a prerequisite for thread safety. You're going to have a hard time finding any other industrial-strength language that statically prevents data races. If you can use Erlang, then sure, use Erlang. But if you can't use Erlang, and you need concurrency, you're not going to find a better candidate than Rust.

gf000 1 day ago|||

I think an important takeaway here that many often ignore is that in language design, not having low-level control over something is sometimes just as important design tradeoff as having it.

From that it also follows that it may not be too fruitful to try to tackle every domain there is with a single language only.

(With that said, I absolutely love sync Rust, and Go is definitely not a good example of an elegantly designed language, I am talking in a more general way here)

newpavlov 2 days ago|||

>This goes all the way back to the "futures are inert" design of async Rust

Yeap. And this footgun is yet another addition to the long list of reasons why I consider the Rust async model with its "inert" futures managed in user space a fundamentally flawed un-Rusty design.

filleduchaos 2 days ago||

I feel there's a difference between a preference and a flaw. Rust has targets that make anything except inert futures simply unworkable, and in my opinion it's entirely valid for a programming language to prioritise those targets.

Rusky 2 days ago||

The requirement is that the futures are not separate heap allocations, not that they are inert.

It's not at all obvious that Rust's is the only possible design that would work here. I strongly suspect it is not.

In fact, early Rust did some experimentation with exactly the sort of stack layout tricks you would need to approach this differently. For example, see Graydon's post here about the original implementation of iterators, as lightweight coroutines: https://old.reddit.com/r/ProgrammingLanguages/comments/141qm...

vlovich123 2 days ago|||

If it’s not inert, how do you use async in the kernel or microcontrollers? A non-inert implementation presumes a single runtime implementation within std+compiler and not usable in environments where you need to implement your own meaning of dispatch.

tux3 1 day ago|||

I think the kernel and microcontroller use-case has been overstated.

A few bare metal projects use stackless coroutines (technically resumable functions) for concurrency, but it has turned out to be a much smaller use-case than anticipated. In practice C and C++ coroutines are really not worth the pain that they are to use, and Rust async has mostly taken off with heavy-duty executors like Tokio that very much don't target tiny #[no-std] 16-bit microcontrollers.

The Kernel actually doesn't use resumable functions for background work, it uses kernel threads. In the wider embedded world threads are also vastly more common than people might think, and the really low-end uniprocessor systems are usually happy to block. Since these tiny systems are not juggling dozens of requests per second that are blocking on I/O, they don't gain that much from coroutines anyways.

We mostly see bigger Rust projects use async when they have to handle concurrent requests that block on IO (network, FS, etc), and we mostly observe that the ecosystem is converging on tokio.

Threads are not free, but most embedded projects today that process requests in parallel — including the kernel — are already using them. Eager futures are more expensive than lazy futures, and less expensive than threads. They strike an interesting middle ground.

Lazy futures are extremely cheap at runtime. But we're paying a huge complexity cost in exchange that benefits a very small user-base than hasn't really fully materialized as we hoped it would.

kibwen 1 day ago||

> it has turned out to be a much smaller use-case than anticipated

Well, no, at the time of the design of Rust's async MVP, everyone was pretty well aware that the vast majority of the users would be writing webservers, and that the embedded use case would be a decided minority, if it ever existed at all. That Embassy exists and its ecosystem as vibrant as it is is, if anything, an unexpected triumph.

But regardless of how many people were actually expected to use it in practice, the underlying philosophy remained thus: there exist no features of Rust-the-language that are incompatible with no_std environments (e.g. Rust goes well out of its way, and introduces a lot of complexity, to make things like closures work given such constraints), and it would be exceptional and unprecedented for Rust to violate this principle when it comes to async.

tux3 1 day ago||

Point taken, I might have formed the wrong impression at the time.

With my C++ background, I'm very much at home with that philosophy, but I think there is room for nuance in how strictly orthodox we are.

C++ does have optional language features that introduce some often unwelcone runtime overhead, like RTTI and unwinding.

Rust does not come configured for freestanding environments out of the box either. Like C++, you are opting out of language features like unwinding as well as the standard library when going freestanding.

I want to affirm that I'm convinced Rust is great for embedded. It's more that I mostly love async when I get to use it for background I/O with a full fledged work stealing thread-per-core marvel of engineering like tokio!

In freestanding Rust the I/O code is platform specific, suddenly I'd have to write the low-level async code myself, and it's not clear this makes the typical embedded project that much higher performance, or all that easy to maintain.

So, I don't want to say anything too radical. But I think the philosophy doesn't have to be as clear cut as no language feature ever incompatible with no-std. Offering a std only language feature is not necessarily closing a door to embedded. We sort of already make opt-out concessions to have a friendlier experience for most people.

(Apologies for the wall of text)

Rusky 1 day ago|||

"Not inert" does not at all imply "a single runtime within std+compiler." You've jumped way too far in the opposite direction there.

The problem is that the particular interface Rust chose for controlling dispatch is not granular enough. When you are doing your own dispatch, you only get access to separate tasks, but for individual futures you are at the mercy of combinators like `select!` or `FuturesUnordered` that only have a narrow view of the system.

A better design would continue to avoid heap allocations and allow you to do your own dispatch, but operate in terms of individual suspended leaf futures. Combinators like `join!`/`select!`/etc. would be implemented more like they are in thread-based systems, waiting for sub-tasks to complete, rather than being responsible for driving them.

vlovich123 21 hours ago||

If you’ve got eager dispatch I’m eager (pun intended) to learn how you have an executor that’s not baked into the std library and limited to a single runtime per process because at the time of construction you need the language to schedule dispatch of the created future. This is one of the main challenges behind the pluggable executor effort - the set of executors that could be written is so different (work stealing vs thread per core) that it’s impossible to unify without an effect system and even then you’ve got challenges of how to encode that in the language structure because the executor is a global thing determined at runtime but then it’s also local in the sense that you don’t know which executor a given piece of code will end up actually being dispatched into since you could have the same async function invoked on different executors.

For better or worse eager dispatch I think generally implies also not being able to cancel futures since ownership is transferred to the executor rather than being retained by your code.

Rusky 12 hours ago||

You don't need any of that, and you can keep cancellation too.

The core of an eager cooperative multitasking system does not even need the concept of an executor. You can spawn a new task by giving it some stack space and running its body to its first suspension point, right there on the current thread. When it suspends, the leaf API (e.g. `lock`) grabs the current top of the stack and stashes it somewhere, and when it's time to resume it again just runs the next part of the task right there on the current thread.

You can build different kinds of schedulers on top of this first-class ability to resume a particular leaf call in a task. For example, a `lock` integrated with a particular scheduler might queue up the resume somewhere instead of invoking it immediately. Or, a generic `lock` might be wrapped with an adapter that re-suspends and queues that up. None of this requires that the language know anything about the scheduler at all.

This is all typical of how higher level languages implement both stackful and stackless coroutines. The difference is that we want control over the "give it some stack space" part- we want the compiler to compute a maximum size and have us specify where to store it, whether that's on the heap (e.g. tokio::spawn) or nested in some other task's stack (e.g. join, select) or some statically-allocated storage (e.g. on a microcontroller).

(Of course the question then becomes, how do you ensure `lock` can't resume the task after it's been freed, either due to normal resumption or cancellation? Rust answers this with `Waker`, but this conflates the unit of stack ownership with the unit of scheduling, and in the process enables intermediate futures to route a given wakeup incorrectly. These must be decoupled so that `lock` can hold onto both the overall stack and the exact leaf suspension point it will eventually resume.)

Cancellation doesn't change much here. Given a task held from the "caller end" (as opposed to the leaf callee resume handles above), the language needs to provide a way to destruct the stack and let the decoupled `Waker` mechanism respond. This still propagates naturally to nested tasks like join/select arms, though there is now an additional wrinkle that a nested task may be actively running (and may even be the thing that indirectly provoked the cancellation).

filleduchaos 2 days ago|||

On the other hand, early Rust also for instance had a tracing garbage collector; it's far from obvious to me how relevant its discarded design decisions are supposed to be to the language it is today.

Rusky 2 days ago||

This one is relevant because it avoids heap allocation while running the iterator and for loop body concurrently. Which is exactly the kind of thing that `async` does.

wahern 2 days ago||

It avoids heap allocation in some situations. But in principle the exact same optimization could be done for stackful coroutines. Heck, right now in C I could stack-allocate an array and pass it to pthread_create as the stack for a new thread. To avoid an overlarge allocation I would need to know exactly how much stack is needed, but this is exactly the knowledge the Rust compiler already requires for async/await.

What people care about are semantics. async/await leaks implementation details. One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM, which might require some feature work to support a deeper level of integration of async without losing what benefits the current implementation provides. Rust has a few warts like this where semantics are stilted in order to confine the implementation work to the high-level Rust compiler.

Rusky 1 day ago|||

> in principle the exact same optimization could be done for stackful coroutines.

Yes, I totally agree, and this is sort of what I imagine a better design would look like.

> One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM

This I would argue is simply a failure of imagination. All you need from the LLVM layer is tail calls, and then you can manage the stack layout yourself in essentially the same way Rust manages Future layout.

You don't even need arbitrary tail calls. The compiler can limit itself to the sorts of things LLVM asks for- specific calling convention, matching function signatures, etc. when transferring control between tasks, because it can store most of the state in the stack that it laid out itself.

zozbot234 1 day ago|||

In order to know for sure how much stack is needed (or to replace the stack with a static allocation, which used to be common on older machines and still today in deep embedded code, and even on GPU!), you must ensure that any functions you call within your thread are non-reentrant, or else that they resort to an auxiliary stack-like allocation if reentrancy is required. This is a fundamental constraint (not something limited to current LLVM) which in practice leads you right back into the "what color are your functions?" world.

Veserv 2 days ago|||

I thought Rust async is a colored stackless coroutine model and thus it would be unsafe to continue execution of previously executing async functions.

To explain, generally speaking, stackless coroutine async only need coloring because they are actually “independent stack”less coroutines. What they actually do is that they share the stack for their local state. This forces async function execution to proceed in LIFO order so you do not blow away the stack of the async function executing immediately after which demands state machine transforms to be safe. This is why you need coloring unlike stackful coroutine models which can execute, yield, and complete in arbitrary order since their local state is preserved in a safe location.

oconnor663 2 days ago|||

Rust futures are "just" structs with a poll() method. The poll() method is a function like any other, so it can have local variables on the stack as usual, but anything it wants to save between calls needs to be a field of the struct instead of a stack local. The magic of async/await is that the compiler figures out which of your async function's variables need to be fields on that struct, and it generated the struct and the poll method for you.

I have a blog series that goes into the concrete details if you like: https://jacko.io/async_intro.html

Veserv 1 day ago||

I see. The Rust implementation effectively splats out the transitive closure of all your callee stack frames upfront which would enable continuing previously executing async functions.

treyd 2 days ago|||

> thus it would be unsafe to continue execution of previously executing async functions.

There's more nuance than this. You can keep polling futures as often as you want. When an async fn gets converted into the state machine, yielding is just expressed as the poll fn returning as not ready.

So it is actually possible for "a little bit" of work to happen, although that's limited and gets tricky because the way wakers work ensure that normally futures only get polled by the runtime when there's actually work for them to do.

johnisgood 2 days ago|||

Off-topic but that code looks quite... complicated as opposed to what I would write in Erlang, Elixir, Go, or even C. Maybe it is just me.

jerf 2 days ago||

Erlang/Elixir and Go "solve" this problem by basically not giving you the rope to hang yourself in this particular way in the first place. This is a perfectly valid and sensible solution... but it is not the only solution. It means you're paying for some relatively expensive full locks that the Rust async task management is trying to elide, for what can be quite significant performance gains if you're doing a lot of small tasks.

It is good that not every language gives you this much control and gives some easier options for when those are adequate, but it is also good that there is some set of decent languages that do give you this degree of control for when it is necessary, and it is good that we are not surrendering that space to just C and/or C++. Unfortunately such control comes with footguns, at least over certain spans of time. Perhaps someone will figure out a way to solve this problem in Rust in the future.

johnisgood 1 day ago||

> It means you're paying for some relatively expensive full locks that the Rust async task management is trying to elide, for what can be quite significant performance gains if you're doing a lot of small tasks.

The point of Erlang/Elixir is that it is as performant as possible, and Erlang's history is a testament to this. BEAM is wonderful, and really fast, along with the languages on it being ergonomic (OTP behaviors, supervisors, etc.).

jerf 1 day ago||

This is a myth, from the old days when BEAM was the only thing that could juggle thousands of "processes" without losing performance, and even back then, people routinely missed that while BEAM could juggle those thousands of processes, each of them was individually not that fast. That is, BEAM's extremely high performance was only in one isolated thing, not high performance across the board.

Now BEAM is far from the only runtime juggling that many processes, but it remains a relatively slow VM. I rule-of-thumb it at 10x slower than C, making it a medium performance VM at best, and you want to watch your abstraction layers in those nicer languages like Gleam because further multiplicative slow downs can really start to bite.

The first serious Go program I wrote was a replacement for something written in Erlang, there was no significant architectural improvement in the rewrite (it was already reasonably well-architected), and from the first deployment, we went from 4 systems, sometimes struggling with the load spikes, to where just one could handle it all, even with BEAM being over a decade more mature and the Go clustering code being something I wrote over a few weeks rather than battle tested, optimized code.

BEAM is good at managing concurrency, but it is slowish in other ways. It's better than the dynamic scripting languages like Python by a good amount but it is not performance-competitive with a modern compiled language.

johnisgood 19 hours ago||

Go's runtime is indeed faster for CPU-bound and raw throughput scenarios, but it lacks the same fault-tolerance semantics, hot-code reloading, and actor-level isolation - things that make BEAM indispensable in telecoms, messaging systems, and fault-resilient distributed architectures. Go may run faster, but Erlang and Elixir recover faster and fail more gracefully, which in distributed systems often matters more than raw speed. Go might win on throughput, but the BEAM wins on fault-tolerant system design - and that's why systems built on it keep running reliably for decades.

Additionally, high-performance Erlang/Elixir systems often delegate compute-intensive work to NIFs (native implemented functions) in C or Rust, or use ports to external services.

elchananHaas 2 days ago|||

My view of this is that its closer to the basic 2 lock Deadlock.

Thread 1 acquires A. Thread 2 acquires B. Thread 1 tries to acquire B. Thread 2 tries to acquire A.

In this case, the role "A" is being played by the front of the Mutex's lock queue. Role "B" is being played by the Tokio's actively executed task.

Based on this understanding, I agree that the surprising behavior is due to Tokio's Mutex/Lock Queue implementation. If this was an OS Mutex, and a thread waiting for the Mutex can't wake for some reason, the OS can wake a different thread waiting for that Mutex. I think the difficulty in this approach has to do with how Rust's async is implemented. My guess is the algorithm for releasing a lock goes something like this:

1. Pop the head of the wait queue. 2. Poll the top level tokio::spawn'ed task of the Future that is holding the Mutex.

What you want is something like this

For each Future in the wait queue (Front to Back): Poll the Future. If Success - Break ???Something if everything fails???

The reason this doesn't work has to do with how futures compose. Futures compile to states within a state machine. What happens when a future polled within the wait queue completes? How is control flow handed back to the caller?

I guess you might be able to have some fallback that polls the futures independently then polls the top level future to try and get things unstuck. But this could cause confusing behavior where futures are being polled even though no code path within your code is await'ing them. Maybe this is better though?

dboreham 1 day ago||

Which is why "async" is a pox on our house. System threads can and do address these edge issues. User level concurrency generally doesn't (perhaps with the exceptions of golang and erlang).

Matthias247 2 days ago||

As far as I remember from building these things with others within the async rust ecosystem (hey Eliza!) was that there was a certain tradeoff: if you wouldn’t be able to select on references, you couldn’t run into this issue. However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.

I fully agree that this and the cancellation issues discussed before can lead to surprising issues even to seasoned Rust experts. But I’m not sure what really can be improved under the main operating model of async rust (every future can be dropped).

But compared to working with callbacks the amount of surprising things is still rather low :)

mycoliza 2 days ago||

Indeed, you are correct (and hi Matthias!). After we got to the bottom of this deadlock, my coworkers and I had one of our characteristic "how could we have prevented this?" conversations, and reached the somewhat sad conclusion that actually, there was basically nothing we could easily blame for this. All the Tokio primitives involved were working precisely as they were supposed to. The only thing that would have prevented this without completely re-designing Rust's async from the ground up would be to ban the use of `&mut future`s in `select!`...but that eliminates a lot of correct code, too. Not being able to do that would make it pretty hard to express a lot of things that many applications might reasonably want to express, as you described. I discussed this a bit in this comment[1] as well.

On the other hand, it also wasn't our coworker who had written the code where we found the bug who was to blame, either. It wasn't a case of sloppy programming; he had done everything correctly and put the pieces together the way you were supposed to. All the pieces worked as they were supposed to, and his code seemed to be using them correctly, but the interaction of these pieces resulted in a deadlock that it would have been very difficult for him to anticipate.

So, our conclusion was, wow, this just kind of sucks. Not an indictment of async Rust as a whole, but an unfortunate emergent behavior arising from an interaction of individually well-designed pieces. Just something you gotta watch out for, I guess. And that's pretty sad to have to admit.

[1] https://news.ycombinator.com/item?id=45776868

kibwen 1 day ago|||

> All the Tokio primitives involved were working precisely as they were supposed to. The only thing that would have prevented this without completely re-designing Rust's async from the ground up would be to ban the use of `&mut future`s in `select!`...but that eliminates a lot of correct code, too.

But it still suggests that `tokio::select` is too powerful. You don't need to get rid of `tokio::select`, you just need to consider creating a less powerful mechanism that doesn't risk exhibiting this problem. Then you could use that less powerful mechanism in the places where you don't need the full power of `tokio::select`, thereby reducing the possible places where this bug could arise. You don't need to get rid of the fully powerful mechanism, you just need to make it optional.

tux3 1 day ago||

I feel like select!() is a good case study because the common future timeout use-case maps pretty closely to a select!(), so there is only so much room to weaken it.

The ways I can think of for making select!() safer all involve runtime checks and allocations (possibly this is just a failure of my imagination!). But if that's the case, I would find it bothersome if our basic async building blocks like select/timeout in practice turn out to require more expensive runtime checks or allocations to be safe.

We have a point in the async design space where we pay a complexity price, but in exchange we get really neat zero-cost futures. But I feel like we only get our money's worth if we can actually statically prove that correct use won't deadlock, without the expensive runtime checks! Otherwise, can we afford to spend all this complexity budget?

The implementation of select!() does feel way too powerful in a way (it's a whole mini scheduler that creates implicit future dependencies hidden from the rest of the executor, and then sometimes this deadlocks!). But the need is pretty foundational, it shows up everywhere as a building block.

kibwen 1 day ago||

It feels to me like there's plenty of design space to explore. Sure, it's possible to view "selection" as a basic building block, but even that is insufficiently precise IMO. There's a reason that Javascript provides all of Promise.any and Promise.all and Promise.allSettled and Promise.race. Selection isn't just a single building block, it's an entire family of building blocks with distinct semantics.

imtringued 1 day ago|||

You must guarantee forward progress inside your critical sections and that means your critical sections are guaranteed to finish. How hard is that to understand? From my perspective this situation was basically guaranteed to happen.

There is no real difference between a deadlock caused by a single thread acquiring the same non reentrant lock twice and a single thread with two virtual threads where the the first thread calls the code of the second thread inside the critical section. They are the same type of deadlock caused by the same fundamental problem.

>Remember too that the Mutex could be buried beneath several layers of function calls in different modules or packages. It could require looking across many layers of the stack at once to be able to see the problem.

That is a fundamental property of mutexes. Whenever you have a critical section, you must be 100% aware of every single line of code inside that critical section.

>There’s no one abstraction, construct, or programming pattern we can point to here and say "never do this". Still, we can provide some guidelines.

The programming pattern you're looking for is guaranteeing forward progress inside critical sections. Only synchronous code is allowed to be executed inside a critical section. The critical section must be as small as possible. It must never be interrupted, ever.

Sounds like a pain in the ass, right? That's right, locks are a pain in the ass.

octoberfranklin 2 days ago||

> However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.

No, just have select!() on a bunch of owned Futures return the futures that weren't selected instead of dropping them. Then you don't lose state. Yes, this is awkward, but it's the only logically coherent way. There is probably some macro voodoo that makes it ergonomic. But even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.

For the real root cause: https://news.ycombinator.com/item?id=45777234

mycoliza 2 days ago||

> No, just have select!() on a bunch of owned Futures return the futures that weren't selected instead of dropping them. Then you don't lose state.

How does that prevent this kind of deadlock? If the owned future has acquired a mutex, and you return that future from the select so that it might be polled again, and the user assigns it to a variable, then the future that has acquired the mutex but has not completed is still not dropped. This is basically the same as polling an `&mut future`, but with more steps.

octoberfranklin 2 days ago||

> How does that prevent this kind of deadlock?

Like I said, it doesn't:

> even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.

It fixes this:

> However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.

If you want to fix the root cause, see https://news.ycombinator.com/item?id=45777234

jacquesm 2 days ago||

If any rust designers are lurking about here: what made you decide to go for the async design pattern instead of the actor pattern, which - to me at least - seems so much cleaner and so much harder to get wrong?

Ever since I started using Erlang it felt like I finally found 'the right way' when before then I did a lot of work with sockets and asynchronous worker threads. But even though it usually worked as advertised it had a large number of really nasty pitfalls which the actor model seemed to - effortlessy - step aside.

So I'm seriously wondering what the motivation was. I get why JS uses async, there isn't any other way there, by the time they added async it was too late to change the fundamentals of the language to such a degree. But rust was a clean slate.

sunshowers 2 days ago||

Not a Rust designer, but a big motivation for Rust's async design was wanting it to work on embedded, meaning no malloc and no threads. This unfortunately precludes the vast majority of the design space here, from active futures as seen in JS/C#/Go to the actor model.

You can write code using the actor model with Tokio. But it's not natural to do so.

oconnor663 2 days ago|||

Kind of a tangent, but I think "systems programming" tends to bounce back and forth between three(?) different concerns that turn out to be closely related:

1. embedded hardware, like you mentioned

2. high-performance stuff

3. "embedding" in the cross-language sense, with foreign function calls

Of course the "don't use a lot of resources" thing that makes Rust/C/C++ good for tiny hardware also tends to be helpful for performance on bigger iron. Similarly, the "don't assume much about your runtime" thing that's necessary for bare metal programming also helps a lot with interfacing with other languages. And "run on a GPU" is kind of all three of those things at once.

So yeah, which of those concerns was async Rust really designed around? All of them I guess? It's kind of like, once you put on the systems programming goggles for long enough, all of those things kind of blend together?

kibwen 2 days ago||

> So yeah, which of those concerns was async Rust really designed around? All of them I guess?

Yes, all of them. Futures needed to work on embedded platforms (so no allocation), needed to be highly optimizable (so no virtual dispatch), and need to act reasonably in the presence of code that crosses FFI boundaries (so no stack shenanigans). Once you come to terms with these constraints--and then add on Rust's other principles regarding guaranteed memory safety, references, and ownership--there's very little wiggle room for any alternative designs other than what Rust came up with. True linear types could still improve the situation, though.

oconnor663 2 days ago|||

> so no virtual dispatch

Speaking of which, I'm kind of surprised we landed on a Waker design that requires/hand-rolls virtual dispatch. Was there an alternate universe where every `poll()` function was generic on its Waker?

mjevans 2 days ago|||

In my view, the major design sin was not _forcing_ failure into the outcome list.

.await(DEADLINE) (where deadline is any non 0 unit, and 0 is 'reference defined' but a real number) should have been the easy interface. Either it yields a value or it doesn't, then the programmer has to expressly handle failure.

Deadline would only be the minimum duration after which the language, when evaluating the future / task, would return the empty set/result.

kibwen 1 day ago|||

> Deadline would only be the minimum duration after which the language, when evaluating the future / task, would return the empty set/result.

This appears to be misunderstanding how futures work in Rust. The language doesn't evaluate futures or tasks. A future is just a struct with a poll method, sort of like how a closure in Rust is just a struct with a call method. The await keyword just inserts yield points into the state machine that the language generates for you. If you want to actually run a future, you need an executor. The executor could implement timeouts, but it's not something that the language could possibly have any way to enforce or require.

danielheath 2 days ago|||

Does that imply a lot of syscalls to get the monotonic clock value? Or is there another way to do that?

muvlon 1 day ago|||

On Linux there is the VDSO, which on all mainstream architectures allows you to do `clock_gettime` without going through a syscall. It should take on the order of (double digit) nanoseconds.

mjevans 2 days ago|||

If the scheduler is doing _any_ sort of accounting at all to figure out any remote sort of fairness balancing at all, then whatever resolution that is probably works.

At least for Linux, offhand, popular task scheduler frequencies used to be 100 and 1000hz.

Looks like the Kernel's tracking that for tasks:

https://www.kernel.org/doc/html/latest/scheduler/sched-desig...

"In CFS the virtual runtime is expressed and tracked via the per-task p->se.vruntime (nanosec-unit) value."

I imagine the .vruntime struct field is still maintained with the newer "EEVDF Scheduler".

...

A Userspace task scheduler could similarly compare the DEADLINE against that runtime value. It would still reach that deadline after the minimum wait has passed, and thus be 'background GCed' at a time of the language's choice.

jitl 2 days ago||

The issue is that no scheduler manages futures. The scheduler sees tasks, futures are just a struct. See discussion of embedded above: there is no “kernel esque” parallel thread

shepmaster 2 days ago||||

> But it's not natural to do so.

I tend to write most of my async Rust following the actor model and I find it natural. Alice Rhyl, a prominent Tokio contributor, has written about the specific patterns:

https://ryhl.io/blog/actors-with-tokio/

sunshowers 2 days ago|||

Oh I do too, and that's one of the recommendations in RFD 400 as well as in my talk. cargo-nextest's runner loop [1] is also structured as two main actors + one for each test. But you have to write it all out and it can get pretty verbose.

[1] https://nexte.st/docs/design/architecture/runner-loop/

baq 1 day ago|||

The ‘Beware of cycles’ section at the end has some striking similarities with futurelock avoidance recommendations from the original article… not sure what to make of this except to say that this stuff is hard?

lll-o-lll 2 days ago||||

As a curious bystander, it will be interesting to see how the Zig async implementation pans out. They have the advantage of getting to see the pitfalls of those that have come before.

Getting back to Rust, even if not natural, I agree with the parent that the actor model is simply the better paradigm. Zero runtime allocation should still be possible, you just have to accept some constraints.

I think async looks simple because it looks like writing imperative code; unfortunately it is just obfuscating the complex reality underlying. The actor model makes things easier to reason about, even if it looks more complicated initially.

sunshowers 2 days ago|||

I think you can do a static list of actors or tasks in embedded, but it's hard to dynamically spin up new ones. That's where intra-task concurrency is helpful.

throwawaymaths 2 days ago||||

iiuc zig has thought about this specifically and there is a safe async-cancel in the new design that wasn't there in the old one.

sgt 1 day ago|||

I was wondering when someone would bring up Zig. I think it's fascinating how far it has come in the last couple of years and now the new IO interface/async implementation.

Question is - when will Zig become mature enough to become a legit choice next to say, Go or Rust?

I mean for a regular dev team, not necessarily someone who works deeply along with Andrew Kelley etc like Tigerbeetle.

fpoling 2 days ago||||

Rust async still uses a native stack which just a form of memory allocator that uses LIFO order. And controlling stack usage in the embedding world is just as important as not relying on the system allocator.

So its a pity that Rust async design tried so hard to avoid any explicit allocations rather than using an explicit allocator that embedding can use to preallocate and reuse objects.

zbentley 2 days ago|||

> a native stack [is] just a form of memory allocator

There is a lot riding on that “just”. Hardware stacks are very, very unlike heap memory allocators in pretty much every possible way other than “both systems provide access to memory.”

Tons and tons of embedded code assumes the stack is, indeed, a hardware stack. It’s far from trivial to make that code “just use a dummy/static allocator with the same api as a heap”; that code may not be in Rust, and it’s ubiquitous for embedded code to not be written with abstractions in front of its allocator—why would it do otherwise, given that tons of embedded code was written for a specific compiler+hardware combination with a specific (and often automatic or compiler-assisted) stack memory management scheme? That’s a bit like complaining that a specific device driver doesn’t use a device-agnostic abstraction.

fpoling 2 days ago||

During the design phase of Rust async there were no async embedded code written to be inspired. For systems with tight memory budget it is common to pre-allocate everything often using custom bump allocation or split memory into few regions for fixed-sized things and allocate from those.

And then the need to poll features by the runtime means that async in Rust requires non-trivial runtime going against the desire to avoid abstractions in the embedded.

Async without polling while stack-unfriendly requires less runtime. And if Rust supported type-safe region-based allocators when a bunch of things are allocated one by one and then released at once it could be a better fit for the embedded world.

tony69 2 days ago|||

Stack allocation/deallocation does not fragment memory, that’s a yuge difference for embedded systems and the main reason to avoid the heap

fpoling 2 days ago||

Even with the stack the memory can fragment. Just consider one created 10 features on the stack and the last completed last. Then memory for the first 9 will not be released until the last completes.

This problem does not happen with a custom allocator where things to allocate are of roughly the same size and allocator uses same-sized cells to allocate.

jacquesm 2 days ago||

Indeed, arena allocators are quite fast and allow you to really lock down the amount of memory that is in use for a particular kind of data. My own approach in the embedded world has always been to simply pre-allocate all of my data structures. If it boots it will run. Dynamic allocation of any kind is always going to have edge cases that will cause run-time issues. Much better to know that you have a deterministic system.

RossBencina 2 days ago|||

Why would the actor model require malloc and/or threads?

gf000 1 day ago||

You basically have a concurrency-safe message queue. It would be pretty limited without malloc (fixed max queue size).

the__alchemist 2 days ago|||

Your application needs concurrency. So, the answer is... switch your entire application, code style, and libraries it uses into a separate domain that is borderline incompatible with normal one? And has its own dialects that have their own compatibility barriers? Doesn't make sense to me.

toast0 2 days ago||

It's easier to write all applications, concurrent or not, in a style that works well for concurrency. Lots of applications can benefit from concurrency.

You can do straight line, single threaded, non concurrent code in an actor model. Mostly, that's what most of the actor code will look like. Get a message, update local state in a straight forward way, send a response, repeat.

raggi 2 days ago|||

_an answer_ is performance - the necessity of creating copyable/copied messages for inter-actor communication everywhere in the program _can be_ expensive.

that said there are a lot of parts of a lot of programs where a fully inlined and shake optimized async state machine isn't so critical.

it's reasonable to want a mix, to use async which can be heavily compiler optimized for performance sensitive paths, and use higher level abstractions like actors, channels, single threaded tasks, etc for less sensitive areas.

lll-o-lll 2 days ago||

I’m not sure this is actually true? Do messages have to be copied?

raggi 2 days ago||

if you want your actors to be independent computation flows and they're in different coroutines or threads, then you need to arrange that the data source can not modify the data once it arrives at the destination, in order to be safe.

in a single threaded fully cooperative environment you could ensure this by implication of only one coroutine running at a time, removing data races, but retaining logical ones.

if you want to eradicate logical races, or have actual parallel computation, then the source data must be copied into the message, or the content of the message be wrapped in a lock or similar.

in almost all practical scenarios this means the data source copies data into messages.

sapiogram 2 days ago|||

Rust solves this at compile-time with move semantics, with no runtime overhead. This feature is arguably why Rust exists, it's really useful.

zorgmonkey 2 days ago|||

Rust moves are a memcpy where the source becomes effectively unitialized after the move (that is say it is undefined to access it after the move). The copies are often optimized by the compiler but it isn't guaranteed.

This actually caused some issues with rust in the kernel because moving large structs could cause you to run out the small amount of stack space availabe on kernel threads (they only allocate 8-16KB of stack compared to a typical 8MB for a userspace thread). The pinned-init crate is how they ended solving this [1].

[1] https://crates.io/crates/pinned-init

raggi 2 days ago|||

if you can always move the data that's the sweet spot for async, you just pass it down the stack and nothing matters.

all of the complexity comes in when more than one part of the code is interested in the state at the same time, which is what this thread is about.

gleenn 2 days ago||||

Isn't that something Rust is particularly good at, controlling the mutation of shared memory?

raggi 2 days ago||

yes

vlovich123 2 days ago|||

In Rust wouldn’t you just Send the data?

mdasen 2 days ago|||

I'd recommend watching this video: https://www.infoq.com/presentations/rust-2019/; and reading this: https://tokio.rs/blog/2020-04-preemption

I'm not the right person to write a tl;dr, but here goes.

For actors, you're basically talking about green threads. Rust had a hard constraint that calls to C not have overhead and so green threads were out. C is going to expect an actual stack so you have to basically spin up a real stack from your green-thread stack, call the C function, then translate it back. I think Erlang also does some magic where it will move things to a separate thread pool so that the C FFI can block without blocking the rest of your Erlang actors.

Generally, async/await has lower overhead because it gets compiled down to a state machine and event loop. Languages like Go and Erlang are great, but Rust is a systems programming language looking for zero cost abstractions rather than just "it's fast."

To some extent, you can trade overhead for ease. Garbage collectors are easy, but they come with overhead compared to Rust's borrow checker method or malloc/free.

To an extent it's about tradeoffs and what you're trying to make. Erlang and Go were trying to build something different where different tradeoffs made sense.

EDIT: I'd also note that before Go introduced preemption, it too would have "pitfalls". If a goroutine didn't trigger a stack reallocation (like function calls that would make it grow the stack) or do something that would yield (like blocking IO), it could starve other goroutines. Now Go does preemption checks so that the scheduler can interrupt hot loops. I think Erlang works somewhat similarly to Rust in scheduling in that its actors have a certain budget, every function call decrements their budget, and when they run of of budget they have to yield back to the scheduler.

jacquesm 2 days ago||

Indeed, in Erlang the budget is counted in 'reductions'. Technically Erlang uses the BEAM as a CPU with some nifty extra features which allow you to pretend that you are pre-empting a process when in fact it is the interpreter of the bytecode that does the work and there are no interrupts involved. Erlang would not be able to do this if the Erlang input code was translated straight to machine instructions.

But Go does compile down to machine code, so that's why until it did pre-emption it needed that yield or hook.

Come to think of it: it is strange that such quota management isn't built into the CPU itself. It seems like a very logical thing to do. Instead we rely on hardware interrupts for pre-emption and those are pretty fickle. It also means that there is a fixed system wide granularity for scheduling.

yxhuvud 1 day ago||

Fickle? Pray tell, when the OS switch your thread for another thread, in what way does that fickleness show?

jacquesm 1 day ago||

I take it you've never actually interfaced directly with hardware?

Interrupts are at the most basic level an electrical signal to the CPU to tell it to load a new address into the next instruction pointer after pushing the current one and possibly some other registers onto the stack. That means you don't actually know when they will happen and they are transparent to the point that those two instructions that you put right after one another are possibly detoured to do an unknown amount of work in some other place.

Any kind of side effect from that detour (time spent, changes made to the state of the machine) has the potential to screw up the previously deterministic path that you were on.

To make matters worse, there are interrupts that can interrupt the detour in turn. There are ways in which you can tell the CPU 'not now' and there are ways in which those can be overridden. If you are lucky you can uniquely identify the device that caused the interrupt to be triggered. But this isn't always the case and given the sensitivity of the inputs involved it isn't rare at all that your interrupt will trigger without any ground to do so. If that happens and the ISR is not written with that particular idea in mind you may end up with a system in an undefined state.

Interrupts are a very practical mechanism. But they're also a nightmare to deal with in the otherwise orderly affairs of computing and troubleshooting interrupt related issues can eat up days, weeks or even months if you are really unlucky.

the__alchemist 2 days ago||

I'm surprised learning this too. I know the hobby embedded, and HTTP-server OSS ecosystem have committed to Async, but I didn't expect Oxide would.

mycoliza 2 days ago||

We actually don't use Rust async in the embedded parts of our system. This is largely because our firmware is based on a multi-tasking microkernel operating system, Hubris[1], and we can express concurrency at the level of the OS scheduler. Although our service processors are single-core systems, we can still rely on the OS to schedule multiple threads of execution.

Rust async is, however, very useful in single-core embedded systems that don't have an operating system with preemptive multitasking, where one thread of execution is all you ever get. It's nice to have a way to express that you might be doing multiple things concurrently in an event-driven way without having to have an OS to manage preemptive multitasking.

[1] https://hubris.oxide.computer/

jabedude 2 days ago||

Heh, this is super interesting to hear. Single threaded async/concurrent code is so fun and interesting to see. I’ve ran some tokio programs in single threaded mode just to see it in action

oconnor663 2 days ago||

> FAQ: doesn’t future1 get cancelled?

I guess cancellation is really two different things, which usually happen at the ~same time, but not in this case: 1) the future stops getting polled, and 2) the future gets dropped. In this example the drop is delayed, and because the future is holding a guard,* the delay has side effects. So the future "has been cancelled" in the sense that it will never again make forward progress, but it "hasn't been cancelled yet" in the sense that it's still holding resources. I wonder if it's practical to say "make sure those two things always happen together"?

* Technically a Tokio-internal `Acquire` future that owns a queue position to get a guard, but it sounds like the exact same bug could manifest after it got the guard too, so let's call it a guard.

mleonhard 2 days ago||

> // Start a background task that takes the lock and holds it for a few seconds.

Holding a lock while waiting for IO can destroy a system's performance. With async Rust, we can prevent this by making the MutexGuard !Send, so it cannot be held across an await. Specifically, because it is !Send, it cannot be stored in the Future [2], so it must be dropped immediately, freeing the lock. This also prevents Futurelock deadlock.

This is how I wrote safina::sync::Mutex [0]. I did try to make it Send, like Tokio's MutexGuard, but stopped when I realized that it would become very complicated or require unsafe.

> You could imagine an unfair Mutex that always woke up all waiters and let them race to grab the lock again. That would not suffer from risk of futurelock, but it would have the thundering herd problem plus all the liveness issues associated with unfair synchronization primitives.

Thundering herd is when clients overload servers. This simple Mutex has O(n^2) runtime: every task must acquire and release the mutex, which adds all waiting tasks to the scheduler queue. In practice, scheduling a task is very fast (~600ns). As long as polling the lock-mutex-future is fast and you have <500 waiting tasks, then the O(n^2) runtime is fine.

Performance is hard to predict. I wrote Safina using the simplest possible implementations and assumed they would be slow. Then I wrote some micro-benchmarks and found that some parts (like the async Mutex) actually outperform Tokio's complicated versions [1]. I spent days coding optimizations that did not improve performance (work stealing) or even reduced performance (thread affinity). Now I'm hesitant to believe assumptions and predictions about performance, even if they are based on profiling data.

[0] https://docs.rs/safina/latest/safina/sync/struct.MutexGuard....

[1] https://docs.rs/safina/latest/safina/index.html#benchmark

[2] Multi-threaded async executors require futures to be Send.

ufmace 1 day ago||

Considering this issue did also make me think - maybe the real footgun here is the async mutex. I think a better "rule" to avoid this issue might be something like, don't just use the tokio async mutex by default just because it's there and you're in an async function; instead default to a sync mutex that errors when held across awaits and think very hard about what you're really doing before you switch to the async one.

ufmace 1 day ago||

Actually I think I might be a little misguided here - confusing a mutex with an awaitable lock method versus blocking, and a mutex whose LockGuard is Send and can be held across other await points.

To clarify, I do still think it's probably wise to prefer using a mutex whose LockGuard is not Send. If you're in an async context though, it seems clearly preferable to use a mutex that lets you await on lock instead of possibly blocking. Looks like that's what that Safina gives you.

It does bring to mind the point though - does it really make sense to call all of these things Mutexes? Most Mutexes, including the one in std, seem relatively simplistic, with no provision for exactly what happens if multiple threads/tasks are waiting to acquire the lock. As if they're designed for the case of, it's probably rare to never for multiple threads to actually need this thing at once, but we have to guard against it just to be certain. The case of this resource is in high demand by a bunch of threads, we expect there to be a lot of time spent by a lot of threads waiting to get the lock, so it's actually important which lock requesters actually get the lock in what order, seems different enough that it maybe ought to have a different name and more flexibility and selection as to what algorithm is being used to control the lock order.

dvratil 1 day ago|||

I would guess this is just to make the explanation of the bug easier.

In real world, the futurelock could occur even with very short locks, it just wouldn't be so deterministic. Having a minimal reproducer that you have to run a thousand times and it will maybe futurelock doesn't really make for a good example :)

imtringued 1 day ago||

>In real world, the futurelock could occur even with very short locks, it just wouldn't be so deterministic.

You have to explain the problem properly then. The problem here has nothing to do with duration whatsoever so don't bring that up. The problem here is that if you acquire a lock, you're inside a critical section. Critical sections have a programming paradigm that is equivalent to writing unsafe Rust. You're not allowed to panic inside unsafe Rust or inside critical sections. It's simply not allowed.

You're also not allowed to interrupt the critical section by something that does not have a hard guarantee that it will finish. This rules out await inside the critical section. You're not allowed to do await. It's simply not allowed. The only thing you're allowed to do is execute an instruction that guarantees that N-1 instructions are left to be executed, where N is a finite number. Alternatively you do the logical equivalent. You have a process that has a known finite bound on how long it will take to execute and you are waiting for that external process.

After that process has finished, you release the lock. Then you return to the scheduler and execute the next future. The next future cannot be blocked because the lock has already been released. It's simply impossible.

You now have to explain how the impossible happened. After all, by using the lock you've declared that you took all possible precautions to avoid interrupting the critical section. If you did not, then you deserve any bugs coming your way. That's just how locks are.

dap 1 day ago||

I think you misunderstand the problem. The only purpose of the sleep in this example is to control interleaving of execution to ensure the problem happens. Here's a version where the background task (the initial lock holder) only runs a bounded number of instructions with the lock held, just as you suggest:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

It still futurelocks.

> After that process has finished, you release the lock. Then you return to the scheduler and execute the next future. The next future cannot be blocked because the lock has already been released. It's simply impossible.

This is true with threads and with tasks that only ever poll futures sequentially. It is not true in the various cases mentioned in this RFD (notably `tokio::select!`, but also others). Intuitively: when you have one task polling on multiple futures concurrently, you're essentially adding another layer to the scheduler (kernel thread scheduler, tokio task scheduler, now some task is acting as its own future scheduler). The problem is it's surprisingly easy to (1) not realize that and (2) accidentally have that "scheduler" not poll the next runnable future and then get stuck, just like if the kernel scheduler didn't wake up a runnable thread.

yxhuvud 1 day ago||

Work stealing is more a technique to function better when architecture is pessimal (think mixing slow and fast tasks in one queue) than something that make things go faster in general. It also tend to shuffle around complexity a bit, in ways that are sometimes nice.

Same thing with task preemption, though that one has less organisatorial impact.

In general, getting something to perform well enough on specific tasks is a lot easier than performing well enough on tasks in general. At the same time, most tasks have kinda specific needs when you start looking at them..

forrestthewoods 2 days ago||

I feel like I’m pretty good at writing multithreaded code. I’ve done it a lot. As long as you use primitives like Rust Mutex that enforce correctness for data access (ie no accessing data without the lock) it’s pretty simple. Define a clean boundary API and you’re off to the races.

async code is so so so much more complex. It’s so hard to read and rationalize. I could not follow this post. I tried. But it’s just a full extra order of complexity.

Which is a shame because async code is supposed to make code simpler! But I’m increasingly unconfident that’s true.

foota 2 days ago||

Async code isn't supposed to be simpler than sync code, it's supposed to be simpler than doing thing like continuation passing.

amelius 2 days ago||

Async code is simpler because you're implicitly holding a lock on the CPU. That's also why you should stay away from it: it increases latency. Especially since Rust is about speed and responsiveness. In general, async programming in Rust makes little sense.

forrestthewoods 2 days ago||

I love Rust. But I’m 100% convinced Rust chose the wrong tradeoffs with their async model. Just give me green threads and use malloc to grow the stack. It’s fine. That would have been better imho.

mjevans 2 days ago|||

Hell, even force threads to be allocated from a bucket of N threads defined at compile time. Surely that'd work for embedded / GPU space?

gf000 1 day ago|||

You can't have a low-level language and green threads at the same time.

forrestthewoods 1 day ago|||

Why not?

forrestthewoods 1 day ago|||

Why not?

crabmusket 1 day ago||

This feels like the sort of thing that has led to the development of deterministic simulation testing (DST) techniques as pioneered by FoundationDB and TigerBeetle.

https://notes.eatonphil.com/2024-08-20-deterministic-simulat...

I hope something like this becomes popular in the Rust/Tokio space. It seems like Turmoil is that?

https://tokio.rs/blog/2023-01-03-announcing-turmoil

orthecreedence 2 days ago|

Great read, and the example code makes sense. This stuff can be a nightmare to find, but once you do it's like a giant 1000 piece puzzle just clicks together instantly.

bcantrill 2 days ago|

Indeed. One of the interesting side effects of being a remote company that records everything[0] is that we have the instant where the "1000 piece puzzle just clicks together" recorded, and it's honestly pretty wild. In this case, it was very much a shared brainstorming between four engineers (Eliza, Sean, John and Dave) -- and there is almost a passing of the baton where they start to imagine the kind of scenario that could induce this and then realize that those are exactly the conditions that exist in the software.

We are (on brand?) going to do a podcast episode on this on Monday[1]; ahead of that conversation I'm going to get a clip of that video out, just because it's interesting to see the team work together to debug it.

[0] https://rfd.shared.oxide.computer/rfd/0537

[1] https://discord.gg/QrcKGTTPrF?event=1433923627988029462

mycoliza 2 days ago||

As a member of (Eliza, Sean, John, and Dave), I can second that debugging this was certainly an adventure. I'm not going to go as far as to say that we had fun, since...you can't have a heroic narrative without real struggle. But it was certainly rewarding to be in the room for that "a-ha!" moment, in which all the pieces really did begin to fit together very quickly. It was like the climax of a detective story --- and it was particularly well-scripted the way each of us contributed a little piece of the puzzle.

littlestymaar 2 days ago||

Since you are of of the people working directly on this codebase, may I ask you why is select! being used/allowed in the first place?

Its footgun-y nature has been known for years (IIRC even the first version of the tokio documentation warned against that) and as such I don't really understand why people are still using it. (For context I was the lead of a Rust team working on a pretty complex async networking program and we had banned select! very early in the project and never regretted this decision once).

0xdeafbeef 2 days ago||

What to use instead?

surajrmal 15 hours ago||

Your favorite llm will give you several good options. You can use an mpsc channel where you have a task per sender which waits on each branch in the select and then sends a message and then just wait on the receiver end of that channel. Or you could use https://docs.rs/futures/latest/futures/future/fn.select.html (or the select_all version). Both make it more obvious what is going on. The last way would be to implement a future manually. This would probably be my favorite option in many cases as it would be least overhead, but if you've never implemented a future it may be a bit intimidating.

Edit: tokio docs actually provide several suggestions: https://docs.rs/tokio/latest/tokio/macro.select.html#alterna...

More comments...