Top
Best
New

Posted by todsacerdoti 10/29/2025

Zig's New Async I/O(andrewkelley.me)
https://www.youtube.com/watch?v=mdOxIc0HM04
329 points | 172 comments
thefaux 10/30/2025|
I find the direction of zig confusing. Is it supposed to be a simple language or a complex one? Low level or high level? This feature is to me a strange mix of high and low level functionality and quite complex.

The io interface looks like OO but violates the Liskov substitution principle. For me, this does not solve the function color problem, but instead hides it. Every function with an IO interface cannot be reasoned about locally because of unexpected interactions with the io parameter input. This is particularly nasty when IO objects are shared across library boundaries. I now need to understand how the library internally manages io if I share that object with my internal code. Code that worked in one context may surprisingly not work in another context. As a library author, how do I handle an io object that doesn't behave as I expect?

Trying to solve this problem at the language level fundamentally feels like a mistake to me because you can't anticipate in advance all of the potential use cases for something as broad as io. That's not to say that this direction shouldn't be explored, but if it were my project, I would separate this into another package that I would not call standard.

throwawaymaths 10/30/2025||
i think you are missing that a proper io interface should encapsulate all abstractions that care about asynchrony and patterns thereof. is that possible? we will find out. It's not unreasonable to be skeptical but can you come up with a concrete example?

> As a library author, how do I handle an io object that doesn't behave as I expect

you ship with tests against the four or five default patterns in the stdlib and if anyone wants to do anything substantially crazier to the point that it doesnt work, thats on them, they can submit a PR and you can curbstomp it if you want.

> function coloring

i recommend reading the function coloring article. there are five criteria that describe what make up the function coloring problem, it's not just that there are "more than one class of function calling conventions"

jvanderbot 10/31/2025||
An interface is a library decision, not a language decision. The level of abstraction possible is part of a language decision. GP is saying that this adds "too much" possible abstraction, and therefore qualifies as "too high level". Another benchmark about "too high level" would be that it requires precisely the "guess the internal plumbing" tests that you describe.

Not really advocating anything, just connecting the two a little better.

throwawaymaths 10/31/2025||
yes indeed true. but the standard library is in the end just a library, you could reimplement the "pre-io" patterns in the new std if you wanted.
lukaslalinsky 10/31/2025|||
What exactly makes it unpredictable? The functions in the interface have a fairly well defined meaning, take this input, run I/O operation and return results. Some implementation will suspend your code via user-space context switching, some implementation will just directly run the syscall. This is not different than approaches like the virtual thread API in Java, where you use the same APIs for I/O no matter the context. In Python world, before async/await, this was solves in gevent by monkey patching all the I/O functions in the standard library. This interface just abstracts that part out.
scuff3d 10/31/2025|||
I like Zig a lot, but something about this has been bothering me since it was announced. I can't put my finger on why, I honestly don't have a technical reason, but it just feels like the wrong direction to go.

Hopefully I'm wrong and it's wildly successful. Time will tell I guess.

whateveracct 10/31/2025|||
It's funny how this makes the Haskell IO type so clearly valuable. It is inherently async and the RTS makes it Just Work. Ofc there are dragons afoot always but mostly you just program and benefit.
geysersam 10/31/2025|||
> Every function with an IO interface cannot be reasoned about locally because of unexpected interactions with the io parameter input. This is particularly nasty when IO objects are shared across library boundaries.

Isn't this just as true of any function using io in any other language?

> As a library author, how do I handle an io object that doesn't behave as I expect?

But isn't that the point of having an interface? To specify how the io object can and can't behave.

lvass 10/30/2025|||
It's more about allowing a-library-fits-all than forcing it. You don't have to ask for io, you just should, if you are writing a library. You can even do it the Rust way and write different libraries for example for users who want or don't want async if you really want to.
hmmokidk 10/31/2025|||
First thought is that IO is just hard
mordnis 10/30/2025|||
Couldn't the same thing be said about functions that accept allocators?
butterisgood 10/30/2025|||
I start to want a Reader Monad stack for all the stuff I need to thread through all functions.
charlieflowers 10/31/2025||
Yeah, these kinds of "orthogonal" things that you want to set up "on the outside" and then have affect the "inner" code (like allocators, "io" in this case, and maybe also presence/absence of GC, etc.) all seem to cry out for something like Lisp dynamic variables.
pjmlp 10/31/2025||
A few languages have those, and I don't miss them, because in large codebases it becomes a pain to debug.

xBase, Clipper, Perl, Tcl upvars

int_19h 10/31/2025||
It depends on how you do it. XSLT 2.0 had <xsl:tunnel>, where you still had to declare them explicitly as function (well, template) parameters, just with a flag. No explicit control over levels, you just get the most recent one that someone has passed with <xsl:with-param tunnel="yes"> with the matching qualified name.

For something like Zig, it would make sense to go one step further and require them to be declared to be passed, i.e. no "tunneling" through interleaving non-Io functions. But it could still automatically match them e.g. by types, so that any argument of type Io, if marked with some keyword to indicate explicit propagation, would be automatically passed to a call that requires one.

gpderetta 10/31/2025||
that's basically implicit parameters, the typed, well behaved version of dynamic scoping.
int_19h 11/1/2025||
Yep, I believe that's what Scala called that.

And I think we need something like this to get people off globals.

metaltyphoon 10/30/2025|||
I don't think so because the result of calling an allocator is either you got memory or you don’t, while the IO here will be “it depends”
chotmat 10/31/2025||
I don't get it, what's the difference between "got or don't" vs "it depends"?
metaltyphoon 10/31/2025||
The allocator’s output is only two: either you get memory or you don’t. To quote GP

> Every function with an IO interface cannot be reasoned about locally because of unexpected interactions with the io parameter input

This means with IO interface is not quiet clear what WILL happen so it “depends”

hoppp 10/31/2025|||
I thought the same, that zig is too low level to have async implemented in the language. It's experimental and probably going to change
Zambyte 11/13/2025|||
What about the standard library? That's where it's being implemented.
anonymoushn 10/31/2025|||
fwiw i thought the previous async based on whole-program analysis and transformation to stackless coroutines was pretty sweet, and similar sorts of features ship in rust and C++ as well
hoppp 10/31/2025||
Both of those (C++ and Rust) ship language support for async but not runtimes.
PaulHoule 10/30/2025||
It seems to me that async io struggles whenever people try it.

For instance it is where Rust goes to die because it subverts the stack-based paradigm behind ownership. I used to find it was fun to write little applications like web servers in aio Python, particularly if message queues and websockets were involved, but for ordinary work you're better off using gunicorn. The trouble is that conventional async i/o solutions are all single threaded and in an age where it's common to have a 16 core machine on your desktop it makes no sense. It would be like starting a chess game dumping out all your pieces except for your King.

Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.

pron 10/30/2025||
> Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.

First, that would be Java and Go, not Java and .NET, as .NET offers a separate construct (async/await) for high-throughput concurrency.

Second, while "unfashionable" in some sense, I guess, it's no wonder that Java is many times popular than any "fashionable" language. Also, if "fashionable" means "much discussed on HN", then that has historically been a terrible predictor of language success. There's almost an inverse correlation between how much a language is discussed on HN and its long-term success, and that's not surprising, as it's the less commonplace things that are more interesting to talk about. HN is more Vogue magazine than the New York Times.

vanviegen 10/31/2025|||
Hacker News has mostly discussed JavaScript and TypeScript over the past 15 years. These languages do seem to have some long-term succes.
funflame 10/31/2025|||
JavaScript has success because it has a monopoly in the browser, anything you want to do there has to go through JavaScript, not because of any merit of the language.
pron 10/31/2025|||
I don't think so. Things built in those languages may have been discussed on HN, but the amount of discussion about those languages has not been proportional at all to their popularity.
pjmlp 10/31/2025||||
Kind of, in .NET you also have structured concurrency and dataflow, which are another way without having to explicitly write async/await.

Yes, sadly Java and .NET are unfashionable in circles like HN, and recent SaaS startups, I keep seeing products that only offer nodejs based SDKs, when they offer Java/.NET SDKs they are generaly always outdated verus the nodejs one.

CapsAdmin 10/31/2025|||
In this context I interpret unfashionable as boring/normal/works/good enough/predictable etc.
vlovich123 10/30/2025|||
> Unfashionable languages like Java and .NET that have quality multithreaded runtimes are the way to go because they provide a single paradigm to manage both concurrency and parallelism.

At the cost of not being able to actually provide the same throughput, latency, or memory usage that lower level languages that don't enforce the same performance pessimizing abstractions on everything can. Engineering is about tradeoffs but pretending like Java or .NET have solved this is naiive.

pron 10/30/2025|||
> At the cost of not being able to actually provide the same throughput, latency, or memory usage

Only memory usage is true with regards to Java in this context (.NET actually doesn't offer a shared thread abstraction; it's Java and Go that do), and even that is often misunderstood. Low-level languages are optimised for minimal memory usage, which is very important on RAM-constrained devices, but is could be wasting CPU on most machines: https://youtu.be/mLNFVNXbw7I

This optimisation for memory footprint also makes it harder for low-level languages to implement user-mode threading as efficiently as high-level languages.

Another matter is that there are two different use cases for asynchronous constructs that may tempt implementors to address them with a single implementation. One is the generator use case. What makes it special is that there are exactly two communicating parties, and both of their state may fit in the CPU cache. The other use case is general concurrency, primarily for IO. In that situation, a scheduler juggles a large number of user-mode threads, and because of that, there is likely a cache miss on every context switch, no matter how efficient it is. However, in the second case, almost all of the performance is due to Little's law rather than context switch time (see my explanation here: https://inside.java/2020/08/07/loom-performance/). That means that a "stackful" implementation of user-mode threads can have no significant performance penalty for the second use case (which, BTW, I think has much more value than the first), even though a more performant implementation is possible for the first use case. In Java we decided to tackle the second use case with virtual threads, and so far we've not offered something for the first (for which the demand is significantly lower). What happens in languages that choose to tackle both use cases with the same construct is that in the second and more important use case they gain no more than negligible performance (at best), but they're paying for that with a substantial degradation in user experience.

vlovich123 10/31/2025||
It sounds like you’re disagreeing yet no case is made that throughout and latency isn’t worse.

For example, the best frameworks on TechEmpower are all Rust, C and C++ with the best Java coming in at 25% slower on that microbenchmark. My point stands - it is generally true that well written rust/c/c++ outperforms well written Java and .Net and not just with lower memory usage. The “engineering effort per performance” maybe skews to Java but that’s different than absolute performance. With rust to me it’s also less clear if that is actually even true.

[1] https://www.techempower.com/benchmarks/#section=data-r23

pjmlp 10/31/2025|||
This kind of discussions are always a wasted effort, because in the end we are all using Electron based apps, and Python scripting for AI tools.

It doesn't matter to win benchmarks games if the customer doesn't get what they need, but runs at blazing speed.

CJefferson 10/31/2025||||
Honestly, if these languages are only winning by 25% in microbenchmarks, where I’d expect the difference to be biggest, that’s a strong boost for Java for me. I didn’t realise it was so close, and I hate async programming so I’m definitely not doing it for an, at most, 25% boost.
jmaker 11/1/2025|||
It’s not about the languages only, but also about runtimes and libraries. The vert.x vertices are reactive. Java devrel folks push everyone from reactive to virtual threads now. You won’t see it perform in that ballpark. If you look at the bottom of the benchmark results table, you’ll find Spring Boot (servlets and a bit higher Reactor), together with Django (Python). So “Java” in practice is different from niche Java. And if you look inside at the codebase, you’ll see the JVM options. In addition, they don’t directly publish CPU and memory utilization. You can extract it from the raw results, but it’s inconclusive.

This stops short of actually validating the benchmark payloads and hardware against your specific scenario.

vips7L 11/2/2025||
> So “Java” in practice is different from niche Java.

This is an odd take, especially when in the discussion of Rust. In practice when talking about projects using Rust as an http server backend is non-existent in comparison. Does that mean we just get to write off the Rust benchmarks?

Java performs, as shown by the benchmarks.

jmaker 11/2/2025||
I don’t understand what you’re saying. Typical Java is Spring Boot. Typical Rust is Axum and Actix. I don’t see why it would make sense to push the argument ad absurdum. Vert.x is not typical Java, its not easy to get it right. But Java the ecosystem profits from Netty in terms of performance, which does the best it can to avoid the JVM, the runtime system. And it’s not always about “HTTP servers” though that’s what that TechEmpower benchmark subject matter is about - frameworks, not just languages.

Your last sentence reads like an expression of faith. I’ll only remark that performance is relative to one’s project specs.

pron 11/3/2025||
In some of those benchmarks, Quarkus (which is very much "typical Java") beats Axum, and there's far more software being written in "niche Java" than in "typical Rust". As for Netty, it's "avoiding the JVM" (standard library, really) less now, and to the extent that it still does, it might not be working in its favour. E.g. we've been able to get better results with plain blocking code and virtual threads than with Netty, except in situations where Netty's codecs have optimisations done over many years, and could have been equally applied to ordinary Java blocking code (as I'm sure they will be in due time).
jmaker 11/3/2025||
Hey Ron, I’ve got deep respect for what you do and appreciate what you’re sharing, that’s definitely good to know. And I understand that many people take any benchmark as a validation for their beliefs. There are so many parameters that are glossed over at best. More interesting to me is the total cost of bringing that performance to production. If it’s some gibberish that takes a team of five a month to formulate and then costs extra CPU and RAM to execute, and then becomes another Perlesque incantation that no one can maintain, it’s not really a “typical” thing worth consideration, except where it’s necessary, scoped to a dedicated library, and the budget permits.

I don’t touch Quarkus anymore for a variety of issues. Yes, sometimes it’s Quarkus ahead, sometimes it’s Vert.x, from what I remember it’s usually bare Vert.x. It boils down to the benchmark iteration and runtime environment. In a gRPC benchmark, Akka took the crown in a multicore scenario - at a cost of two orders of magnitude more RAM and more CPU. Those are plausible baselines for a trivial payload.

By Netty avoiding the JVM I referred mostly to its off-heap memory management, not only the JDK APIs you guys deprecated.

I’m deeply ingrained in the Java world, but your internal benchmarks rarely translate well to my day-to-day observations. So I’m quite often a bit perplexed when I read your comments here and elsewhere or watch your talks. Without pretending I comprehended the JVM on a level comparable to yours, in my typical scenarios, I do quite often manage to get close to the throughput of my Rust and C++ implementations, albeit at a much higher CPU and memory cost. Latency & throughput at once is a different story though. I genuinely hope that one day Java will become a platform for more performance-oriented workloads, with less nondeterminism. I really appreciate your efforts toward introducing more consistency into JDK.

vlovich123 10/31/2025|||
I didn’t make the claim that it’s worth it. But when it is absolutely needed Java has no solution.

And remember, we’re talking about a very niche and specific I/O microbenchmark. Start looking at things like SIMD (currently - I know Java is working on it) or in general more compute bound and the gap will widen. Java still doesn’t yet have the tools to write really high performance code.

pron 10/31/2025|||
But it does. Java already gives you direct access sto SIMD, and the last major hurdle to 100% of hardware performance with idiomatic code, flattened structs, will be closed very soon. The gap has been closing steadily, and there's no sign of change in the trend. Actually, it's getting harder and harder to find cases where a gap exists at all.
pjmlp 10/31/2025|||
It is called JNI, or Panama nowadays.

Too many people go hard on must be 100% pure, meanwhile Python is taking over the AI world, via native library bindings.

pron 10/31/2025||||
First, in all benchmarks but two, Java performs just as well as C/C++/Rust, and in one of those two, Go performs as well as the low-level languages. Second, I don't know the details of that one benchmark where the low-level languages indeed perform better than high-level ones, but I don't see any reason to believe it has anything to do with virtual threads.

Modern Java GCs typically offer a boost over more manual memory management. And on latency, even if virtual were very inefficient and you'd add a GC pause with Java's new GCs, you'd still be well below 1ms, i.e. not a dominant factor in a networked program.

(Yes, there's still one cause for potential lower throughput in Java, which is the lack of inlined objects in arrays, but that will be addressed soon, and isn't a big factor in most server applications anyway or related to IO)

BTW, writing a program in C++ has always been more or less as easy as writing it in Java/C# etc.; the big cost of C++ is in evolution and refactoring over many years, because in low-level languages local changes to code have a much more global impact, and that has nothing to do with the design of the language but is an essential property of tracking memory management at the code level (unless you use smart pointers, i.e. a refcounting GC for everything, but then things will be really slow, as refcounting does sacrifice performance in its goal of minimising footprint).

jandrewrogers 10/31/2025|||
A 1-millisecond pause is an eternity. That’s disk access latencies. Unless your computation is completely and unavoidably dominated by slow network, that latency will have a large impact on performance.

Ironically, Java has okay performance for pure computation. Where it shows poorly is I/O intensive applications. Schedule quality, which a GC actively interferes with, has a much bigger impact on performance for I/O intensive applications than operation latency (which can be cheaply hidden).

pron 10/31/2025|||
> A 1-millisecond pause is an eternity

Who said anything about a 1ms pause? I said that even if virtual thread schedulers had terrible latencies (which they don't) and you added GC pauses, you'd still be well below 1ms, which is not an eternity in the context of network IO, which is what we're talking about here.

gpderetta 10/31/2025||
to be fair, 1ms is an eternity for network IO as well. Only over the internet is considered acceptable.
pron 10/31/2025||
It is not "an eternity". A roundtrip of 100-200us - which is closer to the actual GC pause time these days (remember, I said well below 1ms) - is considered quite good and is within the 1ms order of magnitude. Getting a <<1ms pause once every several seconds is not a significant impact to all but a few niche programs, and you may even get better throughput. OS-imposed hiccups (such as page faults or scheduler decisions) are about the same as those caused by today's Java GCs. Programs for which these are "an eternity" don't use regular kernels.
pjmlp 10/31/2025|||
Performance without a goal is wasted effort, sometimes that 1-millisecond matters, most of the time it doesn't, hence why everyone is using Web browsers with applications written in a dynamic language even worse GC pauses.
lkjdsklf 10/31/2025|||
Any gc pause is unacceptable if your goal is predictable throughput and latency

Modern gcs can be pauseless, but either way you’re spending CPU on gc and not servicing requests/customers.

As for c++, std::unique_ptr has no ref counting at all.

shared_ptr does, but that’s why you avoid it at all costs if you need to move things around. you only pay the cost when copying the shared_ptr itself, but you almost never need a shared_ptr and even when you need it, you can always avoid copying in the hot path

pron 10/31/2025||
> Modern gcs can be pauseless, but either way you’re spending CPU on gc and not servicing requests/customers.

Since memory is finite and all computation uses some, every program spends CPU on memory management regardless of technique. Tracing GCs often spend less CPU on memory management than low-level languages.

> std::unique_ptr has no ref counting at all.

It still needs to do work to free the memory. Tracing GCs don't. The whole point of tracing GCs is that they spend work on keeping objects alive, not on freeing memory. As the size of the working set is pretty much constant for a given program and the frequency of GC is the ratio of allocation rate (also constant) to heap size, you can arbitrarily reduce the amount of CPU spent on memory management by increasing the heap.

vips7L 10/31/2025|||
I honestly doubt any of the frameworks in that benchmark are using virtual threads yet. The top one is still using vert.x which is an event loop on native platform threads.
pjmlp 10/31/2025|||
What matters is if it is good enough for project acceptance criteria.
josephg 10/31/2025|||
> It seems to me that async io struggles whenever people try it.

Promises work great in javascript, either in the browser or in node/bun. They're easy to use, and easy to reason about (once you understand them). And the language has plenty of features for using them in lots of ways - for example, Promise.all(), "for await" loops, async generators and so on. I love this stuff. Its fast, simple to use and easy to reason about (once you understand it).

Personally I've always thought the "function coloring problem" was overstated. I'm happy to have some codepaths which are async and some which aren't. Mixing sync and async code willy nilly is a code smell.

Personally I'd be happy to see more explicit effects (function colors) in my languages. For example, I'd like to be able to mark which functions can't panic. Or effects for non-divergence, or capability safety, and so on.

nine_k 10/31/2025|||
Promises in JS are particularly easy because JS is single-threaded. You can be certain that your execution flow won't be preepted at an arbitrary point. This greatly reduces the need for locks, atomics, etc.
o11c 10/31/2025||
Also task-local variables, which almost all systems other than C-level threads basically give up on despite being widely demanded.
int_19h 10/31/2025||
.NET has had task-local vars for about a decade now: https://learn.microsoft.com/en-us/dotnet/api/system.threadin...

Python added them in 3.7: https://docs.python.org/3/library/contextvars.html

o11c 10/31/2025||
I'll admit to unfamiliarity with the .NET version, but for Python even `threading.local` is a useless implementation if you care at all about performance.

Performant thread-local variables require ahead-of-time mapping to a 1-or-2-level integer sequence with a register to quickly the base array, and some kind of trap to handle the "not allocated" case. Task-local variables are worse than thread-locals since they are swapped out much more frequently.

This requires special compiler support, not being a mere library.

int_19h 10/31/2025||
I would argue that if you're using Python, you already don't care about performance (unless it's just a little glue between other things).

In .NET they do virtual dispatch via a very basic map-like interface that has a bunch of micro-optimized implementations that are swapped in and out as needed if new items are added. For N up to 4 variables, they use a dedicated implementation that stores them as fields and does simple branching to access the right one, for each N. Beyond that it becomes an array, and at some point, a proper Dictionary. I don't know the exact perf characteristics, but FWIW I don't recall that ever being a source of an actual, non-hypothetical perf problem. Usually you'll have one local that is an object with a bunch of fields, so you only need one lookup to fetch that, and from there it's as fast as field access.

MangoToupe 10/31/2025|||
> Promises work great in javascript, either in the browser or in node/bun.

I can't disagree more. They suffer from the same stuff rust async does: they mess with the stack trace and obscure the actual guarantees of the function you're calling (eg a function returning a promise can still block, or the promise might never resolve at all).

Personally I think all solutions will come with tradeoffs; you can simply learn them well enough to be productive anyway. But you don't need language-level support for that.

josephg 10/31/2025||
> I can't disagree more. They suffer from the same stuff rust async does: they mess with the stack trace and obscure the actual guarantees of the function you're calling (eg a function returning a promise can still block, or the promise might never resolve at all).

These are inconveniences, but not show stoppers. Modern JS engines can "see through" async call stacks. Yes, bugs can result in programs that hang - but that's true in synchronous code too.

But async in rust is way worse:

- Compilation times are horrible. An async hello world in javascript starts instantly. In rust I need to compile and link to tokio or something. Takes ages.

- Rust doesn't have async iterators or async generators. (Or generators in any form.) Rust has no built in way to create or use async streams.

- Rust has 2 different ways to implement futures: the async keyword and impl Future. You need to learn both, because some code is impossible to write with the async keyword. And some code is impossible to write with impl Future. Its incredibly confusing and complicated and its difficult to learn it properly.

- Rust doesn't have a built in run loop ("executor"). So - best case - your project pulls in tokio or something, which is an entire kitchen sink and all your dependencies use that. Worst case, the libraries you want to use are written for different async executors and ??? shoot me. In JS, everything just works out of the box.

I love rust. But async rust makes async javascript seem simple and beautiful.

MangoToupe 10/31/2025|||
I stand by my assessment. You seem to simply see javascript as better because the tradeoffs are easier to internalize, in part because it can't (and doesn't try) to tackle the generalizations of async code that rust does.

> Modern JS engines can "see through" async call stacks.

I did not know that. I'll have to figure out how this works and what it looks like.

> Rust doesn't have async iterators or async generators. (Or generators in any form.) Rust has no built in way to create or use async streams.

This is not necessary. Library-level streams work just fine. Perhaps a "yield" keyword and associated compiler/runtime support would simplify this code, but this is not really a restriction for people willing to learn the libraries.

Rust has many issues, and so does its async keyword, but javascript is only obviously better if you want to use the tradeoffs javascript offers: an implicit and unchangeable async runtime that doesn't offer parallelism and relies on a jit interpreter. If you have cpu-bound code, or you want to ship a statically-compiled binary (or an embeddable library), this is not a good set of tradeoffs.

I find rust's tradeoffs to be worth the benefits—i literally do not care about compilation time and I internalized the type constraints many years ago—and I find the pain of javascript's runtime constraints to be not worth its simplicity or "beauty", although I admit I simply do not view code aesthetically. Perhaps we just prefer to tackle differently-shaped problems.

josephg 10/31/2025|||
> javascript is only obviously better if you want to use the tradeoffs javascript offers: an implicit and unchangeable async runtime that doesn't offer parallelism and relies on a jit interpreter.

Yes - I certainly wouldn’t use JavaScript to compile and ship binaries to end users. But as an application developer, i think the tradeoffs it makes are pretty great. I want fast iteration (check!). I want all libraries in the ecosystem to just work and interoperate out of the box (check!). And I want to be able to just express my software using futures without worrying I’m holding them wrong.

Even in systems software I don’t know if I want to be picking my own future executor. It’s like, the string type in basically every language is part of the standard library because it makes interoperability easy. I wish future executors in rust were in std for the same reason - so we could stop arguing about it and just get back to writing code.

MangoToupe 10/31/2025||
> And I want to be able to just express my software using futures without worrying I’m holding them wrong.

Well, there you go: you just happen to want to build stuff that javascript is good for. If you wanted to express different software you'd prefer a different language. But not everyone wants to write io-bound web services.

int_19h 10/31/2025|||
> I did not know that. I'll have to figure out how this works and what it looks like.

They basically stitch together a dummy async stack based on causality chain. It's not really a stack anymore since you can have a bunch of tasks interleaved on it which has to be shown somehow, but it's still nice.

It's also not JS specific. .NET has the same async model (despite also having multithreaded concurrency), and it also has similar debugger support. Not just linearized async stacks, but also the ability to diagram them etc.

https://learn.microsoft.com/en-us/visualstudio/debugger/walk...

And in profiler as well, not just the debugger. So it's entirely a tooling issue, and part of the problem is that JS ecosystem has been lagging behind on this.

tcfhgj 10/31/2025|||
Aren't streams async iterators?

generators, at least, are available on nightly.

josephg 10/31/2025||
Yeah, generators have been available on nightly for 8 years or something. They're clearly stable enough that async is built on top of the generator infrastructure within the compiler.

But I haven’t heard anything about them ever moving to stable. Here’s to another 8 years!

erichocean 10/30/2025|||
Project Loom makes Java in particular really nice, virtual threads can "block" without blocking the underlying OS thread. No callbacks at all, and you can even use Structured Concurrency to implement all sorts of Go- and Erlang-like patterns.

(I use it from Clojure, where it pairs great with the "thread" version of core.async (i.e. Go-style) channels.)

seabrookmx 10/31/2025|||
> but for ordinary work you're better off using gunicorn

I'd like to see some evidence for this. Other than simplicity, IMO there's very little reason to use synchronous Python for a web server these days. Streaming files, websockets, etc. are all areas where asyncio is almost a necessity (in the past you might have used twisted), to say nothing of the performance advantage for typical CRUD workloads. The developer ergonomics are also much better if you have to talk to multiple downstream services or perform actions outside of the request context. Needing to manage a thread pool for this or defer to a system like Celery is a ton more code (and infrastructure, typically).

> async i/o solutions are all single threaded

And your typical gunicorn web server is single threaded as well. Yes you can spin up more workers (processes), but you can also do that with an asgi server and get significantly higher performance per process / for the same memory footprint. You can even use uvicorn as a gunicorn worker type and continue to use it as your process supervisor, though if you're using something like Kubernetes that's not really necessary.

ddorian43 10/31/2025|||
Maybe he meant gevent? Which is better than async io in python.
seabrookmx 10/31/2025||
Agree to disagree. Monkey patching the stdlib is a terrible hack and having to debug non-trivial gevent apps is a nightmare (not that asyncio apps are perfect either).
no_flaks_given 10/31/2025|||
Not many use cases actually need websockets. We're still building new shit in sync python and avoiding the complexity of all the other bullshit
brandonbloom 10/30/2025|||
If you watched the video closely, you'll have noticed that this design parameterizes the code by an `io` interface, which enables pluggable implementations. Correctly written code in this style can work transparently with evented or threaded runtimes.
PaulHoule 10/30/2025||
Really? Ordinary synchronous code calls an I/O routine which returns. Asynchronous code calls an I/O routine and then gets called back. That’s a fundamental difference and you can only square it by making the synchronous code look like asynchronous code (callback gets called right away) or asynchronous code look like synchronous code (something like async python which breaks up a subroutine into multiple subroutines and has an event loop manage who calls who.
int_19h 10/31/2025||
OP didn't say that code looks like ordinary sync code, only that it's possible to write code that works equally well for both sync and async. If you RTFA, it looks like this:

  var a = io.async(doWork, .{ io, "hard" });
  ...
  a.await(io);
If your `io` is async, this behaves like an async call returning a promise and then awaiting said promise. If `io` is sync then `io.async` will make the call immediately and `await` is a no-op.
hinkley 10/30/2025|||
I know that it depends on how much you disentangle your network code from your business logic. The question is the degree. Is it enough, or does it just dull the pain?

If you give your business logic the complete message or send it a stream, then the flow of ownership stays much cleaner. And the unit tests stay substantially easier to write and more importantly, to maintain.

I know too many devs who don't see when they bias their decisions to avoid making changes that will lead to conflict with bad unit tests and declare that our testing strategy is Just Fine. It's easier to show than to debate, but it still takes an open mind to accept the demonstration.

andrewmcwatters 10/30/2025||
I haven’t actually seen it in the wild yet, just talked about in technical talks from engineers at different studios, but I’m interested in designs in which there isn’t a traditional main thread anymore.

Instead, everything is a job, and even what is considered the main thread is no longer an orchestration thread, but just another worker after some nominal set up between scaffolding enough threads, usually minus one, to all serve as lockless, work-stealing worker threads.

Conventional async programming relies too heavily on a critical main thread.

I think it’s been so successful though, that unfortunately we’ll be stuck with it for much longer than some of us will want.

It reminds me of how many years of inefficient programming we have been stuck with because cache-unfriendly traditional object-oriented programming was so successful.

tele_ski 10/31/2025||
This sounds like running an event loop per thread instead of 1 event loop with a backing thread pool. Or am I misunderstanding you?

It works great for small tasks but larger tasks block local events and you can get weird latency issues, that was the major tradeoff I ran into when I used it. Works great if your tasks are tiny though, not having the event loop handoff to the worker thread is a good throughput boost. But then we started having latency issues and we introduced larger tasks which would hang the local event loop from getting those events.

I think Scylladb works somewhat like this but does message passing to put certain data on certain threads so any thread can handle incoming events but it still moves the request to the pinned thread the data lives on. One thread can get overwhelmed if your data isn't well distributed.

andyferris 10/30/2025||
Personally I find this really cool.

One thing I like about the design is it locks in some of the "platforms" concepts seen in other languages (e.g. Roc), but in a way that goes with Zig's "no hidden control flow" mantra.

The downstream effect is that it will be normal to create your own non-posix analogue of `io` for wherever you want code to hook into. Writing a game engine? Let users interact with a set of effectful functions you inject into their scripts.

As a "platform" writer (like the game engine), essentially you get to create a sandbox. The missing piece may be controlling access to calling arbitrary extern C functions - possibly that capability would need to be provided by `io` to create a fool-proof guarantees about what some code you call does. (The debug printing is another uncontrolled effect).

pyrolistical 10/30/2025||
Oh boy. Example 7 is a bit of a mindfuck. You get the returned string in both in await and cancel.

Feels like this violates zig “no hidden control flow” principle. I kinda see how it doesn’t. But it sure feels like a violation. But I also don’t see how they can retain the spirit of the principle with async code.

rdtsc 10/30/2025||
> Feels like this violates zig “no hidden control flow” principle.

A hot take here is that the whole async thing is a hidden control flow. Some people noticed that ever since plain callbacks were touted as a "webscale" way to do concurrency. The sequence of callbacks being executed or canceled forms a hidden, implicit control flow running concurrently with the main control logic. It can be harder to debug and manage than threads.

But that said, unless, Zig adds a runtime with its own scheduler and turns into a bytecode VM, there is not much it can do. Co-routines and green threads have been done before in C and C-like languages, but not sure how easily the would fit with Zig and its philosophy.

throwawaymaths 10/30/2025|||
hidden control flow means no control flow occurs outside of function boundaries, keywords, short circuiting operators, or builtins. i believe there is a plan for a asyncresume and asyncsuspend builtins that show the actual sites where control flow happens.
int_19h 10/31/2025|||
They have plans for multiple implementations including green threads and stackless.
rdtsc 10/31/2025||
That's pretty neat. Thanks for pointing it out.

The abstraction on top still async based and I agree it makes sense for Zig. But in general I don't like that abstraction. I like it when it's flipped around -- the abstraction is process/thread/green thread-like and sending messages or events around. Underneath it may involved having a few IO pollers with select/epoll/io_uring, a thread pool, etc. But the top level API doesn't handle promises, futures, callback, deferreds, etc. I am thinking of Go's goroutines, BEAM VM processes, or even plain threads or processes talking over the network or locally via queue.

HacklesRaised 10/31/2025||
hard for me to argue with your point but if the understanding is that cancellation causes early return of the function, then i suppose the signature is, eerrr, consistent?
travisgriggs 10/31/2025||
I deal with async function coloring in swift and Kotlin. Have avoided it (somehow) in our Python codebase. And in Elixir, I do things on separate processes all the time, but never feel like I’m wrestling with function coloring. I do like Zig (what little I’ve played with), but continue to wish that for concurrent style computation, people would just use BEAM based languages.
mgdev 10/31/2025|
Hear hear. Elixir is a dream for this kind of stuff. But it requires very different decisions "all the way down" to make it work outside of BEAM. And BEAM itself feels heavy to most systems devs.

(IMO it's not for many use cases, and to the extent it is I'm happy to see things like AtomVM start to address it.)

I'm just happy I can use Elixir + Zig for NIFs.

travisgriggs 10/31/2025||
Indeed. Zigler is tres cool.
comex 10/30/2025||
It's worth noting that this is not async/await in the sense of essentially every other language that uses those terms.

In other languages, when the compiler sees an async function, it compiles it into a state machine or 'coroutine', where the function can suspend itself at designated points marked with `await`, and be resumed later.

In Zig, the compiler used to support coroutines but this was removed. In the new design, `async` and `await` are just functions. In the threaded implementation used in the demo, `await` just blocks the thread until the operation is done.

To be fair, the bottom of the post explains that there are two other Io implementations being planned.

One of them is "stackless coroutines", which would be similar to traditional async/await. However, from the discussion so far this seems a bit like vaporware. As discussed in [1], andrewrk explicitly rejected the idea of just (re-)adding normal async/await keywords, and instead wants a different design, as tracked in issue 23446. But in issue 23446 the seems to be zero agreement on how the feature would work, how it would improve on traditional async/await, or how it would avoid function coloring.

The other implementation being planned is "stackful coroutines". From what I can tell, this has more of a plan and is more promising, but there are significant unknowns.

The basis of the design is similar to green threads or fibers. Low-level code generation would be identical to normal synchronous code, with no state machine transform. Instead, a library would implement suspension by swapping out the native register state and stack, just like the OS kernel does when switching between OS threads. By itself, this has been implemented many times before, in libraries for C and in the runtimes of languages like Go. But it has the key limitation that you don't know how much stack to allocate. If you allocate too much stack in advance, you end up being not much cheaper than OS threads; but if you allocate too little stack, you can easily hit stack overflow. Go addresses this by allocating chunks of stack on demand, but that still imposes a cost and a dependency on dynamic allocation.

andrewrk proposes [2] to instead have the compiler calculate the maximum amount of native stack needed by a function and all its callees. In this case, the stack could be sized exactly to fit. In some sense this is similar to async in Rust, where the compiler calculates the size of async function objects based on the amount of state the function and its callees need to store during suspension. But the Zig approach would apply to all function calls rather than treating async as a separate case. As a result, the benefits would extend beyond memory usage in async code. The compiler would statically guarantee the absence of stack overflow, which benefits reliability in all code that uses the feature. This would be particularly useful in embedded where, typically, reliability demands are high and memory available is low. Right now in embedded, people sometimes use a GCC feature ("-fstack-usage") that does a similar calculation, but it's messy enough that people often don't bother. So it would be cool to have this as a first-class feature in Zig.

But.

There's a reason that stack usage calculators are uncommon. If you want to statically bound stack usage:

First, you have to ban recursion, or else add some kind of language mechanism for tracking how many times a function can possibly recurse. Banning recursion is common in embedded code but would be rather annoying for most codebases. Tracking recursion is definitely possible, as shown by proof languages like Agda or Coq that make you prove termination of recursive functions - but those languages have a lot of tools that 'normal' languages don't, so it's unclear how ergonomic such a feature could be in Zig. The issue [2] doesn't have much concrete discussion on how it would work.

Second, you have to ban dynamic calls (i.e. calls to function pointers), because if you don't know what function you're calling, you don't know how much stack it will use. This has been the subject of more concrete design in [3] which proposes a "restricted" function pointer type that can only refer to a statically known set of functions. However, it remains to be seen how ergonomic and composable this will be.

Zooming back out:

Personally, I'm glad that Zig is willing to experiment with these things rather than just copying the same async/await feature as every other language. There is real untapped potential out there. On the other hand, it seems a little early to claim victory, when all that works today is a thread-based I/O library that happens to have "async" and "await" in its function names.

Heck, it seems early to finalize an I/O library design if you don't even know how the fancy high-performance implementations will work. Though to be fair, many applications will get away just fine with threaded I/O, and it's nice to see a modern I/O library design that embraces that as a serious option.

[1] https://github.com/ziglang/zig/issues/6025#issuecomment-3072...

[2] https://github.com/ziglang/zig/issues/157

[3] https://github.com/ziglang/zig/issues/23367

wahern 10/30/2025||
> But it has the key limitation that you don't know how much stack to allocate. If you allocate too much stack in advance, you end up being not much cheaper than OS threads; but if you allocate too little stack, you can easily hit stack overflow.

With a 64-bit address space you can reserve large contiguous chunks (e.g. 2MB), while only allocating the minimum necessary for the optimistic case. The real problem isn't memory usage, per se, it's all the VMA manipulation and noise. In particular, setting up guard pages requires a separate VMA region for each guard (usually two per stack, above and below). Linux recently got a new madvise feature, MADV_GUARD_INSTALL/MADV_GUARD_REMOVE, which lets you add cheap guard pages without installing a distinct, separate guard page. (https://lwn.net/Articles/1011366/) This is the type of feature that could be used to improve the overhead of stackful coroutines/fibers. In theory fibers should be able to outperform explicit async/await code, because in the non-recursive, non-dynamic call case a fiber's stack can be stack-allocated by the caller, thus being no more costly than allocating a similar async/await call frame, yet in the recursive and dynamic call cases you can avoid dynamic frame bouncing, which in the majority of situations is unnecessary--the poor performance of dynamic frame allocation/deallocation in deep dynamic call chains is the reason Go switched from segmented stacks to moveable stacks.

Another major cost of fibers/thread is context switching--most existing solutions save and restore all registers. But for coroutines (stackless or stackful), there's no need to do this. See, e.g., https://photonlibos.github.io/blog/stackful-coroutine-made-f..., which tweaked clang to erase this cost and bring it line with normal function calls.

> Go addresses this by allocating chunks of stack on demand, but that still imposes a cost and a dependency on dynamic allocation.

The dynamic allocation problem exists the same whether using stackless coroutines, stackful coroutines, etc. Fundamentally, async/await in Rust is just creating a linked-list of call frames, like some mainframes do/did. How many Rust users manually OOM check Boxed dyn coroutine creation? Handling dynamic stack growth is technically a problem even in C, it's just that without exceptions and thread-scoped signal handlers there's no easy way to handle overflow so few people bother. (Heck, few even bother on Windows where it's much easier with SEH.) But these are fixable problems, it just requires coordination up-and-down the OS stack and across toolchains. The inability to coordinate these solutions does not turn ugly compromises (async/await) into cool features.

> First, you have to ban recursion, or else add some kind of language mechanism for tracking how many times a function can possibly recurse. [snip] > > Second, you have to ban dynamic calls (i.e. calls to function pointers)

Both of which are the case for async/await in Rust; you have to explicitly Box any async call that Rust can't statically size. We might frame this as being transparent and consistent, except it's not actually consistent because we don't treat "ordinary", non-async calls this way, which still use the traditional contiguous stack that on overflow kills the program. Nobody wants that much consistency (too much of a "good" thing?) because treating each and every call as async, with all the explicit management that would entail with the current semantics would be an indefensible nightmare for the vast majority of use cases.

throwawaymaths 10/31/2025||
> If you allocate too much stack in advance, you end up being not much cheaper than OS threads;

Maybe. a smart event loop could track how many frames are in flight at any given time and reuse preallocated frames when their frames dispatch out.

int_19h 10/31/2025|||
Regarding recursive functions, would it really be that annoying? We kinda take the ability to recurse for granted, yet it is rarely used in practice, and often when it happens it's unintentional and a source of bugs (due to unforeseen re-entrancy). Intuitively it feels that if `recursive` was a required modifier when declaring intentionally recursive functions, like in Fortran, it wouldn't actually be used all that much. Most functions don't need to call via function pointers, either.

Being explicit about it might also allow for some interesting compiler optimizations across shared library boundaries...

pklausler 10/31/2025||
Fortran is recursive by default.
SkiFire13 10/30/2025|||
> tracking how many times a function can possibly recurse.

> Tracking recursion is definitely possible, as shown by proof languages like Agda or Coq that make you prove termination of recursive functions

Proof languages don't really track how many times a function can possibly recurse, they only care that it will eventually terminate. The amount of recursive steps can even easily depend on the inputs, making it unknown at the moment a function is defined.

tines 10/31/2025||
Right. They use rules like "a function body must destructure its input and cannot use a constructor" which implies that, since input can't be infinite, the function will terminate. That doesn't mean that the recursion depth is known before running.
SkiFire13 10/31/2025||
The actual rule is more refined than that and doesn't prevent you from using constructors in the body or recursing without deconstructing any input, it just needs to prove the new inputs are "smaller" according to a fixed well-ordered relation. This is most often related to the structure of the input but is not required. For example I can have f(5) recurse into f(6) if that's "smaller" according to my custom relation, but no relation will allow me to continue increasing the argument forever.
int_19h 10/31/2025||
This blog post had a better explanation than the one linked from the story IMO:

https://kristoff.it/blog/zig-new-async-io/

SV_BubbleTime 10/31/2025||
If I know anything about anything it is that a new language debuting a new async/await plan will be well received by casual and expert alike.
ozgrakkurt 10/30/2025||
Really really hope they make it easy and ergonomic to integrate with single threaded cooperative scheduling paradigm like seastar or glommio.

I wrote a library that I use for this but it would be really nice to be able to cleanly integrate it into async/await.

https://github.com/steelcake/csio

lukaslalinsky 10/31/2025|
I wrote a library that does single threaded cooperative scheduling and async I/O and from the ground up, it was designed to implement this interface.

https://github.com/lalinsky/zio

SkiFire13 10/31/2025||
That's quite an unfortunate name given there's already a famous async framework named ZIO for Scala

https://github.com/zio/zio

lukaslalinsky 10/31/2025||
I knew about this, but I think the language scope is so different that it's OK.
miki123211 10/31/2025||
How does downstream code know which kind of asynchrony is requested / allowed?

Assume some code is trying to blink an LED, which can only be done with the led_on and led_off system calls. Those system calls block until they get an ack from the LED controller, which, in the worst case, will timeout after 10s if the controller is broken.

In e.g. Python, my function is either sync or async, if it's async, I know I have to go through the rigamarole of accessing the event loop and scheduling the blocking syscall on a background thread. If it's sync, I know I'm allowed to block, but I can't assume an event loop exists.

In Zig, how would something like this be accomplished (assuming led_on and led_off aren't operations natively supported by the IO interface?)

ivanjermakov 10/31/2025|
> How does downstream code know which kind of asynchrony is requested / allowed?

Certainly not from the function signatures, so documentation is the only way.

Seems like properly-written downstream code would deadlock if given an IO implementation that does not support concurrency. If that's true, to me this looks like a bad IO abstraction.

jaspday 10/31/2025||
> Seems like properly written downstream code would deadlock if given an IO implementation that does not support concurrency.

No, it would fail with error.ConcurrencyUnavailable. See example 10 from TFA.

More to the point, this implementation decouples asynchrony from concurrency. In GP's example of a microcontroller blinking an LED, there's no explicit requirement for concurrency, only asynchrony. That is, it would be nice to be able to do other things while we wait for the LED controller to potentially time out, but it does not fundamentally change the application logic if the program blocks until the function completes. If you use io.async to run the call, then it will do exactly that: run other code while waiting for the controller if a concurrent Io implementation is passed, and block until the controller returns if a synchronous Io implementation is passed.

Functions that require concurrency would, however, need to be documented. Several people have proposed adding a feature like Rust traits or C++ concepts to Zig that would enable you to annotate type signatures to indicate which features need to be available for a generic substitution to succeed, but until such a thing exists, you're pretty much best off reading the source itself. In practice, most code probably doesn't require concurrency, and library authors are free to sprinkle asynchrony everywhere without worrying about whether or not their users' Io implementation will actually support concurrency.

geon 10/31/2025|
So a cancelable function must poll the `cancelRequested` function, and return the error `Canceled`, right?

https://github.com/ziglang/zig/blob/master/lib/std/Io.zig#L1...

https://github.com/ziglang/zig/blob/master/lib/std/Io.zig#L7...

AndyKelley 11/2/2025|
In the vast majority of cases, cancellation will be handled transparently by virtue of `try` being commonly used. The thing that takes relatively longer to do is I/O operations, and those will now return error.Canceled when requested.

Polling cancelRequested is generally a bad idea since it introduces overhead, but you could do it to introduce cancellation points into long-running CPU tasks.

More comments...