Posted by zdw 3 days ago
Not really. The author provides Go as evidence, but Go's CSP-based approach far predates the popularity of async/await. Meanwhile, Zig's approach still has function coloring, it's just that one color is "I/O function" and the other is "non-I/O function". And this isn't a problem! Function coloring is fine in many contexts, especially in languages that seek to give the user low-level control! I feel like I'm taking crazy pills every time people harp about function coloring as though it were something deplorable. It's just a bad way of talking about effect systems, which are extremely useful. And sure, if you want to have a high-level managed language like Go with an intrusive runtime, then you can build an abstraction that dynamically papers over the difference at some runtime cost (this is probably the uniformly correct choice for high-level languages, like dynamic or scripting languages (although it must be said that Go's approach to concurrency in general leaves much to be desired (I'm begging people to learn about structured concurrency))).
> Function signatures don’t change based on how they’re scheduled, and async/await become library functions rather than language keywords.
The functions have the same calling conventions regardless of IO implementation. Functions return data and not promises, callbacks, or futures. Dependency injection is not function coloring.
Certainly it’s true that Go invented neither, both Erlang and Haskell had truly parallel green threads without function coloring before Go or Node existed.
You could imagine a programming language that expressed “comptime” as a function argument of a type that is only constructible at compile-time. And one for runtime as well, and then functions that can do both can take the sum type “comptime | runtime”.
Java had green threads in 1997, removed them in 2000 and brought them back properly now as virtual threads.
I'm kinda glad they've sat out the async mania, with virtual threads/goroutines, the async stuff just feels like lipstick on a pig. Debugging, stacktrackes etc. are just jumbled.
Like their purpose/implementation everything is just so different, they don't share anything at all.
And for my money I prefer async/await to the structured concurrency stuff..
There are also related discussions on other platforms that are worthy to read.
A while back I just started leaning in. I write a lot of Python at work, and anytime I have to use a library that's relies on asyncio, I just write the entire damn app as an asynchronous one. Makes function coloring a non-issue. If I'm in a situation where the two have to coexist, the async runtime gets its own thread and communication back and forth is handled at specific boundaries.
Yes, having to rewrite literally all of your code because you need to use an async function somewhere is an issue.
An even bigger issue is that now you have two (incompatible!) versions of literally every library dependency.
It's typically less than a hundred kilobytes and (on the systems I've benchmarked using std::thread) it takes 60usec (wall time in userspace) to create and destroy a thread.
Threads have gotten so fast that paying the async function coloring price makes very little sense for most software.
Now function colouring is interesting but not for the reason these articles get excited. Recolouring is easy and has basically no impact on code maintenance. BUT if you need that code path to really fly then marking it as async is a killer, as all those tiny little promises add tiny delays in the form of many tasks. Which add up to performance problems on hot code paths. This is particularly frustrating if functions are sometimes async, like lazy loaders or similar cache things. To get around this you can either use callbacks instead or use selective promise chaining to only use promises when you get a promise. Both strategies can be messy and trip up people who don’t understand these careful design decisions.
One other fun thing is indexeddb plays terribly with promises, as it uses a “transactions close at end of task” mechanism, making certain common patterns impossible with promises due to how they behave with the task system. Although some API designers have come up with ways around this to give you promise interfaces for databases. Normally by using callbacks internally and only doing one operation per transaction.
This is a solved problem in C#. You can use ValueTask<T> instead of Task<T> and no promise will be allocated if it never awaits.
Is this because the functions are async or is that because most of the time async is used for things that are I/O like and therefore susceptible to these kinds of delays?
That depends on the language/framework. In some languages, `await foo()` is equivalent to `Future f = foo(); await f`. In others (e.g. Python), it's a primitive operation and you have to use a different syntax if you want to create a future/task. In Trio (an excellent Python alternative to asyncio), there isn't even the concept of a future at all!
However, in embedded rust async functions are amazing! Combine it with a scheduler like rtic or embassy, and now hardware abstractions are completely taken care of. Serial port? Just two layers of abstraction and you have a DMA system that shoves bytes out UART as fast as you can create them. And your terminal thread will only occupy as much time as it needs to generate the bytes and spit them out, no spin locking or waiting for a status register to report ready.
The other advantage is a rough classification in the type system. Not marking a function as async means that the author believes it can be run in a reasonable amount of time and is safe to run eg. on a UI main thread. In that sense, the propagation through the call hierarchy is a feature, not a bug.
I can see that maintaining multiple versions of a function is annoying for library authors, but on the other hand, functions like fs.readSync shouldn’t even exist. Other code could be running on this thread, so it's not acceptable to just freeze it arbitrarily.
Saying that fs.readSync shouldn't exist is really weird. Not all code written benefits from async nor even requires it. Running single threaded, sync programs is totally valid.
In a good API design, you should exposed functions that each do one thing and can easily be composed together. The 'readSync' function doesn't meet that requirement, so it's arguably not necessary - it would be better to expose two separate functions.
This was not a big issue when computers only had a single processor or if the OS relied on cooperative multi-threading to perform I/O. But these days the OS and disk can both run in parallel to your program so the requirement to block when you read is a design wart we shouldn't have to live with.
i don't see it as very useful or elegant to integrate any form for parallelism or concurrency into every imaginable api. depends on context of course. but generalized, just no. if a kind of io takes a microsecond, why bother.
No, it tells the OS "schedule the current thread to wake up when the data read task is completed".
Having to implement that with other OS primitives is a) complex and error-prone, and b) not atomic.
Maybe, but is it useful to have sync options?
You can still write single threaded programs
Sync options are useful. If everything is on the net probably less so. But if you have a couple of 1ms io ops that you want to get done asap, it's better to get them done asap.
But in ordinary JS there just can't be a race condition, everything is single threaded.
Two async functions doing so is not a data race either.
Serially, completely synchronously overwriting values is none of these categories though.
Concurrency is needed for race conditions, parallelism is needed for data races. Many single threaded runtimes including JS have concurrency, and hence the potential for race conditions, but don't have parallelism and hence no data races.
Why is reserving a megabyte of stack space "expensive"?
> and takes roughly a millisecond to create
I'm not sure where this number is from, it seems off by a few orders of magnitude. On Linux, thread creation is closer to 10 microseconds.
Because if you use one thread for each of your 10,000 idle sockets you will use 10GB to do nothing.
So you'll want to use a better architecture such as a thread pool.
And if you want your better architecture to be generic and ergonomic, you'll end up with async or green threads.
1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.
2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.
My example (and the c10k problem) is 10k concurrent connections, not 10k concurrent requests.
> 2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.
Yes, and that's both memory and cpu usage that isn't needed when using a better concurrency model. That's why no high-performance server software use a huge amount of threads, and many use the reactor pattern.
No, it literally is not. The "memory" is just entries in a page table in the kernel and MMU. It shouldn't worry you at all.
Nor is the CPU used by the kernel to manage those threads going to be necessarily less efficient than someone's handrolled async runtime. In fact given it gets more eyes... likely more.
The sole argument I can see is just avoiding a handful of syscalls and excessive crossing of the kernel<->userspace brain blood barrier too much.
I've written massively concurrent systems where each connection only handled maybe a few kilobytes of data.
Async io is a massive win in those situations.
This describes many rest endpoints. Fetch a few rows from a DB, return some JSON.
You don't pay for stack space you don't use unless you disable overcommit. And if you disable overcommit on modern linux the machine will very quickly stop functioning.
It’s a programming model that has some really risky drawbacks.
Operating systems can shrink the memory usage of a stack.
madvise(page, size, MADV_DONTNEED);
Leaves the memory mapping intact but the kernel frees underlying resources. Subsequent accesses get either new zero pages or the original file's pages.Linux also supports mremap, which is essentially a kernel version of realloc. Supports growing and shrinking memory mappings.
stack = mremap(stack, old_size, old_size / 2, MREMAP_MAYMOVE, 0);
Whether existing systems make use of this is another matter entirely. My language uses mremap for growth and shrinkage of stacks. C programs can't do it because pointers to stack allocated objects may exist.They also say:
>The system spends time managing threads that could be better spent doing useful work.
What do they think the async runtime in their language is doing? It's literally doing the same thing the kernel would be doing. There's nothing that intrinsically makes scheduling 10k couroutines in userspace more efficient than the kernel scheduling 10k threads. Context switches are really only expensive when the switch is happening between different processes, the overhead of a context switch on a CPU between two threads in the same process is very small (and they're not free when done in userspace anyway).
There are advantages to doing scheduling in the kernel and there are advantages to doing scheduling in userspace, but this article doesn't really touch on any of the actual pros and cons here, it just assumes that userspace scheduling is automatically more efficient.
I feel like we're now, what, 20, 25 years on and people still haven't adjusted themselves to the fact that the machines we have now are multicore, have boatloads of cache, or how that cache is shared (or not) between cores.
Nor is there apparently a real understanding of the difference between VSS and RSS.
Nor of the fact that modern machines are really really fast if you can keep stuff in cache. And so you really should be focused on how you can make that happen.
While virtual memory allocation does not require physical allocation, it immediately runs into the kinds of performance problems that huge pages are designed to solve. On modern systems, you can burn up most of your virtual address space via casual indifference to how it maps to physical memory and the TLB space it consumes. Spinning up thousands of stacks is kind of a pathological case here.
10µs is an eternity for high-performance software architectures. That is also around the same order of magnitude as disk access with modern NVMe. An enormous amount of effort goes into avoiding blocking on NVMe disk access with that latency for good reason. 10µs is not remotely below the noise floor in terms of performance.
The program could use one page's worth of stack space, which is optimal. The program could use like 200 bytes of stack space, which wastes the rest of the page. The program could recurse all the way to 9.9 MB of stack usage, stop just before overflow and then unwind back to constant 200 bytes stack space usage, and never touch all those pages ever again.
Guess it's not a huge issue in these 64-bit days, but back in the 32-bit days it was a real limitation to how many threads you could spin up due to the limited address space.
Of course most applications which hit this would override the 1MB default.
So much so that they'll sign themselves up for async frameworks that thread steal at will and bounce things all over cores causing cache line bouncing and associated memory stalls, not understanding what this is doing to their performance profile.
And endure complexity, etc. through awkward async call chains and function colouring.
Most people's applications would be totally fine just spawning OS threads and using them without fear and dropping into a futex when waiting on I/O; or using the kernel's own async completion frameworks. The OS scheduler is highly efficient, and it is very good at managing multiple cores and even being aware asymmetrical CPU hierarchies, etc.. Likely more efficient than half the async runtimes out there.
Equally, if a megabyte of stack is a lot for your usecase, can't you just ask pthreads to reserve less? I believe it goes down to like 16k
EDIT: Something else to consider is what if your REST calls needs to make 5 queries. Do you serialize them? Now your latency can be worse. Do you launch a thread per query? Now you need to a) synchornize b) take x5 the thread cost. Async patterns or green threads or coroutines enable more efficient overlapping of operations and potentially better concurrency (though a server that handles lots of concurrent requests may already have "enough" concurrency anyways).
There’s a benefit in being able to code the handling of a request in synchronous logic. A case has to be made for the particular application that it would cause performance or resource issues, before opting for asynchronous code that adds more complexity.
I totally agree there are applications for which this is overkill and adds complexity. It's just a tool in the toolbox. Video games famously are just a single thread/main loop kind of application.
Why does async make it harder to reason about when resources are released?
No such thing. In a preemptive multitasking OS (that's basically all of them today) you will get context switching regardless of what you do. Most modern OS's don't even give you the tools to mess with the scheduler at all; the scheduler knows best.
Linux kernel uses 8k stacks (TBH, it's been a while), but there's also some copy-on-write overhead. Still, this is not the C10k problem...
Async precisely improves disk/network I/O-bound applications because synchronous code has to waste a whole thread sitting around waiting for an I/O response (each with its own stack memory and scheduler overhead), and in something like an application server there will be many incoming requests doing so in parallel. Cancellation is also easier with async
CPU-bound code would not benefit because the CPU is already busy, and async adds overhead
See e.g. https://learn.microsoft.com/en-us/aspnet/web-forms/overview/... and https://learn.microsoft.com/en-us/aspnet/web-forms/overview/...
I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)
I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.
UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.
So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.
Using async for languages like Rust or C++ is cargo cult by people who don't know what the hell they're doing.
[Caveat: there's a use case for async if you're doing embedded development where you don't have threads or call stacks at all.]
Assuming for the sake of argument that they are together, you're still saving stack memory for every thread that isn't created. In fact you could say it allows the CPU to be idle, by spending less time context switching. On top of that, async/await is a perfect fit for OS overlapped I/O mechanisms for similar reasons, namely not requiring a separate blocking thread for every pending I/O (see e.g. https://en.wikipedia.org/wiki/Overlapped_I/O, https://stackoverflow.com/a/5283082)
So avoiding that would mean a server can handle more traffic before running into limits based on thread count.
I mean, I suppose we could move the scheduling and tracking out of kernel mode and into user mode...
But then guess what we've just reinvented?
Honestly, it's a mostly obsolete approach. OS threads are fast. We have lots of cores. The cost of bouncing around on the same core and losing L1 cache coherence is higher than the cost of firing up a new OS thread that could land on a new core.
The kernel scheduler gets tuned. Language specific async runtimes are unlikely to see so many eyeballs.
This whole thing is basically snake oil. The best thing backend can do instead is have dedicated thread pool where each real thread has its own queue of limited size. Each element in queue would contain input and output state of request and code to deal with those. Once queue grows over certain size the backend should simply immediately return error code (too busy). Much more sound strategy in my opinion.
There are more complex cases of course (like computationally expensive requests with no io that take long time). Handling those would require some extra logic. Async stuff however will not help here either
Libraries like Tokio (mentioned in the article) have support for this built-in. Goroutines sidestep the issue completely. C# Tasks are batteries included in that regard. In fact function colors aren't an issue in most languages that have async/await. JavaScript is the odd one out, mostly due to being single-threaded. Can't really be made to work in a clean way in existing JS engines.
Take Rust: you return `Result<T,E>`, you are coloring your function the same way as you are when using `async`. Same for Option. Errors as return values in Go: again, function coloring.
One of your nested function starts taking a "serverUrl" input parameter instead of reading an environment variable: you've colored your function and you now need to color the entire call stack (taking the url parameter themselves).
All of them are exactly as annoying, as you need to rewrite the entire call stack's function signature to accommodate for the change, but somehow people obsess about async in particular as if it was something special.
It's not special, it's just the reflection that something can either be explicit and require changing many function signatures at once when making a change, or be implicit (with threads, exceptions or global variables) which is less work, but less explicit in the code, and often more brittle.
Async functions are colored because they force a change in the rest of the call stack, not just the caller. If you have a function nested ten levels deep and it calls a function that returns a Result, and you change that function to no longer return a result because it lost all its error cases, you only have to change the direct callers. If you are ten layers deep in a stack of synchronous functions and suddenly need to make an asynchronous call, the type signature of every individual function in the stack has to change.
You might say "well, if I'm ten layers deep in stack of functions that don't return errors and have to make a call that returns the error, well now I have to change the entire stack of functions to return the error", but that's not true. The type change from sync to async is forced. The error is not. You could just discard it. You could handle it somehow in one of the intervening calls and terminate the propagation of the type signature changes half way up. The caller might log the error and then fail to propogate it upwards for any number of reasons. You aren't being forced to this change by the type system. You may be forced to change by the rest of the software engineering situation, but that's not a "color".
For similar reasons, the article is incorrect about Go's "context.Context" being a coloration. It's just a function parameter like anything else. If you're ten layers deep into non-Context-using code and you need to call a function that takes a context, you can just pass it one with context.Background() that does nothing context-relevant. You may, for other software engineering reasons, choose to poke that use of a context up the stack to the rest of the functions. It's probably a good idea. But you're not being forced to by the type system.
"Coloration" is when you have a change to a function that doesn't just change the way it interacts with the functions that directly call it. It's when the changes forcibly propagate up the entire call stack. Not just when it may be a good idea for other reasons but when the language forces the changes.
It is not, in the maximally general sense, limited to async. It's just that sync/async is the only such color that most languages in common use expose.
well, this isn't really true - at least for Rust:
runtime.block_on(async{});
https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.htm...
So I don't buy that async colors are fundamentally different.
Threading methodology is unrelated though. How exactly the call stack is scheduled is orthogonal to the question of whether or not making a call to a particular function results in type changes being forced on all function in the entire stack.
There may also be cases where you can take "async" code and run it entirely out of the context of any sort of sceduler, where it can simply be turned into the obvious sync code. While that does decolor the resulting call (or, if you prefer, recolor it back into the "sync" color) it doesn't mean that async is not generally a color in code where that is not an option. Solving concurrency by simply turning it off certainly has a time and place (e.g., a shell script may be perfectly happen to run "async" code completely synchronously because it may be able to guarantee nothing will ever happen concurrently), but that doesn't make the coloration problem go away when that is not an option.
Here's the list of requirements: 1. Every function has a color. 2. The way you call a function depends on its color. 3. You can only call a red function from within another red function. 4. Red functions are more painful to call. 5. Some core library functions are red.
You are complaining about point 3. You are saying if there's any way to call a red function from a blue function then it's not real. The type change from sync to async is not forced any more than changing T to Result<T,E>. You just get a Promise from the async function. So you logically think that async is not a color. You think even a Haskell IO-value can be used in a pure function if you don't actually do the IO or if you use unsafePerformIO. This is nonsense. Anything that makes the function hard to use can be color.
And result and option usually mean something else. Option is a value or none. None doesn't necessarily means the function failed. Result is the value or an error message. You can have result<option, error>
That's different then async where you can call the other type.
Like in Haskell there is the IO monad used to denote the IO effect. And there are unsafe ways to actually execute it - does that make everything in Haskell impure?
As soon as you start using function arguments instead of using a global variable, you are coloring your function in the exact same way. Yet I don't think anyone would make the case that we should stop using function arguments and use global variables instead…
Effects are another way of making functions incompatible, for better or worse. It can be done badly. Java fell into that trap with checked exceptions. They meant well, but it resulted in fragmentation.
Sometimes it’s worth making an effort to make functions more compatible by standardizing types. By convention, all functions in Go that return an error use the same type. It gives you less information about what errors can actually happen, but that means the implementation of a function can be modified to return a new error without breaking callers.
Another example is standardizing on a string type. There are multiple ways strings can be implemented, but standardization is more important.
Or if you mean that returning a new error breaks API compatibility, then yes that's the point. If now you can error in a different way, your users now need to handle that. But if it's all generic and inferred, it can still just bubble up to wherever they want to do that with no changes to middle layers.
In this way, declaring a type guides people to write calling code that doesn't break, provided you set it up that way. It makes things easier for the implementation to change.
Sometimes you do need handlers for specific errors, but in Go you always need to write generic error handling, too.
(A type variable can do something similar. It forces the implementation to be generic because the type isn't known, or is only partially known.)
Throwing the baby out with the water like what go sort of does with its error handling is no solution. The proper solution is a better type system (e.g. a result type with a generic handles what go can't).
For effects though, we need a type systems that support these - but it's only available in research languages so far. You can actually just be generic in effects (e.g. an fmap function applying a lambda to a list could just "copy" the effect of the lambda to the whole function - this can be properly written down and enforced by the compiler)