What async promised and what it delivered

Posted by zdw 3 days ago

What async promised and what it delivered(causality.blog)

173 points | 197 comments

kibwen 6 hours ago|

> Language designers who studied the async/await experience in other ecosystems concluded that the costs of function coloring outweigh the benefits and chose different paths.

Not really. The author provides Go as evidence, but Go's CSP-based approach far predates the popularity of async/await. Meanwhile, Zig's approach still has function coloring, it's just that one color is "I/O function" and the other is "non-I/O function". And this isn't a problem! Function coloring is fine in many contexts, especially in languages that seek to give the user low-level control! I feel like I'm taking crazy pills every time people harp about function coloring as though it were something deplorable. It's just a bad way of talking about effect systems, which are extremely useful. And sure, if you want to have a high-level managed language like Go with an intrusive runtime, then you can build an abstraction that dynamically papers over the difference at some runtime cost (this is probably the uniformly correct choice for high-level languages, like dynamic or scripting languages (although it must be said that Go's approach to concurrency in general leaves much to be desired (I'm begging people to learn about structured concurrency))).

cma256 5 minutes ago||

That is an unfair characterization of Zig. The OP correctly points out:

> Function signatures don’t change based on how they’re scheduled, and async/await become library functions rather than language keywords.

The functions have the same calling conventions regardless of IO implementation. Functions return data and not promises, callbacks, or futures. Dependency injection is not function coloring.

jeremyjh 3 hours ago|||

CSP is a theory about synchronization and implies nothing about green threads or M:N scheduling. Go could have used OS threads and called it CSP.

Certainly it’s true that Go invented neither, both Erlang and Haskell had truly parallel green threads without function coloring before Go or Node existed.

simonask 3 hours ago|||

I agree with you, but the big difference between function arguments and effect systems is that the tools we have for composing functions with arguments are a lot simpler to deal with than the tools we have for composing effects.

You could imagine a programming language that expressed “comptime” as a function argument of a type that is only constructible at compile-time. And one for runtime as well, and then functions that can do both can take the sum type “comptime | runtime”.

YZF 6 hours ago|||

Boost.Asio (2005) is surely worth a mention. But the pattern predates this by decades. Green threads, what Goroutines are, comes from the 1990's.

tekacs 6 hours ago|||

I mean Java's Loom feels like the 'ultimate' example of the latter for the _ordinary_ programmer, in that it effectively leaves you just doing what looks like completely normal threads however you so please, and it all 'just works'.

ysleepy 5 hours ago|||

Java has gone full circle.

Java had green threads in 1997, removed them in 2000 and brought them back properly now as virtual threads.

I'm kinda glad they've sat out the async mania, with virtual threads/goroutines, the async stuff just feels like lipstick on a pig. Debugging, stacktrackes etc. are just jumbled.

iknowstuff 2 hours ago|||

In Rust debugging and stacktraces are perfectly fine because async/futures compile to a perfect state machine.

comex 14 minutes ago||

They are not perfectly fine. If a task panics then you will get the right stack trace, but there is no way to get a stack trace for a task that’s currently waiting. (At least not without intrusive hacks.)

gf000 4 hours ago||||

I don't think comparing 97's green threads to virtual threads ever made sense.

Like their purpose/implementation everything is just so different, they don't share anything at all.

delusional 4 hours ago|||

Java didn't really "sit it out". It launched CompletableFutures, CompletionStages, Sources and Sinks, arguably even streams. All of those are standard library forms of async programming. People tried to make it catch on, but the experience of using it, The runtime wrapping all your errors in completion exceptions, destroying your callstacks, just made it completely useless.

jen20 2 hours ago||

Every Java codebase using something like Flux serves as a datapoint in favor of this argument - they're an abomination to read, reason about or (heaven help) debug.

sqquima 3 hours ago||||

I'm curious how escape analysis works with virtual threads. With the asynchronous model, an object local to a function will be migrated to the old generation heap while the external call gets executed. With virtual threads I imagine the object remains in the virtual thread "stack", therefore reducing pressure in garbage collection.

Rapzid 3 hours ago|||

The initial Loom didn't really provide the semantics and ergonomics of async/await which is why they immediately started working on structured concurrency.

And for my money I prefer async/await to the structured concurrency stuff..

leoc 5 hours ago|||

What should people read to learn about structured concurrency?

kibwen 4 hours ago|||

I think the clearest sales pitch comes from this post from the author of Trio, which is an implementation of structured concurrency for Python: https://vorpus.org/blog/notes-on-structured-concurrency-or-g... .

gf000 4 hours ago|||

Perhaps java's related JEPs could be a good starting point?

https://openjdk.org/jeps/505

There are also related discussions on other platforms that are worthy to read.

scuff3d 3 hours ago||

In my experience people complain about it because they are coming from a blocking first mindset. They're trying to shoehorn async calls into an inherently synchronous structure.

A while back I just started leaning in. I write a lot of Python at work, and anytime I have to use a library that's relies on asyncio, I just write the entire damn app as an asynchronous one. Makes function coloring a non-issue. If I'm in a situation where the two have to coexist, the async runtime gets its own thread and communication back and forth is handled at specific boundaries.

otabdeveloper4 1 hour ago||

> Makes function coloring a non-issue.

Yes, having to rewrite literally all of your code because you need to use an async function somewhere is an issue.

An even bigger issue is that now you have two (incompatible!) versions of literally every library dependency.

jemfinch 30 minutes ago||

> OS threads are expensive: an operating system thread typically reserves a megabyte of stack space and takes roughly a millisecond to create.

It's typically less than a hundred kilobytes and (on the systems I've benchmarked using std::thread) it takes 60usec (wall time in userspace) to create and destroy a thread.

Threads have gotten so fast that paying the async function coloring price makes very little sense for most software.

shortercode 5 hours ago||

Having lived through the changes from callback hell, early promises and then async/await I only ever found each step an improvement and the negatives are very minor when actually working with them.

Now function colouring is interesting but not for the reason these articles get excited. Recolouring is easy and has basically no impact on code maintenance. BUT if you need that code path to really fly then marking it as async is a killer, as all those tiny little promises add tiny delays in the form of many tasks. Which add up to performance problems on hot code paths. This is particularly frustrating if functions are sometimes async, like lazy loaders or similar cache things. To get around this you can either use callbacks instead or use selective promise chaining to only use promises when you get a promise. Both strategies can be messy and trip up people who don’t understand these careful design decisions.

One other fun thing is indexeddb plays terribly with promises, as it uses a “transactions close at end of task” mechanism, making certain common patterns impossible with promises due to how they behave with the task system. Although some API designers have come up with ways around this to give you promise interfaces for databases. Normally by using callbacks internally and only doing one operation per transaction.

Rohansi 21 minutes ago||

> This is particularly frustrating if functions are sometimes async, like lazy loaders or similar cache things.

This is a solved problem in C#. You can use ValueTask<T> instead of Task<T> and no promise will be allocated if it never awaits.

ambicapter 45 minutes ago|||

> all those tiny little promises add tiny delays in the form of many tasks

Is this because the functions are async or is that because most of the time async is used for things that are I/O like and therefore susceptible to these kinds of delays?

quietbritishjim 5 hours ago||

> all those tiny little promises add tiny delays in the form of many tasks.

That depends on the language/framework. In some languages, `await foo()` is equivalent to `Future f = foo(); await f`. In others (e.g. Python), it's a primitive operation and you have to use a different syntax if you want to create a future/task. In Trio (an excellent Python alternative to asyncio), there isn't even the concept of a future at all!

dcan 8 hours ago||

I will agree - async rust on an operating system isn’t all that impressive - it’s a lot easier to just have well defined tasks and manually spawn threads to do the work.

However, in embedded rust async functions are amazing! Combine it with a scheduler like rtic or embassy, and now hardware abstractions are completely taken care of. Serial port? Just two layers of abstraction and you have a DMA system that shoves bytes out UART as fast as you can create them. And your terminal thread will only occupy as much time as it needs to generate the bytes and spit them out, no spin locking or waiting for a status register to report ready.

cmrdporcupine 3 hours ago|

Despite my panning of async elsewhere on this thread, I agree with you here. Embassy is a thing of beauty and a great use of Rust's async. Much of my embedded career was bogged down managing a pile of state machines. With async/await and embassy, that just goes away.

SebastianKra 8 hours ago||

The discussion around async await always focuses on asynchronous use-cases, but I see the biggest benefits when writing synchronous code. In JS, not having await in front of a statement means that nothing will interfere with your computation. This simplifies access to shared state without race conditions.

The other advantage is a rough classification in the type system. Not marking a function as async means that the author believes it can be run in a reasonable amount of time and is safe to run eg. on a UI main thread. In that sense, the propagation through the call hierarchy is a feature, not a bug.

I can see that maintaining multiple versions of a function is annoying for library authors, but on the other hand, functions like fs.readSync shouldn’t even exist. Other code could be running on this thread, so it's not acceptable to just freeze it arbitrarily.

Yokohiii 2 hours ago||

Maybe I am missing something. But the function coloring problem is basically the tension that async can dominate call hierarchies and the sync code in between looses it's beneficial properties to a degree. It's at least awkward to design a system that smoothly tries to blend sync that executes fast and async code that actually requires it.

Saying that fs.readSync shouldn't exist is really weird. Not all code written benefits from async nor even requires it. Running single threaded, sync programs is totally valid.

IX-103 2 hours ago|||

'readSync' does two different things - tells the OS we want to read some data and then waits for the data to be ready.

In a good API design, you should exposed functions that each do one thing and can easily be composed together. The 'readSync' function doesn't meet that requirement, so it's arguably not necessary - it would be better to expose two separate functions.

This was not a big issue when computers only had a single processor or if the OS relied on cooperative multi-threading to perform I/O. But these days the OS and disk can both run in parallel to your program so the requirement to block when you read is a design wart we shouldn't have to live with.

Yokohiii 34 minutes ago|||

he was referring to fs.readSync (node) which has also has fs.read, which is async. there is also no parallelism in node.

i don't see it as very useful or elegant to integrate any form for parallelism or concurrency into every imaginable api. depends on context of course. but generalized, just no. if a kind of io takes a microsecond, why bother.

otabdeveloper4 1 hour ago|||

> tells the OS we want to read some data and then waits for the data to be ready

No, it tells the OS "schedule the current thread to wake up when the data read task is completed".

Having to implement that with other OS primitives is a) complex and error-prone, and b) not atomic.

tcfhgj 2 hours ago|||

> Not all code written benefits from async nor even requires it. Running single threaded, sync programs is totally valid.

Maybe, but is it useful to have sync options?

You can still write single threaded programs

Yokohiii 43 minutes ago||

I mean single threaded + sync.

Sync options are useful. If everything is on the net probably less so. But if you have a couple of 1ms io ops that you want to get done asap, it's better to get them done asap.

gf000 4 hours ago||

> This simplifies access to shared state without race conditions

But in ordinary JS there just can't be a race condition, everything is single threaded.

SkiFire13 3 hours ago|||

You can definitely have a race condition in JS. Being single-threaded means you don't have parallelism, but you still have concurrency, and that's enough to have race conditions. For example you might have some code that behaves differently depending on which promise resolves first.

Kinrany 4 hours ago|||

And it doesn't actually prevent concurrency.

gf000 4 hours ago||

Sure, but concurrent != parallel. You can't have data races with a single thread of execution - a while loop writing i=0 or i=1 on each iteration is not a data race.

Two async functions doing so is not a data race either.

Rapzid 3 hours ago|||

You should really look up the definition of race condition; it has nothing to do with parallel processing. Parallel processing just makes it harder to deal with.

gpderetta 3 hours ago|||

Data race != Race condition

gf000 3 hours ago||

Data races are a specific race condition - they may be safe or cause tearing.

Serially, completely synchronously overwriting values is none of these categories though.

Maxatar 2 hours ago||

You're mixing up quite a few somewhat related but different concepts: data races, race conditions, concurrency and parallelism.

Concurrency is needed for race conditions, parallelism is needed for data races. Many single threaded runtimes including JS have concurrency, and hence the potential for race conditions, but don't have parallelism and hence no data races.

ibraheemdev 7 hours ago||

> OS threads are expensive: an operating system thread typically reserves a megabyte of stack space

Why is reserving a megabyte of stack space "expensive"?

> and takes roughly a millisecond to create

I'm not sure where this number is from, it seems off by a few orders of magnitude. On Linux, thread creation is closer to 10 microseconds.

n_e 5 hours ago||

> Why is reserving a megabyte of stack space "expensive"?

Because if you use one thread for each of your 10,000 idle sockets you will use 10GB to do nothing.

So you'll want to use a better architecture such as a thread pool.

And if you want your better architecture to be generic and ergonomic, you'll end up with async or green threads.

lelanthran 4 hours ago|||

> Because if you use one thread for each of your 10,000 idle sockets you will use 10GB to do nothing.

1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.

2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.

n_e 4 hours ago|||

> 1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.

My example (and the c10k problem) is 10k concurrent connections, not 10k concurrent requests.

> 2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.

Yes, and that's both memory and cpu usage that isn't needed when using a better concurrency model. That's why no high-performance server software use a huge amount of threads, and many use the reactor pattern.

cmrdporcupine 4 hours ago||

> Yes, and that's both memory and cpu usage that isn't needed

No, it literally is not. The "memory" is just entries in a page table in the kernel and MMU. It shouldn't worry you at all.

Nor is the CPU used by the kernel to manage those threads going to be necessarily less efficient than someone's handrolled async runtime. In fact given it gets more eyes... likely more.

The sole argument I can see is just avoiding a handful of syscalls and excessive crossing of the kernel<->userspace brain blood barrier too much.

com2kid 4 hours ago|||

> 1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed

I've written massively concurrent systems where each connection only handled maybe a few kilobytes of data.

Async io is a massive win in those situations.

This describes many rest endpoints. Fetch a few rows from a DB, return some JSON.

wmf 5 hours ago||||

On a 64-bit system, 10 GB of address space is nothing.

matheusmoreira 3 hours ago||

10 GB of RAM is certainly something though. Especially in current times.

monocasa 3 hours ago||

Except if those threads are actually faulting in all of that memory and making it resident, they'd be doing the same thing, just on the heap, for a classic async coroutine style application.

asdfasgasdgasdg 2 hours ago||

If you have hugepages enabled, all of those threads are probably faulting in a fair amount of memory.

duped 4 hours ago|||

> you will use 10GB to do nothing.

You don't pay for stack space you don't use unless you disable overcommit. And if you disable overcommit on modern linux the machine will very quickly stop functioning.

simonask 3 hours ago||

The amount of stack you pay for on a thread is proportional to the maximum depth that the stack ever reached on the thread. Operating systems can grow the amount of real memory allocated to a thread, but never shrink it.

It’s a programming model that has some really risky drawbacks.

matheusmoreira 3 hours ago||

> Operating systems can grow the amount of real memory allocated to a thread, but never shrink it.

Operating systems can shrink the memory usage of a stack.

  madvise(page, size, MADV_DONTNEED);

Leaves the memory mapping intact but the kernel frees underlying resources. Subsequent accesses get either new zero pages or the original file's pages.

Linux also supports mremap, which is essentially a kernel version of realloc. Supports growing and shrinking memory mappings.

  stack = mremap(stack, old_size, old_size / 2, MREMAP_MAYMOVE, 0);

Whether existing systems make use of this is another matter entirely. My language uses mremap for growth and shrinkage of stacks. C programs can't do it because pointers to stack allocated objects may exist.

eklitzke 5 hours ago|||

Yeah, none of this makes sense to me. Allocating memory for stack space is not expensive (and the default isn't even 1MB??) because you're just creating a VMA and probably faulting in one or two pages.

They also say:

>The system spends time managing threads that could be better spent doing useful work.

What do they think the async runtime in their language is doing? It's literally doing the same thing the kernel would be doing. There's nothing that intrinsically makes scheduling 10k couroutines in userspace more efficient than the kernel scheduling 10k threads. Context switches are really only expensive when the switch is happening between different processes, the overhead of a context switch on a CPU between two threads in the same process is very small (and they're not free when done in userspace anyway).

There are advantages to doing scheduling in the kernel and there are advantages to doing scheduling in userspace, but this article doesn't really touch on any of the actual pros and cons here, it just assumes that userspace scheduling is automatically more efficient.

tcfhgj 2 hours ago|||

doesn't a async runtime have more knowledge about the tasks than the OS about the threads?

cmrdporcupine 4 hours ago|||

It's a cargo cult and a bias I see all over the place.

I feel like we're now, what, 20, 25 years on and people still haven't adjusted themselves to the fact that the machines we have now are multicore, have boatloads of cache, or how that cache is shared (or not) between cores.

Nor is there apparently a real understanding of the difference between VSS and RSS.

Nor of the fact that modern machines are really really fast if you can keep stuff in cache. And so you really should be focused on how you can make that happen.

jandrewrogers 2 hours ago|||

The author doesn't fully justify the assertion but it does have sound basis.

While virtual memory allocation does not require physical allocation, it immediately runs into the kinds of performance problems that huge pages are designed to solve. On modern systems, you can burn up most of your virtual address space via casual indifference to how it maps to physical memory and the TLB space it consumes. Spinning up thousands of stacks is kind of a pathological case here.

10µs is an eternity for high-performance software architectures. That is also around the same order of magnitude as disk access with modern NVMe. An enormous amount of effort goes into avoiding blocking on NVMe disk access with that latency for good reason. 10µs is not remotely below the noise floor in terms of performance.

matheusmoreira 3 hours ago|||

1 megabyte stacks mean ten thousand threads require 10 gigabytes of RAM just for the stacks. The entire point of the asynchronous programming paradigm is to reclaim all of those gigabytes by not allowing stacks to develop at all, by stealthily turning everything into a hidden form of cooperative multitasking instead.

monocasa 3 hours ago||

Only if they're resident. Otherwise you just need one page per thread of physical memory (so ~40MB on x86) and 10GB of virtual memory.

matheusmoreira 2 hours ago||

While that's strictly true, resident memory in this context is a function of worst case memory usage by the code executing on those stacks. Seems wise to assume worst case performance when discussing this.

The program could use one page's worth of stack space, which is optimal. The program could use like 200 bytes of stack space, which wastes the rest of the page. The program could recurse all the way to 9.9 MB of stack usage, stop just before overflow and then unwind back to constant 200 bytes stack space usage, and never touch all those pages ever again.

magicalhippo 6 hours ago|||

> Why is reserving a megabyte of stack space "expensive"?

Guess it's not a huge issue in these 64-bit days, but back in the 32-bit days it was a real limitation to how many threads you could spin up due to the limited address space.

Of course most applications which hit this would override the 1MB default.

cmrdporcupine 4 hours ago|||

There's much ridiculous hatred for OS threads based on people's biases of operating systems and hardware from 20 years ago.

So much so that they'll sign themselves up for async frameworks that thread steal at will and bounce things all over cores causing cache line bouncing and associated memory stalls, not understanding what this is doing to their performance profile.

And endure complexity, etc. through awkward async call chains and function colouring.

Most people's applications would be totally fine just spawning OS threads and using them without fear and dropping into a futex when waiting on I/O; or using the kernel's own async completion frameworks. The OS scheduler is highly efficient, and it is very good at managing multiple cores and even being aware asymmetrical CPU hierarchies, etc.. Likely more efficient than half the async runtimes out there.

tcfhgj 2 hours ago||

hardware from 10 years ago - do you have benchmarks for more recent hardware?

https://vorner.github.io/async-bench.html

delusional 4 hours ago||

> Why is reserving a megabyte of stack space "expensive"?

Equally, if a megabyte of stack is a lot for your usecase, can't you just ask pthreads to reserve less? I believe it goes down to like 16k

mbid 9 hours ago||

How many systems are there that can't just spawn a thread for each task they have to work on concurrently? This has to be a system that is A) CPU or memory bound (since async doesn't make disk or network IO faster) and B) must work on ~tens of thousands of tasks concurrently, i.e. can't just queue up tasks and work on only a small number concurrently. The only meaningful example I can come up with are load balancers, embedded software and perhaps something like browsers. But e.g. an application server implementing a REST API that needs to talk to a database anyway to answer each request doesn't really qualify, since the database connection and the work the database itself does are likely much more resource intensive than the overhead of a thread.

YZF 6 hours ago||

Pretty much anything that needs performance and has a lot of relatively light operations is not a candidate for spawning a thread. Context switching and the cost of threads is going to kill performance. A server spawning a thread per request for relatively lightweight request is going to be extremely slow. But sure, if every REST call results in a 10s database query then that's not your bottleneck. A query to a database can be very fast though (due to caches, indices, etc.) so it's not a given that just because you're talking to a database you can just spin up new threads and it'll be fine.

EDIT: Something else to consider is what if your REST calls needs to make 5 queries. Do you serialize them? Now your latency can be worse. Do you launch a thread per query? Now you need to a) synchornize b) take x5 the thread cost. Async patterns or green threads or coroutines enable more efficient overlapping of operations and potentially better concurrency (though a server that handles lots of concurrent requests may already have "enough" concurrency anyways).

layer8 5 hours ago|||

Server applications don’t spawn threads per request, they use thread pools. The extra context switching due to threads waiting for I/O is negligible in practice for most applications. Asynchronous I/O becomes important when the number of simultaneous requests approaches the number of threads you can have on your system. Many applications don’t come close to that in practice.

There’s a benefit in being able to code the handling of a request in synchronous logic. A case has to be made for the particular application that it would cause performance or resource issues, before opting for asynchronous code that adds more complexity.

YZF 4 hours ago|||

Thread pools are another variation on the theme. But if your threads block then your pool saturates and you can't process any more requests. So thread pools still need non-blocking operations to be efficient or you need more threads. If you have thread pools you also need a way of communicating with that pool. Maybe that exists in the framework and you don't worry about it as a developer. If you are managing a pool of threads then there's a fair amount of complexity to deal with.

I totally agree there are applications for which this is overkill and adds complexity. It's just a tool in the toolbox. Video games famously are just a single thread/main loop kind of application.

acdha 3 hours ago|||

There’s also a really good operational benefit if you have limits like total RAM, database connections, etc. where being able to reason about resource usage is important. I’ve seen multiple async apps struggle with things like that because async makes it harder to reason about when resources are released.

tcfhgj 1 hour ago||

Could you point out the issue here?

Why does async make it harder to reason about when resources are released?

acdha 1 hour ago|||

Basically it’s the non-linear execution flow creating situations which are harder to reason about. Here’s an example I’m trying to help a Node team fix right now: something is blocking the main loop long enough that some of the API calls made in various places are timing out or getting auth errors due to the signature expiring between when the request was prepared and when it is actually dispatched because that’s sporadically tend of seconds instead of milliseconds. Because it’s all async calls, there are hundreds of places which have to be checked whereas if it was threaded this class of error either wouldn’t be possible or would be limited to the same thread or an explicit synchronization primitive for something like a concurrency limit on the number of simultaneous HTTP requests to a given target. Also, the call stack and other context is unhelpful until you put effort into observability for everything because you need to know what happened between hitting await and the exception deep in code which doesn’t share a call stack.

otabdeveloper4 1 hour ago|||

Because async usually means you've stopped having "call stack" as a useful abstraction.

otabdeveloper4 1 hour ago|||

> Context switching

No such thing. In a preemptive multitasking OS (that's basically all of them today) you will get context switching regardless of what you do. Most modern OS's don't even give you the tools to mess with the scheduler at all; the scheduler knows best.

RustyRussell 19 minutes ago|||

I agree: fork is fast, cheap and easy. If you're spawning something for significant work it tends to be in the noise.

Linux kernel uses 8k stacks (TBH, it's been a while), but there's also some copy-on-write overhead. Still, this is not the C10k problem...

anonymars 8 hours ago|||

I'm not sure this is correct mental model of what async solves

Async precisely improves disk/network I/O-bound applications because synchronous code has to waste a whole thread sitting around waiting for an I/O response (each with its own stack memory and scheduler overhead), and in something like an application server there will be many incoming requests doing so in parallel. Cancellation is also easier with async

CPU-bound code would not benefit because the CPU is already busy, and async adds overhead

See e.g. https://learn.microsoft.com/en-us/aspnet/web-forms/overview/... and https://learn.microsoft.com/en-us/aspnet/web-forms/overview/...

likeabbas 7 hours ago|||

I have some test code that runs a comparison of Hyper pre-async (aka thread per request) vs async (via Tokio), and the pre-async version is able to process more requests per second in every scenario (I/o, CPU complex tasks, shared memory).

I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)

iknowstuff 6 hours ago|||

How many concurrent requests?

likeabbas 4 hours ago||

I'll have to check my work computer on Monday. It was 8 cpu virtual machine on a m1 Mac. the UMCG and normal threads were 1024 set on the server, the Tokio version was 2 threads per core. Just from the top of my head - the I/O bound requests topped out around 40k/second for the Tokio version, 60k/second for the normal hyper version, and 80k/second for the UMCG hyper version.

I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.

UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.

So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.

otabdeveloper4 1 hour ago|||

Async only exists because languages like Python and Javascript have global interpreter locks that don't play nice with threads.

Using async for languages like Rust or C++ is cargo cult by people who don't know what the hell they're doing.

[Caveat: there's a use case for async if you're doing embedded development where you don't have threads or call stacks at all.]

mbid 8 hours ago||||

I read this argument ("async is for I/O-bound applications") often, but it makes no sense to me. If your app is I/O bound, how does reducing the work the (already idling!) CPU has to spend on context switching improve the performance of the system?

ndriscoll 8 hours ago|||

IO bound might mean latency but not throughput, so you can up concurrency and add batching, both of which require more concurrent requests in flight to hit your real limit. IO bound might also really mean contention for latches on the database, and different types of requests might hit different tables. Basically, I see people say they're IO bound long before they're at the limit of a single disk, so obviously they are not IO bound. Modern drives are absurdly fast. If everyone were really IO bound, we'd need 1/1000 the hardware we needed 10-15 years ago.

anonymars 8 hours ago||||

It sounds like you're assuming both pieces are running on the same server, which may not be the case (and if you're bottlenecked on the database it probably shouldn't be, because you'd want to move that work off the struggling database server)

Assuming for the sake of argument that they are together, you're still saving stack memory for every thread that isn't created. In fact you could say it allows the CPU to be idle, by spending less time context switching. On top of that, async/await is a perfect fit for OS overlapped I/O mechanisms for similar reasons, namely not requiring a separate blocking thread for every pending I/O (see e.g. https://en.wikipedia.org/wiki/Overlapped_I/O, https://stackoverflow.com/a/5283082)

mbid 8 hours ago||

Right, I think the argument should be that transitioning from a synchronous to asynchronous programming model can improve the performance of a previously CPU/Memory-bound system so that it saturates the IO interface.

anonymars 6 hours ago||

If the system is CPU-bound doing useful work, that's not the case. Async shines when there are a lot of "tasks" that are not doing useful work, because they are waiting (e.g. on I/O). Waiting threads waste resources. That's what async greatly improves.

charlieflowers 8 hours ago|||

The simplest example is that you can easily be wasteful in your use of threads. If you just write blocking code, you will block the thread while waiting on io, and threads are a finite resource.

So avoiding that would mean a server can handle more traffic before running into limits based on thread count.

pocksuppet 7 hours ago|||

Inversion of thought pattern: Why is a thread such a waste that we can't have one per concurrent request? Make threads less wasteful instead. Go took things in this direction.

anonymars 6 hours ago||

How do you suggest we just "make threads less wasteful"?

I mean, I suppose we could move the scheduling and tracking out of kernel mode and into user mode...

But then guess what we've just reinvented?

ozgrakkurt 8 hours ago|||

Async does make nvme io faster because queueing multiple operations on the nvme itself is faster.

default-kramer 7 hours ago||

I think it's another case of the whole industry being driven by the needs of the very small number of systems that need to handle >10k concurrent requests.

cmrdporcupine 3 hours ago||

Or biases inherited from deploying on single or dual core 32-bit systems from 20 years ago.

Honestly, it's a mostly obsolete approach. OS threads are fast. We have lots of cores. The cost of bouncing around on the same core and losing L1 cache coherence is higher than the cost of firing up a new OS thread that could land on a new core.

The kernel scheduler gets tuned. Language specific async runtimes are unlikely to see so many eyeballs.

FpUser 2 minutes ago||

in real life when request handler call async/colored/whatnot it lets the call proceed and immediately ready to process next request. The backend then would have no problems to create ever growing number of asyncs currently in flight. In real life those asyncs would most likely end up calling database. The end result is that backend would simply overwhelm the database and other resources that have to maintain states of those countless asyncs in flight.

This whole thing is basically snake oil. The best thing backend can do instead is have dedicated thread pool where each real thread has its own queue of limited size. Each element in queue would contain input and output state of request and code to deal with those. Once queue grows over certain size the backend should simply immediately return error code (too busy). Much more sound strategy in my opinion.

There are more complex cases of course (like computationally expensive requests with no io that take long time). Handling those would require some extra logic. Async stuff however will not help here either

miiiiiike 27 minutes ago||

JavaScript developers don't like hearing this but RxJS solves, or gives you the tools to solve, most of these problems.

joelwilliamson 3 days ago|

Function colouring, deadlocks, silent exception swallowing, &c aren’t introduced by the higher levels, they are present in the earlier techniques too.

chmod775 3 days ago||

Function coloring also only applies to a few select languages. If your runtime allows you can call an async function from a sync function by pausing execution of the current function/thread whenever you're waiting for some async op.

Libraries like Tokio (mentioned in the article) have support for this built-in. Goroutines sidestep the issue completely. C# Tasks are batteries included in that regard. In fact function colors aren't an issue in most languages that have async/await. JavaScript is the odd one out, mostly due to being single-threaded. Can't really be made to work in a clean way in existing JS engines.

littlestymaar 3 days ago|||

“Function coloring” is an imaginary issue in the first place. Or rather it's a real phenomenon, but absolutely not limited to async and people don't seem to care about it at all except when talking about async.

Take Rust: you return `Result<T,E>`, you are coloring your function the same way as you are when using `async`. Same for Option. Errors as return values in Go: again, function coloring.

One of your nested function starts taking a "serverUrl" input parameter instead of reading an environment variable: you've colored your function and you now need to color the entire call stack (taking the url parameter themselves).

All of them are exactly as annoying, as you need to rewrite the entire call stack's function signature to accommodate for the change, but somehow people obsess about async in particular as if it was something special.

It's not special, it's just the reflection that something can either be explicit and require changing many function signatures at once when making a change, or be implicit (with threads, exceptions or global variables) which is less work, but less explicit in the code, and often more brittle.

jerf 8 hours ago|||

Function coloring does not mean that functions take parameters and have return values. Result<T,E> is not a color. You can call a function that returns a Result from any other function. Errors as return values do not color a function, they're just return values.

Async functions are colored because they force a change in the rest of the call stack, not just the caller. If you have a function nested ten levels deep and it calls a function that returns a Result, and you change that function to no longer return a result because it lost all its error cases, you only have to change the direct callers. If you are ten layers deep in a stack of synchronous functions and suddenly need to make an asynchronous call, the type signature of every individual function in the stack has to change.

You might say "well, if I'm ten layers deep in stack of functions that don't return errors and have to make a call that returns the error, well now I have to change the entire stack of functions to return the error", but that's not true. The type change from sync to async is forced. The error is not. You could just discard it. You could handle it somehow in one of the intervening calls and terminate the propagation of the type signature changes half way up. The caller might log the error and then fail to propogate it upwards for any number of reasons. You aren't being forced to this change by the type system. You may be forced to change by the rest of the software engineering situation, but that's not a "color".

For similar reasons, the article is incorrect about Go's "context.Context" being a coloration. It's just a function parameter like anything else. If you're ten layers deep into non-Context-using code and you need to call a function that takes a context, you can just pass it one with context.Background() that does nothing context-relevant. You may, for other software engineering reasons, choose to poke that use of a context up the stack to the rest of the functions. It's probably a good idea. But you're not being forced to by the type system.

"Coloration" is when you have a change to a function that doesn't just change the way it interacts with the functions that directly call it. It's when the changes forcibly propagate up the entire call stack. Not just when it may be a good idea for other reasons but when the language forces the changes.

It is not, in the maximally general sense, limited to async. It's just that sync/async is the only such color that most languages in common use expose.

tcfhgj 6 hours ago|||

> If you are ten layers deep in a stack of synchronous functions and suddenly need to make an asynchronous call, the type signature of every individual function in the stack has to change.

well, this isn't really true - at least for Rust:

runtime.block_on(async{});

https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.htm...

grogers 5 hours ago||||

If you are ten nested functions deep in sync code and want to call an async function you could always choose to block the thread to do it, which stops the async color from propagating up the stack. That's kind of a terrible way to do it, but it's sort of the analog of ignoring errors when that innermost function becomes fallible.

So I don't buy that async colors are fundamentally different.

ndriscoll 8 hours ago||||

You can exit an async/IO monad just like you can exit an error monad: you have a thread blocking run(task) that actually executes everything until the future resolves. Some runtimes have separate blocking threadpools so you don't stall other tasks.

jerf 7 hours ago||

If you have something in a specific language that does not result in having to change the entire call stack to match something about it, then you do not have a color. Sync/async isn't a "color" in all languages. After all, it isn't in thread-based languages or programs anyhow.

Threading methodology is unrelated though. How exactly the call stack is scheduled is orthogonal to the question of whether or not making a call to a particular function results in type changes being forced on all function in the entire stack.

There may also be cases where you can take "async" code and run it entirely out of the context of any sort of sceduler, where it can simply be turned into the obvious sync code. While that does decolor the resulting call (or, if you prefer, recolor it back into the "sync" color) it doesn't mean that async is not generally a color in code where that is not an option. Solving concurrency by simply turning it off certainly has a time and place (e.g., a shell script may be perfectly happen to run "async" code completely synchronously because it may be able to guarantee nothing will ever happen concurrently), but that doesn't make the coloration problem go away when that is not an option.

pocksuppet 7 hours ago|||

You are stuck in a fixed pattern of thinking where async==color. Here's the meme origin: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

Here's the list of requirements: 1. Every function has a color. 2. The way you call a function depends on its color. 3. You can only call a red function from within another red function. 4. Red functions are more painful to call. 5. Some core library functions are red.

You are complaining about point 3. You are saying if there's any way to call a red function from a blue function then it's not real. The type change from sync to async is not forced any more than changing T to Result<T,E>. You just get a Promise from the async function. So you logically think that async is not a color. You think even a Haskell IO-value can be used in a pure function if you don't actually do the IO or if you use unsafePerformIO. This is nonsense. Anything that makes the function hard to use can be color.

Yokohiii 2 hours ago||||

Returning errors isn't function coloring, it's fundamental language design choice by go.

tayo42 8 hours ago|||

You can still use a function that returns result in a function that uses option.

And result and option usually mean something else. Option is a value or none. None doesn't necessarily means the function failed. Result is the value or an error message. You can have result<option, error>

That's different then async where you can call the other type.

gf000 4 hours ago||||

Function coloring is an effect. If the language makes a distinction between sync and async, then it has that effect. Just because there are escape hatches to get around one effect doesn't really change this fact.

Like in Haskell there is the IO monad used to denote the IO effect. And there are unsafe ways to actually execute it - does that make everything in Haskell impure?

tardedmeme 3 days ago|||

[dead]

littlestymaar 3 days ago||

I wish the “Function coloring” meme died. It made sense in the context of the original blog post (which was about callback hell, hence the “4. Red functions are more painful to call” section un the original blog post), but doesn't make sense in the context of async/await. There's literally nothing special with async, it's just an effect among many others.

As soon as you start using function arguments instead of using a global variable, you are coloring your function in the exact same way. Yet I don't think anyone would make the case that we should stop using function arguments and use global variables instead…

skybrian 8 hours ago|||

I think the lesson is to be careful about introducing incompatibility via the type system. When you introduce distinctions, you reduce compatibility. Often that’s deliberate (two functions shouldn’t be interchangeable because it introduces a bug) but the result is lots of incompatible code, and, often, duplicate code.

Effects are another way of making functions incompatible, for better or worse. It can be done badly. Java fell into that trap with checked exceptions. They meant well, but it resulted in fragmentation.

Sometimes it’s worth making an effort to make functions more compatible by standardizing types. By convention, all functions in Go that return an error use the same type. It gives you less information about what errors can actually happen, but that means the implementation of a function can be modified to return a new error without breaking callers.

Another example is standardizing on a string type. There are multiple ways strings can be implemented, but standardization is more important.

ndriscoll 7 hours ago|||

You can also use type inference with union types like ZIO. So you could e.g. return a Result where the error type is `DatabaseError | InvalidBirthdayError`. If you're in an error monad anyway, and you add a new error type deep in the call stack, it can just infer itself into the union up the stack to wherever you want to handle it.

skybrian 7 hours ago||

That will help locally, but for a published API or a callback function where you don't know the callers, it's still going to break people if you change a union type. It doesn't matter if it's inferred or not.

ndriscoll 6 hours ago||

IIRC ZIO solution is actually to return a generic E :> X|Y. Caller providing the callback knows what else is on E, and they're the only one that knows it so only they could've handled it anyway. You still get type inference.

Or if you mean that returning a new error breaks API compatibility, then yes that's the point. If now you can error in a different way, your users now need to handle that. But if it's all generic and inferred, it can still just bubble up to wherever they want to do that with no changes to middle layers.

skybrian 5 hours ago||

If you declare specific error types and callers only write handlers for specific cases, then adding a new error breaks them. If you just declare a base error type in your API, they have to write a generic error handler or it doesn't type check.

In this way, declaring a type guides people to write calling code that doesn't break, provided you set it up that way. It makes things easier for the implementation to change.

Sometimes you do need handlers for specific errors, but in Go you always need to write generic error handling, too.

(A type variable can do something similar. It forces the implementation to be generic because the type isn't known, or is only partially known.)

gf000 4 hours ago|||

I mean, the very point of a type system is to introduce distinctions and reduce compatibility (compatibility of incorrectly typed programs).

Throwing the baby out with the water like what go sort of does with its error handling is no solution. The proper solution is a better type system (e.g. a result type with a generic handles what go can't).

For effects though, we need a type systems that support these - but it's only available in research languages so far. You can actually just be generic in effects (e.g. an fmap function applying a lambda to a list could just "copy" the effect of the lambda to the whole function - this can be properly written down and enforced by the compiler)

Yokohiii 2 hours ago||||

Using globals or arguments is a free choice independent of the context. If I call async code I don't have a choice.

eikenberry 8 hours ago||||

Async/await will be equivalent to parameters when they are first class and can be passed in as parameters. Language syntax and semantics are not equivalent and colored functions are colored by the syntax. Zig avoided colored functions by doing something very much like this.

pocksuppet 7 hours ago||||

async/await is just syntax-sugar callback hell

tardedmeme 3 days ago|||

[dead]

More comments...