Posted by willm 9/2/2025
The truth is that in python, async was too little, too late. By the time it was introduced, most people who actually needed to do lots of io concurrently had their own workarounds (forking, etc) and people who didn't actually need it had found out how to get by without it (multiprocessing etc).
Meanwhile, go showed us what good green threads can look like. Then java did it too. Meanwhile, js had better async support the whole time. But all it did was show us that async code just plain sucks compared to green thread code that can just block, instead of having to do the async dances.
So, why engage with it when you already had good solutions?
I take so much flak for this opinion at work, but I agree with you 100%.
Code that looks synchronous, but is really async, has funny failure modes and idiosyncracies, and I generally see more bugs in the async parts of our code at work.
Maybe I’m just old, but I don’t think it’s worth it. Syntactic sugar over continuations/closures basically..
The comment you are responding to prefers green threads to be managed like goroutines, where the code looks synchronous, but really it's cooperative multitasking managed by the runtime, to explicit async/await.
But then you criticize "code that looks synchronous but is really async". So you prefer the explicit "async" keywords? What exactly is your preferred model here?
Goroutines feel like old-school, threaded code to me. I spawn a goroutine and interact with other “threads” through well defined IPC. I can’t tell if I’m spawning a green thread or a “real” system thread.
C#’s async/await is different IMO and I prefer the other model. I think the async-concept gets overused (at my workplace at least).
If you know Haskell, I would compare it to overuse of laziness, when strictness would likely use fewer resources and be much easier to reason about. I see many of the same problems/bugs with async/await..
Async in C# is awesome, and there's nothing stopping you from writing sync code where appropriate or using threads if you want proper multi threading. Async is primarily used to avoid blocking for non-cpu-bound work, like waiting for API/db/filesystem etc. If you use it everywhere then it's used everywhere, if you don't then it isn't. For a lot of apps it makes sense to use it a lot, like in web apis that do lots of db calls and such. This incurs some overhead but it has the benefit of avoiding blocked threads so that no threads sit idle waiting for I/O.
You can imagine in a web API receiving a large number of requests per second there's a lot of this waiting going on and if threads were idle waiting for responses you wouldn't be able to handle nearly as much throughout.
Problem is it that it self reinforces and before you look every little function is suddenly async.
The irony is that it is used where you want to write in a synchronous style...
An obvious advantage of doing it that way is you don’t need any runtime/OS-level support. Eg your runtime doesn’t need to even have a concept of threads. It works on bare metal embedded.
Another advantage is that it’s fully cooperative model. No magic preemption. You control the points where the switch can happen, there is no magic stuff suddenly running in background and messing up the state.
It is nothing like what you just described
Used to be well-hidden cooperative, these days it's preemptive.
But this appears to be describing languages with green threads, rather than languages that make async explicit.
You may think of use of an async keyword as explicit async code but that is very much not the case.
If you want to see async code without the keyword, most of the code of Linux is asynchronous.
Kernel-style async code, where everything is explicit:
* You write a poller that opens up queues and reads structs representing work
* Your functions are not tagged as "async" but they do not block
* When those functions finish, you explicitly put that struct in another queue based on the result
Async-await code, where the runtime is implicit:
* All async functions are marked and you await them if they might block
* A runtime of some sort handles queueing and runnability
Green threads, where all asynchrony is implicit:
* Functions are functions and can block
* A runtime wraps everything that can block to switch to other local work before yielding back to the kernel
which are no different from app POV from kernel threads, or any threads for that matter.
the whole async stuff came up because context switch per event is way more expensive than just shoveling down a page of file descriptor state.
thus poll, kqueue, epoll, io_uring, whatever.
think of it as batch processing
Let me try to clarify my point of view:
I don’t mean that async/await is more or less explicit than goroutines. I mean regular threaded code is more explicit than async/await code, and I prefer that.
I see colleagues struggle to correctly analyze resource usage for instance. Someone tries to parallelize some code (perhaps naiively) by converting it to async/await and then run out of memory.
Again, I don’t mean to judge anyone. I just observe that the async/await-flavor has more bugs in the code bases I work on.
More explicit in what sense? I've written both regular threaded Python and async/await Python. Only the latter shows me precisely where the context switches occur.
Everything is in a run loop that does not exist in my codebase.
The context switching points are obvious but the execution environment is opaque.
At least that's how it looks to me.
Green threads are better (IMHO), because they actually do hide all the machinery. As a developer in a language with mature green threads (Erlang), I don't have to know about the machinery[1], I just write code that blocks from my perspective and BEAM makes magic happen. As I understand it, that's the model for Java's Project Loom aka Java Green Threads 2: 2 Green 2 Threads. The first release had some issues with the machinery, but I think I read the second release was much better, and I haven't seen much since... I'm not a Cafe Babe, so I don't follow Java that closely.
[1] It's always nice to know about the machinery, but I don't have to know about it, and I was able to get started pretty quick and figure out the machinery later.
If even this does not help, rm -rf is your friend.
I would say that green threads still have "function coloring stuff", we just decided that every function will be async-colored.
Now, what happens if you try to cross an FFI-border and try to call a function that knows nothing about your green-thread runtime is an entirely different story...
Thank you for explaining much more clearly than I could.
> none of the function coloring stuff
And it’s this part that I don’t like (and see colleagues struggling to implement correctly at work).
DESPITE THAT: even if you're doing everything "right" (TM) -- using a single thread and doing all your networking I/O sequentially is simply slow as hell. A very very good example of this is bottle.py. Lets say you host a static web server with bottle.py. Every single web request for files leads to sequential loading, which makes page load times absolutely laughable. This isn't the case for every python web frame work, but it seems to be a common theme to me. (Cause: single thread, event loop.)
With asyncio, the most consistent behavior I've had with it seems to be to avoid having multiple processes and then running event loops inside them. Even though this approach seems like its necessary (or at least threading) to avoid the massive down sides of the event loop. But yeah, you have to keep everything simple. In my own library I use a single event loop and don't do anything fancy. I've learned the hard way how asyncio punishes trying to improve it. It's a damn cool piece of software, just has some huge limitations for performance.
To be fair that also happens with other solutions.
A lot of the async problems in other languages is because they haven't bought up into the concept fully with some 3rd party code using it and some don't. JS went all-in with async.
[1]: Yes I know about service workers, but they are not threads in the sense that there is no shared memory*. It is good for some types of parallelization problems, but not others because of all the memory copying required.
[2]: Yes I know about SharedArrayBuffer and there is a bunch of proposals to add support for locks and all that fun stuff to them, which also brings all the complexity back.
https://nodejs.org/en/learn/asynchronous-work/overview-of-bl...
I will agree with what some is said a above, BEAM is pretty great. I have been using it recently through Elixir.
I'd add one other aspect that we sort of take for granted these days, but affordable multi-threaded CPUs have really taken off in the last 10 years.
Not only does the stack based on green-threads "just work" without coloring your codebase with async/no-async, it allows you to scale a single compute instance gracefully to 1 instance with N vCPUs vs N pods of 2-vCPU instances.
The memory and execution model for higher level work needs to not have async. Go is the canonical example of it done well from the user standpoint IMO.
Actually, I was and am primarily a Dart developer, not a JS developer. But function color is a problem in any language that uses that style of asynchrony: JS, Dart, etc.
However, gevent has to do its magic by monkeypatching. Wanting to avoid that, IIRC, was a significant reason why the async/await syntax and the underlying runtime implementation was developed for Python.
Another significant reason, of course, was wanting to make async functions look more like sync functions, instead of having to be written very differently from the ground up. Unfortunately, requiring the "async" keyword for any async function seriously detracted from that goal.
To me, async functions should have worked like generator functions: when generators were introduced into Python, you didn't have to write "gen def" or something like it instead of just "def" to declare one. If the function had the "yield" keyword in it, it was a generator. Similarly, if a function has the "await" keyword in it, it should just automatically be an async function, without having to use "async def" to declare it.
Similarly, a function that calls an async function wouldn't itself be async unless it also had the await keyword. But of course the usual way of calling an async function would be to await it. And calling it without awaiting it wouldn't return a value, just as with a generator; calling a generator function without yielding from it returns a generator object, and calling an async function without awaiting it would return a future object. You could then await the future later, or pass it to some other function that awaited it.
In JavaScript async doesn’t have a good way to nice your tasks, which is an important feature of green threads. Sindre Sorhus has a bunch of libraries that get close, but there’s still a hole.
What coroutines can do is optimize the instruction cache. But I’m not sure goroutines entirely accomplish that. There’s nothing preventing them from doing so but implementation details.
The problem is not python, it's a skill issue.
First of all forking is not a workaround, it's the way multiprocessing works at the low level in Unix systems.
Second of all, forking is multiprocessing, not multithreading.
Third of all, there's the standard threading library which just works well. There's no issue here, you don't need async.
What I did have issues with though, was async. For example pytest's async thingy is buggy for years with no fix in sight, so in one project I had to switch to manually making an event loop in that those tests.
But isn't the whole purpose of async, that it enabled concurrency, not parallelism, without the weight of a thread? I agree that in most cases it is not necessary to go there, but I can imagine systems with not so many resources, that benefit from such an approach when they do lots of io.
People act like it's dead but it still works perfectly well and, at least for me, makes async networking so much simpler.
To me, Go is really well designed when it comes to multithreading because it is built upon a mutual contract where it will break easily and at compile time when you mess up the contract between the scheduling thread and the sub threads.
But, for the love of Go, I have no idea who the person was that decided that the map data type has to be not threadsafe. Once you start scaling / rewriting your code to use multiple goroutines, it's like you're being thrown in the cold water without having learnt to swim before.
Mutexes are a real pain to use in Go, and they could have been avoided if the language just decided to make read/write access threadsafe for at least maps that are known to be accessed from different threads.
I get the performance aspect of that decision, but man, this is so painful because you always have to rewrite large parts of your data structures everywhere, and abstract the former maps away into a struct type that manages the mutexes, which in return feels so dirty and unclean as a provided solution.
For production systems I just use haxmap from the start, because I know its limitations (of hashes of keys due to atomics), because that is way easier to handle than forgetting about mutexes somewhere down the codebase when you are still before the optimization phase of development.
I'll be sold on this when a green thread native UI paradigm becomes popular but it seems like all the languages with good native UI stories have async support.
Promises/thenables gave people the time to get used to the idea of deferred evaluation via a familiar callback approach... Then when async/await came along, people didn't see it as a radically new feature but more as syntactic sugar to do what they were already doing in a more succinct way without callbacks.
People in the Node.js community were very aware of async concepts since the beginning and put a lot of effort in not blocking the event loop. So Promises and then async/await were seen as solutions to existing pain points which everyone was already familiar with. A lot of people refactored their existing code to async/await.
The main difference being that now both models are simultaneously supported instead of being an implementation detail of each JVM.
Doing async in python has the same fundamental design. You have an executer, a scheduler, and event-driven wakers on futures or promises. But you’re doing it in a fundamentally hand-cuffed environment.
You don’t get benefits like static compilation, real work-stealing, a large library ecosystem, or crazy performance boosts. Except in certain places in the stack.
Using fastapi with async is a game-changer. Writing a cli to download a bunch of stuff in parallel is great.
But if you want to use async to parse faster or make a parallel-friendly GUI, you are more than likely wasting your time using python. The benefits will be bottlenecked by other language design features. Still the GIL mostly.
I guess there is no reason you can’t make tokio in python with multiprocessing or subinterpreters, but to my knowledge that hasn’t been done.
Learning tokio was way more fun, too.
I am happy to hear stories of using pypy or something to radically improve an architecture. I don’t have any from personal experience.
I guess twisted and stackless, a long time ago.
Is this no longer true?
python is kind of a slow choice for that sort of thing regardless and i don't think the complexity of async is all that justified for most usecases.
i still maintain my position that a good computer system should let you write logic synchronously and the system will figure out how to do things concurrently with high performance. (although getting this right would be very hard!)
Generations of programmers have given up on downloading data async in their Python scripts and just gone to bash and added a & at the end of a curl call inside a loop.
Even then, nginx might be a netter solution.
Taking a general case, let's say a forum, in order to render a thread one needs to search for all posts from that thread, then get all the extra data needed for rendering and finally send the rendered output to the client.
In the "regular" way of doing this, one will compose a query, that will filter things out, join all the required data bla bla, send it to the database, wait for the answer from the database and all the data to be transferred over, loop over the results and do some rendering and send the thing over to the client.
It doesn't matter how async your app code is, in this way of doing things, the bottle neck is the database, as there is a fixed limit on how many things a db server can do at once and if doing one of these things takes a long time, you still end up waiting too much.
In order for async to work, one needs to split the work load into very small chunks that can be done in parallel and very fast, therefore, sending a big query and waiting for all the result data is out of the window.
An async approach would split the db query into a search query, that returns a list of object ids, say posts, then create N number of async tasks that given a post id will return a rendered result. These tasks will do their own query to retrieve the post data, then assemble another list of async tasks to get all the other data required and render each chunk and so on. Throw in a bunch of db replicas and you get the benefits of async.
This approach is not generally used, because, let's face it, we like making the systems we use do complicated things, eg complicated sql requests.
However, async tasks on a single core means potentially a lot of switching between those tasks. So async alone does not save the day here. It will have to be combined with true parallelism, to result in the speedup we want. Otherwise a single task rendering all the parts in sequence would be faster.
Also not, that it depends on where your db is. the process you describe implies at least 2 rounds of db communication. First one for the initial get forum thread query, then second one for all the async get forum replies requests. So if communication with the db takes a long time, you might as well lose what you gained, because you did 2 rounds of that communication.
So I guess it's not a trivial matter.
By now, the downsides are well-known, but I think Python's implementation did a few things that made it particularly unpleasant to use.
There is the usual "colored functions" problem. Python has that too, but on steroids: There are sync and async functions, but then some of the sync functions can only be called from an async function, because they expect an event loop to be present, while others must not be called from an async function because they block the thread or take a lot of CPU to run or just refuse to run if an event loop is detected. That makes at least four colors.
The API has the same complexity: In JS, there are 3 primitives that you interact with in code: Sync functions, async functions and promises. (Understanding the event loop is needed to reason about the program, but it's never visible in the code).
Whereas Python has: Generators, Coroutines, Awaitables, Futures, Tasks, Event Loops, AsyncIterators and probably a few more.
All that for not much benefit in everyday situations. One of the biggest advantages of async/await was "fearless concurrency": The guarantee that your variables can only change at well-defined await points, and can only change "atomically". However, python can't actually give the first guarantee, because threaded code may run in parallel to your async code. The second guarantee already comes for free in all Python code, thanks to the GIL - you don't need async for that.
Function colours can get pretty verbose when you want to write functional wrappers. You can end up writing nearly the exact same code twice because one needs to be async to handle an async function argument, even if the real functionality of the wrapper isn't async.
Coroutines vs futures vs tasks are odd. More than is pleasant, you have one but need the other for an API for no intuitive reason. Some waiting functions work on some types and not on others. But you can usually easily convert between them - so why make a distinction in the first place?
I think if you create a task but don't await it (which is plausible in a server type scenario), it's not guaranteed to run because of garbage collection or something. That's weird. Such behaviour should be obviously defined in the API.
Sorry for the possibly naive question. If I need to call a synchronous function from an async function, why can't I just call await on the async argument?
def foo(bar: str, baz: int):
# some synchronous work
pass
async def other(bar: Awaitable[str]):
foo(await bar, 0)
Maybe a useful approach for a language would be to make "colors" a first-class part of the type system and support them in generics, etc.
Or go a step further and add full-fledged time complexity tracking to the type system.
Rust has been trying to do that with "keyword generics": https://blog.rust-lang.org/inside-rust/2023/02/23/keyword-ge...
This is what languages with higher-kinded types do and it's glorious. In Scala you write your code in terms of a generic monad and then you can reuse it for sync or async.
I think that use case doesn't work well in async, because async effectively creates a tree of Promises that resolve in order. A task that doesn't get await-ed is effectively outside it's own tree of Promises because it may outlive the Promise it is a child of.
I think the solution would be something like Linux's zombie process reaping, and I can see how the devs prefer just not running those tasks to dealing with that mess.
If you just do
async def myAsyncFunction():
...
await someOtherAsyncFunction()
...
then the call to someOtherAsyncFunction will not spawn any kind of task or delegate to the event loop at all - it will just execute someOtherAsyncFunction() within the task and event loop iteration that myAsyncFunction() is already running in. This is a major difference from JS.If you just did
someOtherAsyncFunction()
without await, this would be a fire-and-forget call in JS, but in Python, it doesn't do anything. The statement creates a coroutine object for the someOtherAsyncFunction() call, but doesn't actually execute the call and instead just throws the object away again.I think this is what triggers the "coroutine is not awaited" warning: It's not complaining about fire-and-forget being bad style, it's warning that your code probably doesn't do what you think it does.
The same pitfall is running things concurrently. In JS, you'd do:
task1 = asyncFunc1();
task2 = asyncFunc2();
await task1;
await task2;
In Python, the functions will be run sequentially, in the await lines, not in the lines with the function calls.To actually run things in parallel, you have to to
loop.create_task(asyncFunc())
or one of the related methods. The method will schedule a new task and return a future that you can await on, but don't have to. But that "await" would work completely differently from the previous awaits internally.If you do `someOtherAsyncFunction()` without await and Python tried to execute similarly to a version with `await`, then the one without await would happen in the same task and event loop iteration but there's no guarantee that it's done by the time the outer function is. Thus the existing task/event loop iteration has to be kept alive or the non-await'ed task needs to be reaped to some other task/event loop iteration.
> loop.create_task(asyncFunc())
This sort of intuitively makes sense to me because you're creating a new "context" of sorts directly within the event loop. It's similar-ish to creating daemons as children of PID 1 rather than children of more-ephemeral random PIDs.
As far as I understood it, calling an async function without await (or create_task()) does not run the function at all - there is no uncertainty involved.
Async functions work sort of like generators in that the () operator just creates a temporary object to store the parameters. The 'await' or create_task() are the things that actually execute the function - the first immediately runs it in the same task as the containing function, the second creates a new task and puts that in the event queue for later execution.
So
asyncFunc()
without anything else is a no-op. It creates the object for parameter storage ("coroutine object") and then throws it away, but never actually calls (or schedules) asyncFunc.When queuing the function in a new task with create_task(), then you're right - there is no guarantee the function would finish, or even would have started before the outer function completed. But the new task won't have any relationship to the task of the outer function at all, except if the outer function explicitly chooses to wait for the other task, using the Future object that was returned by create_task.
If I remember correctly, the Python async API was still in experimental phase at that time.
I agree that that's annoying but tbh it sounds like any other piece of code to me that relies on global state. (Man, I can't wait for algebraic effects to become mainstream…)
I recognise that this situation is possible, but I don't think I've ever seen it happen. Can you give an example?
This is used by most of asyncio's synchronization primitives, e.g. async.Queue.
A consequence is that you cannot use asyncio Queues to pass messages or work items between async functions and worker threads. (And of course you can't use regular blocking queues either, because they would block).
The only solution is to build your own ad-hoc system using loop.call_soon_threadsafe() or use third-party libs like Janus[2].
[1] https://github.com/python/cpython/blob/e4e2390a64593b33d6556...
But I think generators are still sometimes mentioned in tutorials for this reason.
But, to sum it all up for those who want to talk here, there are several ways to look at concurrency but only one that matters. Is my program correct? How long will it take to make my program correct? Structured concurrency makes that clear(er) in the syntax of the language. Unstructured concurrency requires that you hold all the code in your head.
[1]: https://glyph.twistedmatrix.com/2014/02/unyielding.html
[2]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...
[3]: https://vorpus.org/blog/notes-on-structured-concurrency-or-g...
I already kinda had this idea while working with Rust. In Rust, Futures won’t execute unless `await`ed. In practice, that meant that all my futures were joined. It was just the only way I could wrap my head around doing anything useful with async.
async with mk_nursery() as nursery:
with os.fopen(...) as file:
nursery.start_soon(lambda: file.read())
The with block may have ended before the task starts...One of the most memorable "real software engineering" bugs of my career involved async Python. I was maintaining a FastAPI server which was consistently leaking file descriptors when making any outgoing HTTP requests due to failing to close the socket. This manifested in a few ways: once the server ran out of available file descriptors, it degraded to a bizarre world where it would accept new HTTP requests but then refuse to transmit any information, which was also exciting due to increasing the difficulty of remotely debugging this. Occasionally the server would run out of memory before running out of file descriptors on the OS, which was a fun red herring that resulted in at least one premature "I fixed the problem!" RAM bump.
The exact culprit was never found - I spent a full week debugging it, and concluded that the problem had to do with someone on the library/framework/system stack of FastAPI/aiohttp/asyncio having expectations about someone else in the stack closing the socket after picking up the async context, but that never actually occurring. It was impenetrable to me due to the constant context switching between the libraries and frameworks, such that I could not keep the thread of who (above my application layer) should have been closing it.
My solution was to monkey patch the native python socket class and add a FastAPI middleware layer so that anytime an outgoing socket opened, I'd add it to a map of sockets by incoming request ID. Then when the incoming request concluded I'd lookup sockets in the map and close them manually.
It worked, the servers were stable, and the only follow-up request was to please delete the annoying "Socket with file descriptor <x> manually closed" message from the logs, because they were cluttering things up. And thus, another brick in the wall of my opinion that I do not prefer Python for reliable, high-performance HTTP servers.
This point doesn't get enough coverage. When I saw async coming into Python and C# (the two ecosystems I was watching most closely at the time) I found it depressing just how much work was going into it that could have been productively expended elsewhere if they'd have gone with blocking calls to green threads instead.
To add insult to injury, when implementing async it seems inevitable that what's created is a bizarro-world API that mostly-mirrors-but-often-not-quite the synchronous API. The differences usually don't matter, until they do.
So not only does the project pay the cost of maintaining two APIs, the users keep paying the cost of dealing with subtle differences between them that'll probably never go away.
> I do not prefer Python for reliable, high-performance HTTP servers
I don't use it much anymore, but Twisted Matrix was (is?) great at this. Felt like a superpower to, in the oughties, easily saturate a network interface with useful work in Python.
You must be an experienced developer to write maintenable code with Twisted, otherwise, when the codebase increase a little, it will quickly become a bunch of spaghetti code.
Eventually I wrote an "image sorter" that I found was hanging up when the browser was trying to download images in parallel, the image serving should not have been CPU bound, I was even using sendfile(), but I think other requests would hold up the CPU and would be block the tiny amount of CPU needed to set up that sendfile.
So I switched from aiohttp to the flask API and serve with either Flask or Gunicorn, I even front it with Microsoft IIS or nginx to handle the images so Python doesn't have to. It is a minor hassle because I develop on Windows so I have to run Gunicorn inside WSL2 but it works great and I don't have to think about server performance anymore.
My take on gunicorn is that it doesn't need any tuning or care to handle anything up to the large workgroup size other than maybe "buy some more RAM" -- and now if I want to do some inference in the server or use pandas to generate a report I can do it.
If I had to go bigger I probably wouldn't be using Python in the server and would have to face up to either dual language or doing the ML work in a different way. I'm a little intimidated about being on the public web in 2025 though with all the bad webcrawlers. Young 'uns just never learned everything that webcrawler authors knew in 1999. In 2010 there were just two bad Chinese webcrawlers that never sent a lick of traffic to anglophone sites, but now there are new bad webcrawlers every day it seems.
Async is for juggling lots of little initialisations, completions, and coordinating work.
Many apps are best single threaded with a thread pool to run (single threaded) long running tasks.
1) Use the network thread pool to also run application code. Then your entire program has to be super careful to not block or do CPU intensive work. This is efficient but leads to difficult to maintain programs.
2) The network thread pool passes work back and forth between an application executor. That way, the network thread pool is never starved by the application, since it is essentially two different work queues. This works great, but now every request performs multiple thread hops, which increases latency.
There has been a lot of interest lately to combine scheduling and work stealing algorithms to create a best of both worlds executor.
You could imagine, theoretically, an executor that auto-scales, and maintains different work queues and tries to avoid thread hops when possible. But ensures there are always threads available for the network.
Writing a FastAPI websocket that reads from a redis pubsub is a documentation-less flailfest
I realized, years later, that the (non-)documentation was directed at people who were already familiar with the feature from Javascript. But I hadn't been familiar with it from Javascript and I didn't even know that Javascript had had such a feature.
So that's my tiny contribution to this discussion, one data point: Python's async might have been one unit more popular if it had had any documentation, or even a crossreference to the Javascript documentation.
the documentation is directed at people who want coroutines and futures, and know what that means. if you don't know what coroutines and futures are, the python docs aren't going to help you. the documentation isn't going to guide anybody into using the async features who aren't already seeking them out. and maybe that's intentional, but it's not going to grow adoption of the async features.
And while Python implements async directly in the VM, its semantics is such that it can be treated as syntactic sugar for callbacks there also.
> Because parallelism in Python using threads has always been so limited, the APIs in the standard library are quite rudimentary. I think there is an opportunity to have a task-parallelism API in the standard library once free-threading is stabilized.
> I think in 3.14 the sub-interpreter executor and free-threading features make more parallel and concurrency use cases practical and useful. For those, we don’t need async APIs and it alleviates much of the issues I highlighted in this post.
Armin recently put up a post that goes into those issue in more depth: https://lucumr.pocoo.org/2025/7/26/virtual-threads/
Which lead me to a pre-PEP discussion regarding the possibility of Virtual Threads in Python, which was probably way more than I needed to know but found interesting: https://discuss.python.org/t/add-virtual-threads-to-python/9...
Ultimately Python already has function coloring, and libraries are forced into that. This proposal seems poorly thought out, and also too little too late.
> and also too little too late.
I think it very likely that Python will still be around and popular 10 years from now. Probably 20 years from now. And maybe 30 years from now. I think that's plenty of time for a new and good idea that addresses significant pain points to take root and become a predominant paradigm in the ecosystem.
So I don't agree that it's too little too late. But whether or not a Virtual Threads implementation can/will be developed and be good enough to gain wide adoption, I just can't speak to. If it's possible to create a better devx than async and get multi-core performance and usage, I'm all for the effort.
If you choose a non-preemptive system, you naturally need yield points for cooperation. Those can either be explicit (await) or implicit (e.g. every function call). But you can get away with a minimal runtime and a stackless design.
Meanwhile, in a preemptive system you need a runtime that can interrupt other units of work. And it pushes you towards a stackful design.
All those decisions are downstream of the preemptive vs. cooperative.
In either case, you always need to be able to interface with CPU-heavy work. Either through preemption, or by isolating the CPU-heavy work.
The same goes for C++, which now has co_await.
Once you have this in place, you can notice that you can "submit the task to the same thread", and just switch between tasks at every `await` point; you get coroutines. This is how generators work: `yield` is the `await` point.
If all the task is doing is waiting for I/O, and your runtime is smart enough to yield to another coroutine while the I/O is underway, you can do something useful, or at least issue another I/O task, not waiting for the first one to complete. This allows typical server code that does a lot of different I/O requests to run faster.
Older things like `gevent` just automatically added yield / await points at certain I/O calls, with an event loop running implicitly.
If you have 1 async thread, 4 very slow clients don't impact your server in the slightest.
Speed might be similar but resource usage is not the same at all.
IME writing an asyncio Python application is a bit like fixing a broken Linux boot. You frantically Google things, the documentation doesn't mention it, and eventually you find a rant on a forgotten Finnish embedded electronics forum where someone has the same problem as you, and is kindly sharing a solution. After 30 mins of C&P of random commands from a stranger on the web, it works, for no reason you can decipher. Thank goodness for the Finns and Google Translate.
1) its infectious. You need to wrap everything in async or nothing.
2) it has non-obvious program flow.
Even though it is faster in a lot of cases (I had a benchmark off for a web/socket server for multi-threaded vs async with a colleague, and the async was faster.) for me it is a shit to force into a class.
The thing I like about threads is that the flow of data is there and laid out neatly _per thread_, where as to me, async feels like surprise goto. async feels like it accepts a request, and then will at some point at the future either trigger more async, or crap out mixing loads of state from different requests all over the place.
To me it feels like a knotted wool bundle, where as threaded/multi-process feels like a freshly wound bobbin.
Now, this is all viiiiiibes man, so its subjective.
In general, the architectures developed because of the GIL, like Celery and gunicorn and stuff like that, handles most of the problems we run into that async/await solves with slightly better horizontal scaling IMO. The problem with a lot of async code is that it tends not to think beyond the single machine that's running it, and by the time you do, you need to rearchitect things to scale better horizontally anyway.
For most Python applications, especially with web development, just start with something like Celery and you're probably fine.