Posted by gw2 4/2/2025
Async/promise/deferred code is just re-implementing separate control callback chains that are not that different than threads when talking about IO. You'll still need mutexes, semaphores and such.
That's why there are things like async-mutex, and it's not just a Javascript problem, Python's Twisted also has these: https://docs.twistedmatrix.com/en/stable/api/twisted.interne... DeferredLock and DeferredSemaphore.
> The best design is the one where complexity is kept minimal, and where locality is kept maximum. That is where you get to write code that is easy to understand without having these bottomless holes of mindbogglingly complex CPU-dependent memory barrier behaviors. These designs are the easiest to deploy and write. You just make your load balancer cut the problem in isolated sections and spawn as many threads or processes of your entire single threaded program as needed
Wholeheartedly agree. That's exactly how Elixir and Erlang processes work, they are small, lightweight and have isolated heaps.
If the author wrote that sincerely, they should promptly drop Node.js on the floor and start learning Elixir.
I don't think so. V8 is a marvel of engineering. JS is the problem (quirky-to-downright-ugly language that is also extremely ubiquitous), not Node.
> But it was made with one very accurate observation: multithreading sucks even more.
Was it made with that observation? (in that case I would like a source to corroborate that) Or was it simply that when all you have is hammer (single thread execution) then everything looks like a nail (you going to fix problems within that single thread).
The evented/async/non-blocking style of programming that you will need when serving a lot of requests from one thread, was already existing before Node. It was just not that popular. When you choose to employ this style of programming, all your heavy IOing libraries need to be build for it, and they usually were not.
Since Node had no other options, all their IO libs were evented/async/non-blocking from the get go. I don't think this was a design choice, but more a design requirement/necessity.
The reality is that hardware-provided cache coherence is an extremely powerful paradigm. Building your application on top of message passing not only gives away some performance, but it means that if you have any sort of cross thread logical shared state that needs to be kept in sync, you have to implement cache coherence yourself, which is an extremely hard problem.
With my apologies to Greenspun, any sufficiently complicated distributed system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of MESI.
But of course, if you have a trivially parallel problem, rejoice! You do not need much communication and shared memory is not as useful. But not all, or even most, problems are trivially parallel.
In the 1990s it became "well known" that threading is virtually impossible for mere mortals. But this is a classic case of misdiagnosis. The problem wasn't threading. The problem was a lock-based threading model, where threading is achieved by identifying "critical sections" and trying to craft a system of locks that lets many threads run around the entire program's memory space and operate simultaneously.
This becomes exponentially complex and essentially infeasible fairly quickly. Even the programs of the time that "work" contain numerous bombs in their state space, they've just been ground out by effort.
But that's not the only way to write threaded code. You can go full immutable like Haskell. You can go full actor like Erlang, where absolutely every variable is tied to an actor. You can write lock-based code in a way that you never have to take multiple simultaneous locks (which is where the real murder begins) by using other techniques like actors to avoid that. There's a variety of other safe techniques.
I like to say that these take writing multithreaded code from exponential to polynomial, and a rather small polynomial at that. No, it isn't free, but it doesn't have to be insane, doesn't take a wizard, and is something that can be taught and learned with only reasonable level of difficulty.
Indeed, when done correctly, it can be easier to understand that Node-style concurrency, which in the limit can start getting crazy with the requisite scheduling you may need to do. Sending a message to another actor is not that difficult to wrap your head around.
So the author is arguably correct, if you approach concurrency like it's 1999, but concurrency has moved on since then. Done properly, with time-tested techniques and safe practices, I find threaded concurrency much easier to deal with than async code, and generally higher performance too.
The choice is between a few days of messing around with actors and it still doesn’t work and 20 minutes rewriting with Executors and done. The trick with threads is having a good set of primitives to work with and Java gives you that. In some areas of software the idea of composing a minimal set of operations really gets you somewhere, when it comes to threads it gets you to the painhouse,
I went through a phase of having a huge amount of fun writing little server/clients with async Python but switched to sync when the demands in CPU increased. The idea that “parallelism” and “concurrency” aren’t closely related is a bad idea like the alleged clean split between “authentication” and “authorization” —- Java is great because it gives you 100% adequate tools that handle parallelism and concurrency with the same paradigm.
[1] You could do error handling and teardown with monads but drunk on the alleged superiority of a new programming paradigm many people don’t —- so you meet the coders who travel from job to job like itinerant martial artists looking for functional programming enlightenment. TAOCP (Turing) stands the test of time whereas SICP (lambda calculus) is a fad.
But message passing is not a panacea. Sometimes shared mutable state is the solution that is simplest to implement and reason about. If you think about it what are database if not shared mutable state, and they have been widely successful. The key is of course proper concurrency control abstractions.
As someone fairly well versed in MESI and cache optimization: it really isn't. It's a minority of people that understand it (and really, that need to).
> Building your application on top of message passing not only gives away some performance
This really isn't universally true either. If you're optimizing for throughput, pipelining with pinned threads + message passing is usually the way to go if the data model allows for it.
Instead of "just throw it on a thread and forget about it" - in a production environment, use the job queue. You gain isolation and observability - you can see the job parameters and know nothing else came across, except data from the DB etc.
I mean if you go this route then you may as well say zero-copy doesn't exist. Everytime you move things between registers things are copied. I guess OP also disables all their cores and runs their OS on a single core. It's more efficient after all. I take the other view. The more effective CPU time you can use the better for a ton of non UI use cases. So I would say it is more efficient to use 2x the CPU time to reduce wall clock time by 10% for example. In fact any CPU time that is unused is inefficient in a way. It's just sitting there unused, forever lost to time.
I'm sure I've been doing it wrong. I just had better luck optimizing the performance per core rather than trying to spread the load over multiple cores.
CPU bound applications MUST use multithreading to be able to utilize multiple cores. In many cases, the framework knows how to give an API to the developer which masks the need for him to deal with setting up a worker thread pool, such as with web applications frameworks - but eventually you need one.
Learn how to be an engineer, and use the right solution for the problem.