Multi-threading is always the wrong design (2023)

Posted by gw2 4/2/2025

Multi-threading is always the wrong design (2023)(unetworkingab.medium.com)

33 points | 52 commentspage 2

rdtsc 4/3/2025|

> Again, say what you want about Node.js, but it does have this thing right.

Async/promise/deferred code is just re-implementing separate control callback chains that are not that different than threads when talking about IO. You'll still need mutexes, semaphores and such.

That's why there are things like async-mutex, and it's not just a Javascript problem, Python's Twisted also has these: https://docs.twistedmatrix.com/en/stable/api/twisted.interne... DeferredLock and DeferredSemaphore.

> The best design is the one where complexity is kept minimal, and where locality is kept maximum. That is where you get to write code that is easy to understand without having these bottomless holes of mindbogglingly complex CPU-dependent memory barrier behaviors. These designs are the easiest to deploy and write. You just make your load balancer cut the problem in isolated sections and spawn as many threads or processes of your entire single threaded program as needed

Wholeheartedly agree. That's exactly how Elixir and Erlang processes work, they are small, lightweight and have isolated heaps.

If the author wrote that sincerely, they should promptly drop Node.js on the floor and start learning Elixir.

cies 4/2/2025||

> Say what you want about Node.js. It sucks, a lot.

I don't think so. V8 is a marvel of engineering. JS is the problem (quirky-to-downright-ugly language that is also extremely ubiquitous), not Node.

> But it was made with one very accurate observation: multithreading sucks even more.

Was it made with that observation? (in that case I would like a source to corroborate that) Or was it simply that when all you have is hammer (single thread execution) then everything looks like a nail (you going to fix problems within that single thread).

The evented/async/non-blocking style of programming that you will need when serving a lot of requests from one thread, was already existing before Node. It was just not that popular. When you choose to employ this style of programming, all your heavy IOing libraries need to be build for it, and they usually were not.

Since Node had no other options, all their IO libs were evented/async/non-blocking from the get go. I don't think this was a design choice, but more a design requirement/necessity.

gpderetta 4/2/2025||

It is 2025, multithread scalability is a well understood, if not easy, problem.

The reality is that hardware-provided cache coherence is an extremely powerful paradigm. Building your application on top of message passing not only gives away some performance, but it means that if you have any sort of cross thread logical shared state that needs to be kept in sync, you have to implement cache coherence yourself, which is an extremely hard problem.

With my apologies to Greenspun, any sufficiently complicated distributed system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of MESI.

But of course, if you have a trivially parallel problem, rejoice! You do not need much communication and shared memory is not as useful. But not all, or even most, problems are trivially parallel.

jerf 4/3/2025||

"It is 2025, multithread scalability is a well understood, if not easy, problem."

In the 1990s it became "well known" that threading is virtually impossible for mere mortals. But this is a classic case of misdiagnosis. The problem wasn't threading. The problem was a lock-based threading model, where threading is achieved by identifying "critical sections" and trying to craft a system of locks that lets many threads run around the entire program's memory space and operate simultaneously.

This becomes exponentially complex and essentially infeasible fairly quickly. Even the programs of the time that "work" contain numerous bombs in their state space, they've just been ground out by effort.

But that's not the only way to write threaded code. You can go full immutable like Haskell. You can go full actor like Erlang, where absolutely every variable is tied to an actor. You can write lock-based code in a way that you never have to take multiple simultaneous locks (which is where the real murder begins) by using other techniques like actors to avoid that. There's a variety of other safe techniques.

I like to say that these take writing multithreaded code from exponential to polynomial, and a rather small polynomial at that. No, it isn't free, but it doesn't have to be insane, doesn't take a wizard, and is something that can be taught and learned with only reasonable level of difficulty.

Indeed, when done correctly, it can be easier to understand that Node-style concurrency, which in the limit can start getting crazy with the requisite scheduling you may need to do. Sending a message to another actor is not that difficult to wrap your head around.

So the author is arguably correct, if you approach concurrency like it's 1999, but concurrency has moved on since then. Done properly, with time-tested techniques and safe practices, I find threaded concurrency much easier to deal with than async code, and generally higher performance too.

PaulHoule 4/5/2025|||

I see it the other way. I’ll admit that I do a lot of “embarrassingly parallel” problems where the answer is “Executor and chill” in Java. I have dealt with quite a few Scala systems that (1) didn’t get the same answer every time and (2) got a 250% speed up with 8 cores and such, and common problems where “error handling with monads theater”, “we are careful about initialization but could care less about teardown (monads again!)” [1] and actors.

The choice is between a few days of messing around with actors and it still doesn’t work and 20 minutes rewriting with Executors and done. The trick with threads is having a good set of primitives to work with and Java gives you that. In some areas of software the idea of composing a minimal set of operations really gets you somewhere, when it comes to threads it gets you to the painhouse,

I went through a phase of having a huge amount of fun writing little server/clients with async Python but switched to sync when the demands in CPU increased. The idea that “parallelism” and “concurrency” aren’t closely related is a bad idea like the alleged clean split between “authentication” and “authorization” —- Java is great because it gives you 100% adequate tools that handle parallelism and concurrency with the same paradigm.

[1] You could do error handling and teardown with monads but drunk on the alleged superiority of a new programming paradigm many people don’t —- so you meet the coders who travel from job to job like itinerant martial artists looking for functional programming enlightenment. TAOCP (Turing) stands the test of time whereas SICP (lambda calculus) is a fad.

gpderetta 4/4/2025|||

I'm a huge fan of the actor model and message passing, so you do not have to sell it to me; I also strongly dislike the current async fad.

But message passing is not a panacea. Sometimes shared mutable state is the solution that is simplest to implement and reason about. If you think about it what are database if not shared mutable state, and they have been widely successful. The key is of course proper concurrency control abstractions.

jerf 4/4/2025||

"There's a variety of other safe techniques."

packetlost 4/3/2025||

> multithread scalability is a well understood, if not easy, problem

As someone fairly well versed in MESI and cache optimization: it really isn't. It's a minority of people that understand it (and really, that need to).

> Building your application on top of message passing not only gives away some performance

This really isn't universally true either. If you're optimizing for throughput, pipelining with pinned threads + message passing is usually the way to go if the data model allows for it.

gpderetta 4/4/2025||

To be clear, I'm not claiming any universality. Quite the contrary, I'm saying that there is no silver bullet.

cadamsdotcom 4/3/2025||

For modern large-scale production systems I agree except in cases where performance is critical. To choose a wild example - a microservice serving geospatial queries over millions of objects that can exist anywhere in the world has plenty of parallelism at the level of each query, but handling multiple queries can be done by scaling horizontally with multiple instances of the service.

Instead of "just throw it on a thread and forget about it" - in a production environment, use the job queue. You gain isolation and observability - you can see the job parameters and know nothing else came across, except data from the DB etc.

nurettin 4/2/2025||

Younger me tried saturating CPU cores with event loops for a few years. Then I grew up and started using long living threads and queues to consolidate everything on the main thread and it works a whole lot better for me. Async/await shit show is gone, synchronization primitives you still somehow need for some reason with an event loop are mostly gone, all interfaces use the same synch stack, it is paradise. It almost feels like you have to play around with async to really appreciate the simple and solid thread safe queue approach.

piuantiderp 4/3/2025|

Or you know, you could just use .NET?

rowanG077 4/2/2025||

I don't think it's useful to think like this when writing software. Sure you must do more work when going multi threaded. But that doesn't mean you are slower in wall clock time. And wall clock time is the cool kid.

I mean if you go this route then you may as well say zero-copy doesn't exist. Everytime you move things between registers things are copied. I guess OP also disables all their cores and runs their OS on a single core. It's more efficient after all. I take the other view. The more effective CPU time you can use the better for a ton of non UI use cases. So I would say it is more efficient to use 2x the CPU time to reduce wall clock time by 10% for example. In fact any CPU time that is unused is inefficient in a way. It's just sitting there unused, forever lost to time.

tobyhinloopen 4/2/2025||

Every time I try to increase the performance of my software by using multiple cores, I need a lot of cores to compensate for the loss of per-core efficiency. Like, it might run 2-3 times as fast on 8 cores.

I'm sure I've been doing it wrong. I just had better luck optimizing the performance per core rather than trying to spread the load over multiple cores.

viraptor 4/2/2025||

Or your task needs the overhead to sync and read/write data. Only you can tell really with access to code/data, but 3x speed on 8x cores may well be the theoretical maximum you can do for this specific thing.

pjc50 4/2/2025||

Synchronization overhead is more than people think, and it can be difficult to tell when you're RAM/cache-bandwidth limited. But it makes a difference if you can make the "unit of work" large enough.

ronreiter 4/2/2025||

Come on. This is a "mongo web scale" type of article.

CPU bound applications MUST use multithreading to be able to utilize multiple cores. In many cases, the framework knows how to give an API to the developer which masks the need for him to deal with setting up a worker thread pool, such as with web applications frameworks - but eventually you need one.

Learn how to be an engineer, and use the right solution for the problem.

sys_64738 4/2/2025||

What is his definition of "multi-threading"? Did he specify it as that frames the entire discussion. I took a quick search but saw no mention of it but might have missed where he discusses it.

gpderetta 4/2/2025|

shared-memory multiprocessing.

AnonHP 4/2/2025|

Since many comments here are critical of this article, are there any better sources on this topic? If yes, please share here. Thank you.

More comments...