Posted by gw2 1 day ago
The reality is that hardware-provided cache coherence is an extremely powerful paradigm. Building your application on top of message passing not only gives away some performance, but it means that if you have any sort of cross thread logical shared state that needs to be kept in sync, you have to implement cache coherence yourself, which is an extremely hard problem.
With my apologies to Greenspun, any sufficiently complicated distributed system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of MESI.
But of course, if you have a trivially parallel problem, rejoice! You do not need much communication and shared memory is not as useful. But not all, or even most, problems are trivially parallel.
In the 1990s it became "well known" that threading is virtually impossible for mere mortals. But this is a classic case of misdiagnosis. The problem wasn't threading. The problem was a lock-based threading model, where threading is achieved by identifying "critical sections" and trying to craft a system of locks that lets many threads run around the entire program's memory space and operate simultaneously.
This becomes exponentially complex and essentially infeasible fairly quickly. Even the programs of the time that "work" contain numerous bombs in their state space, they've just been ground out by effort.
But that's not the only way to write threaded code. You can go full immutable like Haskell. You can go full actor like Erlang, where absolutely every variable is tied to an actor. You can write lock-based code in a way that you never have to take multiple simultaneous locks (which is where the real murder begins) by using other techniques like actors to avoid that. There's a variety of other safe techniques.
I like to say that these take writing multithreaded code from exponential to polynomial, and a rather small polynomial at that. No, it isn't free, but it doesn't have to be insane, doesn't take a wizard, and is something that can be taught and learned with only reasonable level of difficulty.
Indeed, when done correctly, it can be easier to understand that Node-style concurrency, which in the limit can start getting crazy with the requisite scheduling you may need to do. Sending a message to another actor is not that difficult to wrap your head around.
So the author is arguably correct, if you approach concurrency like it's 1999, but concurrency has moved on since then. Done properly, with time-tested techniques and safe practices, I find threaded concurrency much easier to deal with than async code, and generally higher performance too.
As someone fairly well versed in MESI and cache optimization: it really isn't. It's a minority of people that understand it (and really, that need to).
> Building your application on top of message passing not only gives away some performance
This really isn't universally true either. If you're optimizing for throughput, pipelining with pinned threads + message passing is usually the way to go if the data model allows for it.
Instead of "just throw it on a thread and forget about it" - in a production environment, use the job queue. You gain isolation and observability - you can see the job parameters and know nothing else came across, except data from the DB etc.
I'm sure I've been doing it wrong. I just had better luck optimizing the performance per core rather than trying to spread the load over multiple cores.
CPU bound applications MUST use multithreading to be able to utilize multiple cores. In many cases, the framework knows how to give an API to the developer which masks the need for him to deal with setting up a worker thread pool, such as with web applications frameworks - but eventually you need one.
Learn how to be an engineer, and use the right solution for the problem.
Avoiding multi-threading doesn't remove concurrency issues. It just moves them to a different point in the application execution. A point where you don't have debuggers and need to create overly fault-tolerant behavior for everything. This is bad for performance but worse for debugging. With a regular synchronous threaded application I have a clean, obvious stack trace to a failure in most cases. Asynchronous or process based code gives me almost nothing for production failures or even regular debugging.