Posted by kelindar 5 days ago
Consumers subscribing to the same event type are placed in a group. There is a single lock for the whole group. When publishing, the lock is taken once and the event is replicated to each consumer's queue. Consumers take the lock and swap their entire queue buffer, which lets them consume up to 128 events per lock/unlock.
Since channels each have a lock and only take 1 element at a time, they would require a lot more locking and unlocking.
There is also some frequent polling to maintain group metadata, so this could be less ideal in low volume workloads where you want CPU to go to 0%.
For low traffic messages you need only send one message at a time, but if the receiver slows down the sender can avoid resorting to back pressure until the buffer is more than half full.
Use 1 ring buffer, and N read pointers. The writer determines how far it can write by taking the minimum of the read pointers. ... I think that's all you'd need? The writer will block if a reader is too slow, but any broadcast that preserves messages and has bounded memory will have that problem. Maybe the difficulties will be in adding and removing readers.
To a first approximation, you can imagine any decently optimized concurrency primitives as being extremely highly optimized, which means on the flip side that no additional capability, like "multi-to-multi thread communication", ever comes for free versus something that doesn't offer that capability. The key to high-performance concurrency is to use as little "concurrency power" as possible.
That's not a Go-specific thing, it's a general rule.
Channels are in some sense more like the way dynamic scripting languages prioritize ease-of-use and flexibility over performance-at-all-costs. They're a very powerful primitive, and convenient in their flexibility, but also a pretty big stick to hit a problem with. Like dynamic scripting languages being suitable for many tasks despite not being the fastest things, in a lot of code they're not the performance problem, but if you are doing a ton of channel operations, and for some reason you can't do the easy thing of just sending more work at a time through them, you may need to figure out how to use simpler pieces to do what you want. A common example is, if you've just got a counter of some kind, don't send a message through a channel to another goroutine to increment it; use the atomic increment operation in the sync/atomic package.
(If you need absolute performance, you probably don't want to use Go. The runtime locks you away from the very lowest level things like memory barriers; it uses them to implement its relatively simple memory model but you can't use them directly yourself. However, it is important to be sure that you do need such things before reaching for them.)
Multiple writers can send on a channel, but only one reader will receive a given message from the channel. This makes it unsuitable for the broadcast usecase; the phrasing here makes Go channels sound more general purpose than they are in practice.
The easiest one to see is the distributed versus non-distributed versions of locks as they are so dramatically different and so obviously different in expense, but the principle extends all the way down to even different sorts of memory barriers with different guarantees having different costs.
When each of these are optimized to within an inch of their life, as they typically are, including all the way down to the hardware level, stepping up to a higher guarantee level is never free.
Edit: https://news.ycombinator.com/item?id=44416345 seems to have done a much more detailed analysis of the code. There's likely more to this.
There’s a lot of spinoff libraries out there that have provoked a reaction from the core team that cuts down cost of their implementation by 25, 50%. And that’s a rising tide that lifts all boats.
I was building a small multiplayer game in Go. Started with a channel fan-out but (for no particular reason) wanted to see if we can do better. Put together this tiny event bus to test, and on my i7-13700K it delivers events in 10-40ns, roughly 4-10x faster than the plain channel loop, depending on the configuration.
I'd be interested to learn why/how and what the underlying structural differences are that make this possible.
A different design, without channels, could improve on those.
In most cases where you want to send data between concurrent goroutines, channels are a better primitive, as they allow the sender and receiver to safely and concurrently process data without needing explicit locks. (Internally, channels are protected with mutexes, but that's a single, battle-tested and likely bug-free implementation shared by all users of channels.)
The fact that channels also block on send/receive means and support buffering means that there's a lot more to them, but that's how you should think of them. The fact that channels look like a queue if you squint is a red herring that has caused many a junior developer to abuse them for that purpose, but they are a surprisingly poor fit for that. Even backpressure tends to be something you want to control manually (using intermediate buffers and so on), because channels can be fiendishly hard to debug once you chain more than a couple of them. Something forgets to close a close a channel, and your whole pipeline can stall. Channels are also slow, requiring mutex locking even in scenarios where data isn't in need of locking and could just be passed directly between functions.
Lots of libraries (such as Rill and go-stream) have sprung up that wrap channels to model data pipelines (especially with generics it's become easier to build generic operators like deduping, fan-out, buffering and so on), but I've found them to be a bad idea. Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.
I remember hearing (not sure where) that this is a lesson that was learned early on in Go. Channels were the new hotness, so let's use them to do things that were not possible before. But it turned out that Go was better for doing what was already possible before, but more cleanly.
I'm a bit out of practice with Go but I never thought that the channels were "slow", so getting 4-10x the speed is pretty impressive. I wonder if it shares any design with LMAX Disruptor...
I've recently switched from using Disruptor.NET to Channel<T> in many of my .NET implementations that require inter-thread sync primitives. Disruptor can be faster, but I really like the semantics of the built-in types.
https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...
https://learn.microsoft.com/en-us/dotnet/api/system.threadin...
I personally will use traditional Java BlockingQueue for about 95% of stuff, since they're built in and more than fast enough for nearly everything, but Disruptor kicks its ass when dealing with high-throughput stuff.
https://medium.com/@ocoanet/improving-net-disruptor-performa...
Wow - that’s a pretty impressive accomplishment. I’ve been meaning to move some workers I have to a pub/sub on https://www.typequicker.com.
I might try using this in prod. I don’t really need the insane performance benefits as I don’t have my traffic lol - but I always like experimenting with new open source libraries - especially while the site isn’t very large yet
> btw do some tech twitter promos.
Yes - I have plans to do. Still relatively new to the marketing, SEO, etc world. Recently quit my job to build products (TypeQuicker is the first in line) and up until now I've only ever done software dev work.
How would you suggest to do twitter promos - just post consistently about the app features and such?
https://github.com/picosh/pubsub
With this impl can you stream data or is it just for individual events?