Event – Fast, In-Process Event Dispatcher

Posted by kelindar 5 days ago

Event – Fast, In-Process Event Dispatcher(github.com)

169 points | 37 comments

singron 5 days ago|

After a brief skim, it looks like this implementation is highly optimized for throughput and broadcasts whereas a channel has many other usecases.

Consumers subscribing to the same event type are placed in a group. There is a single lock for the whole group. When publishing, the lock is taken once and the event is replicated to each consumer's queue. Consumers take the lock and swap their entire queue buffer, which lets them consume up to 128 events per lock/unlock.

Since channels each have a lock and only take 1 element at a time, they would require a lot more locking and unlocking.

There is also some frequent polling to maintain group metadata, so this could be less ideal in low volume workloads where you want CPU to go to 0%.

hinkley 5 days ago||

Seems that the trick would be detecting if there is a queue building up and dispatching multiple events per lock if so. Double buffering is a common enough solution here. The reader gets one buffer to write to and the writer gets another, and when the read buffer is drained the write buffer is swapped.

For low traffic messages you need only send one message at a time, but if the receiver slows down the sender can avoid resorting to back pressure until the buffer is more than half full.

infogulch 4 days ago||

I wonder if a broadcast could be implemented as a specialization of a ring buffer:

Use 1 ring buffer, and N read pointers. The writer determines how far it can write by taking the minimum of the read pointers. ... I think that's all you'd need? The writer will block if a reader is too slow, but any broadcast that preserves messages and has bounded memory will have that problem. Maybe the difficulties will be in adding and removing readers.

minaguib 5 days ago||

OP: the readme could really benefit from a section describing the underlying methodology, and comparing it to other approaches (Go channels, LMAX, etc...)

oefrha 5 days ago||

It’s a fairly standard broadcaster based on sync.Cond.

nine_k 5 days ago||

Why are channels so much slower? I would expect a channel to operate basically like a ring buffer + a semaphore.

jerf 5 days ago|||

Channels allow for many-to-many communication.

To a first approximation, you can imagine any decently optimized concurrency primitives as being extremely highly optimized, which means on the flip side that no additional capability, like "multi-to-multi thread communication", ever comes for free versus something that doesn't offer that capability. The key to high-performance concurrency is to use as little "concurrency power" as possible.

That's not a Go-specific thing, it's a general rule.

Channels are in some sense more like the way dynamic scripting languages prioritize ease-of-use and flexibility over performance-at-all-costs. They're a very powerful primitive, and convenient in their flexibility, but also a pretty big stick to hit a problem with. Like dynamic scripting languages being suitable for many tasks despite not being the fastest things, in a lot of code they're not the performance problem, but if you are doing a ton of channel operations, and for some reason you can't do the easy thing of just sending more work at a time through them, you may need to figure out how to use simpler pieces to do what you want. A common example is, if you've just got a counter of some kind, don't send a message through a channel to another goroutine to increment it; use the atomic increment operation in the sync/atomic package.

(If you need absolute performance, you probably don't want to use Go. The runtime locks you away from the very lowest level things like memory barriers; it uses them to implement its relatively simple memory model but you can't use them directly yourself. However, it is important to be sure that you do need such things before reaching for them.)

infogulch 4 days ago|||

> Channels allow for many-to-many communication.

Multiple writers can send on a channel, but only one reader will receive a given message from the channel. This makes it unsuitable for the broadcast usecase; the phrasing here makes Go channels sound more general purpose than they are in practice.

wizhi 5 days ago|||

What's a concrete example of "concurrency power" here?

jerf 4 days ago||

You can make a list of guarantees that concurrency primitives provide. You have to get down into the details, ranging from "memory barriers that guarantee all previous operations have completed", which is really quite a weak guarantee, up through things like "no more than one thing can have this lock at a time", which is the sort of thing that a moderately experienced person might already have considered to be one of the weakest guarantees but there's quite a few "memory barriers" that are significantly weaker, up through things like the channel's guarantee of "if you pass the send operation you are guaranteed that some other goroutine has already picked this up", and finally, proceeding upwards to something like a distributed lock's "you are guaranteed to be the only thing holding this lock across the entire cluster", which is very expensive. And this is still only a very-coarse-grained summary of the sorts of concurrency primitives there are.

The easiest one to see is the distributed versus non-distributed versions of locks as they are so dramatically different and so obviously different in expense, but the principle extends all the way down to even different sorts of memory barriers with different guarantees having different costs.

When each of these are optimized to within an inch of their life, as they typically are, including all the way down to the hardware level, stepping up to a higher guarantee level is never free.

oefrha 5 days ago|||

I never benchmarked this, so just guessing from principles, take this with a grain of salt. Channel isn't a broadcast mechanism (except when you call close on the channel), so a naive channel-based broadcaster implementation like the one you find in bench/main.go here uses one channel for each subscriber; every event has to be sent on every subscriber channel. Condition variable on the other hand is a native broadcast mechanism. I imagine it's possible to leverage channel close as a broadcast mechanism to achieve similar performance.

Edit: https://news.ycombinator.com/item?id=44416345 seems to have done a much more detailed analysis of the code. There's likely more to this.

v3lmx 5 days ago||

you can indeed use channels to implement sync.Cond functionnality, I came across this article a while ago : https://blogtitle.github.io/go-advanced-concurrency-patterns... (scroll down to Condition)

karel-3d 5 days ago||

The actual code and the actual bench is very short.

hinkley 5 days ago||

It’s always worth discussing what features were thrown out to get the performance boost, whether it’s fair for those features to impose a tax on all users who don’t or rarely use those features, and whether there’s a way to rearrange the code so that the lesser used features are a low cost abstraction, one that you mostly only pay if you use those features and are cheap if not free if you don’t.

There’s a lot of spinoff libraries out there that have provoked a reaction from the core team that cuts down cost of their implementation by 25, 50%. And that’s a rising tide that lifts all boats.

kelindar 5 days ago||

This might be useful to some if you need a very light pub/sub inside one process.

I was building a small multiplayer game in Go. Started with a channel fan-out but (for no particular reason) wanted to see if we can do better. Put together this tiny event bus to test, and on my i7-13700K it delivers events in 10-40ns, roughly 4-10x faster than the plain channel loop, depending on the configuration.

zx2c4 5 days ago||

> about 4x to 10x faster than channels.

I'd be interested to learn why/how and what the underlying structural differences are that make this possible.

MathMonkeyMan 5 days ago|

I didn't look, but I don't think of channels as a pub/sub mechanism. You can have a producer close() a channel to notify consumers of a value available somewhere else, or you can loop through a bunch of buffered channels and do nonblocking sends.

A different design, without channels, could improve on those.

atombender 5 days ago||

I prefer to think of channels as a memory-sharing mechanism.

In most cases where you want to send data between concurrent goroutines, channels are a better primitive, as they allow the sender and receiver to safely and concurrently process data without needing explicit locks. (Internally, channels are protected with mutexes, but that's a single, battle-tested and likely bug-free implementation shared by all users of channels.)

The fact that channels also block on send/receive means and support buffering means that there's a lot more to them, but that's how you should think of them. The fact that channels look like a queue if you squint is a red herring that has caused many a junior developer to abuse them for that purpose, but they are a surprisingly poor fit for that. Even backpressure tends to be something you want to control manually (using intermediate buffers and so on), because channels can be fiendishly hard to debug once you chain more than a couple of them. Something forgets to close a close a channel, and your whole pipeline can stall. Channels are also slow, requiring mutex locking even in scenarios where data isn't in need of locking and could just be passed directly between functions.

Lots of libraries (such as Rill and go-stream) have sprung up that wrap channels to model data pipelines (especially with generics it's become easier to build generic operators like deduping, fan-out, buffering and so on), but I've found them to be a bad idea. Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.

MathMonkeyMan 5 days ago||

> Channels should remain a low-level primitive to build pipelines, but they're not what you should use as your main API surface.

I remember hearing (not sure where) that this is a lesson that was learned early on in Go. Channels were the new hotness, so let's use them to do things that were not possible before. But it turned out that Go was better for doing what was already possible before, but more cleanly.

tombert 5 days ago||

Interesting, I need to dig into the guts of this because this seems cool.

I'm a bit out of practice with Go but I never thought that the channels were "slow", so getting 4-10x the speed is pretty impressive. I wonder if it shares any design with LMAX Disruptor...

bob1029 5 days ago|

> I wonder if it shares any design with LMAX Disruptor...

I've recently switched from using Disruptor.NET to Channel<T> in many of my .NET implementations that require inter-thread sync primitives. Disruptor can be faster, but I really like the semantics of the built-in types.

https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...

https://learn.microsoft.com/en-us/dotnet/api/system.threadin...

tombert 5 days ago||

I've never used Disruptor.NET, only the Java version.

I personally will use traditional Java BlockingQueue for about 95% of stuff, since they're built in and more than fast enough for nearly everything, but Disruptor kicks its ass when dealing with high-throughput stuff.

bob1029 5 days ago||

The .NET version is compelling because it has a special variant called ValueDisruptor that can take a struct instead of a class. This gives it a big edge in certain use cases:

https://medium.com/@ocoanet/improving-net-disruptor-performa...

gethly 5 days ago||

I've recently wrote about something similar https://gethly.com/blog/lockless-golang

absolute_unit22 5 days ago||

> High Performance: Processes millions of events per second, about 4x to 10x faster than channels.

Wow - that’s a pretty impressive accomplishment. I’ve been meaning to move some workers I have to a pub/sub on https://www.typequicker.com.

I might try using this in prod. I don’t really need the insane performance benefits as I don’t have my traffic lol - but I always like experimenting with new open source libraries - especially while the site isn’t very large yet

luthMan 5 days ago||

Really cool site. One small issue that I found is that you cannot change the keyboard layout. Not sure if it should somehow automatically pick it up through some browser feature. But it would be nice to change it manually :)

MunishMummadi 5 days ago||

that's a cool site. you will see me more frequently btw do some tech twitter promos.

absolute_unit22 5 days ago||

Thank you - appreciate it! :)

> btw do some tech twitter promos.

Yes - I have plans to do. Still relatively new to the marketing, SEO, etc world. Recently quit my job to build products (TypeQuicker is the first in line) and up until now I've only ever done software dev work.

How would you suggest to do twitter promos - just post consistently about the app features and such?

qudat 5 days ago||

This is pretty neat, code looks minimal as well. At pico.sh we wrote our own pubsub impl in Go that leveraged channels. We primarily built it to use with https://pipe.pico.sh

https://github.com/picosh/pubsub

With this impl can you stream data or is it just for individual events?

aitchnyu 5 days ago|

Reminds me when Zeromq (scalable networked communications) promoted in-process queues for communicating between components.

More comments...