I gave up on self-hosted Sentry (2024)

Posted by roywashere 4/18/2025

I gave up on self-hosted Sentry (2024)(www.bugsink.com)

186 points | 150 comments

Weryj 4/18/2025|

We self-host sentry in Hetzner, but with a high-end server. 96c, 512gb. It ends up only costing around $300 a month, however with the scale of events that it processes, the managed version would be in the 10's of thousands.

The overhead at low volume is pretty high, but in the higher volumes (25M transactions/24h) it's a massive cost saving for us.

Edit:

There were just some initial headaches with needing to increase kafka partitions and add replications to the transaction processors, otherwise we didn't quite leverage the available compute and the backpressure would fill Redis up until OOM.

pebble 4/18/2025||

Same here with the community maintained Helm chart. Not the easiest thing but quite reasonable for almost two years now. This is for 50M transactions per month and we're seeing massive cost savings compared to SaaS at this volume as well.

For those interested in only errors, the self-hosted version recently introduced errors-only mode which should cut down on the containers.

vanschelven 4/18/2025|||

Yeah I fully get how that's a volume where going self-hosted Sentry makes perfect sense at the bottom line and including any upkeep you might have.

Bugsink's also quite scalable[0], but I wouldn't recommend it a 25M/day.

[0] https://www.bugsink.com/scalable-and-reliable/

selcuka 4/18/2025||

> Bugsink's also quite scalable[0], but I wouldn't recommend it a 25M/day.

Well, your homepage disagrees with this statement:

> Bugsink can deal with millions of events per day on dirt cheap hardware

kelnos 4/18/2025||

To me, "millions" usually means less than 10M. 25M falls into "tens of millions" to me.

But it's a very fuzzy way of quantifying something, and open to various interpretations.

selcuka 4/18/2025||

Fair point, but if it can cope with millions on "dirt cheap hardware" it logically follows that it can do 25M on more expensive hardware.

iJohnDoe 4/18/2025|||

> a high-end server. 96c, 512gb. It ends up only costing around $300 a month

Wow, that's really cheap. I'm seriously overpaying for my cloud provider and need to try Hetzner. I always assumed Hetzner was only European based.

mdaniel 4/19/2025||

Just be forewarned it doesn't seem to offer one iota of IAM, so whether or not one is "overpaying" for a cloud provider depends on what you're getting from them. If you mean "rent a machine," then likely. If you mean "have the machines heal themselves instead of Pagerduty waking me up" then reasonable people can differ about where that money is going

FWIW, https://lowendbox.com/ is good fun for the former set of things, too

lnenad 4/18/2025|||

> It ends up only costing around $300 a month, however with the scale of events that it processes, the managed version would be in the 10's of thousands.

I think this is a repeated question but... are you considering the cost of the people managing the deployment, security oversight, dealing with downtime etc?

bayindirh 4/18/2025|||

If you can keep the people doing all the things, they become cheaper over time. Because as your system settles and people become more competent, both downtime and effort required to mend these problems reduce dramatically, and you can give more responsibilities to the same people without overloading them.

Disclosure: I'm a sysadmin.

eptcyka 4/18/2025||

I wonder what is your managers take on this, given your incentives here.

bayindirh 4/18/2025||

Honestly asking, what my incentives are looking like from there?

eptcyka 4/18/2025||

You are incentivised to argue that it is good to keep employing sysadmins for self hosting, because that will keep you employed. You have a monetary incentive, thus you are a bit biased, in my opinion.

bayindirh 4/18/2025||

I think I didn't elaborate my point enough, so there's a misunderstanding.

What I said is true for places where they already have sysadmins for various tasks. For the job I do (it's easy to find), you have to employ system administrations to begin with.

So, at least for my job, working the way I described in my original comment is the modus operandi for the job itself.

If the company you're working in doesn't prefer self-hosting things, and doesn't need system administrators for anything, you might be true, but having a couple of capable sysadmins on board both enables self-hosting and allows this initiative to grow without much extra cost, because it gets cheaper as the sysadmins learn and understand what they're doing, so they can handle more things with the same/less effort.

See, system administrators are lazy people. They'd rather solve problems for once and for all and play PacMan in their spare time.

Weryj 4/18/2025||||

I am the person, it's occasionally I log in to delete a log file that I just haven't setup to rotate. About once a month, apart from that, no intervention needed (so far).

rtpg 4/18/2025|||

I have to imagine at that size they have an ops team already for all the other services so those are pretty amortized.

Weryj 4/18/2025|||

I do have one major complaint though, in dotnet, the tracing/errors are always captured regardless of the sampling rate. So you end up with a lot more memory usage on high throughput/low memory services with no way to lower it.

There's a ticket now open to stop this, but it's still in progress.

brungarc 4/20/2025|||

Avoiding allocations when a transaction isn’t sampled should be pretty trivial. Only gotcha is that you’d still trace id propagation to tie errors together regardless of transactions. But tracing and transactions got decoupled a while ago so shouldn’t be a big problem. I’ll leave a comment on ticket pointing to this post

parthdesai 4/18/2025||||

It's open source, you guys could always create a PR to fix it. That's the power of open source!

no_wizard 4/18/2025||

there's no guarantee it will get merged though, even if a PR is created.

Forking has down sides that can't be hand waved away too, especially for a service like this.

5Qn8mNbc2FNCiVV 4/19/2025||

It's just a client library, what's the alternative, put a proxy in front that drops 90% of the events

zeeg 4/18/2025|||

Any chance you can link me to the ticket?

Feel free to email - david at sentry

Weryj 4/18/2025||

It looks like there was some motion a week ago.

https://github.com/getsentry/sentry-dotnet/issues/3636#event...

tgv 4/18/2025||

Am I reading correctly that your software generates 25 million error messages per day?

xmodem 4/18/2025|||

Sentry does a lot more than tracking errors - presumably most of those transactions are 'breadcrumb'-style events.

Weryj 4/18/2025||||

Nope, 25M transactions. In Sentry a transaction is more like an OTEL-trace. Errors are much lower ;)

almd 4/18/2025||

sorry if this is a silly question, the quick google search didn’t give me clues.

Transactions like full user flows start to finish, or 1 transaction = 1 post/get and 1 response?

Volundr 4/18/2025||

> Transactions like full user flows start to finish, or 1 transaction = 1 post/get and 1 response?

For most applications we are talking closer to 1 transportation 1 web request. Distributed tracing across microservices is possible, the level of extra effort required depends on your stack. But that's also the out of the box, plug and play stuff. With lower level APIs you define your own transactions, when they start and end, which is needed for tracing applications where there isn't a built in framework integration (e.x not a web application).

pebble 4/18/2025|||

That is more likely performance traces or session replays.

adamcharnock 4/18/2025||

This absolutely mirrors my experience. Sentry was straightforward to deploy years ago, but now seems like one of the more egregious offenders in the, 'self host-able but increasingly prohibitively complex by design' category.

As others have said, we've [0] found the only practical way to deploy this for our clients is Kuberentes + Helm chart, and that's on bare-metal servers (mostly Hetzner). It runs well if you can throw hardware and engineering time at it, which thankfully we can. But given the option we would love a simpler solution.

[0]: https://lithus.eu

wg0 4/18/2025|

And how do you install and maintain/upgrade kubernetes? Are you running databases also on kubernetes?

adamcharnock 4/18/2025||

In our case we have a collection of Ansible roles we use for the purpose. We run databases using the Stackgres operator either using logical replication on local fast NVMe dives, on top of OpenEBS/Mayastor replicated block-storage.

But we specialise in this so that our clients don't have to. As much as I do actually love Kubernetes, the fact that the _easiest_ way to self-host Sentry is via Kubernetes is not a good sign. And choosing to spin up a Kubernetes cluster just to run Sentry would feel a lot like the lady who swallowed a fly[0].

[0]: https://en.wikipedia.org/wiki/There_Was_an_Old_Lady_Who_Swal...

apexalpha 4/18/2025|||

Thanks for the poem, it seems pretty apt for IT in 2025.

That said I would honestly prefer if the industry would just settle on K8s as our OS.

I really do not see any benefit that sentry could bring on its own compared to a solid set of Helm charts for k8s.

baq 4/18/2025|||

Does it really have to be a proper cluster, though? Can it be e.g. a single node k3s?

adamcharnock 4/18/2025||

Yes, you could definitely do that. Although if going that route I'd consider taking a run at their docker compose self-hosting instructions first.

vanschelven 4/18/2025||

Hey, that's me!

When I posted this myself on Reddit, I said the following:

I've long held off on actually posting this article to a platform like this one (don't bash your competition and all that), but "isn't Sentry self-hosted?" _is_ one of the most asked questions I get, and multiple people have told me this blog-post actually explains the rationale for Bugsink better than the rest of the site, so there you have it.

yarekt 4/18/2025||

Well done! I came to the same conclusion (with the exact same bewilderment steps) as I do love Sentry myself. I will definitely try Bugsink, it’s something i’ve been looking for ages.

Feedback on competition bashing: sometimes they deserve it, they should really just come out and say it: “open sourcing our stuff isn’t working for us, we want to keep making money on the hosting”, and that would be ok

zeeg 4/18/2025||

fwiw I was always pretty transparent about our priorities:

https://blog.sentry.io/building-an-open-source-service/

We enable self-hosting because not everyone can use a cloud service (e.g. government regulation), otherwise we probably wouldn't even spend energy on it. We dont commercialize it at all, and likely never will. I strongly believe people should not run many systems themselves, and something that monitors your reliability is one such system. The lesson you learn building a venture backed company, and one that most folks miss: focus on growth, not cost-cutting. Self-hosting for many is a form of cost-cutting.

We do invest in making it easier, and its 100% a valid complaint that the entire thing is awful today to self-host, and most people dont need a lot of the functionality we ship. Its not intentional by any means, its just really hard to enable a tiny-scale use-case while also enabling someone like Disney Plus.

yarekt 4/20/2025||

Despite my earlier comment, the way Sentry approaches open source is at least a little better than the majority, so its good that you at least provide an escape hatch that means heavy integration with Sentry isn't a complete vendor lock.

On your second point, you're right that self-hosting is cost cutting, but I found that when a business is growing, the features that sentry offers are more of a nice to have along the way, not quite critical to its success (No point capturing errors if no-one's using your bloody thing). Its easy to rack up that monthly bill with nothing but nice to have hosted services.

What I'd love to have is dumb versions of tools to self host initially when the requirements (and traffic) is very low. Kibana for example is a pig to self host, at one point it was taking up 25% of our production capacity. I found Loki is much better for simple cases in the beginning.

Good example that strikes a balance is Grafana / Prometheus. Its practically impossible to run a software shop without them, and everywhere I worked went through the same phases: Chuck a set of random containers in prod -> Deploy helm chart -> Migrate to Thanos -> Move to hosted Grafana when user/teams management gets out of hand.

Benefit is that absolutely everyone is familiar (and even likes) your tooling, I hope that's enough of a reason to give away a simpler offering.

miyuru 4/18/2025||

Your project is awesome, you should do a show HN later.

vanschelven 4/18/2025||

Thanks for the kind words!

In fact I did one last week, but it got only a fraction of today's article's traction... I'll try again in whatever the prescribed interval is :-)

apitman 4/18/2025||

You can submit 2-3 times over a couple days.

stebian_dable 4/18/2025||

FOSS Sentry fork GlitchTip keeps things more simple and self-hosting friendly.

https://glitchtip.com/

crimsonnoodle58 4/18/2025||

+1 for Glitchtip.

We also found the same problem as OP with self hosting sentry. Each release would unleash more containers and consume more memory until we couldn't run anything on the 32gb server except Sentry.

We looked at both GlitchTip and BugSink and only settled on GlitchTip as it was maintained by a bigger team. Feature wise they were quite similar and both good alternatives.

So far so good with GlitchTip.

And thanks Op for making BugSink, the more alternatives the better.

mstaoru 4/18/2025|||

GlitchTip had replaced our Sentry (9.x, pre-Clickhouse madness). It was just a matter of updating DSN in a few Configmaps/Secrets, good to go from day one. The UI is a bit buggy and "resolve" doesn't always work, but it does 99% of what Sentry did with 10% of the effort to maintain a modern Sentry setup.

mdaniel 4/19/2025||

And actually open source, which matters to some folks

vanschelven 4/18/2025|||

Indeed it does!

Although with Bugsink (which is what came out of this origin story of annoyance) I'm aiming for _even more_ simple (1 Docker container at minimum, rather than 4), 30 seconds up and running etc.

bufke 4/18/2025||

Hello, ping me (GlitchTip lead) if you want to collaborate. Your stack is also Django. I'd be open to simplifying - we could probably make Redis optional. I have an experimental script that runs celery and Django in one process. But I think Postgres is a must have. So that gets down to two. My hope is that GlitchTip works for super small use cases and scales with minimal thought required.

thrilleratplay 4/18/2025|||

I saw the headline and wanted to make sure someone mentioned GlitchTip. It doesn't have all of the functionality of Sentry but has all of the functionality I need. We have been running it in production for a year with no problems. Given our small user base (<1000 users), Sentry did not make sense.

brandonaaron 4/18/2025|||

Also a fan of GlitchTip so far. I only recently (last month or so) started using it. I made a railway template for it and have been using it to monitor a handful sites. I used valkey and minio for storage. Makes it super easy to spool up an instance.

anonzzzies 4/18/2025||

We run them both as we are evaluating glitchtip, but, at least for us, it has so many bugs vs sentry. But it's so much lighter so we try to stick with it.

npodbielski 4/18/2025||

Can you say what kind of bugs? This application looks interesting; I want to try it instead of https://healthchecks.io/

epolanski 4/18/2025||

The biggest issue I have with these solutions is indeed local debugging.

I use Sentry with most of my clients, and for effective debugging I need to spin my own Sentry in a Docker container which ends up being quite heavy on my machine especially when combined with Grafana and Prometheus.

I'm really unhappy with virtually all monitoring/telemetry/tracking solutions.

It really feels they are all designed to vendor lock you in their expensive cloud solutions and I really don't feel I'm getting my $s back at all. Worst of all those cloud vendors would rather add new features non-stop rather than honing what they currently have.

Their sales are everywhere, I've seen two different clients getting Datadog sales people join private Slacks to evangelize their products.

Both times I escalated to the CTO, both times I ended up suspecting someone in management had something to gain from pushing teams to adopt those solutions.

vanschelven 4/18/2025||

I actually wrote about that scenario!

Killing flies with hammers and all, but since I really like my hammer I actually do all my local development with my full-blown error tracker too:

https://www.bugsink.com/blog/local-development/

delusional 4/18/2025||

Sentry's sales team is incredibly aggressive. I've seen multiple colleagues hijacked for sales presentations over a few months. It would not surprise me at all if they just asked random employees to be added to the company slack, and even less if those people then just did it.

I can only commend the hustle on their part, but it does feel a little like a high pressure time share situation.

zeeg 4/18/2025||

If you - or anyone reading this - ever ends up in a situation where we came off as aggressive send me a direct email and I will take care of it. This is not something we believe in at Sentry, and while you cant manage everything, its important to us that we never become "of those companies" like so many other successful companies become.

david at sentry.io

notpushkin 4/18/2025||

I still love Sentry, but it’s so enormous now that it isn’t practical to self-host for smaller businesses. A “small” alternative is always great to see!

I’m not sure how I feel about the license though (Polyform Shield, basically use-but-don’t-compete). It’s a totally valid choice – I just wish it would convert to FOSS at some point. (I understand the concern as I’ve had to make a similar decision when releasing https://lunni.dev/. I went with AGPL, but I’m still overthinking it :-)

vanschelven 4/18/2025||

also in recent news: https://it-notes.dragas.net/2024/12/28/i-almost-died-for-a-f...

rmnclmnt 4/18/2025||

> Code needs to be written properly; you can’t just waste money and resources endlessly to cover up inefficiencies.

Quite rare to hear this wise line these days. An I guess with AI coding assistant, this is only the beginning of this kind of horror story

nurettin 4/18/2025||

For me, the horror story started when people ditched optimal desktop apps for Electron because they knew js and css.

beng-nl 4/18/2025|||

Perhaps we can make an AI translate electron apps into native apps…

poisonborz 4/18/2025|||

So tired of hearing this trope. Electron is alright. Memory is cheap. Tell me a single better way to write cross-platform UI other than a worse version of Electron.

mdaniel 4/18/2025|||

> Memory is cheap

If you're running on commodity hardware, sure; if you happen to be a $1T company that solders 3x marked-up RAM then that's definitely not true https://www.apple.com/shop/buy-mac/macbook-pro/16-inch-space... is the entry model, and clicking the 48GB option dials the price up to $3k

poisonborz 4/18/2025||

The cheapest Apple laptops start with 16GB now. That's enough for quite a few desktop apps regardless of Electron.

nurettin 4/18/2025|||

Sure, but what do you think would count as better?

kelnos 4/18/2025|||

Great ending, honestly. I hope that dev got fired and truly understood what he had done, and felt the appropriate amount of shame. Not for his error, because we all make mistakes, but for his hubris that allowed him to keep making that same mistake over and over, while insisting he was doing the right thing.

raverbashing 4/18/2025|||

Ok so your dev can't be told off for bringing the stuff out (and for being a moron doing sync calls to a logging service) and this brought the company down

But it was a good call sending it to the cloud. Better than "my problem" it is something being "somebody else's problem"

tr33house 4/18/2025||

enjoyable read. thanks for sharing

mplantsheer 4/18/2025||

The amount of shell script that needs to be executed to install is a bit of a no no for me. It also doesn't make sense to spin up a 16GB machine (minimum!) to track the errors on those 4-8GB VPS which are running my production services.

seanwilson 4/18/2025||

> we don’t necessarily recommend self-hosting to everyone. In addition to existing hidden costs, as Sentry evolves, our self-hosted version will become more complex, demanding additional types of infrastructure.

Any insights on why Sentry is so complex and needs so much resources? Is collecting, storing, and organizing errors messages and stack traces at scale difficult? Or it's the other features on top of this?

eitland 4/18/2025||

Some ideas:

- they had enough money that they never needed to think seriously about maintenence cost, and the sales process was strokg enough to keep customers arriving anyway (look to Oracle for another example of hopelessly complicated installation process but people keep using it anyway)

- at some point someone realized this was actually a feature: the more complicated it got, the harder it became to self host. And from that perspective it is a win-win for the company: they can claim it is open source without being afraid that most people will choose to self host.

vanschelven 4/18/2025||

I'd say that the architecture that is desirable from the point of view of a large scale SaaS is very different than the one that's desirable from the point of view of a tool that just needs to work for a single organization. And since the SaaS is bringing in all the money, that's where the architecture follows.

> actually a feature

I would guess that for a few people people (e.g. the ones who made the scary visual of rising costs) this is explicitly so, but for most people it's more implied. i.e. I don't think anyone advanced their career with Sentry by making self-hosting easier.

BYK 4/19/2025||

Well I actually did as I was literally hired for that and worked on it for almost 3 years :)

slyall 4/18/2025||

Anything gets complex at scale. This is the same software they us to host their SaaS system. Presumably has to scale to many thousands of customer and a huge number of events per second.

They have all sorts of caching, autoscaling, distributed systems and other stuff thats complete overkill for all except that largest installation. Plus all sorts of software features only needed by a few customers and extra layers to be multi-customer.

It's the difference between a hoop in your back yard and a NBA Stadium

seanwilson 4/18/2025||

Is this a common architectural issue for self-hosted options from SaaS companies?

As in, a huge SaaS company offers their product for self-hosting to individual companies, but it's not practical to self-host because the code is highly specialized for supporting hundreds of companies instead of just one? And it's hard to have an architecture that works well for just one and for hundreds?

pebble 4/18/2025||

It's more about scale than tenancy. Not many SaaS companies offer such an option in the first place but it is typical that the in-house product is the priority and the architectural decisions are made with that in mind firstly, and self-hosting second if at all.

For example Sentry requires ClickHouse, Postgres, Kafka, and Redis presumably because they were the right tools for their needs and either they have the resources to operate them all or the money to buy the managed options from vendors.

Also, the main concern people have with hosting Sentry is the sheer number of containers required but most of them are just consumers for different Kafka queues which again is presumably this way because Sentry ops prefers it this way, whether it be for fine tuning the scaling of each one or whatever the reason.

What makes sense for a SaaS company rarely translates to sensible for self-hosting.

XCSme 4/21/2025|

This is what usually happens with self-hosted software that offers a cloud paid alternative. If it's too easy to install and maintain, the cloud version would make no money.

For my self-hosted analytics platform [0], I chose to only use the LAMP stack (PHP/MySQL), as it's one of the easiest to set up and maintain. Yes, it won't be able to track 100 million events per day, but most people don't have that amount of traffic anyway. It runs on the cheapest VPS, and can be installed in many different ways (adding more and more, if the platform is self-hosted only, my goal is to make installation as simple as possible, especially with the free trial, so customers can try it anywhere in a few minutes). Sorry if it sounds like a rant, but it's hard competing with huge corporations promising free "self-hosted options", only to use it as a lead-magnet for the cloud service. I saw this trend growing, even with smaller companies, where their self-hosted one-click app installs fail and they provide no support for it or have no intention to fix the installer.

[0]: https://www.uxwizz.com

More comments...