I gave up on self-hosted Sentry (2024)

Posted by roywashere 1 day ago

I gave up on self-hosted Sentry (2024)(www.bugsink.com)

177 points | 143 commentspage 2

rwky 1 day ago|

I use both hosted and self hosted sentry. I prefer hosted it's less to manage but self hosted isn't too awful as long as you can budget the resources. If you just want the team sentry plan it's going to be cheaper to use hosted. I would only self host if you had to for legal/compliance reasons. In the many years I've been managing sentry we've only had it truly crap out once and that was an upgrade from 9 to their new date versioning basically the whole hosted method changed and it was just easier to start over from scratch.

tanepiper 1 day ago||

This is also my opinion of Backstage. It's positioned as this easy-to-use open source tool for IDPs - but in reality, it's a product that needs a team to maintain it constantly.

mtndew4brkfst 1 day ago|

It has its own conference now! Definitely not a signal for something that is reliable set-it-and-forget-it tech.

elAhmo 1 day ago||

So many services are going this route. I remembered Reddit had an install script and almost everything worked out of the box 10-15 years ago on a Ubuntu VPS.

exiguus 1 day ago||

I'm a big fan of self-hosting, but it requires the right infrastructure. You need a robust virtualization solution like Proxmox, equipped with sufficient CPU cores and RAM. For instance, allocating 16GB of RAM shouldn't be an issue if your server has 128GB to share.

In my experience, solutions like Mailcow, which involve multiple services and containers (such as SMTP, IMAP, Redis, SSO, webmail, Rspamd, etc.), work very well. I have extensive experience running these systems, performing backups, restoring data, and updating their ecosystems.

Additionally, I've had a positive experience setting up and running a self-hosted Sentry instance with Docker for a project that spanned several years. However, this experience might be somewhat outdated, as it was a few years ago.

npodbielski 1 day ago|

I am running Mailcow on just some VM with docker. No Proxmox needed. So far it works.

afro88 1 day ago||

> First, there’s a signal being given about the complexity and fragility of the software. A 16GB RAM requirement just screams “this is a big, complex piece of software that will break in mysterious ways”. Not very scientific, I know, but based on experience.

This lines up with my experience self hosting a headless BI service. In "developer mode" it takes maybe 1GB RAM. But as soon as you want to go prod you need multiple servers with 4+ cores and 16GB+ RAM that need a strongly consistent shared file store. Add confusing docs to the mix, mysterious breakages, incomplete logging and admin APIs, and a reliance on community templates for stuff like k8s deployment... it was very painful. I too gave up on self hosted.

AdrianB1 1 day ago|

In most corporate projects that I worked with, the day it went to production started the clock to disband the core team and optimizations were not on the table. When the product works, they move on. The only area where performance optimization is a mandatory step is video games, but that is where your clients are external. Most business software is in the "finally, but barely works" state and new features beat performance improvements.

This is caused by short sighted management that need to deliver and move on. "Long term" is a contradiction with their business model. In this case "long term" means "after product launch".

m463 10 hours ago||

> It’s a ton of docker containers. Things will fail randomly or maybe with a lot of traffic, don’t remember well.

maybe they should put in a system to monitor the docker containers.

BenGosub 1 day ago||

I worked in a startup around 2019-2020 where we started using Sentry for distributed tracing. It was a company in the IOT space and we had a lot of events going through. The bill started going in the $3-5000 range just from the tracing, so we decided to host it on our own. When looking at the sheer complexity of the project, I was flabbergasted. We need a name for managing the infrastructure of company based open source, self-hosted solutions. Often times, the right choice would be to choose a different, simpler, open source solution. In this case there are some neat distributed tracing solutions, that are easier to manage.

KronisLV 1 day ago||

A good APM tool I’ve been using for a few years is Apache Skywalking: https://skywalking.apache.org/

It might not do everything Sentry does but it definitely has helped with tracking down some issues, even production ones and runs in a fairly manageable setup (in comparison to how long even the Sentry self-hosted Docker Compose file is).

What’s more, if you want, you can even use regular PostgreSQL as the backing data store (might not be quite as efficient as ElasticSearch for a metrics use case, but also doesn’t eat your RAM like crazy).

mdaniel 1 day ago|

Any product that uses etcd is immediately off my radar[1]; it is nice of them to have a whole page dedicated to how to babysit the PoS https://skywalking.apache.org/docs/skywalking-banyandb/next/...

1: yes, I'm a huge k8s fanboi and yes I long every day for them to allow me to swap out etcd for something sane

KronisLV 20 hours ago||

That’s kind of the beauty of it: you can opt for a different storage solution too https://skywalking.apache.org/docs/main/v10.2.0/en/setup/bac...

Personally, no hate towards their BanyanDB but after getting burnt by OrientDB in Sonatype Nexus, I very much prefer more widespread options.

azthecx 1 day ago||

At first glance it's not immediately obvious to me, why would you pick Sentry or Bugsink over something included in the Grafana stack? What's the use use

mdaniel 13 hours ago|

I haven't used Sentry's obserability portions, but the error capture has very strong deduplication built into it, and can optionally integrate with your SCM, CI, and issue tracker, if one wishes. Thus, the flow can go like:

- error happens, can be attributed to release 1.2.3

- every subsequent time that error happens to a different user, it can track who was affected by it, without opening a new error report

- your project can opt-in to accepting end-user feedback on error: "please tell us what you were doing when this exploded, or feel free to rant and rave, we read them all"

- it knows from the stack trace that the error is in src/kaboom/onoz.py line 55

- onoz.py:55 was last changed by claude@example.com last week, in PR #666

- sentry can comment upon said PR to advise the reviewers of the bad outcome

- sentry can create a Jira with the relevant details

- claude.manager@example.com can mark the bug as "fixed in the next release", which will cause sentry to suppress chirping about it until it sees a release 1.2.4

- if it happens again it will re-open the prior error report, marking it as a regression

Unless you know something I don't, Grafana does *ABSOLUTELY NONE* of that

precommunicator 1 day ago|

We've just applied a helm chart a while back. It just works. We maybe had like a few incidents over the years, requiring stuff like Kafka queues to be wiped.

The argument that you have to read a sh script doesn't make sense to me. Are you gonna read source code of any software is referenced in this script or any you download too? No? What's the difference between that and a bash script, at the end of the day both can do damage.

xyzzy123 1 day ago||

We used the helm chart but things didn't get updated often enough to keep our container security stuff happy.

Helm is a huge pain in the butt if you have mitigation obligations because the overall supply chain for a 1-command install can involve several different parties, who all update things at different frequencies :/

So chart A includes subchart B, which consumes an image from party C, who haven't updated to foobar X yet. You either need to wait for 3 different people to update stuff to get mainline fixed, or you roll up your sleeves and start rebuilding things, hosting your own images and forking charts. At first you build 1 image and set a value but the problem grows over time.

If you update independently you end up running version combinations of software that the OG vendor has never tested.

This is not helm's fault of course; it's just the reality of deploying software with a lot of moving parts.

vanschelven 1 day ago|||

Rereading that section, I'd agree it's probably not the best-argued point because it implies security concerns... I guess what I'm saying is: for something I'm setting up to keep around for a while, I'd like to know a bit what's in the package before I deploy it. In that sense, the shell script serves as a table of contents... and if the table of contents is 800 lines, that makes me wonder how many moving parts there are and how many things might break at inconvenient times because of that.

precommunicator 1 day ago||

For me I would just run it on a clean cluster/VM somewhere (to be destroyed after that) just to see what happens. If you have no local resources to spare, an hour of even very high end (to save time) VMs/cluster at a provider e.g. AWS costs next to nothing

vanschelven 1 day ago||

That solution didn't apply for me at the time, since I was in an environment that combined security-consciousness with thick layers of bureaucracy, meaning that hardware came at a premium (and had to be on premise).

precommunicator 1 day ago||

Sure, but I'm not suggesting running there, just testing there. We also have to run in specific providers in specific location, but nothing stops us from renting a clean large VM in AWS for an hour or two, for testing stuff without using any customer data. Hell, that costs pretty much nothing so if my employer didn't allowed it, I would just pay with my own money - it's much better for your work efficiency to work out the kinks of this without having to do 10 cleanups after failed deployment, it's much easier than to just delete a VM.

precommunicator 1 day ago||

Oh and the most difficult part when setting up, from what I remember, was setting up GitHub SSO and GitHub and Slack integration as it wasn't well documented.

More comments...