OpenObserve: Observability platform for logs, metrics, traces, analytics

Posted by thunderbong 10/23/2024

OpenObserve: Observability platform for logs, metrics, traces, analytics(github.com)

85 points | 66 comments

gclawes 10/23/2024|

No SSO in open-source, pass. I'll stick w/ Grafana

prabhatsharma 10/23/2024|

You should read - https://openobserve.ai/blog/sso-tax and https://openobserve.ai/blog/openobserve-vs-grafana

NewJazz 10/23/2024|||

I've read articles like that time and time again. Doesn't change my requirements.

apitman 10/23/2024|||

I'm curious what are your requirements? Their SSO tax appears to be structured in such a way that only large enterprises would have to pay.

Volundr 10/23/2024||

How so? This chart [1] has SAML under enterprise with no price tag other than "get in touch"

[1] https://openobserve.ai/pricing

apitman 10/23/2024||

I'm going off the article linked above: https://openobserve.ai/blog/sso-tax

Volundr 10/23/2024||

Fair enough. I couldn't find information of self-hosting enterprise on their site, but did in the GitHub repository [1] and 200GB is indeed a lot. At the same time it's also a non-starter for me. I'm not going to install "enterprise" anything where I'm going to start depending on it, and one day the price will go up to ???.

[1] https://github.com/openobserve/openobserve?tab=readme-ov-fil...

prabhatsharma 10/23/2024|||

God bless you my friend. Thanks for the comment.

Volundr 10/23/2024||||

> enterprise SSO solutions like Okta are not free for users and cost a lot of money for organizations to implement and use.

There are free and open source solutions like Keycloak and Zitadel. I don't dispute they are less common than Okta and Entra, but the definitely exist and are deployed in the real world. My workplace (state government) uses Keycloak for example.

Another thing that the article doesn't really touch is that SSO is locking a security best practice, important for an organization of any size, behind a paywall. With SSO, when someone leaves the organization you can disable their singular account and be confident they are locked out of your shared folders, gitlab, jira, etc, etc, rather than having to manually track down and disable each one, with a high likelihood of missing something. This is important for an organization of any size > 1 from a bootstrapped startup all the way to fortune 500. Hiding it behind higher cost makes it more likely that an org will try to do without and have a security breach as a result.

I also take issue with:

> Developing and maintaining SSO solutions requires significant investment in research, development, and infrastructure.

Having done it myself, this is overstated. No feature is free but implementing a SAML or OAuth flow is not THAT much work, nor does it represent a huge amount of ongoing maintenance.

I actually don't mind the SSO tax too much in cases where it's the differentiator between free or open source vs paid. I find it far more egregious when it's a product that already has a cost and SAML auth jacks up the price 2-10x. I don't think the blog post is a particularly good discussion of the tradeoffs though.

adamm255 10/24/2024||||

If your platforms enterprise offering includes having SSO as a value driver to upgrade, you've defined your products value proposition wrong.

Ahh you want to make it easier to enable this in your org, in order to get better adoption and ensure the data in our app is more secure, yeah you're going to need to pay us for that.

adamm255 10/24/2024|||

| The way we think of it is, are you a large enterprise that is already spending a lot of money on security and SSO solutions like Okta? If yes, you should be able to pay us as well for the same level of security.

| Vendors need to recoup costs

| Industry standards: The SSO tax has become an industry standard

Well in that case fine /s

1 - is OpenObserve providing SSO security for ALL your applications, no. Is it doing SCIM, Identity Governance, provisioning, no... its like saying you pay for a sandwich, why dont you pay for the door you used to come into the shop as well. Door tax.

I bet they don't charge you to recoup costs on implementing a JS library? Why are they 'recouping costs' on adding support for OIDC/SAML standards. Build your solution to support SAML/SCIM and OAuth, allow anyone to consume it.

Why?

Adoption and security. Anyone who's a Google Workspace or Microsoft shop has an IDP (albeit basic but OK). Most orgs see the IDP capability there as free. They are then seeing the ability to leverage it as a paid offering in the SaaS apps they buy. So on the one hand, the Identity Provider is free, but the SSO endpoint on the app is paid? Wild.

Also, this is wild:

| For our cloud service we provide SSO in our free tier for following providers with plan to support more in future: Google,GitHub,GitLab,Microsoft

This is great, well done.

| SAML and OIDC are available in our enterprise tier.

WTF? The built out integrations that you had to make UI elements for, offer free (that vendor recoup argument died here). The ones that are generic, are paid for. Ahhh thats right, the generic ones are the ones that let you use Okta, Ping, OneLogin, Keycloak etc etc. Got it, the "valuable" ones.

_fat_santa 10/23/2024||

Funny I was actually shopping around for an logging platform yesterday and ended up going with Grafana. The thing that sold it (for now) is their generous free tier, though if I had to self-host (which I intend to do eventually), this seems like a much easier thing to self host (and I really really like that it's just one binary)

hijinks 10/23/2024||

does anyone use this?

I'm really starting to get sick of companies that claim they operate at petabyte at scale and find you need to spend 400k a month to support that scale.

prabhatsharma 10/23/2024|

Thousands of active deployments globally.

How many open source log systems work at PB scale given any number of resources? Also FWIW, OpenObserve can ingest data at 28 MB/Sec/Core (We are working on optimizing it even more) and ingesting 1 PB of data would cost just $435 based on on-demand prices (AWS m7g family).

terminalbraid 10/23/2024|||

That doesn't answer the question of who? A (rightly) cynical reading of what you posted could just be "thousands of active deployments" you did for yourself to prove benchmarks.

prabhatsharma 10/23/2024||

Machines I would use for benchmarking would go down after some time and won't be active.

djbusby 10/23/2024||

Still didn't answer the "who" part.

prabhatsharma 10/23/2024||

We will publish many names on our website soon.

Veserv 10/23/2024|||

Why is it only 28 MB/core-second?

Is that production rate, inbound bandwidth, rate to persistence, rate to processed, or rate to display?

prabhatsharma 10/23/2024||

Compute power is required to process and store the incoming data.

It's not "only 28 MB/Sec/Core". Try doing same with Splunk/Elasticsearch - You won't go past 5 MB/Sec/Core (Typically it will be lower) on their best day.

Veserv 10/23/2024||

To what state?

Suppose I have 28 GB of trace data in memory on a machine and then I fire that off. What do I have after 1000 seconds?

Do I just have a file of 28 GB of raw trace?

Do I have 28 GB of raw trace in memory ready to be indexed?

Do I have a data structure in memory ready to be searched?

Do I have the full trace information rendered on my screen (or a aggregated visualization derived after processing all the data)?

If it is the first, that would be ridiculously slow. If it is one of the latter ones, then it would depend on what querying operations are fast.

28 MB/core-second makes no sense without the context of what you can do quickly after the “processing” is done.

prabhatsharma 10/23/2024||

Too much to give all details in an HN thread. To simplify the conversation, Data will be persisted and usable for individual searches and aggregations. I would welcome you to our slack workspace for any further questions you may have - https://short.openobserve.ai/community

prabhatsharma 10/23/2024||

Thanks @thunderbong for the post.

We have spent over 2 years building OpenObserve into a simple, highly usable and efficient observability tool. You could run it using a single binary that provides all the functionality of logs, metrics, traces, front end monitoring, dashboards (18 different chart types), alerts and pipelines.

OpenObserve is being used by startups, mid tier enterprises and fortune 100 companies. There are thousands of active installations of OpenObserve globally.

Folks have replaced Elasticsearch, Splunk, Graylog, Datadog , Newrelic and more for OpenObserve.

Comment from a user -

We moved from 5 node OpenSearch cluster to single node OpenObserve and measured using our actual everyday queries, which are reasonably complex queries (1 to 5 conditions applied) over our real logging data. We see that typically they complete in about the same time. OpenObserve costs us 10 times less though (instances + storage)

Also, we are currently working on replacing one of the world's largest splunk installations.

p.s. I am one of the maintainers of OpenObserve. Feel free to ask questions. I will be happy to answer them. You can also visit our slack workspace at https://short.openobserve.ai/community for discussions.

niux 10/23/2024||

Had a pretty bad experience with this. The web app is very buggy and frustrating.

prabhatsharma 10/23/2024|

What bugs? Care to file a GitHub issue?

wiradikusuma 10/23/2024||

How is it compared to Signoz?

mdaniel 10/23/2024||

AGPLv3 versus MIT Expat + open core for one thing

https://github.com/openobserve/openobserve/blob/v0.12.1/LICE...

https://github.com/SigNoz/signoz/blob/v0.56.0/LICENSE

prabhatsharma 10/24/2024||

For one thing - From their website - Our powerful ingestion engine has a proven track record of handling 10TB+ data ingestion per day.

OpenObserve clusters can ingest PBs of data every day. While more can be discussed - I would rather focus on what are your needs when it comes to observability. Let's talk about them. I will be more excited to answer those questions.

__turbobrew__ 10/23/2024||

How does this better than grafana? Loki -> logs Tempo -> traces Prometheus/mimir -> metrics

I know personally of several companies which use this stack with tens of thousands of nodes. Everything runs on object storage which means it is easy to scale up, and you can move between storage providers as long as they implement the S3 API. The grafana ecosystem is very sticky as well. If I download some random helm chart it most likely will come with a grafana dashboard which I can easily import and instantly have dashboards and alerts for the helm chart.

__turbobrew__ 10/23/2024|

I see you address this in a blog post: “You will also hear of LGTM stack (Loki, Grafana, Tempo, Mimir) which is a pretty good stack for observability. Each of these components are separate open source tools built by grafana labs.”

Not very convincing why I wouldn’t go for the LGTM stack which has been proven to be effective.

prabhatsharma 10/23/2024||

By all means, if LGTM works for you stay with it.

For those looking at much more simplicity, and much higher performance OpenObserve is the way to go. Many folks have moved from Loki to OpenObserve due to performance issues with Loki. Many have moved from LGTM stack completely to OpenObserve. Many have chosen to use Grafana as a front end for OpenObserve too.

Take a look at how easy it can be to build dashboards in OpenObserve.

It takes time for community and ecosystem to build for great products. Grafana started in 2014. OpenObserve started in 2022.

__turbobrew__ 10/24/2024||

What would convince me is if you show benchmarks between large LTGM deployments and OpenObserve.

Show me OpenObserve ingesting a few billion metrics streams and show me how query times are faster than mimir.

prabhatsharma 10/24/2024||

In all likelihood you are not going to get convinced by that. You did not switch to using LGTM stack because grafana gave you a benchmark of LGTM against what you were using previously.

We run benchmarks internally and will publish them once we are ready for it, but are unlikely to benchmark someone else's product. Benchmarking someone else's product always leads to a conversation similar to - You ran your benchmark in the most optimized way but used our non optimized settings or a version of that.

People who use OpenObserve like it for it's ease of setup, ease of management, high performance and rich feature set.

Especially for logs, grafana and loki is no match in terms of features and performance when it comes to OpenObserve. I will let you test it if you are curious and have some spare time.

OpenObserve is used by people ingesting MBs of data in their basement to PBs of data in large clusters in AWS, Azure, GCP and other cloud environments.

BTW, here is a story of a large EV company who moved to OpenObserve for traces and increased performance by a factor of 10x and reduced their cost at the same time - https://openobserve.ai/blog/jidu-journey-to-100-tracing-fide...

__turbobrew__ 10/24/2024||

Grafana has always been the lingua-franca of telemetry so I think if you want real traction with people like me you are going to need to publish a publicly verifiable benchmark than OpenObserve can ingest a billion metrics streams, which mimir could do several years ago: https://grafana.com/blog/2022/04/08/how-we-scaled-our-new-pr...

Like I said, the industry is small and I know several colleagues running LGTM at large scales (10k-100k nodes) so I know the system works and it is a safe bet.

ewuhic 10/23/2024||

How does it compare to Grafana suite?

prabhatsharma 10/23/2024|

You should read this - https://openobserve.ai/blog/openobserve-vs-grafana

viraptor 10/23/2024||

That's an interesting take for OpenObserve, but for me in an ops role it misses the compatibility. I appreciate that with Grafana I can choose a specific backend and configure it as required in my environment. I know other systems can reach the same database because they have known APIs. I also know I can move to another store with the same front-end if the owners pull an Elastic/Redis.

OO integration may save me a couple of days of setup, but long-term it's a dangerously limiting idea / lock-in.

prabhatsharma 10/23/2024||

You are in the same danger with Grafana (Front end, Elasic/Redis fate and lock in) as you are with OpenObserve. No difference there.

viraptor 10/23/2024||

The risk is more spread out between the projects though so the replacements required would be smaller.

cshark007 10/23/2024||

used this on self-hosted and ingested 150GiB daily and absolutely no issues, fancy UI and more buttons are not needed if you get the value from ingestion speed.

max_streese 10/23/2024|

How are you folks related to Anguilla? Couldn’t really find anything AI specific about OpenObserve so I am guessing the domain is for other reasons?

More comments...