Top
Best
New

Posted by ibobev 4 days ago

Garage – An S3 object store so reliable you can run it outside datacenters(garagehq.deuxfleurs.fr)
708 points | 174 commentspage 2
supernes 4 days ago|
I tried it recently. Uploaded around 300 documents (1GB) and then went to delete them. Maybe my client was buggy, because the S3 service inside the container crashed and couldn't recover - I had to restart it. It's a really cool project, but I wouldn't really call it "reliable" from my experience.
JonChesterfield 4 days ago||
Corrupts data on power loss according to their own docs. Like what you get outside of data centers. Not reliable then.
lxpz 4 days ago|
Losing a node is a regular occurrence, and a scenario for which Garage has been designed.

The assumption Garage makes, which is well-documented, is that of 3 replica nodes, only 1 will be in a crash-like situation at any time. With 1 crashed node, the cluster is still fully functional. With 2 crashed nodes, the cluster is unavailable until at least one additional node is recovered, but no data is lost.

In other words, Garage makes a very precise promise to its users, which is fully respected. Database corruption upon power loss enters in the definition of a "crash state", similarly to a node just being offline due to an internet connection loss. We recommend making metadata snapshots so that recovery of a crashed node is faster and simpler, but it's not required per se: Garage can always start over from an empty database and recover data from the remaining copies in the cluster.

To talk more about concrete scenarios: if you have 3 replicas in 3 different physical locations, the assumption of at-most one crashed node is pretty reasonable, it's quite unlikely that 2 of the 3 locations will be offline at the same time. Concerning data corruption on a power loss, the probability to lose power at 3 distant sites at the exact same time with the same data in the write buffers is extremely low, so I'd say in practice it's not a problem.

Of course, this all implies a Garage cluster running with 3-way replication, which everyone should do.

JonChesterfield 4 days ago|||
That is a much stronger guarantee than your documentation currently claims. One site falling over and being rebuilt without loss is great. One site losing power, corrupting the local state, then propagating that corruption to the rest of the cluster would not be fine. Different behaviours.
lxpz 4 days ago||
Fair enough, we will work on making the documentation clearer.
jiggawatts 4 days ago|||
So if you put a 3-way cluster in the same building and they lose power together, then what? Is your data toast?
lxpz 4 days ago|||
If I make certain assumptions and you respect them, I will give you certain guarantees. If you don't respect them, I won't guarantee anything. I won't guarantee that your data will be toast either.
Dylan16807 3 days ago||
If you can't guarantee anything for all the nodes losing power at the same time, that's really bad.

If it's just the write buffer at risk, that's fine. But the chance of overlapping power loss across multiple sites isn't low enough to risk all the existing data.

rakoo 2 days ago||
I disagree that it's bad, it's a choice. You can't protect against everything. The team made calculations and decided that the cost to protect against this very low probability is not worth it. If all the nodes lose power you may have a bigger problem than that
Dylan16807 2 days ago||
Power outages across big areas are common enough.

It's downright stupid if you build a system that loses all existing data when all nodes go down uncleanly, not even simultaneously but just overlapping. What if you just happen to input a shutdown command the wrong way?

I really hope they meant to just say the write buffer gets lost.

rakoo 14 hours ago||
That's why you need to go to other regions, not remain in the same area. Putting all your eggs in one basket (single area) _is_ stupid. Having a single shutdown command for the whole cluster _is_ stupid. Still accepting writes when the system is in a degraded state _is_ stupid. Don't make it sound worse than it actually is just to prove your point.
Dylan16807 12 hours ago||
> Still accepting writes when the system is in a degraded state _is_ stupid.

Again, I'm not concerned for new writes, I'm concerned for all existing data from the previous months and years.

And getting in this situation only takes one out of a wide outage or a bad push that takes down the cluster. Even if that's stupid, it's a common enough stupid that you should never risk your data on the certainty you won't make that mistake.

You can't protect against everything, but you should definitely protect against unclean shutdown.

rakoo 6 hours ago||
If it's a common enough occurrence to have _all_ your nodes down at the same time maybe you should reevaluate your deployment choices. The whole point of multi-nodes clustering is that _some_ of the nodes will always be up and running otherwise what you're doing is useless.

Also, garage gives you the possibility to automatically snapshot the metadata, advices on how to do the snapshotting at the filesystem level and to restore that.

InitialBP 4 days ago|||
It sounds like that's a possibility, but why on earth would you take the time to setup a 3 node cluster of object storage for reliability and ignore one of the key tenants of what makes it reliable?
eduardogarza 3 days ago||
I use this for booting up S3-compatible buckets for local development and testing -- paired up with s5cmd, I can seed 15GB and over 60,000 items (seed/mock data) in < 60s... have a perfect replica of a staging environment with Docker containers (api, db, cache, objects) all up in less than 2mins. Super simple to set up for my case and been working great.

Previously I used LocalStack S3 but ultimately didn't like the lack of persistance thats not available on the OSS verison. MinIO OSS is apparently no longer maintained? Also looked at SeaweedFS and RustFS but from a quick reading into them this once was the easiest to set up.

chrislusf 3 days ago|
I work on SeaweedFS. So very biased. :)

Just run "weed sever -s3 -dir=..." to have an object store.

eduardogarza 3 days ago||
I'll try it!
awoimbee 4 days ago||
How is garage for a simple local dev env ? I recently used seaweedfs since they have a super simple minimal setup compared to garage which seemed to require a config file just to get started.
doctorpangloss 4 days ago||
https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main-...

this is the reliability question no?

lxpz 4 days ago|
I talked about the meaning of the Jepsen test and the results we obtained in the FOSDEM'24 talk:

https://archive.fosdem.org/2024/schedule/event/fosdem-2024-3...

Slides are available here:

https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/4efc8...

wyattjoh 4 days ago||
Wasn't expecting to see it hosted on forgejo. Kind of a breath of fresh air to be honest.
apawloski 4 days ago||
Is it the same consistency model as S3? I couldn't see anything about it in their docs.
lxpz 4 days ago|
Read-after-write consistency : yes (after PutObject has finished, the object will be immediately visible in all subsequent requests, including GetObject and ListObjects)

Conditionnal writes : no, we can't do it with CRDTs, which are the core of Garage's design.

skrtskrt 4 days ago||
Does RAMP or CURE offer any possibility of conditional writes with CRDTs? I have had these papers on my list to read for months, specifically wondering if it could be applied to Garage

https://dd.thekkedam.org/assets/documents/publications/Repor... http://www.bailis.org/papers/ramp-sigmod2014.pdf

lxpz 4 days ago||
I had a very rapid look at these two papers, it looks like none of them allow the implementation of compare-and-swap, which is required for if-match / if-none-match support. They have a weaker definition of a "transaction". Which is to be expected as they only implement causal consistency at best and not consensus, whereas consensus is required for compare-and-swap.
skrtskrt 3 days ago||
ack - makes sense, thank you for looking!
ianopolous 3 days ago||
@lxpz It would be great to do a follow up to this blog post with the latest Peergos. All the issues with baseline bandwidth and requests have gone away, even with federation on. The baseline is now 0, and even many locally initiated requests will be served directly from a Peergos cache without touching S3.

https://garagehq.deuxfleurs.fr/blog/2022-ipfs/

Let's talk!

agwa 4 days ago|
Does this support conditional PUT (If-Match / If-None-Match)?
codethief 4 days ago|
https://news.ycombinator.com/item?id=46328218
More comments...