Posted by rrreese 15 hours ago
The one thing they have to do is backup everything and when you see it in their console you can rest assured they are going to continue to back it up.
They’ve let the desktop client linger, it’s difficult to add meaningful exceptions. It’s obvious they want everyone to use B2 now.
Borg backup is a good tool in my opinion and has everything that I need (deduplication, compression, mountable snapshot.
Hetzner Storage Box is nothing fancy but good enough for a backup and is sensibly cheaper for the alternatives (I pay about 10 eur/month for 5TB of storage)
Before that I was using s3cmd [3] to backup on a S3 bucket.
D'argh.
In other words, a backup can be degraded into a sync-to-nothing situation if the client logic is untrustworthy.
Just this weekend, my backup tool went rogue and exhausted quota on rsync.net (Some bad config by me on Borg.) Emailed them, they promptly added 100 GB storage for a day so that I could recover the situation. Plus, their product has been rock solid since a few years I've been using them.
Sorry to hear about your troubles. Hope your backup situation's sorted out?
Do you recall if you used a link like this to sign up?
https://www.rsync.net/signup/order.html?code=experts
If you don't, a good heuristic would be to see how much you pay per GB - if it's less than a cent, you probably did. The ones that come with support are typically a shade above per a cent per GB
Just to clarify - there are discounted plans that don't have free ZFS snapshots but you can still have them ... they just count towards your quota.
If your files don't change much - you don't have much "churn" - they might not take up any real space anyway.
There is 100% a difference between "dead data" (eg: movie.mp4) and "live data" (eg: a git directory with `chmod` attributes)- S3 and similar often don't preserve "attributes and metadata" without a special secondary pass, even though the `md5` might be the same.
</bzexclusions><excludefname_rule plat="mac" osVers="*" ruleIsOptional="f" skipFirstCharThenStartsWith="*" contains_1="/users/username/dropbox/" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
That is the exact path to my Dropbox folder, and I presume if I move my Dropbox folder this xml file will be updated to point to the new location. The top of the xml file states "Mandatory Exclusions: editing this file DOES NOT DO ANYTHING".
.git files seem to still be backing up on my machine, although they are hidden by default in the web restore (you must open Filters and enable Show Hidden Files). I don't see an option to show hidden files/folders in the Backblaze Restore app.
That would be nice, they'd be able to get their history back!
Try checking bzexcluderules_editable.xml. A few years ago, Backblaze would back up .git folders for Mac but not Windows. Not sure if this is still the case.
Regardless to the OP's issues:
- on macOS since 9.0.2.784 released in 2023 all .git folders are included in backups - Cloud drives are problematic to backup because they all use extension plugins to hide the network and your local disk only contains stubs instead of actual files. If Backblaze scans it fully it'll download everything and exhaust your disk space there's no easy solution here.
I don't buy for a minute they were trying to be "sneaky" to save some $$ I instead feel like for the majority of users they felt it was misleading to backup stubs only and would rather not brick user computers by downloading all the files. Remember they can't access your cloud disk directly so the only way they can get the file contents is by doing an fread and letting the cloud drive client sync the content on demand.
Basically it works like this:
- I have syncthing moving files between all my devices. The larger the device, the more stuff I move there[2]. My phone only has my keepass file and a few other docs, my gaming PC has that plus all of my photos and music, etc.
- All of this ends up on a raspberry pi with a connected USB harddrive, which has everything on it. Why yes, that is very shoddy and short term! The pi is mirrored on my gaming PC though, which is awake once every day or two, so if it completely breaks I still have everything locally.
- Nightly a restic job runs, which backs up everything on the pi to an s3 compatible cloud[3], and cleans out old snapshots (30 days, 52 weeks, 60 months, then yearly)
- Yearly I test restoring a random backup, both on the pi, and on another device, to make sure there is no required knowledge stuck on there.
This is was somewhat of a pain to setup, but since the pi is never off it just ticks along, and I check it periodically to make sure nothing has broken.
[1] there is always weirdness with these tools. They don't sync how you think, or when you actually want to restore it takes forever, or they are stuck in perpetual sync cycles
[2] I sync multiple directories, broadly "very small", "small", "dumping ground", and "media", from smallest to largest.
[3] Currently Wasabi, but it really doens't matter. Restic encrypts client side, you just need to trust the provider enough that they don't completely collapse at the same time that you need backups.
I still trust restic checksums will actually check whether restore is correct, but that way random part of storage gets tested every so often in case some old pack file gets damaged
Props for getting this implemented and seemingly trusted... I wish there was an easier way to handle some of this stuff (eg: tiny secure key material => hot syncthing => "live" git files => warm docs and photos => cold bulk movies, isos, etc)... along with selective "on demand pass through browse/fetch/cache"
They all have different policy, size, cost, technical details, and overall SLA/quality tradeoffs.
~ 5 years ago, I had a development flow that involved a large source tree (1-10K files, including build output) that was syncthing-ed over a residential network connection to some k8s stuff.
Desyncs/corruptions happened constantly, even though it was a one-way send.
I've never had similar issues with rsync or unison (well, I have in unison, but that's two-way sync, and it always prompted to ask for help by design).
Anyway, my decade-old synology is dying, so I'm setting up a replacement. For other reasons (mostly a decade of systemd / pulse audio finding novel ways to ruin my day, and not really understanding how to restore my synology backups), I've jumped ship over to FreeBSD. I've heard good things about using zfs to get:
saniod + syncoid -> zfs send -> zfs recv -> restic
In the absence of ZFS, I'd do:
rsync -> restic
Or:
unison <-> unison -> restic.
So, similar to what you've landed on, but with one size tier. I have docker containers that the phone talks to for stuff like calendars, and just have the source of the backup flow host my git repos.
One thing to do no matter what:
Write at least 100,000 files to the source then restore from backup (/ on a linux VM is great for this). Run rsync in dry run / checksum mode on the two trees. Confirm the metadata + contents match on both sides. I haven't gotten around to this yet with the flow I just proposed. Almost all consumer backup tools fail this test. Comments here suggest backblaze's consumer offering fails it badly. I'm using B2, but I haven't scrubbed my backup sets in a while. I get the impression it has much higher consistency / durability.
One particular issue I've encountered is that syncthing 2.x does not work well for systems w/o an SSD due to the storage backend switching to sqlite which doesn't perform as well as leveldb on HDDs, the scans of the 6TB folder was taking an excessively long time to complete compared to 1.x using leveldb. I haven't encountered any issues with mixing the use of 1.x and 2.x in my setup. The only other issues I've encountered are usually related to filename incompatibilites between filesystems.
syncthing is not perfect, and can get into weird states if you add and remove devices from it for example, but for my case it is I think the best option.
I never trust them again with my data.
Well, "no problem" is an overstatement. Once you need a restore, you learn that their promise of end-to-end encryption is actually a lie. (As in, you have to break the end-to-end encryption to restore since everything has to be decrypted on their servers.)
[0] My home file server, migrating a four-disk mirrored-pairs ZFS array to RAID5 including replacing the smaller pair of disks with ones matching the larger pair, so the old ZFS filesystem had to be totally destroyed in the process and I needed somewhere to put the data for the like 15 minutes the logical disk wouldn't exist in any form. The alternative would have been to build an entire new four-disk array, doubling the disk cost of the project and requiring some kind of second host-machine. This approach saved me $400 or more, probably wouldn't have attempted it otherwise, cost would have been too high. Ended up costing somewhere in the tens of dollars as I recall.
Not backing up .git folders however is completely unacceptable.
I have hundreds of small projects where I use git track of history locally with no remote at all. The intention is never to push it anywhere. I don't like to say these sorts of things, and I don't say it lightly when I say someone should be fired over this decision.
But if that's truly their stance, then they are being deceptive about their non-business offering at the point of sale.
EDIT - see my other comment where I found the actual email
According to support's reply just now, my backups are crippled just like every other customer. No git, no cloud synced folders, even if those folders are fully downloaded locally.
(This is also my personal backup strategy for iCloud Drive: one Mac is set to fully download complete iCloud contents, and that Mac backs up to Backblaze.)
I know this is besides the point somewhat, but: Learn your tools people. The commit history could probably have been easily restored without involving any backup. The commits are not just instantly gone.
Indeed, the commits and blobs might even have still been available on the GitHub remote, I'm not sure they clean them on some interval or something, but bunch of stuff you "delete" from git still stays in the remote regardless of what you push.