The issue with a client app backing up dropbox and onedrive folders on your computer is the files on demand feature, you could sync a 1tb onedrive to your 250gb laptop but it's OK because of smart/selective sync aka files on demand. Then backblaze backup tries to back the folder up and requests a download of every single file and now you have zero bytes free, still no backup and a sick laptop. You could oauth the backblaze app to access onedrive directly, but if you want to back your onedrive up you need a different product IMO.

▲

appreciatorBus 3 hours ago | parent | next [-]

Shoutout to Arq backup which simply gives you an option in backup plans for what to do with cloud only files:

- report an error

- ignore

- materialize

Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.

▲

Lord_Zero an hour ago | parent | next [-]

Why no linux support?

▲

CamperBob2 34 minutes ago | parent [-]

If it's open-source, Linux support is only a few hours with Claude away.

If it's not open-source, but the protocol is documented, see above.

If it's not open-source, and the protocol isn't documented, well... that makes the decision easy, doesn't it?

▲

cobertos 30 minutes ago | parent [-]

Backups software written by Claude? No thanks.

I've used enough Claude coded applications that I wouldn't trust that with a backup, unless it had extensive tests along with it.

	▲	kid64 16 minutes ago \| parent \| next [-]
		"Dear Claude, please create an eztensive testing suite for this app. Love, cobertos"
	▲	CamperBob2 15 minutes ago \| parent \| prev [-]
		And I've used enough "gold standard" commercial applications, like the one being discussed in this very article, that I don't trust those either. If you recoil in horror at code written by LLMs, I'm afraid that the vendors you're already working with have some really bad news for you. You can get over it now or get over it later. You will get over it. I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.

▲

vunderba 30 minutes ago | parent | prev | next [-]

I honestly didn't even realize Backblaze had a clientside app. Very happy user of Arq - been running a daily scheduled dual backup of my HDD to an external NAS and Backblaze B2 for years with zero issues.

	▲	justusthane 18 minutes ago \| parent [-]
		That was their whole business originally. The block storage is a newer offering.

▲

decadefade 2 hours ago | parent | prev [-]

Love Arq!

▲

ineedasername an hour ago | parent | prev | next [-]

That seems like a pretty straightforward issue to solve, to simply backup only those files that are actually on the system, not the stubs. If it's on your computer, it should able to get backed up. If it's just a shadow, a pointer, it doesn't.

Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"

▲

wrs 19 minutes ago | parent | next [-]

The OP’s complaint is that the files were not backed up. If they had discovered that only stubs were backed up, I don’t think they’d be any happier.

▲

lazide an hour ago | parent | prev [-]

The stubs are the thing on your computer?

	▲	coldtea an hour ago \| parent [-]
		Imagine if they could detect stab or real file huh? Space technology, I know! Or just fucking copy them as stubs and what's actually downloaded as actually downloaded! Boggles the mind! Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!

▲

thecapybara 3 hours ago | parent | prev | next [-]

That would make sense for online-only files, but I have my Dropbox folder set to synchronize everything to my PC, and Backblaze still started skipping over it a few months ago. I reached out to support and they confirmed that they are just entirely skipping Dropbox/OneDrive/etc folders entirely, regardless of if the files are stored locally or not.

▲

whimblepop 3 hours ago | parent | prev | next [-]

The whole "just sync everything, and if you can't seek everything, pretend to sync everything with fake files and then download the real ones ad-hoc" model of storage feels a bit ill-conceived to me. It tries to present a simple facade but I'm not sure it actually simplifies things. It always results in nasty user surprises and sometimes data loss. I've seen Microsoft OneDrive do the same thing to people at work.

▲

carefulfungi an hour ago | parent [-]

I’ve lost data not realizing I was backing up placeholder files (iCloud).

Hiding the network always ends in pain. But never goes out of style.

	▲	whimblepop 11 minutes ago \| parent [-]
		My own approach to simplicity generally means "hide complexity behind a simple interface" rather than pushing for simple implementations because I feel that too much emphasis on simplicity of implementations often means sacrificing correctness. This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)

▲

bombcar 10 minutes ago | parent | prev | next [-]

The issue really isn't that it's not backing up the folder (which I can see an argument for both sides and various ways to do it) - it's that they changed what they did in a surprising way.

Your backup solution is not something you ever want to be the source of surprises!

▲

signorovitch 2 hours ago | parent | prev | next [-]

The primary trouble I have with backblaze was that this change was not clearly communicated, even if perhaps it could be justified.

▲

bastawhiz 3 hours ago | parent | prev | next [-]

That doesn't really make a lot of sense, though. Reading a file that's not actually on disk doesn't download it permanently. If I have zero of 10TB worth of files stored locally on my 1TB device, read them all serially, and measure my disk usage, there's no reason the disk should be full, or at least it should be cache that can be easily freed. The only time this is potentially a problem is if one of the files exceeds the total disk space available.

Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.

▲

jrmg 3 hours ago | parent | next [-]

Right, but even if that’s working it breaks the user experience of services like this that ‘files I used recently are on my device’.

After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!

▲

wtallis an hour ago | parent | next [-]

That shouldn't be seen as Backblaze's problem. It's Dropbox's problem that they made their product too complicated for users to reason about. The original Dropbox concept was "a folder that syncs" and there would be nothing problematic about Backblaze or anything else trying to back it up like any other folder.

Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.

	▲	realo 28 minutes ago \| parent [-]
		If I backup a file, I need to read that file. The rest is in the management layer underneath that file. Seems simple enough to do for Backblaze, no?

▲

NetMageSCW 3 hours ago | parent | prev [-]

There’s no reason to think that would happen - files you had from ten years ago would have been backed up ten years ago and would be skipped over today.

	▲	jrmg 3 hours ago \| parent [-]
		Good point (I’m assuming you’re right here and it trusts file metadata and doesn’t read files it’s already backed up?) It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations. I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.

▲

bombcar 3 hours ago | parent | prev [-]

It's generally now handled decently well, but with three or four of these things it can make backups take annoying long as without "smarts" (which are not always present) it may force a download of the entire OneDrive/Box each time - even if it never crashes out.

	▲	bastawhiz an hour ago \| parent [-]
		> it may force a download of the entire OneDrive/Box each time - even if it never crashes out. I am not aware of any evidence supporting this.

▲

danpalmer 5 hours ago | parent | prev | next [-]

This is a complexity that makes it harder, but not insurmountable.

It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.

▲

bayindirh 4 hours ago | parent | next [-]

> Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.

When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?

Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.

▲

jtbayly 4 hours ago | parent | next [-]

Reading your comments, it sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files. I know you haven’t technically said that, but that’s what it sounds like.

I assume you don’t think that, so I’m curious, what would you propose positively?

▲

bayindirh 3 hours ago | parent [-]

> I know you haven’t technically said that, but that’s what it sounds like.

Yes, I didn't technically said that.

> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.

I don't argue neither, either.

What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.

You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.

I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.

Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.

	▲	nine_k 2 hours ago \| parent [-]
		This. You should not try to backup your local cache of cloud files as if those were your local files. Use a tool that talks to the cloud storage directly. Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)

▲

vladvasiliu 4 hours ago | parent | prev | next [-]

But if the files are only on the remote storage and not local, chances are they haven't been modified recently, so it shouldn't download them fully, just check the metadata cache for size / modification time and let them be if they didn't change.

So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.

▲

bayindirh 4 hours ago | parent [-]

You can't trust size and modification time all the time, though mdate is a better indicator, it's not foolprooof. The only reliable way will be checksumming.

Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.

▲

NetMageSCW 3 hours ago | parent [-]

If you can’t trust modification time you are doing something so unusual that you probably need to be handling your backups privately anyway.

	▲	bayindirh 3 hours ago \| parent [-]
		I don't think so. Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A. Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.

▲

Chaosvex 3 hours ago | parent | prev [-]

Then do it in memory, assuming those services allow you to read the files like that. It sounds like they do based on your other comments.

	▲	bayindirh 3 hours ago \| parent [-]
		The problem is, downloading files and disk management is not in your control, that part is managed by the cloud client (dropbox, google drive, et. al) transparently. The application accessing the file is just waiting akin to waiting for a disk spin up. The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.

▲

NetMageSCW 3 hours ago | parent | prev [-]

Why would they do new backups of old files all the time? They would just skip those.

▲

Dylan16807 4 hours ago | parent | prev [-]

Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.

▲

bayindirh 4 hours ago | parent [-]

> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.

The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.

▲

Dylan16807 4 hours ago | parent [-]

Why would it use either of those on all the files at once? It should only be opening enough files to fill the upload buffer.

▲

ethin 3 hours ago | parent | next [-]

I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.

And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).

	▲	Dylan16807 3 hours ago \| parent [-]
		No, I'm not confusing anything. Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed. If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.

▲

bayindirh 4 hours ago | parent | prev [-]

Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.

▲

Dylan16807 4 hours ago | parent [-]

That sounds very acceptable to get those files backed up.

It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.

▲

bayindirh 4 hours ago | parent [-]

I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.

> more ISPs need to improve upload.

I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.

Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.

▲

Dylan16807 4 hours ago | parent | next [-]

4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.

> the technical limits of the technology that I had at home (PON for the time being)

Isn't that usually symmetrical? Is yours not?

▲

bayindirh 3 hours ago | parent | next [-]

> 4 writes out of what, 3000?

Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.

> For something you'll need to do once or twice ever?

I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.

> Isn't that usually symmetrical? Is yours not?

GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.

	▲	Dylan16807 3 hours ago \| parent [-]
		> I don't know you, but my cloud storage is living But you're probably changing less than 1% each day. And new changes are likely already in the cache, no need to download them. > if the software can't smartly ignore files, it'll Backblaze checks the modification date. > GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home. 2:1 is fine. If you're getting worse than 10:1 then that does sound like your ISP failed you?

▲

jonhohle 3 hours ago | parent | prev [-]

How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.

	▲	NetMageSCW 3 hours ago \| parent \| next [-]
		If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.
	▲	Dylan16807 3 hours ago \| parent \| prev [-]
		Backblaze already trusts the modification date.

▲

NetMageSCW 3 hours ago | parent | prev [-]

Why would it do that more than once unless you are modifying 4TB of data every day, in which case you are causing the problem.

	▲	bayindirh 3 hours ago \| parent [-]
		I don't know how your client works, but reading metadata (e.g. requesting size) off any file causes some cloud clients to download it completely. Of course I'm not modifying 4TB on a cloud drive, every day.