Remix.run Logo
bayindirh 4 hours ago

> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.

The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.

Dylan16807 4 hours ago | parent [-]

Why would it use either of those on all the files at once? It should only be opening enough files to fill the upload buffer.

ethin 3 hours ago | parent | next [-]

I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.

And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).

Dylan16807 3 hours ago | parent [-]

No, I'm not confusing anything.

Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.

If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.

bayindirh 4 hours ago | parent | prev [-]

Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.

Dylan16807 4 hours ago | parent [-]

That sounds very acceptable to get those files backed up.

It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.

bayindirh 4 hours ago | parent [-]

I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.

> more ISPs need to improve upload.

I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.

Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.

Dylan16807 4 hours ago | parent | next [-]

4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.

> the technical limits of the technology that I had at home (PON for the time being)

Isn't that usually symmetrical? Is yours not?

bayindirh 3 hours ago | parent | next [-]

> 4 writes out of what, 3000?

Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.

> For something you'll need to do once or twice ever?

I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.

> Isn't that usually symmetrical? Is yours not?

GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.

Dylan16807 3 hours ago | parent [-]

> I don't know you, but my cloud storage is living

But you're probably changing less than 1% each day. And new changes are likely already in the cache, no need to download them.

> if the software can't smartly ignore files, it'll

Backblaze checks the modification date.

> GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.

2:1 is fine. If you're getting worse than 10:1 then that does sound like your ISP failed you?

jonhohle 3 hours ago | parent | prev [-]

How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.

NetMageSCW 3 hours ago | parent | next [-]

If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.

Dylan16807 3 hours ago | parent | prev [-]

Backblaze already trusts the modification date.

NetMageSCW 3 hours ago | parent | prev [-]

Why would it do that more than once unless you are modifying 4TB of data every day, in which case you are causing the problem.

bayindirh 3 hours ago | parent [-]

I don't know how your client works, but reading metadata (e.g. requesting size) off any file causes some cloud clients to download it completely.

Of course I'm not modifying 4TB on a cloud drive, every day.