| ▲ | dekhn 3 hours ago | |||||||
The simple model for scp and rsync (it's likely more complex in rsync): for loop over all files. for each file, determine its metadata with fstat, then fopen and copy bytes in chunks until done. Proceed to next iteration. I don't know what rsync does on top of that (pipelining could mean many different things), but my empirical experience is that copying 1 1 TB file is far faster than copying 1 billion 1k files (both sum to ~1 TB), and that load balancing/partitioning/parallelizing the tool when copying large numbers of small files leads to significant speedups, likely because the per-file overhead is hidden by the parallelism (in addition to dealing with individual copies stalling due to TCP or whatever else). I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallel, which I do not think it does, while tools like rclone, kopia, and aws sync all take advantage of parallelism (multiple ongoing file lookups and copies). | ||||||||
| ▲ | nh2 2 hours ago | parent [-] | |||||||
> I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallel No, that is not the question. Even Wikipedia explains that rsync is single-threaded. And even if it was multithreaded "or otherwise" used concurent file IO: The question is whether rsync _transmission_ is pipelined or not, meaning: Does it wait for 1 file to be transferred and acknowledged before sending the data of the next? Somebody has to go check that. If yes: Then parallel filesystem access won't matter, because a network roundtrip has brutally higher latency than reading data sequentially of an SSD. | ||||||||
| ||||||||