| ▲ | guenthert 2 days ago | ||||||||||||||||
Now I'm no expert in that matter, but the fs deduplicators I've seen were block, not file, based. Those can clearly not use the file length as they are blissfully unaware of files (or any structure for that matter). Those use a rather expensive hash function (you really want to avoid hash collisions), but (at least some ten years ago) memory, not processing speed, was the limiting factor. | |||||||||||||||||
| ▲ | eru 2 days ago | parent [-] | ||||||||||||||||
https://github.com/sahib/rmlint is the one I had in mind. > Those use a rather expensive hash function (you really want to avoid hash collisions), [...] Then we are clearly not thinking of the same kind of software. > but (at least some ten years ago) memory, not processing speed, was the limiting factor. In what I described, IO is the limiting factor. You want to avoid having to read the whole file, if you can. I think you are thinking of block level online deduplicators that are integrated into the file system? | |||||||||||||||||
| |||||||||||||||||