▲ | jiggawatts 5 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
What I would love to see in an SCM that properly supports large binary blobs is storing the contents using Prolly trees instead of a simple SHA hash. Prolly trees are very similar to Merkle trees or the rsync algorithm, but they support mutation and version history retention with some nice properties. For example: you always obtain exactly the same tree (with the same root hash) irrespective of the order of incremental edit operations used to get to the same state. In other words, two users could edit a subset of a 1 TB file, both could merge their edits, and both will then agree on the root hash without having to re-hash or even download the entire file! Another major advantage on modern many-core CPUs is that Prolly trees can be constructed in parallel instead of having to be streamed sequentially on one thread. Then the really big brained move is to store the entire SCM repo as a single Prolly tree for efficient incremental downloads, merges, or whatever. I.e.: a repo fork could share storage with the original not just up to the point-in-time of the fork, but all future changes too. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | Dylan16807 5 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Can you list some realistic workflows where people would be touching the same huge file but only changing much smaller parts of it? And yes you can represent a whole repo as a giant tar file, but because the boundaries between hash segments won't line up with your file boundaries you get an efficiency hit with very little benefit. Unless you make it file-aware in which case it ends up even closer to what git already does. Git knows how to store deltas between files. Making that mechanism more reliable is probably able to achieve more with less. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | hinkley 5 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Git has had a good run. Maybe it’s time for a new system built by someone who learned about DX early in their career, instead of via their own bug database. If there’s a new algorithm out there that warrants a look… | |||||||||||||||||||||||||||||||||||||||||||||||
|