▲ | jiggawatts 4 days ago | |||||||||||||||||||||||||||||||||||||
That will probably never happen because of the fundamental nature of blob storage. Individual objects are split into multiple blocks, each of which can be stored independently on different underlying servers. Each can see its own block, but not any other block. Calculating a hash like SHA256 would require a sequential scan through all blocks. This could be done with a minimum of network traffic if instead of streaming the bytes to a central server to hash, the hash state is forwarded from block server to block server in sequence. Still though, it would be a very slow serial operation that could be fairly chatty too if there are many tiny blocks. What could work would be to use a Merkle tree hash construction where some of subdivision boundaries match the block sizes. | ||||||||||||||||||||||||||||||||||||||
▲ | texthompson 4 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Why would you PUT an object, then download it again to a central server in the first place? If a service is accepting an upload of the bytes, it is already doing a pass over all the bytes anyway. It doesn't seem like a ton of overhead to calculate SHA256 in the 4092-byte chunks as the upload progresses. I suspect that sort of calculation would happen anyways. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | flakes 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
You have just re-invented IPFS! https://en.m.wikipedia.org/wiki/InterPlanetary_File_System | ||||||||||||||||||||||||||||||||||||||
▲ | losteric 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Why does the architect of blob storage matter? The hash can be calculated as data streams in for the first write, before data gets dispersed into multiple physically stored blocks. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | Salgat 4 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
Isn't that the point of the metadata? Calculate the hash ahead of time and store it in the metadata as part of the atomic commit for the blob (at least for S3). |