▲ | alt227 3 days ago | |||||||
The biggest capacity HD available today is 30TB. 17PB = about 567 of those drives... being totally filled... per day. I was hoping somebody would come and say this is a simple spelling error or something. The cost of the drives alone seems astronomical, let alone the logistics of the data center keeping up with storing that much data. EDIT: I have just realised that they are probably only processing at this speed, rather than storing it, can anyone confirm if they store all the logs they process? | ||||||||
▲ | enether 3 days ago | parent [-] | |||||||
I would assume storage varies greatly. I know that LinkedIn quoted an average read fanout ratio of 5.5x in Kafka, meaning each byte was read 5.5x times. Assuming that is still true, we ought to divide by 6.5x to get to the daily write amount That comes out to 87 disks a day. Assuming a 7 day retention period (this is on the high side), it’s not unthinkable to have a 600-1800 disk deployment (accounting for replication copies) | ||||||||
|