▲ | IshKebab a day ago | |
Damn surely you stop using ASCII formats before your dataset gets to 2 TB?? | ||
▲ | rurban a day ago | parent | next [-] | |
Ha. it gets worse. Search engines or blacklist processors often use gigantic url lists, which are stored as plain ASCII, which is then fed into a perfect hash generator, which accesses those url's unordered. I.e. they need to create a second ordering index to access the urllist. The perfect hashing guys are mathematicians and so they don't care because their definition of a mphf (minimal perfect hash function) is just a random ordering of unique indices, but they don't care to store the ordering also. So we have ASCII and no index. | ||
▲ | a day ago | parent | prev | next [-] | |
[deleted] | ||
▲ | bede a day ago | parent | prev | next [-] | |
BAM format is widely used but assemblies still tend to be generated and exchanged in FASTA text. BAM is quite a big spec and I think it's fair to say that none of the simpler binary equivalents to FASTA and FASTQ have caught on yet (XKCD competing standards etc.) | ||
▲ | hhh a day ago | parent | prev | next [-] | |
no, I power thru indefinitely with no recourse | ||
▲ | amelius a day ago | parent | prev [-] | |
People rely on compression for that ;) |