Remix.run Logo
IshKebab a day ago

Damn surely you stop using ASCII formats before your dataset gets to 2 TB??

rurban a day ago | parent | next [-]

Ha. it gets worse. Search engines or blacklist processors often use gigantic url lists, which are stored as plain ASCII, which is then fed into a perfect hash generator, which accesses those url's unordered. I.e. they need to create a second ordering index to access the urllist. The perfect hashing guys are mathematicians and so they don't care because their definition of a mphf (minimal perfect hash function) is just a random ordering of unique indices, but they don't care to store the ordering also. So we have ASCII and no index.

a day ago | parent | prev | next [-]
[deleted]
bede a day ago | parent | prev | next [-]

BAM format is widely used but assemblies still tend to be generated and exchanged in FASTA text. BAM is quite a big spec and I think it's fair to say that none of the simpler binary equivalents to FASTA and FASTQ have caught on yet (XKCD competing standards etc.)

e.g. https://github.com/ArcInstitute/binseq

hhh a day ago | parent | prev | next [-]

no, I power thru indefinitely with no recourse

amelius a day ago | parent | prev [-]

People rely on compression for that ;)