Remix.run Logo
totalperspectiv a day ago

Removing the wrapping newline from the FASTA/FASTQ convention also dramatically improves parsing perf when you don't have to do as much lookahead to find record ends.

Gethsemane a day ago | parent | next [-]

Unfortunately, when you write a program that doesn't wrap output FASTAs, you have a bunch of people telling you off because SOME programs (cough bioperl cough) have hard limits on line length :)

sharedptr 11 hours ago | parent | next [-]

Is BioPerl still standard, did people move to BioPython?

When I was shown BioPerl I was tempted to write a better, C++ version, but was overwhelmed by other university stuff and let it go.

o11c a day ago | parent | prev [-]

You can use content-defined chunking to wrap at a predictable place so that compression still works.

bede a day ago | parent | prev [-]

Thanks for reminding me to benchmark this!

totalperspectiv a day ago | parent [-]

I've only tested this when writing my own parser where I could skip the record end checks, so idk if this improves perf on a existing parser. Excited to see what you find!