▲ | optionalsquid 17 hours ago | |||||||||||||
The fact that these formats are unable to represent degenerate bases (Ns in particular, but also the remaining IUPAC bases), in my experience renders them unusable for many, if not most, use-cases, including for the storage of FASTQ data | ||||||||||||||
▲ | dwattttt 16 hours ago | parent [-] | |||||||||||||
The question of how to represent things not specified in the original format is a tough one. At the loosest end a format can leave lots of space for new symbols, and you can just use those to represent something new. But then not everyone agrees on what the new symbol means, and worse multiple groups can use symbols to mean different things. On the other end of the spectrum, you can be strict about the format, and not leave space for new symbols. Then to represent new things you need a new standard, and people to agree on it. It's mostly a question of how well code can be updated and agreed upon, how strict you can require your tooling to be w.r.t. formats. | ||||||||||||||
|