▲ | jakobnissen a day ago | |||||||
SAM is not a bad file format. What's bad about SAM? | ||||||||
▲ | optionalsquid a day ago | parent [-] | |||||||
I don't dislike the format, and it is much, much better than what it replaced, but SAM, and its binary sister-format BAM, does have some flaws: - The original index format could not handle large chromosomes, so now there are two index formats: .bai and .csi - For BAM, the CIGAR (alignment description) operation count is limited to 16 bits, which means that very long alignments cannot be represented. One workaround I've seen (but thankfully not used) is saving the CIGAR as a string in a tag - SAM cannot unambiguously represent sequences with only a single base (e.g. after trimming), since a '*' in the quality column can be interpreted either as a single Phred score (9) or as a special value meaning "no qualities". BAM can represent such sequences unambiguously, but most tools output SAM | ||||||||
|