Remix.run Logo
bglazer 3 days ago

This process is called genome alignment. It’s actually quite a fascinating computer science problem that has received a ton of study over the years. I think the classical techniques treat it as a dynamic programming problem but I’m not sure how the most modern alignment tools work.

There are a number of ways that we can check for errors. First, there are many different sequencing and alignment tools, each with different characteristics. For example, by cross checking long read sequencing from a nanopore sequencing deveice and more common Illumina paired end sequencing, we can see where they agree or disagree and then further check with another validated method like Sanger sequencing, if we’re really confused about which is correct. Also, we already know a bit about biology, so we can check the sequence for obviously wrong patterns. Like if our sequencing says the ferret has a mutation that would destroy a critical protein’s function (e.g. a frameshift or premature stop codon) but the ferret looks fine, then we can reasonably infer that the sequencing was wrong somehow. Finally, you’re right that there’s not a “baseline”. All processes in biology are inherently lossy. That said genome sequencing uses pieces of the cellular machinery (DNA polymerase) that can copy gene sequences with incredibly high fidelity, so we rely on biology’s incredible achievement to be reasonably sure that we’re getting the “right” answers.

MrMcCall 2 days ago | parent [-]

That is a truly fantastic answer. Thank you so much. Our high school son was studying his high school genetics today, so it's also a fantastic tie-in. (He prefers his days off during the week, so he put in some hours today; flexibility FTW.)

This is the best of what the intenet can be.

bglazer 2 days ago | parent [-]

Cheers! I hope your son enjoys learning about genetics, sometimes the introductory classes neglect be really weird beautiful fascinating parts in favor of easily tested concepts.