Remix.run Logo
MrMcCall 3 days ago

I'm curious how many defects creep into the process. I doubt the scientists have any idea, and I'm pretty sure the process involves a great deal of statistical inference to reconstruct the genome from many, many small chunks of DNA.

How would they even measure their accuracy? By definition, there's not even a baseline to measure against, right?

bglazer 3 days ago | parent | next [-]

This process is called genome alignment. It’s actually quite a fascinating computer science problem that has received a ton of study over the years. I think the classical techniques treat it as a dynamic programming problem but I’m not sure how the most modern alignment tools work.

There are a number of ways that we can check for errors. First, there are many different sequencing and alignment tools, each with different characteristics. For example, by cross checking long read sequencing from a nanopore sequencing deveice and more common Illumina paired end sequencing, we can see where they agree or disagree and then further check with another validated method like Sanger sequencing, if we’re really confused about which is correct. Also, we already know a bit about biology, so we can check the sequence for obviously wrong patterns. Like if our sequencing says the ferret has a mutation that would destroy a critical protein’s function (e.g. a frameshift or premature stop codon) but the ferret looks fine, then we can reasonably infer that the sequencing was wrong somehow. Finally, you’re right that there’s not a “baseline”. All processes in biology are inherently lossy. That said genome sequencing uses pieces of the cellular machinery (DNA polymerase) that can copy gene sequences with incredibly high fidelity, so we rely on biology’s incredible achievement to be reasonably sure that we’re getting the “right” answers.

MrMcCall 2 days ago | parent [-]

That is a truly fantastic answer. Thank you so much. Our high school son was studying his high school genetics today, so it's also a fantastic tie-in. (He prefers his days off during the week, so he put in some hours today; flexibility FTW.)

This is the best of what the intenet can be.

bglazer 2 days ago | parent [-]

Cheers! I hope your son enjoys learning about genetics, sometimes the introductory classes neglect be really weird beautiful fascinating parts in favor of easily tested concepts.

Etheryte 3 days ago | parent | prev | next [-]

Finding out how accurate this approach is should be trivial, no? Take some other baseline tissue where you know the ground truth, freeze it, then follow the procedure and check what you get. Do this a few times and you'll have a pretty good estimate of the accuracy.

anotherpaul 2 days ago | parent | prev | next [-]

I could not find the answer for you, but I speculate not many: the frozen dna was only stored for 35 years. Cloning also does not need to introduce additional mutations into the dna afaik. So I'd expect very few if any mutations from the material they froze.

And to the reconstruction: I don't think it does in this case. As they talk about working with frozen cells and we are not in the realm of frozen Mammuts who's genomes has degraded and needs to be reconstructed. In that case yes, we would talk about smaller pieces.

MrMcCall 2 days ago | parent [-]

Thanks, that's another fantastic answer.

dbetteridge 3 days ago | parent | prev | next [-]

It's an interesting question but prompts the rebuttal, isn't that just how things work in nature?

No process is 100% clean in biology, that's how we get random mutation and evolution of species

gus_massa 3 days ago | parent | prev [-]

> How would they even measure their accuracy?

My guessis that they already used the same tools with sheeps or rats in a freezer. Combining that info with some other tool to measure how much dqmage the sample gpt [1] may give an estimation.

[1] I guess the exact damage depends on how fast it froze and the changes of tenperature and other difficut to know details.