▲ | leetharris 6 hours ago | |
The main challenge with using LLMs pretrained on internet text for transcript correction is that you reduce verbatimicity due to the nature of an LLM wanting to format every transcript as internet text. Talking has a lot of nuances to it. Just try to read a Donald Trump transcript. A professional author would never write a book's dialogue like that. Using a generic LLM on transcripts almost always reduces accuracy as a whole. We have endless benchmark data to demonstrate this at RevAI. It does, however, help with custom vocabulary, rare words, proper nouns, and some people prefer the "readability" of an LLM-formatted transcript. It will read more like a wikipedia page or a book as opposed to the true nature of a transcript, which can be ugly, messy, and hard to parse at times. | ||
▲ | dylan604 4 hours ago | parent [-] | |
> A professional author would never write a book's dialogue like that. That's a bit too far. Ever read Huck Finn? |