The main challenge with using LLMs pretrained on internet text for transcript correction is that you reduce verbatimicity due to the nature of an LLM wanting to format every transcript as internet text.

Talking has a lot of nuances to it. Just try to read a Donald Trump transcript. A professional author would never write a book's dialogue like that.

Using a generic LLM on transcripts almost always reduces accuracy as a whole. We have endless benchmark data to demonstrate this at RevAI. It does, however, help with custom vocabulary, rare words, proper nouns, and some people prefer the "readability" of an LLM-formatted transcript. It will read more like a wikipedia page or a book as opposed to the true nature of a transcript, which can be ugly, messy, and hard to parse at times.

▲

phrotoma 8 months ago | parent | next [-]

I googled "verbatimicity" and all I could find was stuff published by rev.ai which didn't (at a quick glance) define the term. Can you clarify what this means?

	▲	depr 8 months ago \| parent [-]
		Most likely they mean the degree of being verbatim or exact in reproduction.

▲

dylan604 8 months ago | parent | prev [-]

> A professional author would never write a book's dialogue like that.

That's a bit too far. Ever read Huck Finn?