| ▲ | JCharante 2 hours ago | |||||||||||||
why not? datasets are not only finished works, there's datasets that go into the process they're just available in smaller quantities | ||||||||||||||
| ▲ | elmomle an hour ago | parent [-] | |||||||||||||
Let's take the work of Raymond Carver as just one example. He would type drafts which would go through repeated iteration with a massive amount of hand-written markup, revision and excision by his editor. To really recreate his writing style, you would need the notes he started with for himself, the drafts that never even made it to his editor, the drafts that did make to the editor, all the edits made, and the final product, all properly sequenced and encoded as data. In theory, one could munge this data and train an LLM and it would probably get significantly better at writing terse prose where there are actually coherent, deep things going on in the underlying story (more generally, this is complicated by the fact that many authors intentionally destroy notes so their work can stand on its own--and this gives them another reason to do so). But until that's done, you're going to get LLMs replicating style without the deep cohesion that makes such writing rewarding to read. | ||||||||||||||
| ||||||||||||||