why not? datasets are not only finished works, there's datasets that go into the process they're just available in smaller quantities

▲

elmomle an hour ago | parent [-]

Let's take the work of Raymond Carver as just one example. He would type drafts which would go through repeated iteration with a massive amount of hand-written markup, revision and excision by his editor.

To really recreate his writing style, you would need the notes he started with for himself, the drafts that never even made it to his editor, the drafts that did make to the editor, all the edits made, and the final product, all properly sequenced and encoded as data.

In theory, one could munge this data and train an LLM and it would probably get significantly better at writing terse prose where there are actually coherent, deep things going on in the underlying story (more generally, this is complicated by the fact that many authors intentionally destroy notes so their work can stand on its own--and this gives them another reason to do so). But until that's done, you're going to get LLMs replicating style without the deep cohesion that makes such writing rewarding to read.

	▲	mold_aid 23 minutes ago \| parent \| next [-]
		A good point. "Famous author" is a marketing term for Grammarly here; it's easy to conceive of an "author" as being an individual that we associate with a finite set of published works, all of which contain data. But authors have not done this work alone. Grammarly is not going to sell "get advice from the editorial team at Vintage" or "Grammarly requires your wife to type the thing out first, though" I'll also note that no human would probably want advice from the living versions of the author themselves.
	▲	simianwords 31 minutes ago \| parent \| prev [-]
		Can a human replicate style without understanding process? Yes we can. We do it all the time with Shakespeare. Why not LLMs? I can do it at the moment with Shakespeare an LLMs.