▲ | ijk 7 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The judge's ruling from earlier certainly seemed to me to suggest that the training was fair use. Obviously, that's not part of the current settlement. I'm no expert on this, so I don't know the extent to which the earlier ruling applies. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | hcs 7 days ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
If I'm reading this right yes the training was fair use, but I was responding (unclearly) to the claim that the pirated books weren't used to train commercially released LLMs. The judge complained that it wasn't clear what was actually used, from the June order https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/... [pdf]: > Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and millions of other books was justified because all those copies were at least reasonably necessary for training LLMs — and yet Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs. > We know that Anthropic has more information about what it in fact copied for training LLMs (or not). Anthropic earlier produced a spreadsheet that showed the composition of various data mixes used for training various LLMs — yet it clawed back that spreadsheet in April. A discovery dispute regarding that spreadsheet remains pending. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|