| ▲ | maxbond 13 hours ago | |||||||
I don't think there was very much abuse of "not just A, but B" before ChatGPT. I think that's more of a product of RLHF than the initial training. Very few people wrote with the incredibly overwrought and flowery style of AI, and the English speaking Internet where most of the (English language) training data was sourced from is largely casual, everyday language. I imagine other language communities on the Internet are similar but I wouldn't know. Don't we all remember 5 years ago? Did you regularly encounter people who write like every followup question was absolutely brilliant and every document was life changing? I think about why's (poignant) Guide to Ruby [1], a book explicitly about how learning to program is a beautiful experience. And the language is still pedestrian compared to the language in this book. Because most people find writing like that saccharin, and so don't write that way. Even when they're writing poetically. Regardless, some people born in England can speak French with a French accent. If someone speaks French to you with a French accent, where are you going to guess they were born? | ||||||||
| ▲ | PaulRobinson 12 hours ago | parent [-] | |||||||
It's been alleged that a major source of training data for many LLMs was libgen and SciHub - hardly casual. | ||||||||
| ||||||||