| ▲ | e12e 4 hours ago | |||||||||||||
Odd, I'd imagine Wikisource (in many/all languages) would be part of training data for all LLMs with SOTA ambition? | ||||||||||||||
| ▲ | vidarh 4 hours ago | parent [-] | |||||||||||||
You'd think so. It seems like there are a lot of odd gaps like that. I also have a favourite English language PhD thesis I ask every new model about that they still struggle to find even though there's a Wikipedia article about it that links a blog post I wrote about it. Anyone who thinks they've exhausted even publicly crawlable resources should ask them about some obscure stuff. | ||||||||||||||
| ||||||||||||||