| ▲ | refactor_master 2 hours ago | |
Can someone give me a sound argument for why, when these things supposedly hold: - LLMs scale with amount of data on the subject - Even frontier labs themselves have a hard time gauging exactly how well-performing models are, across a quite rigorous set of tests in all aspects then, how can this be true: Using a low-data "niche language" (what is the volume of literature written in Caveman?) is supposedly of equal performance, when this anecdotally doesn't hold for e.g. niche code languages, proven by a handful of completely arbitrarily designed tests. We've barely convinced ourselves that LLMs actually increase measurable industry productivity, instead of us just spending time to send slop to each other. | ||