Remix.run Logo
refactor_master 2 hours ago

Can someone give me a sound argument for why, when these things supposedly hold:

- LLMs scale with amount of data on the subject

- Even frontier labs themselves have a hard time gauging exactly how well-performing models are, across a quite rigorous set of tests in all aspects

then, how can this be true:

Using a low-data "niche language" (what is the volume of literature written in Caveman?) is supposedly of equal performance, when this anecdotally doesn't hold for e.g. niche code languages, proven by a handful of completely arbitrarily designed tests.

We've barely convinced ourselves that LLMs actually increase measurable industry productivity, instead of us just spending time to send slop to each other.