Remix.run Logo
d4rkn0d3z 5 days ago

This is a case of the metric becoming the target. The tools used to evaluate LLM performance are shaping the LLM. First you make your tools then your tools make you.

If we take a formal systems approach, then an LLM is a model of a complex hierarchy of production rules corresponding to the various formal and informal grammatical, logical, and stylistic rules and habits employed by humans to form language that expresses their intelligence. It should not be surprising that simply executing the production rules, or a model thereof, will give rise to sentences that cannot be assigned a meaning. It should also give rise to sentences that we cannot prove or make sense of immediately but we would not want to discard these due to uncertainty. Why? because every once in a while the sentence that would be culled is actually the stroke of brilliance we are looking for, uncertainty be damned. The citation here would be literally nearly every discovery ever made.

When I recall information and use it, when I "think", I don't just produce sentences by the rules, formal and informal, I don't consider at all how often I have seen one word precede another in past, rather as I meandre the landscape of a given context, a thought manifold if you will, I am constantly evaluating whether this is in contradiction with that, if this can be inferred from that via induction or deduction, does this preclude that, etc.. That is the part that is missing from an LLM; The uncanny ability of the human mind to reproduce the entire manifold of concepts as they relate to one another in a mesh from any small piece of the terrain that it might recall, and to verify anew that they all hang together unsupported by one's own biases.

The problem is that just as the scarcity of factual information in the corpus makes it difficult to produce, so is actual reasoning rarefied among human language samples. Most of what appears as reasoning is language games and will to power. The act of reasoning in an unbiased way is so foreign to humans, so painful and arduous, so much like bending over backwards or swimming upstream against a strong current of will to power, that almost nobody does it for long.