| ▲ | adamzwasserman 3 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
LLMs don't use 'overall probability' in any meaningful sense. During training, gradient descent creates highly concentrated 'gravity wells' of correlated token relationships - the probability distribution is extremely non-uniform, heavily weighted toward patterns seen in training data. The model isn't selecting from 'astronomically many possible sequences' with equal probability; it's navigating pre-carved channels in high-dimensional space. That's fundamentally different from novel discovery. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | KoolKat23 3 days ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
That's exactly the same for humans in the real world. You're focusing too close, abstract up a level. Your point relates to the "micro" system functioning, not the wider "macro" result (think emergent capabilities). | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||