Remix.run Logo
nelox 5 days ago

That framing is too narrow. A weather model is trained on physics equations but still relies on patterns in past data to make forecasts. A language model is trained on patterns in human text but that text already encodes mathematics, code, and reasoning. When prompted with a math problem, the model is not doing physics but it is reproducing the learned statistical structure of solutions people have written before. The distinction between “predicting language” and “solving math” is smaller than it seems because the training data couples symbols to meaning. Dismissing its outputs as “just predicting words” misses the fact that word distributions encode information-rich representations of knowledge. That is why large models can in practice generate working code, prove theorems, and reason through problems, even if they do so imperfectly. The right comparison is not that people are misusing them, but that they generalize beyond their design intent because language itself is the medium through which so many other domains are expressed.