| ▲ | coldtea 2 hours ago | |
>But in general it's self-evident that training the model on information that is irrelevant to your use case does not necessarily improve ability, otherwise you'd have AGI just from reinforcing your model on memorizing the first 10^50 digits of pi. It's hardly self-evident, and your counter-example is hardly applicable. The first 10^50 of pi is not the same as having BREADTH of information in the training data, which is the whole point not just any random "information that is irrelevant to your use case". not to mention that the first 10^50 digits of pi compress to quite small formula, so not much information there to begin with from a shannon/kolmogorov perspective | ||
| ▲ | kibwen an hour ago | parent [-] | |
It is self-evident. Bringing up Kolmogorov complexity is irrelevant, we're talking about rote memorization, but if you can't ignore the given example then replace "digits of pi" with "bits of output from a true random number generator". There's an infinite amount of information that we could shove into a model, and a finite amount of bits with which to store any of that information such that it can be usefully recalled or form useful logical associations. | ||