| ▲ | skydhash 2 hours ago | |
> The actual reason is due to complex biases that arise from the interaction of network architectures and the optimizers and persist in the regime where data scales proportionally to model size. The multiscale nature of the data induces neural scaling laws that enable better performance than any other class of models can hope to achieve. That’s a lot of words to say that, if you encode a class of things as numbers, there’s a formula somewhere that can approximate an instance of that class. It works for linear regression and works as well for neural network. The key thing here is approximation. | ||
| ▲ | bubblyworld an hour ago | parent [-] | |
That isn't what they are saying at all, lol. | ||