| ▲ | energy123 2 hours ago | |
> 67,383 * 426,397 = 71,371,609,051 ... You need to say why it can do some novel tasks but could never do others. Model interpretability gives us the answers. The reason LLMs can (almost) do new multiplication tasks is because it saw many multiplication problems in its training data, and it was cheaper to learn the compressed/abstract multiplication strategies and encode them as circuits in the network, rather than memorize the times tables up to some large N. This gives it the ability to approximate multiplication problems it hasn't seen before. | ||