| ▲ | singpolyma3 7 hours ago | |
Next do "why LLMs work" | ||
| ▲ | inkysigma 34 minutes ago | parent | next [-] | |
This is essentially an open research question. ML theory is unfortunately very weak relative to where the empirics are. I think there's a relatively optimistic paper that was posted a while back here but I would also take it with a grain of salt. https://arxiv.org/abs/2604.21691 There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing. | ||
| ▲ | krackers 5 hours ago | parent | prev | next [-] | |
See Tegmark's "why does deep cheap learning work so well" (well not so cheap anymore...) https://www.youtube.com/watch?v=5MdSE-N0bxs is remarkably prescient given that it was written before LLMs | ||
| ▲ | sheeshkebab 7 hours ago | parent | prev | next [-] | |
considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works... | ||
| ▲ | soupspaces 6 hours ago | parent | prev | next [-] | |
Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws. | ||
| ▲ | skydhash 6 hours ago | parent | prev [-] | |
Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy. | ||