| ▲ | getoffit 8 hours ago | |
"Small models" will always outperform as they are deterministic (or closer to it). This was realized in 2023 already: https://newsletter.semianalysis.com/p/google-we-have-no-moat... "Less is best" is not a new realization. The concept exists across contexts. Music described as "overplayed". Prose described as verbose. We just went through an era of compute that chanted "break down your monoliths". NPM ecosystem being lots of small little packages to compose together. Unix philosophy of small composable utilities is another example. So models will improve as they are compressed, skeletonized down to opcodes, geometric models to render, including geometry for text as the bytecode patterns for such will provide the simplest model for recreating the most outputs. Compressing out useless semantics from the state of the machines operations and leaving the user to apply labels at the presentation layer. | ||
| ▲ | nguyentran03 6 hours ago | parent | next [-] | |
Small models aren't more deterministic than large ones. Determinism comes from temperature and sampling settings, not parameter count. A 7B model at temp 0.7 is just as stochastic as a 405B model. The "no moat" memo you linked was about open source catching up to closed models through fine-tuning, not about small models outperforming large ones. I'm also not sure what "skeletonized down to opcodes" or "geometry for text as bytecode patterns" means in the context of neural networks. Model compression is a real field (quantization, distillation, pruning) but none of it works the way you're describing here. | ||
| ▲ | BoredomIsFun an hour ago | parent | prev | next [-] | |
> "Small models" will always outperform as they are deterministic (or closer to it). Your whole comment feels like, pardon me, like LARPing. No, small models do not outperform the large ones, unless finetuned. Saying that as someone who uses small models 95% vs cloud ones. | ||
| ▲ | 6 hours ago | parent | prev [-] | |
| [deleted] | ||