Remix clone Hacker News

new | show | ask | jobs Github

	▲	johndough 3 hours ago
		Do you have any clues to guess the total model size? I do not see any limitations to making models ridiculously large (besides training), and the Scaling Law paper showed that more parameters = more better, so it would be a safe bet for companies that have more money than innovative spirit.
	▲	magicalhippo an hour ago \| parent [-]
		> I do not see any limitations to making models ridiculously large (besides training) From my understanding, the "besides training" is a big issue. As I noted earlier[1], Qwen3 was much better than Qwen2.5, but the main difference was just more and better training data. The Qwen3.5-397B-A17B beat their 1T-parameter Qwen3-Max-Base, again a large change was more and better training data. [1]: https://news.ycombinator.com/item?id=47089780