| ▲ | johndough 3 hours ago | |
Do you have any clues to guess the total model size? I do not see any limitations to making models ridiculously large (besides training), and the Scaling Law paper showed that more parameters = more better, so it would be a safe bet for companies that have more money than innovative spirit. | ||
| ▲ | magicalhippo an hour ago | parent [-] | |
> I do not see any limitations to making models ridiculously large (besides training) From my understanding, the "besides training" is a big issue. As I noted earlier[1], Qwen3 was much better than Qwen2.5, but the main difference was just more and better training data. The Qwen3.5-397B-A17B beat their 1T-parameter Qwen3-Max-Base, again a large change was more and better training data. | ||