Remix.run Logo
arendtio 3 hours ago

Thanks for the info, but I don't think it answers the question. I mean, you could train a 20-node network on 36 trillion tokens. Wouldn't make much sense, but you could. So I was asking more about the number of nodes / parameters or GB of file size.

In addition, there seem to be many different versions of Qwen3. E.g. here the list from ollama library: https://ollama.com/library/qwen3/tags

gunalx an hour ago | parent [-]

This is the Max series models with unreleased weights, so probably larger than the largest released one. Also when refering to models, use huggingface or modelscope (wherever it is published) ollama is a really poor source on model info. they have some some bad naming (like confusing people on the deepseek R1 models), renaming, and more on model names, and they default to q4 quants, witch is a good sweet-spot but really degrades performance compared to the raw weigths.