Remix.run Logo
arendtio 8 hours ago

> By scaling up model parameters and leveraging substantial computational resources

So, how large is that new model?

marcd35 6 hours ago | parent | next [-]

While Qwen2.5 was pre-trained on 18 trillion tokens, Qwen3 uses nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.

https://qwen.ai/blog?id=qwen3

arendtio 4 hours ago | parent [-]

Thanks for the info, but I don't think it answers the question. I mean, you could train a 20-node network on 36 trillion tokens. Wouldn't make much sense, but you could. So I was asking more about the number of nodes / parameters or GB of file size.

In addition, there seem to be many different versions of Qwen3. E.g. here the list from ollama library: https://ollama.com/library/qwen3/tags

gunalx 2 hours ago | parent [-]

This is the Max series models with unreleased weights, so probably larger than the largest released one. Also when refering to models, use huggingface or modelscope (wherever it is published) ollama is a really poor source on model info. they have some some bad naming (like confusing people on the deepseek R1 models), renaming, and more on model names, and they default to q4 quants, witch is a good sweet-spot but really degrades performance compared to the raw weigths.

naji_alazhar 7 hours ago | parent | prev [-]

[dead]