| ▲ | minimaxir 4 days ago | ||||||||||||||||||||||||||||||||||||||||
> Note: we are not releasing any post-trained / IT checkpoints. I get not trying to cannibalize Gemma, but that's weird. A 540M multimodel model that performs well on queries would be useful and "just post-train it yourself" is not always an option. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | jeffjeffbear 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||
Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts? | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | sundarurfriend 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||
This made me compare the figures, and: did they accidentally switch those around, or are the Post-training Reasoning and Factuality scores actually significantly lower than the Pre-training ones? Edit: Just noticed > Also note pre-training and post-training benchmarks are different, so scores are not comparable across plots. The paper gives more details about the specific benchmarks and the scores obtained in them: https://arxiv.org/html/2512.14856v1#S4 | |||||||||||||||||||||||||||||||||||||||||