Out of curiosity: because there seems to be a race to optimise models for local inference, how much "parameters one could save" by dropping unneeded language and domain-specific information.

Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?

▲

canyon289 6 days ago | parent | next [-]

This is a key question we faced when building this model. It comes down to basically to "how good" to you need to be at "how many things". We had to make some choices with this model and do our best to maximize performance in those areas.

To answer this more precisely its a matter of choosing different data and training regimes and checking performance with evals.

And to make this fully concrete you're welcome to give it a try! Train this model on a taskset of your choice and measure the performance tradeoffs. You'll get a good sense of how LLM capabilities shift

▲

tucnak 6 days ago | parent | prev [-]

Unfortunately, it doesn't quite work like that. Google this: transfer learning.

▲

bigmadshoe 6 days ago | parent [-]

I’m work in ML and I don’t understand your point. Transfer learning usually refers to leveraging data for a different task to help with a task for which you have limited data.

You’re saying that the knowledge gained from the other languages transfers to English? I don’t think for a 270M parameter model the bottleneck is the availability of enough English language training data.

▲

tucnak 6 days ago | parent [-]

> You’re saying that the knowledge gained from the other languages transfers to English?

Yes, there has been many results circa 2020 or so, that have shown this to be the case. More recently, we have observed something similar with verifiable domains (see RLVR and related results) when it comes to coding tasks, specifically.

	▲	bigmadshoe 2 days ago \| parent [-]
		Right, but my point is that a 270M parameter model will not be bottlenecked by the availability of data for the entire English language.