And distillation makes the compute moat irrelevant. You could spend trillions to train a model, but some companies is going to get enough data from your model and distill it's own at a much cheaper upfront cost. This would allow them to offer them for cheaper inference cost too, totally defeating the point of spending crazy money on training.

▲

fredoliveira 3 days ago | parent [-]

A couple of counter-arguments:

Labs can just step up the way they track signs of prompts meant for model distillation. Distillation requires a fairly large number of prompt/response tuples, and I am quite certain that all of the main labs have the capability to detect and impede that type of use if they put their backs into it.

Distillation doesn't make the compute moat irrelevant. You can get good results from distillation, but (intuitively, maybe I'm wrong here because I haven't done evals on this myself) you can't beat the upstream model in performance. That means that most (albeit obviously not all) customers will simply gravitate toward the better performing model if the cost/token ratio is aligned for them.

Are there always going to be smaller labs? Sure, yes. Is the compute mote real, and does it matter? Absolutely.

	▲	serf 2 days ago \| parent [-]
		>Labs can just step up the way they track signs of prompts meant for model distillation. Distillation requires a fairly large number of prompt/response tuples, and I am quite certain that all of the main labs have the capability to detect and impede that type of use if they put their backs into it. ....while degrading their service for paying customers. This is the same problem as law-enforcement-agency forwarding threats and training LLMs to avoid user-harm -- it's great if it works as intended, but more often than not it throws a lot more prompt cancellations at actual users by mistake, refuses queries erroneously -- and just ruins user experience. i'm not convinced any of the groups can avoid distillation without ruining customer experience.