Remix.run Logo
Lerc 7 hours ago

When training lots of models with subtly different parameters like this, Is there anything to be learned from the differences in logprobs between them for the same input. Obviously a model with a lower loss has better logprobs but are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?

itissid 7 hours ago | parent [-]

> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?

It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.