| ▲ | _fizz_buzz_ 7 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||
> their main trick for model improvement is distilling the SOTA models Could you elaborate? How is this done and what does this mean? | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | MobiusHorizons 7 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||