| ▲ | noosphr 7 hours ago | |||||||||||||
>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. | ||||||||||||||
| ▲ | smallerize 5 hours ago | parent | next [-] | |||||||||||||
Does this mean that if your model is "overfitting", the solution is to train for even more epochs? | ||||||||||||||
| ▲ | ForceBru 6 hours ago | parent | prev [-] | |||||||||||||
Right, isn't double descent one of the reasons why modern Extremely Large Language Models work at all? I think I heard somewhere that basically all today's "smart" (reasoning, solving math problems, etc) LLMs are trained in the "double descent" territory (whatever this means, I'm not entirely sure). | ||||||||||||||
| ||||||||||||||