| ▲ | naasking a day ago | |||||||
This layer duplication strikes me as a bit of "poor man's" version of looped language models: Pretty cool though. LLM brain surgery. | ||||||||
| ▲ | dnhkng a day ago | parent [-] | |||||||
Agrees, but one thing to note: I really think from the experiments that 'organs' (not sure what to term this), develop during massive pretraining. This also means maybe looping the entire models is actually not efficient. Maybe a better way is [linear input section -> loop 1 -> linear section -> loop 2 -> linear section -> ... -> loop n -> linear output]? This would give 'organs' space to develop. | ||||||||
| ||||||||