| ▲ | throw310822 an hour ago | |
I keep thinking of the RYS (Repeat Yourself) experiment of simply looping some of the inner layers of LLMs for better results and wonder if any progress was made on it. https://dnhkng.github.io/posts/rys/ Feels it should be straightforward to integrate in LLMs a network to control the looping. Or just duplicate entire blocks of layers after the initial training. | ||