| ▲ | driese 4 hours ago | |
Ever since I read about this, I have been thinking about the next logical step: train a NN to route the internal loops dynamically after each layer. Instead of just choosing a given set of layers that are repeated, let the new classifier decide whether it wants to loop, where it wants to loop, whether to loop multiple times, to loop a big part, or to just jump to the final layers straight away. Each token could loop more or less based on its relevance. It has some similarities of a MoE architecture, but instead of choosing experts, it chooses layer routes. Training this NN classifier together with the LLM could condense the required amount of layers for a given intelligence down drastically if it works. If anyone wants to work on this, feel free to send me a message. | ||