▲ | lispitillo 4 days ago | ||||||||||||||||||||||
I hope/fear this HRM model is going to be merged with MoE very soon. Given the huge economic pressure to develop powerful LLMs I think this can be done in just a month. The paper seems to only study problems like sudoku solving, and not question answering or other applications of LLMs. Furthermore they omit a section for future applications or fusion with current LLMs. I think anyone working in this field can envision their applications, but the details to have a MoE with an HRM model could be their next paper. I only skimmed the paper and I am not an expert, sure other will/can explain why they don't discuss such a new structure. Anyway, my post is just blissful ignorance over the complexity involved and the impossible task to predict change. Edit: A more general idea is that Mixture of Expert is related to cluster of concepts and now we would have to consider a cluster of concepts related by the time they take to be grasped, so in a sense the model would have in latent space an estimation of the depth, number of layers, and time required for each concept, just like we adapt our reading style for a dense math book different to a newspaper short story. | |||||||||||||||||||||||
▲ | yorwba 4 days ago | parent | next [-] | ||||||||||||||||||||||
This HRM is essentially purpose-designed for solving puzzles with a small number of rules interacting in complex ways. Because the number of rules is small, a small model can learn them. Because the model is small, it can be run many times in a loop to resolve all interactions. In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other, so I don't think you could ever get away with a similarly small model. Fortunately, a comparatively small number of steps typically seems to be enough to get decent results. But if you tried to use an LLM-sized model in an HRM-style loop, it would be dog slow, so I don't expect anyone to try it anytime soon. Certainly not within a month. Maybe you could have a hybrid where an LLM has a smaller HRM bolted on to solve the occasional constraint-satisfaction task. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | buster 4 days ago | parent | prev [-] | ||||||||||||||||||||||
must say I am suspicious in this regard, as they don't show applications other than a Sudoku solver and don't discuss downsides. | |||||||||||||||||||||||
|