▲ | yorwba 4 days ago | |
This HRM is essentially purpose-designed for solving puzzles with a small number of rules interacting in complex ways. Because the number of rules is small, a small model can learn them. Because the model is small, it can be run many times in a loop to resolve all interactions. In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other, so I don't think you could ever get away with a similarly small model. Fortunately, a comparatively small number of steps typically seems to be enough to get decent results. But if you tried to use an LLM-sized model in an HRM-style loop, it would be dog slow, so I don't expect anyone to try it anytime soon. Certainly not within a month. Maybe you could have a hybrid where an LLM has a smaller HRM bolted on to solve the occasional constraint-satisfaction task. | ||
▲ | marcosdumay 3 days ago | parent | next [-] | |
> In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other A person has some ~10k word vocabulary, with words fitting specific places in a really small set of rules. All combined, we probably have something on the order of a few million rules in a language. What, yes, is larger than the thing in this paper can handle. But is nowhere near as large as a problem that should require something the size of a modern LLM to handle. So it's well worth it to try to enlarge models with other architectures, try hybrid models (note that this one is necessarily hybrid already), and explore every other possibility out there. | ||
▲ | energy123 4 days ago | parent | prev [-] | |
What about many small HRM models that solve conceptually distinct subtasks as determined and routed to by a master model who then analyzes and aggregates the outputs, with all of that learned during training. |