Remix.run Logo
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space(arxiv.org)
53 points by gmays a day ago | 4 comments
miven 20 hours ago | parent | next [-]

I'm really glad that these HNet-inspired approaches are getting traction, I'm a big fan of that paper.

Though I wonder how much of the gains in this case are actually due to 75% extra parameters compared to the baseline, even if the inference FLOPs are matched.

Can't help but see this as a just different twist on parameter use sparsity idea leveraged by MoE models, as those also gain in performance at constant forward pass FLOPs because of extra parameters.

sorenjan 21 hours ago | parent | prev | next [-]

Would this enable a model to learn concepts in one language and generate answers about it in another, as long as it learns general translations between them?

notrealyme123 21 hours ago | parent [-]

My educated guess: Not more than any other LLM. The text-latent encoder and latent-text decoder just find am more efficient representation of the tokens, but it's more of a compression instead of turning words/sentences into abstract concepts. There will be residuals of the input language be in there.

notrealyme123 21 hours ago | parent | prev [-]

Broken citations. My inner reviewer gets sad. :(