Remix.run Logo
pornel 6 days ago

You're looking at this from the perspective of what would make sense for the model to produce. Unfortunately, what really dictates the design of the models is what we can train the models with (efficiently, at scale). The output is then roughly just the reverse of the training. We don't even want AI to be an "autocomplete", but we've got tons of text, and a relatively efficient method of training on all prefixes of a sentence at the same time.

There have been experiments with preserving embedding vectors of the tokens exactly without loss caused by round-tripping through text, but the results were "meh", presumably because it wasn't the input format the model was trained on.

It's conceivable that models trained on some vector "neuralese" that is completely separate from text would work better, but it's a catch 22 for training: the internal representations don't exist in a useful sense until the model is trained, so we don't have anything to feed into the models to make them use them. The internal representations also don't stay stable when the model is trained further.

LudwigNagasena 5 days ago | parent [-]

It’s indeed a very tricky problem with no clear solution yet. But if someone finds a way to bootstrap it, it may be a new qualitative jump that may reverse the current trend of innovating ways to cut inference costs rather than improve models.