▲ | devmor 7 days ago | ||||||||||||||||
That’s a really astute observation. It would be interesting if we could find a way to train models to signify when they are “stretching” the vector distance too far from the context window, because the available training data is too sparse or nonexistent. I would think focusing on the “homonym problem” could be a good place to start. | |||||||||||||||||
▲ | tdtr 7 days ago | parent | next [-] | ||||||||||||||||
I'm pretty sure that the canonical choice is either choosing vectors to be anchor - either by a knn distance with other vectors, or by "hand", or even stuff like cross entropy - but then that is already in the loss function. another method would be to create some kind of adversarial setup where the output is "stretched" intentionally and then criticized by another llm. afaik the problem is with scale, as manually going through a bunch of vectors to just ground the latent isnt exactly economical. also people are quite conservative, esp in the big model runs - stuff like muon isnt exactly popularized till the new qwen or kimi. obviously this is all speculation for open models and folks with more experience can chime in. | |||||||||||||||||
| |||||||||||||||||
▲ | delusional 6 days ago | parent | prev [-] | ||||||||||||||||
There is to my knowledge no vector signifying "truth" and therefore no vector to measure the distance from. You cannot get a "truthiness" measure out of these models, because they don't have the concept of truth. They use "likelyness" as a proxy for "truth". You could decide that the text is "too unlikely" the problem there is that you'll quickly discover that most human sentences are actually pretty unlikely. | |||||||||||||||||
|