| ▲ | XenophileJKO 4 hours ago | ||||||||||||||||
Hmm.. I looked at the benchmark set. I'm conflicted. I don't know that I would necessarily want a model to pass all of these. Here is the fundamental problem. They are putting the rules and foundational context in "user" messages. Essentially I don't think you want to train the models on full compliance to the user messages, they are essentially "untrusted" content from a system/model perspective. Or at least it is not generally "fully authoritative". This creates a tension with the safety, truthfulness training, etc. | |||||||||||||||||
| ▲ | trevwilson 2 hours ago | parent | next [-] | ||||||||||||||||
Sure, but the opposite end of the spectrum (which LLM providers have tended toward) is treating the training/feedback weights as "fully authoritative", which comes with its own questions about truth and excessive homogeneity. Ultimately I think we end up with the same sort of considerations that are wrestled with in any society - freedom of speech, paradox of tolerance, etc. In other words, where do you draw lines between beneficial and harmful heterodox outputs? I think AI companies overly indexing toward the safety side of things is probably more correct, in both a moral and strategic sense, but there's definitely a risk of stagnation through recursive reinforcement. | |||||||||||||||||
| |||||||||||||||||
| ▲ | Oras 2 hours ago | parent | prev [-] | ||||||||||||||||
Isn’t that what fine tuning does anyway? The article is suggesting that there should be a way for the LLM to gain knowledge (changing weights) on the fly upon gaining new knowledge which would eliminate the need for manual fine tuning. | |||||||||||||||||