| ▲ | Rohansi 5 hours ago | ||||||||||||||||||||||||||||||||||||||||
> Closed or open source doesn't matter; it's the ability to control them that's important. People have been cracking and patching for decades without source, but they have that control. You have no idea what has been baked into the weights in the training process. In theory you could find biases and attempt to "patch" them out, but its a vastly different process vs. patching machine code. Consider what would happen if Google's open weight models were best at writing code targeting Google's services vs. their competitors? Is this something that could be patched? What if there were more subtle differences that you only notice much later after some statistical analysis? | |||||||||||||||||||||||||||||||||||||||||
| ▲ | narrator 5 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
People are already patching these models using abliteration to prevent them from refusing any request, so it is possible for end users to change them in meaningful ways. You can download abliterated models right now from Hugging Face that will respond to all kinds of requests that frontier models refuse. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||