▲ | justlikereddit 6 days ago | |
The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model. You don't need datacenter anything for it, you can run it on an average desktop. There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal. An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic. I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes "uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of" Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself. | ||
▲ | hdjrudni 6 days ago | parent [-] | |
In my limited experience, those abliterated models on Ollama didn't work very well. Still refused most things. |