| ▲ | antirez an hour ago | |||||||
Thank you for posting this! Just a clarification, with DwarfStar steering features I was able to completely remove refusal from DS4. It is only the example dataset (prompt pairs I provide) which is a toy, not the abilities. I thought that who is able to come up with the right dataset and understands how to use the well-documented steering feature, can access to steering. People that have no idea and would just cut & paste, I'm not sure, maybe it is a good idea if they also have access to a model without refusals? I the doubt I didn't release publicly the steering file, but I'm highly perplexed. Btw recently the support was extended and now the steering vector can be applied to the activations at different time: always, only after thinking, only outside of tool calling, ... Something important that not many folks realize: vector direction steering inside the inference engine itself is very superior to having GGUFs modified in the same way. The more you steer, the more you damage the model capabilities. So applying it at runtime, you apply it the minimun needed for what you want to accomplish. Also you can apply only during selected moments. It is even possible (I still didn't implement it but I like the idea) of applying the steering only when the energy across the refusal direction is over a given threshold. Many things you can play with. | ||||||||
| ▲ | zozbot234 an hour ago | parent [-] | |||||||
AIUI, DeepSeek V4 has very little (if any) of the refusal behavior you usually get from Western AI models for benign input. Is this mainly about the software security assessment case? | ||||||||
| ||||||||