Remix.run Logo
p-e-w a day ago

Please let me know if you encounter any problems with the 120b! I'm really interested in how well it will work. When presented with the Pareto front at the end, I recommend choosing a configuration with a KL divergence below 1, even if the refusal rate seems high. The gpt-oss models are trained to do an internal monologue about refusing in the CoT, so the actual refusal rate is often substantially lower because Heretic's refusal classifier gets confused by the trigger words.