| ▲ | Gemma 4 Uncensored (autoresearch results)(huggingface.co) | |||||||
| 4 points by adefa 9 hours ago | 4 comments | ||||||||
| ▲ | adefa 9 hours ago | parent | next [-] | |||||||
Released uncensored versions of all four Gemma 4 models. bf16 + GGUF for each. Collection: https://huggingface.co/collections/TrevorJS/gemma-4-uncensor... Code: https://github.com/TrevorS/gemma-4-abliteration Results Refusal rates from 686 prompts across 4 datasets (JailbreakBench, tulu-harmbench, NousResearch, mlabonne). Manually audited — most flagged refusals are actually the model complying with a disclaimer attached.
26B MoEStandard abliteration only touches dense layers, which gets you from 98% -> 29% on the MoE. The remaining refusals are in the expert weights. Used Expert-Granular Abliteration (EGA, concept from OBLITERATUS [1]) with norm-preserving biprojection [2] on each of the 128 expert slices per layer. That gets it to 3%. [1] https://github.com/elder-plinius/OBLITERATUS [2] https://huggingface.co/blog/grimjim/abliteration-biprojectio... How it was built Set up an automated research loop -- an AI agent reads the current results and idea backlog, picks the next experiment, runs it on the GPU, records results, and repeats. It ran 22 experiments across the 4 models, discovered the false-positive problem in standard refusal markers, built the cross-dataset evaluation, and implemented the MoE expert abliteration when dense-only wasn't enough. Full experiment history and code in the repo. Downloads Each model has bf16 safetensors + GGUF (Q4_K_M, Q8_0):
Quick start: | ||||||||
| ||||||||
| ▲ | stochtinkerer 8 hours ago | parent | prev [-] | |||||||
Is this the best uncensored model to date? or are there better ones? | ||||||||
| ||||||||