Abliteration Autopsy: 85 GPU-Hours of Forensics Reveal Which Safety Removal Actually Works

85 GPU-hours of forensic benchmarking across five abliterated variants of Qwen3.6-27B reveal stark differences in how safety removal techniques preserve or shred model capability. Using the open-source Abliterlitics toolkit, the analysis exposes misleading benchmark artifacts, debunks lossless marketing claims, and identifies the surgical approaches that actually work without collateral damage.

The open-weights ecosystem has a truth-in-advertising problem. Upload a model to HuggingFace, label it “lossless uncensored”, and watch the downloads roll in, no evidence required. But what actually happens to a 27B-parameter reasoning model when you slice out its refusal mechanisms? Five different techniques were just put under the microscope, and the autopsy report is brutal.

Abliterlitics, an open-source forensics toolkit, ran 85 GPU-hours of benchmarks, safety evaluations, KL divergence tests, and weight-level analysis on five abliterated variants of the uncensored Qwen3.6 model and its jailbreaking technique. All six models, the base plus five edits, were evaluated identically via lm-evaluation-harness through vLLM 0.19.0 with BitsAndBytes 4-bit quantization on a single RTX 5090. The goal wasn’t to crown a champion, it was to see who actually kept the engine intact while removing the brakes.

The Contenders

Name	Type
Base	Qwen/Qwen3.6-27B
Heretic	llmfan46/Qwen3.6-27B-uncensored-heretic-v2
HauhauCS	HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
Huihui	huihui-ai/Huihui-Qwen3.6-27B-abliterated
AEON	AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16
Abliterix	wangzhang/Qwen3.6-27B-abliterated-v2

Each approach claims to nullify refusals while preserving capabilities. The reality is messier.

Benchmarks: The Numbers That Lie

At first glance, the capability deltas look like a bloodbath:

Task	Base	Heretic	HauhauCS	Huihui	AEON	Abliterix
MMLU	83.3%	82.8%	83.9%	83.4%	82.9%	81.3%
HellaSwag	83.5%	83.2%	83.1%	83.5%	82.7%	77.3%
ARC Challenge	59.1%	58.0%	57.9%	59.5%	56.1%	53.2%
WinoGrande	77.7%	77.7%	77.7%	77.4%	75.3%	74.9%
TruthfulQA MC2	56.7%	51.1%	47.2%	54.8%	46.1%	48.7%
PiQA	81.0%	81.0%	81.0%	81.2%	80.4%	75.7%
GSM8K (7168 tok)	34.4%	27.5%	51.0%	75.1%	51.2%	37.6%
GSM8K (adj, excl. invalid)	96.2%	93.8%	96.6%	96.0%	95.8%	95.6%
Lambada (ppl)	3.18	3.24	3.35	3.15	3.44	9.12

AEON degrades on every non-GSM8K task. Abliterix’s Lambada perplexity explodes 2.9x from 3.18 to 9.12. Huihui looks like a math genius with a 75.1% GSM8K raw score against the base’s 34.4%.

That last part is a mirage.

Qwen3.6 is a reasoning model. It generates <think/> tokens before answering, and if its internal monologue exceeds the generation budget, it never outputs a final answer. Under the standard max_gen_toks=7168 limit, the base model exhausted its thinking budget on 68.2% of GSM8K questions. Huihui only did so on 23.0%. Strip out those invalid responses, and the adjusted scores flatten dramatically:

Model	GSM8K Raw	Invalid Rate	GSM8K Adj (excl. invalid)	Real Gap
HauhauCS	51.0%	49.3%	96.6%	+0.4%
Base	34.4%	68.2%	96.2%	,
Huihui	75.1%	23.0%	96.0%	-0.2%
Abliterix	37.6%	62.1%	95.6%	-0.6%
AEON	51.2%	69.2%	95.8%	-0.4%
Heretic	27.5%	74.5%	93.8%	-2.4%

The raw scores span a 47.6 percentage point range. The adjusted scores span 2.8 points. Abliteration doesn’t make these models better at math, in most cases, it just makes them stop overthinking. Heretic is the odd exception, its surgical edits actually extend thinking chains, pushing its invalid rate above even the base model.

Capability Preservation: Heretic and Huihui Dominate

When you look past the token-budget artifacts, surgical norm-preserving ablation techniques for unlocking models prove their worth. Heretic achieves the lowest KL divergence at 0.0037, indicating its output distribution on benign prompts barely shifts from the base. Huihui follows closely at 0.0074. Both sit in the “excellent” tier, well below the 0.1 threshold where capability damage becomes perceptible.

Variant	KL (batchmean)	Rating
Heretic	0.0037	excellent
Huihui	0.0074	excellent
Abliterix	0.0222	very good
AEON	0.0238	very good
HauhauCS	0.0242	very good

Huihui wins on benchmark deltas outside GSM8K, averaging just 0.5pp deviation from base across MMLU, HellaSwag, ARC, WinoGrande, TruthfulQA, and PiQA. Heretic averages 1.3pp. In other words, both methods remove the safety guardrails while leaving the engine nearly untouched.

AEON, despite claiming “measurably enhanced capabilities” and “no looping, no philosophizing spirals”, drops 10.6pp on TruthfulQA and 3.0pp on ARC. The data isn’t impressed by marketing copy.

Safety Removal Is a Solved Problem, for Better or Worse

If the goal is total refusal elimination, all five methods deliver. HarmBench testing with 400 textual behaviors showed every abliterated model reaching near-complete compromise:

Variant	ASR	Empty	Full CoT ASR
Base	25.8%	1	26.0%
Huihui	98.5%	5	99.8%
HauhauCS	94.5%	22	100.0%
Abliterix	94.5%	22	100.0%
Heretic	92.5%	30	100.0%
AEON	88.8%	45	100.0%

Four of five hit 100% Full CoT ASR when accounting for responses where chain-of-thought reasoning simply ate the entire generation budget. Harassment, bullying, and harmful content categories are 100% compromised across the board. The base model’s 25.8% ASR mostly reflects refusals, not failures.

Weight Forensics: There Is No Single “Refusal Direction”

This is where the analysis gets weird. Pairwise cosine similarities between the four main abliteration techniques sit below 0.07. They are not finding the same weight vectors. The refusal direction in weight space isn’t a neat arrow, it’s a manifold with multiple viable exit ramps.

Metric	AEON	Abliterix	Heretic	Huihui	HauhauCS
Tensors changed	88 (10.4%)	101 (11.9%)	120 (14.1%)	128 (15.1%)	564 (66.4%)
Relative edit	6.0%	5.2%	2.1%	1.5%	0.7%

HauhauCS is a radioactive outlier. 66.4% of tensors, 564 out of 850 language model keys, show modification. That isn’t surgical, that’s a chainsaw. The cause is twofold: the underlying “Reaper Abliteration” tool targets multiple component types simultaneously, and HauhauCS was exported as Q8_K_P GGUF then recovered back to safetensors using ungguf, superimposing quantization round-trip noise across the weights. A uniform ~0.57% relative edit appears even on tensor types other methods ignore entirely, like embed_tokens and q_proj.

The GGUF noise doesn’t crater behavior, HauhauCS still scores solidly, but it thoroughly debunks the “lossless” and “no changes to capabilities” claims plastered on its model card.

The Plagiarism in the Machine

Which brings us to the open-source drama. HauhauCS’s “Reaper Abliteration” was shown to be plagiarised from Heretic’s codebase, stripping AGPL-3.0 attribution and relicensing it under PolyForm Noncommercial. Forensic examination of recovered source code shows Reaper bolted subspace rank-k ablation, per-component continuous curves, and SOM clustering onto the stolen Heretic core.

The Abliterlitics author has since blacklisted HauhauCS from future comparisons. Without clean safetensors and with ethically compromised provenance, the data exists more as a cautionary tale than a recommendation. Previous forensic analysis of abliterated weights and the conflict it sparked already illustrated how this community tears itself apart over attribution and benchmark validity, this just adds fuel.

The Abliterix Caveat

Abliterix looks like the worst performer on paper, with Lambada perplexity spiking to 9.12 and HellaSwag down 6.2pp. But the model’s creator makes a compelling technical counterargument. Abliterix ships rank-3 LoRA-merged weights where the abliteration signal lives in a 3-dimensional subspace. BitsAndBytes 4-bit NF4 quantization isn’t subspace-aware, per-block absmax scaling can overweight the low-rank outliers, degrading effective precision. A native BF16 re-evaluation might tell a different story. The 2.9x perplexity jump is consistent with a quantization interaction rather than intrinsic capability destruction, though without the BF16 run, the benchmark stands as measured.

The Verdict

If you’re running Qwen3.6-27B locally and want the guardrails gone without tanking capability, the data points to two clear winners. Heretic offers the smallest output distribution shift and lowest KL divergence. Huihui offers the tightest benchmark deltas and highest HarmBench ASR. Both operate with minimal, clean weight footprints. The viability of running 27B models like Qwen 3.6 locally has never looked better, provided you pick the right fork.

AEON and HauhauCS are contradicted by their own marketing. Abliterix remains an open question requiring BF16 validation. And across the entire field, the ongoing tension between system prompts and model safety continues to intensify as surgical weight editing renders top-down policy controls increasingly brittle.

The full report, complete with tensor-by-tensor provenance analysis and interactive charts, lives on the HuggingFace model card. If nothing else, this 85-GPU-hour exercise proves that in the uncensored model economy, trust, but verify the weights.

Abliteration Autopsy: 85 GPU-Hours of Forensics Reveal Which Safety Removal Actually Works

The Contenders

Benchmarks: The Numbers That Lie

Capability Preservation: Heretic and Huihui Dominate

Safety Removal Is a Solved Problem, for Better or Worse

Weight Forensics: There Is No Single “Refusal Direction”

The Plagiarism in the Machine

The Abliterix Caveat

The Verdict

Related Articles

NVFP4 Is Not What You Think: NVIDIA’s Qwen3.6-27B Quantization Actually Beats FP8

The ‘Heretic’ That Breaks Qwen3.5’s Chains: Why This Uncensored Model Matters

Llama.cpp’s MTP Merge Tanks Throughput on Constrained VRAM. Here’s How a Community Fork Pushes 110 tok/s on a 12GB Card.

Multi-Token Prediction Lands in llama.cpp: Nearly 2× Faster Generation, but Prompt Processing Is Paying the Price