2 articles found
REAP pruning outperforms merging in MoE models, enabling near-lossless compression of 480B giants to local hardware
Cerebras releases REAP-pruned GLM-4.6 variants at 25%, 30%, and 40% sparsity with FP8 quantization - but do they actually work?