Tagged with

2 articles found

Trillion-Parameter AI on Your Desktop: The Kimi K2 Thinking Revolution Hits Local Hardware

Moonshot AI’s trillion-parameter reasoning model achieves unprecedented 30+ tokens/sec performance on consumer hardware through real-time GPU/CPU orchestration

#kimi-k2#local-inference#machine-learning...

cerebras

Pruning MoE Models: The Art of Cutting Complexity Without Losing Brains

Cerebras releases REAP-pruned GLM-4.6 variants at 25%, 30%, and 40% sparsity with FP8 quantization – but do they actually work?

#cerebras#fp8#llm-compression...