Tagged with

4 articles found

EU DDR5 Prices Just Crashed 28% in 25 Days, Your LLM Rig Just Got Cheaper

Tracking DDR5 across four EU countries reveals a 28% price drop in under a month. Germany is 20% cheaper than the Netherlands. Here’s what it means for local LLM builders.

#ddr5#EU hardware market#LLM Inference...

Deep Learning

3000 Tokens Per Second? Xiaomi’s MiMo V2.5 Just Rewrote the Physics of Inference

Xiaomi’s MiMo V2.5 hits 3000 tps with a 1-trillion-parameter model using a radical FP4 quantization and a ‘block-diffusion’ drafter. Here’s the tech that made it happen and the catch.

#Deep Learning#LLM Inference#Open Source...

ik_llama.cpp

72.9 tok/s on 24GB VRAM: How ik_llama.cpp Won the Qwen 3.6 27B Backend War

A detailed technical comparison of llama.cpp, ik_llama.cpp, BeeLlama, and vLLM for running Qwen 3.6 27B on 24GB VRAM, achieving up to 72.9 tok/s decode with specific quantizations.

#ik_llama.cpp#LLM Inference#Local LLM...

amd

AMD’s R9700 Is Quietly Making NVIDIA’s AI Dominance Look Overpriced

The Radeon R9700’s 32GB VRAM and ROCm maturity are enabling 128GB local LLM builds that cost less than a single RTX 6000 Blackwell, but the community is discovering some uncomfortable truths about advertised memory.

#amd#LLM Inference#Multi-GPU...