Llama.cpp’s MTP Merge Tanks Throughput on Constrained VRAM. Here’s How a Community Fork Pushes 110 tok/s on a 12GB Card.
After llama.cpp’s MTP merge caused a 20% performance regression, ik_llama.cpp brings back 110 tok/s for local Qwen3.6 inference on constrained VRAM.