NO ONE CARES ABOUT CODE

Tagged with

#speculative-decoding

1 article found

Llama.cpp’s MTP Merge Tanks Throughput on Constrained VRAM. Here’s How a Community Fork Pushes 110 tok/s on a 12GB Card.

Llama.cpp’s MTP Merge Tanks Throughput on Constrained VRAM. Here’s How a Community Fork Pushes 110 tok/s on a 12GB Card.

After llama.cpp’s MTP merge caused a 20% performance regression, ik_llama.cpp brings back 110 tok/s for local Qwen3.6 inference on constrained VRAM.

#ik_llama.cpp#MTP#qwen3.6...