BANANDRE
NO ONE CARES ABOUT CODE

Navigation

HomeCategories

Categories

Artificial Intelligence(609)
Software Architecture(304)
Software Development(286)
Data Engineering(171)
Engineering Management(88)
Enterprise Architecture(71)
Product Management(30)

Tagged with

#ik_llama.cpp

3 articles found

Llama.cpp’s MTP Merge Tanks Throughput on Constrained VRAM. Here’s How a Community Fork Pushes 110 tok/s on a 12GB Card.
ik_llama.cpp
Featured

Llama.cpp’s MTP Merge Tanks Throughput on Constrained VRAM. Here’s How a Community Fork Pushes 110 tok/s on a 12GB Card.

After llama.cpp’s MTP merge caused a 20% performance regression, ik_llama.cpp brings back 110 tok/s for local Qwen3.6 inference on constrained VRAM.

#ik_llama.cpp#MTP#qwen3.6...
Read More
72.9 tok/s on 24GB VRAM: How ik_llama.cpp Won the Qwen 3.6 27B Backend War
ik_llama.cpp

72.9 tok/s on 24GB VRAM: How ik_llama.cpp Won the Qwen 3.6 27B Backend War

A detailed technical comparison of llama.cpp, ik_llama.cpp, BeeLlama, and vLLM for running Qwen 3.6 27B on 24GB VRAM, achieving up to 72.9 tok/s decode with specific quantizations.

#ik_llama.cpp#LLM Inference#Local LLM...
Read More
The Fork That Finally Forked Back: llama.cpp Adopts ik_llama’s Secret Quantization Sauce
ik_llama.cpp

The Fork That Finally Forked Back: llama.cpp Adopts ik_llama’s Secret Quantization Sauce

A controversial PR ports advanced IQ*_K quantization methods from the ik_llama.cpp fork into mainline llama.cpp, promising smaller models and better edge performance, but not without drama over code ownership and MIT license politics.

#ik_llama.cpp#llama.cpp#model-compression...
Read More
BANANDRE
NO ONE CARES ABOUT CODE

Connect

2026 BANANDRE
Privacy PolicyTermsImpressum
Built with 🍌