3 articles found
A technical deep-dive into how llama.cpp’s V-less KV cache optimization cuts memory usage by nearly 50%, enabling 90K-token contexts on consumer GPUs.
The Radeon R9700’s 32GB VRAM and ROCm maturity are enabling 128GB local LLM builds that cost less than a single RTX 6000 Blackwell, but the community is discovering some uncomfortable truths about advertised memory.
New Triton kernels and smart packing reduce VRAM by 90% and speed up training 5x, no accuracy loss, no $10,000 GPU required.