Tagged with

3 articles found

GLM 4.7 Flash Was Wasting 9GB of VRAM on Literal Nothing. The Fix Just Landed.

A technical deep-dive into how llama.cpp’s V-less KV cache optimization cuts memory usage by nearly 50%, enabling 90K-token contexts on consumer GPUs.

#GLM-4.7-Flash#KV-Cache#llama.cpp...

amd

AMD’s R9700 Is Quietly Making NVIDIA’s AI Dominance Look Overpriced

The Radeon R9700’s 32GB VRAM and ROCm maturity are enabling 128GB local LLM builds that cost less than a single RTX 6000 Blackwell, but the community is discovering some uncomfortable truths about advertised memory.

#amd#LLM Inference#Multi-GPU...

consumer GPU

Unsloth’s Triton Kernels End the VRAM Arms Race: 3x Faster Training on 3.9GB GPUs

New Triton kernels and smart packing reduce VRAM by 90% and speed up training 5x, no accuracy loss, no $10,000 GPU required.

#consumer GPU#Fine-tuning#LLM...