1 article found
Benchmarks reveal Vulkan achieving up to 2.2× speedup over CUDA for select quantized models on RTX 3080, challenging assumptions about optimal local inference backends.