BANANDRE
NO ONE CARES ABOUT CODE

Navigation

HomeCategories

Categories

Artificial Intelligence(201)
Software Architecture(76)
Software Development(65)
Data Engineering(29)
Engineering Management(21)
Product Management(20)
Enterprise Architecture(8)
← Back to all tags

Tagged with

#llm-inference

2 articles found

Vulkan Is Quietly Outpacing CUDA for Specific LLMs on Consumer GPUs
cuda
Featured

Vulkan Is Quietly Outpacing CUDA for Specific LLMs on Consumer GPUs

Benchmarks reveal Vulkan achieving up to 2.2× speedup over CUDA for select quantized models on RTX 3080, challenging assumptions about optimal local inference backends.

#cuda#gpu-acceleration#llama.cpp...
Read More
Tencent’s WeDLM 8B: When Diffusion Models Beat Autoregressive LLMs at Their Own Game
diffusion-models

Tencent’s WeDLM 8B: When Diffusion Models Beat Autoregressive LLMs at Their Own Game

Tencent’s diffusion-based language model achieves 3-6× faster inference than vLLM-optimized Qwen3-8B on math reasoning, challenging the token-by-token generation paradigm that has dominated LLMs since GPT-2.

#diffusion-models#llm-inference#math-reasoning...
Read More
BANANDRE
NO ONE CARES ABOUT CODE

Connect

2026 BANANDRE
Privacy PolicyTermsImpressum
Built with 🍌