Gemma 4’s MTP Drafters: Not Just a Speed Hack, But an Architectural Power Shift
Google’s Multi-Token Prediction drafters for Gemma 4 promise 2-3x inference speedups with zero quality loss. We dive into the mechanics, the ‘tiny’ 78M-parameter secret, and what it means for local AI’s future.