1000 Tokens Per Second on a 1T Model? Xiaomi Just Broke Physics (or At Least the Latency Barrier)
Xiaomi’s MiMo v2.5 hits 1000 TPS on a trillion-parameter model using commodity GPUs. Here’s the deep dive on the FP4 quantization, DFlash speculative decoding, and TileRT systems alchemy that made it possible.