1 article found
NVIDIA packs 30B, 23B, and 12B reasoning models into one checkpoint, achieving 360x training cost reduction and dynamic speed scaling.