Step-3.5-Flash: The 196B Parameter Model That Makes Giants Look Wasteful
Stepfun’s sparse MoE model activates only 11B parameters yet outperforms models 3-5x larger on coding and agentic tasks, delivering 100-300 tok/s on consumer hardware and forcing a reckoning with the parameter count arms race.