Tagged with

4 articles found

Step-3.5-Flash: The 196B Parameter Model That Makes Giants Look Wasteful

Stepfun’s sparse MoE model activates only 11B parameters yet outperforms models 3-5x larger on coding and agentic tasks, delivering 100-300 tok/s on consumer hardware and forcing a reckoning with the parameter count arms race.

#efficiency#moe#sparse-activation...

128k-context

Tencent’s 2B-Parameter Youtu-LLM Redefines Efficiency by Outperforming Models 4x Its Size

Tencent’s Youtu-LLM-2B challenges LLM scaling laws with 128K context and superior agentic capabilities despite having only 1.96B parameters.

#128k-context#efficiency#LLM...

attention-mechanisms

Linear Attention’s Revenge: How Kimi Delta Attention Smashes the KV Cache Bottleneck

Moonshot AI’s hybrid architecture delivers 6x decoding speed with 75% less memory, making 1M-token contexts actually practical.

#attention-mechanisms#efficiency#LLM...

efficiency

The AI Scaling Lie: How a 7M-Parameter Model Just Embarrassed Giants Like Gemini and DeepSeek

Samsung’s Tiny Recursive Model with microscopic 7M parameters beats massive LLMs on reasoning tasks, challenging the ‘bigger is better’ dogma.

#efficiency#machine-learning#reasoning...