Tagged with

4 articles found

Jamba2 Mini: AI21’s Radical Bet That Enterprise AI Doesn’t Need Reasoning

AI21 Labs’ 52B parameter ‘mini’ model challenges the reasoning model orthodoxy with SSM-Transformer architecture and 256K context, claiming most enterprise tasks just need reliable information retrieval, not synthetic thought.

#long-context#model-efficiency#open-source-llm...

falcon-h1r

Falcon H1R 7B: Abu Dhabi’s Tiny Reasoning Model Just Embarrassed the Giants

TII’s 7B-parameter Falcon H1R challenges the ‘bigger is better’ dogma with 256K context and benchmark scores that beat models 7x its size. But does it survive contact with reality?

#falcon-h1r#long-context#reasoning-models...

Fine-tuning

500K Context Fine-Tuning on One GPU: The Breakthrough No One’s Talking About Honestly

Unsloth’s new algorithms push LLM context windows to 750K tokens on single GPUs, but the real story isn’t the numbers, it’s what happens when you actually try to use them.

#Fine-tuning#gpu-optimization#LLM...

attention-mechanisms

Linear Attention’s Revenge: How Kimi Delta Attention Smashes the KV Cache Bottleneck

Moonshot AI’s hybrid architecture delivers 6x decoding speed with 75% less memory, making 1M-token contexts actually practical.

#attention-mechanisms#efficiency#LLM...