Tagged with

15 articles found

MiniMax-2.5: The 230B Open Model Running on 101GB That Makes Claude Opus Look Overpriced

MiniMax-2.5 achieves 80.2% on SWE-Bench Verified with 200K context, runs locally at 3-bit precision, and costs $1/hour, forcing a reckoning for proprietary AI pricing.

#minimax#moe#open-source-ai...

ai-models

MiniMax M2.5: The $1/Hour Model That Makes Claude Opus Look Overpriced

MiniMax’s 230B MoE model hits 80.2% on SWE-Bench at 1/20th the cost of competitors. Here’s why the AI pricing model just collapsed.

#ai-models#coding-agents#m2.5...

consumer-gpu

Unsloth’s MoE Coup: The 12x Speedup That Kills the VRAM Arms Race

Unsloth’s custom Triton kernels deliver 12x faster MoE training with 35% less VRAM, enabling Qwen3 and DeepSeek fine-tuning on consumer GPUs. But the real story is what this means for AI democratization and hardware vendor lock-in.

#consumer-gpu#deepseek#Fine-tuning...

AI4Science

A Trillion Parameters and a Single Purpose: How Intern-S1-Pro Reshapes Scientific Reasoning

Intern-S1-Pro’s 1T MoE architecture delivers SOTA scientific reasoning while activating only 22B parameters, challenging closed-source models and redefining specialized AI for chemistry, materials, and life sciences.

#AI4Science#moe#multimodal...

efficiency

Step-3.5-Flash: The 196B Parameter Model That Makes Giants Look Wasteful

Stepfun’s sparse MoE model activates only 11B parameters yet outperforms models 3-5x larger on coding and agentic tasks, delivering 100-300 tok/s on consumer hardware and forcing a reckoning with the parameter count arms race.

#efficiency#moe#sparse-activation...

moe

MOVA Breaks the Silent Era of Open-Source Video Generation, And It’s Not Asking Permission

OpenMOSS’s MOVA delivers synchronized video-audio generation with 18B active parameters, challenging closed models with fully open weights and day-0 SGLang support.

#moe#multimodal#sglang...

huggingface

Hugging Face’s Transformers v5 Delivers 11x MoE Speedups by Admitting They Were Doing It Wrong All Along

Transformers v5’s 6x-11x performance gains for Mixture-of-Experts models reveal more about v4’s limitations than v5’s innovations. The API simplification and dynamic weight loading rewrite the rules for LLM inference.

#huggingface#moe

glm-47

GLM-4.7-Flash: The Reasoning Model That Can’t Stop Thinking

Z.ai’s new 30B MoE model promises transparent step-by-step reasoning, but its meticulous thought process reveals a deeper tension in local AI deployment: when interpretability becomes a performance bottleneck.

#glm-47#moe#reasoning-models

agentic workflows

GLM-4.7-Flash: The Local LLM That Actually Does What It Promises (Mostly)

GLM-4.7-Flash is delivering reliable agentic performance on consumer hardware, but the path to getting it running reveals the messy reality of local AI deployment.

#agentic workflows#GLM-4.7#llama.cpp...

mamba

Nemotron-3-nano 30B Outperforms Llama 3.3 70B: The Local LLM Efficiency Breakdown

A 30-billion-parameter model is beating Llama 3.3 70B on reasoning tasks while using a fraction of the compute. Here’s how NVIDIA’s hybrid architecture changes the local AI game.

#mamba#moe#nemotron...

benchmarks

Xiaomi’s MiMo-V2-Flash: The 309B-Parameter Underdog Giving GPT-5 a Run for Its Money

An in-depth look at how Xiaomi’s modestly-sized MoE model delivers elite performance at a fraction of the cost, and why the community isn’t buying it.

#benchmarks#LLM#moe...

local-llms

NVIDIA’s Nemotron-3-Nano: A 30B Hybrid Reasoning Model That Actually Delivers 1M Context (Mostly)

NVIDIA’s new open-weight Nemotron-3-Nano promises 1M token context and best-in-class reasoning performance, but early deployments reveal a more complicated reality. Here’s what the benchmarks don’t tell you.

#local-llms#moe#nemotron...

diffusion

Diffusion Language Models Break the Autoregressive Cage – And LLaDA2.0 is Jangling the Keys

LLaDA2.0’s MoE-powered diffusion architecture challenges everything we know about local AI deployment

#diffusion#llama.cpp#local-ai...

cerebras

When Less Is Actually More: Cerebras’ REAP Exposes Expert Merging as Flawed MoE Strategy

REAP pruning outperforms merging in MoE models, enabling near-lossless compression of 480B giants to local hardware

#cerebras#compression#LLM...

LLM

Qwen Next Just Made Every Other Local LLM Look Obsolete

Alibaba’s hybrid MoE architecture delivers 80B parameter performance with 3B activation costs, revolutionizing local task automation

#LLM#local-llm#moe