Kimi Just Made Residual Connections Obsolete: The 10-Year Assumption That Crumbled Overnight
Moonshot AI’s Attention Residuals architecture replaces decade-old residual connections with selective depth-wise attention, delivering 1.25x compute efficiency and breaking the PreNorm dilution bottleneck that has plagued deep transformers.