0
Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem)
https://towardsdatascience.com/why-this-decade-old-idea-still-powers-all-of-ai-and-why-its-a-problem/(towardsdatascience.com)Residual connections, a foundational component of deep learning models since 2015, are becoming an information bottleneck as models increase in size. An initial proposed improvement, Hyper-Connections (HC), widened this information pathway but introduced mathematical instability and significant hardware overhead. To address these flaws, researchers at DeepSeek-AI developed Manifold-Constrained Hyper-Connections (mHC). This new method constrains the residual mapping matrix to be a doubly stochastic matrix, which mathematically prevents signals from exploding or vanishing and ensures stability in deep networks, while also using systems engineering like kernel fusion to manage the hardware costs.
0 points•by hdt•1 hour ago