01 DeepSeek’s Latest Transformer Advances DeepSeek’s latest studies introduce Native Sparse Attention and manifold-constrained hyper-connections—boosting transformer efficiency, scale, and long-context reach.