Shenzhen, China – Huawei has officially launched its Pangu 5.5 AI model at the Huawei Developer Conference 2025 (HDC 2025), marking a significant leap forward in the company’s AI capabilities. The new iteration encompasses five foundational models spanning natural language processing (NLP), multi-modality, prediction, scientific computing, and computer vision (CV), aiming to further empower industry digitalization.
Huawei’s Pangu series has always been unique in the domestic large model field. This series of models emphasizes doing things, not writing poems, deeply cultivates the industry, empowers thousands of industries, and promotes the intelligent upgrading of industries. From Pangu 1.0 to Pangu 5.0, Huawei has focused on using large models to solve practical industrial problems and has been widely recognized by the market.
According to Wang Yunhe, Director of Huawei Noah’s Ark Lab, the Pangu 5.5 model features three key components in the NLP domain: Pangu Ultra MoE, Pangu Pro MoE, and Pangu Embedding. It also incorporates a high-efficiency reasoning strategy that combines fast and slow thinking, as well as the Pangu DeepDiver research product.
The highlight of the release is the Pangu Ultra MoE, a near trillion-parameter deep thinking model boasting 718 billion parameters. Built on the Ascend full-stack hardware and software synergy, this model achieves domestic leadership and rivals world-class performance.
Training such a massive and highly sparse MoE model presents significant challenges, particularly in maintaining stability throughout the training process. To address this, the Huawei Pangu team has innovatively designed the model architecture and training methods. They successfully implemented the full-process training of the near trillion-parameter MoE model on the Next-Generation AI Data Center Architecture CloudMatrix384 cluster, which is based on the Ascend NPU.
Specifically, the Pangu team introduced the Depth-Scaled Sandwich-Norm (DSSN) stable architecture and the TinyInit small initialization method, enabling long-term stable training of 10+T token data on the Ascend NPU. Furthermore, Huawei proposed the EP group loss load optimization method, which not only ensures a good load balance between experts but also improves the overall training efficiency.
The Pangu Pro MoE model also stands out, sharing the top spot in the SuperCLUE billion-parameter model ranking in China.
Huawei’s commitment to developing practical, industry-focused AI solutions is evident in the Pangu 5.5 model. By addressing the challenges of training large-scale models and focusing on real-world applications, Huawei is positioning itself as a key player in driving the intelligent transformation of various industries.
References:
- 机器之心 (Machine Heart). (Original source of the information)
Views: 0
