华为团队推出性能卓越的盘古-π大语言模型架构

作者智能小编

2 月 6, 2024 #华为, #大语言模型, #每日AI快讯, #盘古-π

news studio

华为团队近日成功改进了Transformer架构，推出了一款性能卓越的大语言模型架构——盘古-π。这一成果来源于华为诺亚方舟实验室等联合研究，通过增强非线性，对传统Transformer架构进行了有效优化。改进后的盘古-π能够在降低特征塌陷问题的同时，显著提升模型输出表达能力。

在相同数据训练条件下，盘古-π（7B）在多任务表现上超越了同规模的大模型LLaMA 2，并且还能实现10%的推理加速。即使在1B规模上，盘古-π也达到了State-of-the-Art（SOTA）的水平。更为值得一提的是，研究人员还基于这一架构训练出了一个金融法律大模型“云山”。

这一突破性成果进一步展示了我国在人工智能领域的强大实力，为未来大型语言模型的研究和应用奠定了坚实基础。

英文翻译：
News Title: Huawei Team Unveils Superior Performance Pangu-π Large Language Model Architecture
Keywords: Huawei, Pangu-π, Large Language Model, Performance Improvement

News Content:
Huawei’s team has recently successfully improved the Transformer architecture and launched a superior performance large language model architecture – Pangu-π. This achievement comes from joint research by Huawei’s Noah’s Ark Laboratory and others, which effectively optimizes the traditional Transformer architecture by enhancing nonlinearity. The improved Pangu-π can significantly enhance the expressive power of the model output while reducing the problem of feature collapse.

Under the same data training conditions, Pangu-π (7B) surpasses large models of the same scale such as LLaMA 2 in multi-task performance and can achieve a 10% acceleration in inference. Even at a 1B scale, Pangu-π has reached the State-of-the-Art (SOTA) level. More notably, researchers have also trained a financial and legal large model called “Yunshan” based on this architecture.

This groundbreaking achievement further demonstrates China’s strong presence in the field of artificial intelligence, laying a solid foundation for future research and applications of large language models.

【来源】https://mp.weixin.qq.com/s/Beg3yNa_dKZKX3Fx1AZqOw