华为团队近日成功改进了Transformer架构,推出了一款性能卓越的大语言模型架构——盘古-π。这一成果来源于华为诺亚方舟实验室等联合研究,通过增强非线性,对传统Transformer架构进行了有效优化。改进后的盘古-π能够在降低特征塌陷问题的同时,显著提升模型输出表达能力。
在相同数据训练条件下,盘古-π(7B)在多任务表现上超越了同规模的大模型LLaMA 2,并且还能实现10%的推理加速。即使在1B规模上,盘古-π也达到了State-of-the-Art(SOTA)的水平。更为值得一提的是,研究人员还基于这一架构训练出了一个金融法律大模型“云山”。
这一突破性成果进一步展示了我国在人工智能领域的强大实力,为未来大型语言模型的研究和应用奠定了坚实基础。
英文翻译:
News Title: Huawei Team Unveils Superior Performance Pangu-π Large Language Model Architecture
Keywords: Huawei, Pangu-π, Large Language Model, Performance Improvement
News Content:
Huawei’s team has recently successfully improved the Transformer architecture and launched a superior performance large language model architecture – Pangu-π. This achievement comes from joint research by Huawei’s Noah’s Ark Laboratory and others, which effectively optimizes the traditional Transformer architecture by enhancing nonlinearity. The improved Pangu-π can significantly enhance the expressive power of the model output while reducing the problem of feature collapse.
Under the same data training conditions, Pangu-π (7B) surpasses large models of the same scale such as LLaMA 2 in multi-task performance and can achieve a 10% acceleration in inference. Even at a 1B scale, Pangu-π has reached the State-of-the-Art (SOTA) level. More notably, researchers have also trained a financial and legal large model called “Yunshan” based on this architecture.
This groundbreaking achievement further demonstrates China’s strong presence in the field of artificial intelligence, laying a solid foundation for future research and applications of large language models.
【来源】https://mp.weixin.qq.com/s/Beg3yNa_dKZKX3Fx1AZqOw
Views: 1