OpenNLPLab团队近日发布了一款名为Lightning Attention-2的新型注意力机制,这一创新尝试为解决大语言模型在处理长序列问题上提供了新的解决方案。
据OpenNLPLab团队介绍,他们提出的Lightning Attention-2是一种线性注意力机制,能够使得长序列的训练和推理成本与1K序列长度的一致。这意味着,在遇到显存瓶颈之前,无限地增大序列长度并不会对于模型训练速度产生负面影响。这一突破性的发现,为无限长度预训练的可能性打开了新的大门。
此外,Lightning Attention-2还能够使得超长文本的推理成本与1K Tokens的成本一致甚至更少。这一改变将极大地减少当前大语言模型在处理超长文本时的推理成本,从而提高了模型的效率。
OpenNLPLab团队的这项创新成果,不仅为大语言模型的处理长序列问题提供了新的解决方案,也为未来的深度学习研究开辟了新的道路。这一突破性的发现,无疑将对人工智能领域产生深远影响。
英语如下:
Title: OpenNLPLab Team Launches Lightning Attention-2 to Solve Long Sequence Problems in Large Language Models
OpenNLPLab team has recently released a new attention mechanism called Lightning Attention-2, which offers a new solution to the long sequence problem in large language models.
According to the OpenNLPLab team, their proposed Lightning Attention-2 is a linear attention mechanism that makes the training and inference cost of long sequences consistent with that of 1K sequence lengths. This means that before encountering显存瓶颈, increasing the sequence length indefinitely will not have a negative impact on model training speed. This groundbreaking discovery opens up new possibilities for infinite-length pretraining.
In addition, Lightning Attention-2 can also make the inference cost of super-long texts consistent with or even less than that of 1K Tokens. This change will greatly reduce the inference cost of current large language models when dealing with super-long texts, thereby improving model efficiency.
The innovative achievements of the OpenNLPLab team not only provide new solutions to the long sequence problem in large language models but also open up new paths for future deep learning research. This groundbreaking discovery will undoubtedly have a profound impact on the field of artificial intelligence.
【来源】https://www.jiqizhixin.com/articles/2024-01-18-5
Views: 6