亚马逊推出史上最大文本转语音模型,展现“涌现能力”
近日,亚马逊人工智能研究团队宣布开发出有史以来最大的文本转语音模型——可扩展流式文本转语音模型(BASE TTS)。该模型拥有 9.8 亿个参数,并使用 10 万小时的录音进行训练,其中大部分为英语语音。
BASE TTS 的开发和训练过程已发表在 arXiv 预印本服务器上。研究人员表示,该模型的规模和训练数据集使其能够展现出“涌现能力”,即在没有明确编程的情况下,从大量数据中学习复杂模式和行为的能力。
与其他文本转语音模型相比,BASE TTS 具有以下优势:
* 更高的保真度:BASE TTS 生成的语音更加自然流畅,接近人类语音。
* 更强的表达能力:该模型可以捕捉文本中的细微差别,并生成具有适当语调和情感的语音。
* 更快的推理速度:BASE TTS 可以实时生成语音,这使其适用于各种应用程序,例如语音助手和自动客服系统。
亚马逊表示,BASE TTS 已被集成到其 Alexa 语音助手和其他产品中。研究人员相信,该模型将推动文本转语音技术的进一步发展,并为自然人机交互开辟新的可能性。
值得注意的是,BASE TTS 并不是第一个展示涌现能力的模型。此前,谷歌开发的 GPT-3 语言模型也表现出了类似的能力。然而,BASE TTS 的规模和训练数据集使其成为文本转语音领域的一个重要突破。
随着人工智能技术的不断发展,涌现能力有望成为人工智能模型的一项普遍特征。这将极大地扩展人工智能的应用范围,并为解决复杂问题提供新的方法。
英语如下:
**Headline:** Amazon Builds Largest-Ever Speech Model, Demonstrates ‘Emergence’
**Keywords:** Text-to-speech, largest model, emergence
**Body:**
Amazon has unveiled the largest text-to-speech model evercreated, showcasing the phenomenon of “emergence.”
The model, called BASE TTS for short, boasts 980 million parameters and was trained on 100,000 hours of recordings, mostly in English.
Details of BASE TTS’s development and training have been published on the preprint serverarXiv. The researchers say that the model’s size and training dataset allow it to exhibit “emergence,” the ability to learn complex patterns and behaviors from vast amounts of data without being explicitly programmed to do so.
Compared to other text-to-speech models, BASE TTS offers several advantages:
* Higher fidelity: BASE TTS generates speech that is more natural-sounding and human-like.
* Stronger expressiveness: The model can capture subtle nuances in text and produce speech with appropriate intonation and emotion.
* Faster inference speed: BASE TTS can generate speech in real time, making it suitable for applications such as voice assistants andautomated customer service systems.
Amazon says that BASE TTS has already been integrated into its Alexa voice assistant and other products. The researchers believe that the model will drive further advances in text-to-speech technology and open up new possibilities for natural human-machine interaction.
It’s worth noting that BASE TTS is not the first model to demonstrate emergence. Google’s GPT-3 language model, for example, has also exhibited similar capabilities. However, BASE TTS’s size and training dataset make it a significant breakthrough in the field of text-to-speech.
As AI technology continues to advance, emergence is likely to become a more common feature of AI models. This will greatly expand the range of applications for AI and provide new approaches to solving complex problems.
【来源】https://www.ithome.com/0/750/680.htm
Views: 2