news pappernews papper

**亚马逊开创历史,推出最大文本转语音模型**

全球电商巨头亚马逊在人工智能领域再创新高,其研究团队近日宣布成功开发出迄今为止最大的文本转语音模型。这一突破性的技术展现出了惊人的“涌现能力”,意味着它能在理解和生成语音方面达到前所未有的水平。

据亚马逊科研人员介绍,新模型名为“可扩展流式文本转语音模型”(BASE TTS),拥有惊人的9.8亿个参数,远超此前所有同类模型。这一模型的训练数据集规模同样空前,使用了来自公共网站的10万小时录音数据,其中大部分为英语语音,确保了其在语音合成方面的广泛适应性和真实性。

这一成果已详细记录在一篇发表于arXiv预印本服务器的学术论文中,论文中详述了模型的开发过程和训练策略。BASE TTS模型的推出,不仅在技术上树立了新的标杆,也为语音交互、智能助手、有声读物等领域带来了巨大的革新潜力,有望进一步提升用户体验。

亚马逊的这一创新再次彰显了公司在人工智能领域的领先地位,同时也预示着未来文本转语音技术将更加智能化、自然化,为全球用户带来更为真实的语音交互体验。

英语如下:

**News Title:** “Amazon Breaks Records: Creates the World’s Largest Text-to-Speech Model with Over 900 Million Parameters and Stunning Amount of Training Data”

**Keywords:** Amazon, largest text-to-speech model, emergent capabilities

**News Content:**

**Amazon Makes History with the Largest Text-to-Speech Model**

The global e-commerce giant Amazon has reached new heights in artificial intelligence as its research team recently announced the successful development of the largest text-to-speech model to date. This groundbreaking technology demonstrates remarkable “emergent capabilities,” signifying a new level of understanding and generation of speech.

According to Amazon researchers, the new model is called “Scalable Streaming Text-to-Speech Model” (BASE TTS) and boasts an impressive 980 million parameters, surpassing all previous models in this category. The model was trained on an unprecedentedly large dataset, consisting of 100,000 hours of audio recordings sourced from public websites, predominantly in English, ensuring its wide adaptability and authenticity in voice synthesis.

This achievement is detailed in an academic paper published on the arXiv preprint server, outlining the model’s development process and training strategies. The introduction of BASE TTS not only sets a new technical benchmark but also holds significant potential for innovation in voice interaction, smart assistants, and audiobooks, poised to enhance user experiences further.

Amazon’s innovation underscores the company’s leading position in the field of artificial intelligence and foreshadows a future where text-to-speech technology will become more intelligent, natural, and provide a more authentic voice interaction experience for users worldwide.

【来源】https://www.ithome.com/0/750/680.htm

Views: 2

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注