Zhejiang University Releases GTSinger A Massive Multilingual High-Quality Singing Dataset

作者智能小编

10 月 15, 2024 #open, #每日AI快讯

news papper

Introduction

The world of artificial intelligence is rapidly advancing, with applications extending to creative fields like music. One key aspect of music production is the ability to generate realistic and expressive singing voices. To facilitate this advancement, Zhejiang University has released GTSinger, a large-scale,open-source, high-quality singing dataset designed to support a wide range of vocal tasks.

A Multifaceted Dataset for Vocal AI

GTSingerstands out due to its comprehensive nature. It boasts 80.59 hours of professionally recorded singing data, encompassing nine languages: Mandarin, English, Japanese, Korean, Russian, Spanish, French, German, and Italian. The dataset featuresperformances by 20 professional singers, providing a rich tapestry of vocal timbres and styles.

Beyond the Notes: Capturing Singing Techniques

GTSinger goes beyond simply capturing vocal sounds. It focuses on the nuances of singing techniques,offering six common singing techniques with corresponding control groups and phoneme-level annotations. This granular level of detail allows researchers to model and manipulate vocal techniques with greater precision.

Real-World Applications: Integrating Music Theory

The dataset also provides real musical scores that align with the vocal recordings, bridging the gap between vocalsynthesis and practical music creation. This integration of music theory opens up possibilities for utilizing vocal AI in real-world musical compositions.

Adaptability for Diverse Vocal Tasks

GTSinger is designed to be versatile, catering to a range of vocal tasks such as:

Singing Synthesis: Generating synthetic singing voices withhigh fidelity and expressiveness.
Technique Recognition: Identifying and classifying different singing techniques within vocal performances.
Style Transfer: Modifying the style of a singing voice, for example, changing its timbre or emotional tone.
Speech-to-Singing Conversion: Transforming spoken language into singing voice.

Benchmarking and Evaluation

To further enhance its utility, GTSinger includes benchmark tests that assess the dataset’s performance and suitability for various vocal tasks. This allows researchers to compare different models and algorithms effectively.

Conclusion

GTSinger represents a significant contribution to the field of vocal AI. Its multi-lingual nature, emphasis on singing techniques, and integration with musical scores provide researchers and developers with a powerful tool for advancing the capabilities of vocal synthesis and manipulation. As AI continues to reshape the creative landscape, datasets like GTSinger will play a crucial role in unlocking new possibilities for music production and artistic expression.

References

>>> Read more <<<