SoulX-Singer is an open source AI model that creates realistic singing from text and melody that was released mid February 2026. It can sing in voices it has never trained on before, which people call zero shot. The system comes from Soul AI Lab, working with Tianjin University and Northwestern Polytechnical University in China.
The model focuses on making singing that sounds real and controllable. It works in Mandarin English and Cantonese. Users can guide pitch rhythm and expression using melody curves or MIDI scores. So you are not stuck with one input style.
Unlike many text to speech tools trained on one singer, this one adapts to new voices on the fly. It keeps audio quality high while doing that. And it aims to handle real world use not just lab tests.
Core features.
Extra tools.
Research background.
The team also shared a preprint titled SoulX-Singer Towards High Quality Zero Shot Singing Voice Synthesis. It describes training on more than 42000 hours of singing and sets up tests to measure zero shot results. The design targets strong performance and expressive output at an industry level.
In short SoulX-Singer blends control sound quality and open access. It focuses only on vocals and pushes zero shot singing forward in a clear hands on way.
If you'd like to access this model, you can explore the following possibilities: