6
在推理纯英文文本时,连字符和数字会被读成中文。
"text-to-speech" 读成 "text jian to jian speech"
"CosyVoice2, Fish-Speech" 读成 "cose voi ker, fish jian speech
Input text
IndexTTS is a GPT-style text-to-speech (TTS) model mainly based on XTTS and Tortoise. Trained on tens of thousands of hours of data, our system achieves state-of-the-art performance, outperforming current popular TTS systems such as XTTS, CosyVoice2, Fish-Speech, and F5-TTS.
Prompt audio (English)
https://jmp.sh/GjWuuxZ6
Output
https://jmp.sh/7Ux3qcsW
附件
wav.zip