Meta unveiled Voicebox, an AI model that generates synthetic voices from text prompts. The model was trained on a diverse dataset of more than 50,000 hours of unfiltered speech from audiobooks, including in different languages. The system can generate conversational-like speech audio clips and performs almost as well as models trained on real speech, claiming only a 1% error rate degradation.
|