Meta model generates synthetic voices

Meta unveiled Voicebox, an AI model that generates synthetic voices from text prompts.

The model was trained on a diverse dataset of more than 50,000 hours of unfiltered speech from audiobooks, including in different languages.

The system can generate conversational-like speech audio clips and performs almost as well as models trained on real speech, claiming only a 1% error rate degradation.

Voicebox can also edit audio clips, removing unwanted noise and replacing misspoken words.
It relies on a new training method called Flow Matching, which outperforms existing systems in terms of intelligibility and audio similarity.
Meta has not released the Voicebox app or its source code to the public, possibly due to concerns about potential misuse.

Meta model generates synthetic voices

Post a Comment

Contact Form