Meta model generates synthetic voices

 


Meta unveiled Voicebox, an AI model that generates synthetic voices from text prompts. 

The model was trained on a diverse dataset of more than 50,000 hours of unfiltered speech from audiobooks, including in different languages.

The system can generate conversational-like speech audio clips and performs almost as well as models trained on real speech, claiming only a 1% error rate degradation.

  • Voicebox can also edit audio clips, removing unwanted noise and replacing misspoken words.
  • It relies on a new training method called Flow Matching, which outperforms existing systems in terms of intelligibility and audio similarity.
  • Meta has not released the Voicebox app or its source code to the public, possibly due to concerns about potential misuse.

Post a Comment

Previous Next

Contact Form