Amphion Text-to-Speech (TTS) Demo

Evaluations

Comprehensive objective metrics on the generated speech can be evaluated following Amphion evaluation pipeline. The evaluation metrics in Amphion contain:

- F0 Modeling: F0 Pearson Coefficients, F0 Periodicity Root Mean Square Error, F0 Root Mean Square Error, Voiced/Unvoiced F1 Score, etc.
- Energy Modeling: Energy Root Mean Square Error, Energy Pearson Coefficients, etc.
- Speaker Similarity: Cosine similarity
- Spectrogram Distortion: Frechet Audio Distance (FAD), Mel Cepstral Distortion (MCD), Multi-Resolution STFT Distance (MSTFT), Perceptual Evaluation of Speech Quality (PESQ), Short Time Objective Intelligibility (STOI), etc.

Here are the results of objective and subjective evaluation (Mean Opinion Score, MOS) on Amphion and some open-source systems.

Samples

Here are some TTS samples from Amphion and some open-source systems.

Text: {{ item.name }}