Amphion
Text-to-Speech (TTS) Demo
Evaluations
Comprehensive objective metrics on the generated speech can be evaluated following
Amphion evaluation pipeline.
The evaluation metrics in Amphion contain:
- - F0 Modeling: F0 Pearson Coefficients, F0 Periodicity Root Mean Square Error, F0 Root Mean Square
Error, Voiced/Unvoiced F1 Score, etc.
- - Energy Modeling: Energy Root Mean Square Error, Energy Pearson Coefficients, etc.
- - Speaker Similarity: Cosine similarity
- - Spectrogram Distortion: Frechet Audio Distance (FAD), Mel Cepstral Distortion (MCD),
Multi-Resolution STFT Distance (MSTFT), Perceptual Evaluation of Speech Quality (PESQ), Short Time
Objective Intelligibility (STOI), etc.
Here are the results of objective and
subjective evaluation (Mean Opinion Score, MOS) on Amphion and some open-source systems.
Samples
Here are some TTS samples from Amphion and some open-source systems.