Generative Tasks
Benchmarking Results for Generative Tasks
Low Bitrate
Models/Tasks |
SE |
SS |
TTS |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
UTMOS ↑ |
dWER ↓ |
Discrete HuBERT |
3.33 |
15.47 |
0.824 |
3.52 |
80.86 |
0.840 |
3.24 |
2.55 |
Discrete WavLM |
3.26 |
16.52 |
0.830 |
3.43 |
62.34 |
0.847 |
3.84 |
3.01 |
Discrete Wav2Vec2 |
3.55 |
18.86 |
0.779 |
3.75 |
96.70 |
0.787 |
3.32 |
3.45 |
EnCodec |
3.15 |
34.35 |
0.852 |
3.11 |
83.55 |
0.877 |
1.46 |
8.85 |
DAC |
3.30 |
57.41 |
0.853 |
3.01 |
102.00 |
0.854 |
1.97 |
10.68 |
SpeechTokenizer |
3.18 |
30.13 |
0.858 |
3.13 |
85.25 |
0.874 |
2.51 |
3.69 |
Medium Bitrate
Models/Tasks |
SE |
SS |
TTS |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
UTMOS ↑ |
dWER ↓ |
Discrete HuBERT |
3.48 |
12.62 |
0.875 |
3.70 |
66.29 |
0.891 |
3.80 |
3.40 |
Discrete WavLM |
3.48 |
10.18 |
0.889 |
3.68 |
34.03 |
0.912 |
3.82 |
2.45 |
Discrete Wav2Vec2 |
3.54 |
17.60 |
0.858 |
3.75 |
78.42 |
0.866 |
3.68 |
2.89 |
EnCodec |
3.10 |
19.07 |
0.885 |
3.09 |
48.57 |
0.906 |
1.50 |
9.46 |
DAC |
3.49 |
31.14 |
0.906 |
3.26 |
55.43 |
0.924 |
1.71 |
71.26 |
SpeechTokenizer |
3.49 |
23.44 |
0.876 |
3.42 |
60.75 |
0.906 |
1.96 |
53.26 |
High Bitrate
Models/Tasks |
SE |
SS |
TTS |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
UTMOS ↑ |
dWER ↓ |
EnCodec |
2.87 |
68.22 |
0.814 |
2.95 |
97.73 |
0.839 |
N.C |
N.C |
DAC |
2.95 |
46.07 |
0.860 |
2.53 |
208 |
0.784 |
N.C |
N.C |
Continuous Baseline
Models/Tasks |
SE |
SS |
TTS |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
DNSMOS ↑ |
dWER ↓ |
SpkSim ↑ |
UTMOS ↑ |
dWER ↓ |
SSL |
3.49 |
4.92 |
0.928 |
3.68 |
9.97 |
0.939 |
3.71 |
2.94 |