Discriminative Tasks
Benchmarking Results for Generative Tasks
Low Bitrate
Models/Tasks |
ASR-En |
ASR-multiling |
ER |
IC |
KS |
SI |
SV |
WER ↓ |
WER ↓ |
Clean |
Other |
Welsh |
Basque |
ACC ↑ |
ACC ↑ |
ACC ↑ |
ACC ↑ |
EER ↓ |
Discrete HuBERT |
8.99 |
21.14 |
58.50 |
26.83 |
57.20 |
68.70 |
90.54 |
0.90 |
24.99 |
Discrete WavLM |
11.72 |
27.56 |
60.37 |
28.63 |
59.80 |
73.40 |
97.94 |
0.70 |
26.02 |
Discrete Wav2Vec2 |
12.14 |
28.65 |
66.30 |
32.25 |
57.80 |
74.10 |
96.16 |
0.40 |
33.53 |
EnCodec |
52.37 |
77.04 |
92.01 |
58.20 |
44.70 |
31.50 |
86.00 |
58.30 |
17.40 |
DAC |
63.96 |
83.61 |
94.86 |
66.29 |
49.20 |
22.10 |
81.00 |
45.10 |
20.62 |
SpeechTokenizer |
19.77 |
43.12 |
76.67 |
47.92 |
49.10 |
57.90 |
95.09 |
47.40 |
20.41 |
Medium Bitrate
Models/Tasks |
ASR-En |
ASR-multiling |
ER |
IC |
KS |
SI |
SV |
WER ↓ |
WER ↓ |
Clean |
Other |
Welsh |
Basque |
ACC ↑ |
ACC ↑ |
ACC ↑ |
ACC ↑ |
EER ↓ |
Discrete HuBERT |
7.91 |
18.95 |
54.77 |
23.63 |
62.10 |
70.50 |
94.69 |
67.40 |
15.71 |
Discrete WavLM |
8.52 |
20.35 |
54.22 |
22.06 |
57.60 |
78.00 |
98.09 |
80.80 |
8.00 |
Discrete Wav2Vec2 |
8.76 |
21.32 |
60.39 |
26.64 |
59.10 |
75.10 |
96.64 |
65.47 |
17.64 |
EnCodec |
46.80 |
74.24 |
91.23 |
47.95 |
51.30 |
31.40 |
88.70 |
91.90 |
7.81 |
DAC |
59.54 |
81.48 |
97.43 |
56.16 |
45.80 |
18.90 |
76.60 |
83.80 |
11.78 |
SpeechTokenizer |
18.32 |
41.21 |
75.17 |
38.94 |
52.10 |
57.80 |
94.86 |
91.40 |
7.88 |
High Bitrate
Models/Tasks |
ASR-En |
ASR-multiling |
ER |
IC |
KS |
SI |
SV |
WER ↓ |
WER ↓ |
Clean |
Other |
Welsh |
Basque |
ACC ↑ |
ACC ↑ |
ACC ↑ |
ACC ↑ |
EER ↓ |
EnCodec |
45.18 |
72.56 |
93.40 |
87.65 |
46.40 |
19.60 |
83.60 |
92.81 |
7.18 |
DAC |
99.53 |
99.38 |
99.40 |
99.68 |
46.00 |
15.70 |
75.20 |
85.61 |
10.89 |
Continuous Baseline
Models/Tasks |
ASR-En |
ASR-multiling |
ER |
IC |
KS |
SI |
SV |
WER ↓ |
WER ↓ |
Clean |
Other |
Welsh |
Basque |
ACC ↑ |
ACC ↑ |
ACC ↑ |
ACC ↑ |
EER ↓ |
SSL |
3.37 |
7.04 |
41.77 |
14.32 |
63.10 |
86.10 |
99.00 |
99.70 |
2.10 |