Benchmark Overview
Public leaderboards allow researchers to keep track of state-of-the-art methods and encourage reproducible research.
We publicly release DASB as a modular code repository built on the popular SpeechBrain toolkit and licensed under Apache 2.0.
The package helps integrate and evaluate new audio tokenizers in speech tasks such as speech recognition, speaker identification, emotion recognition, keyword spotting, intent classification, speech enhancement, separation, and text-to-speech. It offers an interface for easy model integration and testing, and a protocol for comparing different audio tokenizers. The leaderboard benchmarks new audio tokenizers reliably. We consider different downstream architectures for each task and report the best-performing architecture.
⚡ Datasets and Recipes
Dataset | Task | Abbr. | 1st Architecture | 2nd Architecture | Dataset Link |
---|---|---|---|---|---|
LibriSpeech | Speech Recognition | ASR | BiLSTM | ContextNet | openslr.org/12 |
CommonVoice 17.0 | Speech Recognition | ASR-multiling | BiLSTM | Linear | commonvoice.mozilla.org/en/datasets |
VoxCeleb1 | Speaker Verification/Identification | SI-SV | ECAPA-TDNN | X-Vectors | robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html |
IEMOCAP | Emotion Recognition | ER | ECAPA-TDNN | Time-Pooling + Linear | sail.usc.edu/iemocap/ |
Speech Commands | Keyword Spotting | KS | X-Vectors | ECAPA-TDNN | tensorflow.org/datasets/catalog/speech_commands |
SLURP | Intent Classification | IC | BiLSTM + Linear | Time-Pooling + Linear | zenodo.org/record/4274930 |
VoiceBank | Speech Enhancement | SE | Conformer | CRDNN | datashare.ed.ac.uk/handle/10283/2791 |
Libri2Mix | Speech Separation | SS | Conformer | CRDNN | github.com/JorisCos/LibriMix |
LJSpeech | Text-to-Speech | TTS | Shallow Transformer | Deep Transformer | keithito.com/LJ-Speech-Dataset/ |
Explore Leaderboards