Benchmark Overview
We publicly release DASB as a modular code repository built on the popular SpeechBrain toolkit and licensed under Apache 2.0.
The package helps integrate and evaluate new audio tokenizers in speech, music and general audio domain.
We consider a wide range of tasks, including speech recognition, speaker verification, emotion recognition, keyword spotting, intent classification, sound event classification, music genre classification, speech enhancement, speech separation, text-to-speech, and music and general sound source separation.For reliable evaluation, we apply extensive hyperparameter tuning, test two downstream architectures per task, and average results over multiple seeds.
🧾 Datasets and Recipes
Dataset | Task | 1st Architecture | 2nd Architecture | Dataset Link |
---|---|---|---|---|
LibriSpeech | Speech Recognition | BiLSTM | Branchformer | openslr.org/12 |
CommonVoice 17.0 | Speech Recognition | BiLSTM | Branchformer | commonvoice.mozilla.org/en/datasets |
VoxCeleb1 | Speaker Verification/Identification | ECAPA-TDNN | BiLSTM + Linear | robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html |
IEMOCAP | Emotion Recognition | ECAPA-TDNN | Time-Pooling + Linear | sail.usc.edu/iemocap/ |
Speech Commands | Keyword Spotting | ECAPA-TDNN | X-Vectors | tensorflow.org/datasets/catalog/speech_commands |
SLURP | Intent Classification | BiLSTM + Linear | Time-Pooling + Linear | zenodo.org/record/4274930 |
VoiceBank | Speech Enhancement | Conformer | CRDNN | datashare.ed.ac.uk/handle/10283/2791 |
Libri2Mix | Speech Separation | Conformer | CRDNN | github.com/JorisCos/LibriMix |
LibriTTS / LJSpeech | Text-to-Speech | VALL-E | Shallow Transformer | LibriTTS: openslr.org/60 / LJSpeech: keithito.com/LJ-Speech-Dataset/ |
FUSS | Audio Source Separation | Conformer | CRDNN | github.com/google-research/sound-separation/blob/master/datasets/fuss/ |
MUSDB | Music Source Separation | Conformer | CRDNN | sigsep.github.io/datasets/musdb.html |
ESC50 | Sound Classification | ECAPA-TDNN | Linear | github.com/karolpiczak/ESC-50 |
GTZAN | Music Genre Classification | ECAPA-TDNN | Linear | huggingface.co/datasets/marsyas/gtzan |
Explore Leaderboards