Benchmark Overview

We publicly release DASB as a modular code repository built on the popular SpeechBrain toolkit and licensed under Apache 2.0.


The package helps integrate and evaluate new audio tokenizers in speech, music and general audio domain.

We consider a wide range of tasks, including speech recognition, speaker verification, emotion recognition, keyword spotting, intent classification, sound event classification, music genre classification, speech enhancement, speech separation, text-to-speech, and music and general sound source separation.

For reliable evaluation, we apply extensive hyperparameter tuning, test two downstream architectures per task, and average results over multiple seeds.

Jekyll logo

🧾 Datasets and Recipes

Dataset Task 1st Architecture 2nd Architecture Dataset Link
LibriSpeech Speech Recognition BiLSTM Branchformer openslr.org/12
CommonVoice 17.0 Speech Recognition BiLSTM Branchformer commonvoice.mozilla.org/en/datasets
VoxCeleb1 Speaker Verification/Identification ECAPA-TDNN BiLSTM + Linear robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
IEMOCAP Emotion Recognition ECAPA-TDNN Time-Pooling + Linear sail.usc.edu/iemocap/
Speech Commands Keyword Spotting ECAPA-TDNN X-Vectors tensorflow.org/datasets/catalog/speech_commands
SLURP Intent Classification BiLSTM + Linear Time-Pooling + Linear zenodo.org/record/4274930
VoiceBank Speech Enhancement Conformer CRDNN datashare.ed.ac.uk/handle/10283/2791
Libri2Mix Speech Separation Conformer CRDNN github.com/JorisCos/LibriMix
LibriTTS / LJSpeech Text-to-Speech VALL-E Shallow Transformer LibriTTS: openslr.org/60 / LJSpeech: keithito.com/LJ-Speech-Dataset/
FUSS Audio Source Separation Conformer CRDNN github.com/google-research/sound-separation/blob/master/datasets/fuss/
MUSDB Music Source Separation Conformer CRDNN sigsep.github.io/datasets/musdb.html
ESC50 Sound Classification ECAPA-TDNN Linear github.com/karolpiczak/ESC-50
GTZAN Music Genre Classification ECAPA-TDNN Linear huggingface.co/datasets/marsyas/gtzan

Explore Leaderboards