Benchmark Overview

Public leaderboards allow researchers to keep track of state-of-the-art methods and encourage reproducible research.

We publicly release DASB as a modular code repository built on the popular SpeechBrain toolkit and licensed under Apache 2.0.


The package helps integrate and evaluate new audio tokenizers in speech tasks such as speech recognition, speaker identification, emotion recognition, keyword spotting, intent classification, speech enhancement, separation, and text-to-speech. It offers an interface for easy model integration and testing, and a protocol for comparing different audio tokenizers. The leaderboard benchmarks new audio tokenizers reliably. We consider different downstream architectures for each task and report the best-performing architecture.

Jekyll logo

⚡ Datasets and Recipes

Dataset Task Abbr. 1st Architecture 2nd Architecture Dataset Link
LibriSpeech Speech Recognition ASR BiLSTM ContextNet openslr.org/12
CommonVoice 17.0 Speech Recognition ASR-multiling BiLSTM Linear commonvoice.mozilla.org/en/datasets
VoxCeleb1 Speaker Verification/Identification SI-SV ECAPA-TDNN X-Vectors robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
IEMOCAP Emotion Recognition ER ECAPA-TDNN Time-Pooling + Linear sail.usc.edu/iemocap/
Speech Commands Keyword Spotting KS X-Vectors ECAPA-TDNN tensorflow.org/datasets/catalog/speech_commands
SLURP Intent Classification IC BiLSTM + Linear Time-Pooling + Linear zenodo.org/record/4274930
VoiceBank Speech Enhancement SE Conformer CRDNN datashare.ed.ac.uk/handle/10283/2791
Libri2Mix Speech Separation SS Conformer CRDNN github.com/JorisCos/LibriMix
LJSpeech Text-to-Speech TTS Shallow Transformer Deep Transformer keithito.com/LJ-Speech-Dataset/

Explore Leaderboards