Benchmark Overview

Public leaderboards allow researchers to keep track of state-of-the-art methods and encourage reproducible research.

We publicly release DASB as a modular code repository built on the popular SpeechBrain toolkit and licensed under Apache 2.0.

The package helps integrate and evaluate new audio tokenizers in speech tasks such as speech recognition, speaker identification, emotion recognition, keyword spotting, intent classification, speech enhancement, separation, and text-to-speech. It offers an interface for easy model integration and testing, and a protocol for comparing different audio tokenizers. The leaderboard benchmarks new audio tokenizers reliably. We consider different downstream architectures for each task and report the best-performing architecture.

⚡ Datasets and Recipes

Dataset	Task	Abbr.	1st Architecture	2nd Architecture	Dataset Link
LibriSpeech	Speech Recognition	ASR	BiLSTM	ContextNet	openslr.org/12
CommonVoice 17.0	Speech Recognition	ASR-multiling	BiLSTM	Linear	commonvoice.mozilla.org/en/datasets
VoxCeleb1	Speaker Verification/Identification	SI-SV	ECAPA-TDNN	X-Vectors	robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
IEMOCAP	Emotion Recognition	ER	ECAPA-TDNN	Time-Pooling + Linear	sail.usc.edu/iemocap/
Speech Commands	Keyword Spotting	KS	X-Vectors	ECAPA-TDNN	tensorflow.org/datasets/catalog/speech_commands
SLURP	Intent Classification	IC	BiLSTM + Linear	Time-Pooling + Linear	zenodo.org/record/4274930
VoiceBank	Speech Enhancement	SE	Conformer	CRDNN	datashare.ed.ac.uk/handle/10283/2791
Libri2Mix	Speech Separation	SS	Conformer	CRDNN	github.com/JorisCos/LibriMix
LJSpeech	Text-to-Speech	TTS	Shallow Transformer	Deep Transformer	keithito.com/LJ-Speech-Dataset/

Benchmark Overview

⚡ Datasets and Recipes

Explore Leaderboards