Get Started
DASB: Discrete Audio and Speech Benchmark
Overview of the Discrete Audio and Speech Benchmark (DASB) pipeline:
- DASB includes a diverse set of discrete audio encoders from all three categories: semantic(Discrete HuBERT, Discrete WavLM, Discrete Wav2Vec2), compression(EnCodec, DAC), and hybrid(SpeechTokenizer).
- DASB supports a wide range of discriminative tasks, including speech, speaker, emotion recognition, keyword spotting, and intent classification. It also includes generative tasks, such as speech enhancement, separation, and text-to-speech.
- For a more reliable assessment, twodifferent downstream architectures are considered for each task.
- Novel audio tokenizer can be easily evaluated on DASB benchmark via reproducible and realistic evaluation protocols.
▶️ Quickstart
Running a single task
If you have specific discrete model and want to benchmark it for a specific task, you need to run the following command:
python LibriSpeech/ASR/LSTM/train_[tokenzier_name].py LibriSpeech/ASR/LSTM/hparams/train_[tokenzier_name].yaml --output_folder my-output-folder --data_folder mypath/to/LibriSpeech
Running multiple tasks
To run all tasks, make the following changes:
- Edit the
run_discriminative_benchmark.sh
andrun_genarative_benchmark.sh
files and modify tokenizer related values for example the bitrate , number of codebooks, and etc. - Choose a set of tasks from the provided list and, for each task, select a downstream architecture from the available options (see list below).
-
Update the variables defined in
run_benchmark.sh
with two lists of equal size. In theConsideredTasks
list, specify the tasks you want to run (e.g.,'LibriSpeechASR' 'LibriSpeechASR' 'IEMOCAP'
). In theDownstreams
list, specify the corresponding downstream architecture for each task (e.g.,'BiLSTM'
,contextnet
,'ecapa_tdnn'
).For example, if you set
ConsideredTasks=('LibriSpeechASR' 'LibriSpeechASR' 'IEMOCAP')
andDownstreams=('BiLSTM', 'contextnet', 'ecapa_tdnn')
, the benchmark will be executed as follows:- LibriSpeechASR with BiLSTM as the probing head
- LibriSpeechASR with contextnet as the probing head
- IEMOCAP with ecapa_tdnn as the probing head.
- Run the following command:
bash run_discriminative_benchmark.sh [tokenzier_name] bash run_genarative_benchmark.sh [tokenzier_name]
You could also pass extra arguments as far as they are consistent across all tasks.
For generative task, make sure to set the
utmos_path
required for TTS evaluation.📝 Incorporating Your Audio Tokenizer
Let’s now assume you’ve designed an audio and speech tokenizer in PyTorch and wish to integrate it into our benchmark. You’re in luck because we’ve made this step as simple as possible for you! Here are the steps you should follow:
-
Write your model’s code in a Python library saved in
benchmarks/DASB/model
(e.g.,benchmarks/MOABB/models/my_model.py
). -
Create a YAML and py file for each task you want to experiment with. Thankfully, you don’t have to start from scratch. For example, if you’re working with LibriSpeech/ASR/LSTM, copy
benchmarks/DASB/LibriSpeech/ASR/contextnet/hparams/train_encodec.yaml
and save it in the same folder with a different name (e.g.,train_my_model.yaml
andtrain_my_model.py
). -
Edit the relevant section of your
train_my_model.yaml
andtrain_my_model.py
. Redefine thecodec:
to reference your custom model (e.g.,codec: !new:models.my_model.my_model
). -
Ensure you include the hyperparameters specific to your model.
-
Now, follow the instructions above to run an experiments across tasks. Note: If you’re not familiar with YAML, you can refer to our HyperPyYAML tutorial on the SpeechBrain website for guidance.