Peter Plantinga is a Postdoctoral Researcher at McGill University’s Department of Neurology and Neurosurgery, where his research leverages speech and audio data to develop biomarkers for neurodegenerative diseases. With a long-standing passion for applying AI to assistive technologies, Peter has published extensively on enhancing speech intelligibility in noisy environments for both human listeners and automated systems. He is a core developer of the open-source SpeechBrain toolkit, widely used in the speech processing and conversational AI communities, and previously led speech AI projects at JPMorganChase’s Machine Learning Center of Excellence, contributing to several patents in conversational AI technologies. Peter’s current work sits at the intersection of neuroscience and AI, aiming to advance the understanding and treatment of different neurological disorders through innovations in interpretable machine learning for voice analysis.
Continual learning in end-to-end automatic speech recognition (E2E-ASR) often suffers from catastrophic forgetting, where fine-tuning leads to significant performance degradation on previously seen data. While adapters offer a way to switch between fine-tuned models, they still underperform in unseen domains—a challenge when the input domain is unknown. We propose a method that reduces forgetting to just 3.4%, significantly outperforming fine-tuning strategies like LoRA, which exhibits a 49% forgetting rate. By linearly interpolating the parameters of multiple models fine-tuned from the same generalist model, we achieve a unified model that excels across diverse datasets. Moreover, this model can be iteratively fine-tuned and averaged while maintaining low forgetting rates. Our experiments demonstrate the robustness of this approach across various datasets and models, presenting a promising solution for continual learning in E2E-ASR.
Cem Subakan is an assistant prof. at the computer science department of Laval University, an affiliate assistant prof. at Concordia University and also an associate academic member at Mila. His research is on machine learning for speech and audio, recently focusing more on explainable machine learning. He recently co-organized the explainable AI for speech and audio workshop at ICASSP 2024, and will be a general chair for the IEEE MLSP 2025 conference.
He will discuss his recent work on generating explanations for audio models. While deep learning models excel at achieving high performance, they often function as black boxes, offering little transparency into their decision-making processes. His aim in this line of work is to develop methods that produce listenable explanations for these black-box audio models without compromising their original performance. Through several metrics, he demonstrates that the explanations generated by his approach remain faithful to the original model and are both listenable and understandable.
Mirco Ravanelli received the Ph.D. (with cum laude distinction) from the University of Trento, Trento, Italy, in December 2017. He is currently an Assistant Professor with Concordia University, Montreal, QC, Canada, an Adjunct Professor with the Universite de Montreal, and a Mila Associate Member. He is the Founder and Leader of the SpeechBrain Project which aims to build an open-source toolkit for conversational AI and speech processing. He is the author or co-author of more than 80 papers on his research interests which include deep learning and conversational AI. He is also an Active Member of the Speech and Machine Learning Communities.
Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the creation of modern multimodal large language models. Ideal audio tokens must effectively preserve phonetic and semantic content along with paralinguistic information, speaker identity, and other details. While several types of audio tokens have been recently proposed, identifying the optimal tokenizer for various tasks is challenging due to the inconsistent evaluation settings in existing studies. To address this gap, we release the Discrete Audio and Speech Benchmark (DASB), a comprehensive leaderboard for benchmarking discrete audio tokens across a wide range of discriminative tasks, including speech recognition, speaker identification and verification, emotion recognition, keyword spotting, and intent classification, as well as generative tasks such as speech enhancement, separation, and text-to-speech. Our results show that, on average, semantic tokens outperform compression tokens across most discriminative and generative tasks. However, the performance gap between semantic tokens and standard continuous representations remains substantial, highlighting the need for further research in this field.
Haibin Wu is a senior applied scientist at Microsoft. He got a Ph.D. degree at National Taiwan University, working with Prof. Hung-yi Lee and Prof. Lin-shan Lee in the area of machine learning and speech processing. His expertise lies in speech foundation models, neural audio codecs, prompt engineer, speech LLMs, speech enhancement, and deepfake detection.
TBA
Martijn Bartelds is a Postdoctoral Scholar at Stanford University, advised by Dan Jurafsky. His research focuses on multilingual speech and language processing, with a particular interest in understanding where language variety and dialect information is encoded in neural speech models, benchmarking, and model training. He received his PhD with the highest distinction from the University of Groningen, where his thesis was nominated for the university's best thesis award. He also received a prestigious NWO Rubicon fellowship and was a visiting researcher at Delft University of Technology and the University of Pennsylvania.
TBA
Pooneh Mousavi (she/her) is a computer science PhD student at Mila and Concordia University, supervised by Professor Mirco Ravanelli. She has a broad interest in deep learning for Conversational AI. Her research focuses on discrete self-supervised learning for speech and audio, exploring its potential to bridge audio and language models. She is also one of the main contributors to the SpeechBrain project, a popular open-source conversational AI toolkit.
Website, Google Scholar, Linkedin
Hiba Akhaddar (she/her) is a master’s student majoring in Computer Science at Concordia University and Mila. She is supervised by Pr. Tristan Glatard and Pr. Mirco Ravanelli. Her interests revolve around the applications of Deep Learning in the Medical field. She works on the detection and progression of Parkinson’s Disease from speech.
Website, Linkedin