Machine Listening for Environmental Sound Understanding

Machine listening is the auditory sibling of computer vision and aims to extract meaningful information from audio signals. It has the potential to provide valuable information for numerous tasks including understanding and improving the health of our cities (e.g., monitoring and mitigating noise pollution), natural environments (e.g., monitoring and conserving biodiversity), and more. In our research, we investigate problems all along the machine listening pipeline, from both human and technical perspectives, with the aim of building better tools for understanding sound at scale.

Audio Representation Learning

A graphical depiction of an audio encoder-decoder model.

We aim to learn learn compact, semantically-rich representations from large amonts of unlabeled data, which can then be transferred and used in related downstream tasks with limited labeled data. We investigate both methods to learn these representation as well as how to effectively use these representations.

Crowdsourced Audio Annotation and Quality Evaluation

A bunch of ears in a cloud.

To address the needs of modern, data hungry machine learning algorithms, audio researchers often turn to crowdsourcing with the aim of hastening and scaling their efforts for audio annotation and audio quality evaluation. We research best practices for performing crowdsourced audio annotation and quality evaluation with the aim of increasing annotation quality, throughput, and user engagement.

Human-Centered Audio Production Tools

Screenshot from SynthAssist.

Audio production tools aid audio producers and engineers in the creation of the audio content we listen to in music, film, games, podcasts, and more. However, commercial audio production tools are often not designed for everyone but rather for professionals and serious hobbyists with particular knowledge, experience, and abilities. We research audio production tools aimed at addressing users’ needs that aren’t met by commercial developers. For example, to address the needs of novices, we reframe the controls to work within the interaction paradigms identified by research on how audio engineers and musicians communicate auditory concepts to each other: evaluative feedback, natural language, vocal imitation, and exploration. And more recently, we have begun investigating how to address the audio production needs of deaf and hard of hearing users.

Music Information Retrieval

Graphic depicting a computer transcribing a trio of instruments.

Music information retrieval (MIR) aims to extract information from music and has applications in automatic music transcription, music recommendation, computational musicology, music generation, and more. We investigate new methods, data, and tools to support MIR tasks.