Crowdsourced Pairwise-Comparison for Source Separation Evaluation

Cartwright, M., Pardo, B., Mysore, G. Crowdsourced Pairwise-Comparison for Source Separation Evaluation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.


Automated objective methods of audio source separation evaluation are fast, cheap, and require little effort by the investigator. However, their output often correlates poorly with human quality assessments and typically require groundtruth (perfectly separated) signals to evaluate algorithm performance. Subjective multi-stimulus human ratings (e.g. MUSHRA) of audio quality are the gold standard for many tasks, but they are slow and require a great deal of effort to recruit participants and run listening tests. Recent work has shown that a crowdsourced multi-stimulus listening test can have results comparable to lab-based multi-stimulus tests. While these results are encouraging, MUSHRA multistimulus tests are limited to evaluating 12 or fewer stimuli, and they require ground-truth stimuli for reference. In this work, we evaluate a web-based pairwise-comparison listening approach that promises to speed and facilitate conducting listening tests, while also addressing some of the shortcomings of multi-stimulus tests. Using audio source separation quality as our evaluation task, we compare our web-based pairwisecomparison listening test to both web-based and lab-based multi-stimulus tests. We find that pairwise-comparison listening tests perform comparably to multi-stimulus tests, but without many of their shortcomings.