It has been postulated that anchors in multi-stimulus listening tests for audio quality evaluation should have an item-independent quality, as listeners will likely shift their rating scale if the quality of the anchor varies. However, expert listeners have a very stable internal rating scale, which can be seen from the repeatability of their results when performing the same test multiple times. So they may stick to their usual scale even if the anchor varies. We find that listeners do not shift their rating scale by the full amount the anchor is shifted, but only up to 60% of that. Nevertheless this makes quantitative comparisons between different test results difficult, even if the anchor varies only by 5 Mushra points.
展开▼