...
首页> 外文期刊>acm transactions on applied perception >Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents' Speech?
【24h】

Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents' Speech?

机译:韵律和具身化会影响会话代理语音的感知自然性吗?

获取原文
获取原文并翻译 | 示例

摘要

For conversational agents' speech, either all possible sentences have to be prerecorded by voice actors or the required utterances can be synthesized. While synthesizing speech is more flexible and economic in production, it also potentially reduces the perceived naturalness of the agents among others due to mistakes at various linguistic levels. In our article, we are interested in the impact of adequate and inadequate prosody, here particularly in terms of accent placement, on the perceived naturalness and aliveness of the agents. We compare (1) inadequate prosody, as generated by off-the-shelf text-to-speech (TTS) engines with synthetic output; (2) the same inadequate prosody imitated by trained human speakers; and (3) adequate prosody produced by those speakers. The speech was presented either as audio-only or by embodied, anthropomorphic agents, to investigate the potential masking effect by a simultaneous visual representation of those virtual agents. To this end, we conducted an online study with 40 participants listening to four different dialogues each presented in the three Speech levels and the two Embodiment levels. Results confirmed that adequate prosody in human speech is perceived as more natural (and the agents are perceived as more alive) than inadequate prosody in both human (2) and synthetic speech (1). Thus, it is not sufficient to just use a human voice for an agents' speech to be perceived as natural-it is decisive whether the prosodic realisation is adequate or not. Furthermore, and surprisingly, we found no masking effect by speaker embodiment, since neither a human voice with inadequate prosody nor a synthetic voice was judged as more natural, when a virtual agent was visible compared to the audio-only condition. On the contrary, the human voice was even judged as less "alive" when accompanied by a virtual agent. In sum, our results emphasize, on the one hand, the importance of adequate prosody for perceived naturalness, especially in terms of accents being placed on important words in the phrase, while showing, on the other hand, that the embodiment of virtual agents plays a minor role in the naturalness ratings of voices.
机译:对于会话代理的语音,所有可能的句子都必须由配音演员预先录制,或者可以合成所需的话语。虽然合成语音在生产中更加灵活和经济,但由于各种语言层面的错误,它也可能降低智能体的感知自然性。在我们的文章中,我们感兴趣的是充分和不充分的韵律,特别是在重音位置方面,对代理的感知自然性和活力的影响。我们比较了 (1) 由现成的文本转语音 (TTS) 引擎与合成输出生成的韵律不足;(2)训练有素的人类说话者模仿的韵律同样不足;(3)这些说话者产生的足够的韵律。演讲要么以纯音频的形式呈现,要么由具身的拟人化代理呈现,以研究这些虚拟代理的同时视觉表示的潜在掩蔽效果。为此,我们进行了一项在线研究,对 40 名参与者进行了一项在线研究,分别在三个言语级别和两个实施例级别中聆听了四种不同的对话。结果证实,在人类语言(2)和合成语言(1)中,人类言语中足够的韵律被认为比不充分的韵律更自然(并且代理被认为更有活力)。因此,仅仅使用人类的声音来使智能体的语音被认为是自然的是不够的,韵律的实现是否充分是决定性的。此外,令人惊讶的是,我们发现说话人实施例没有掩蔽效果,因为与纯音频条件相比,当虚拟代理可见时,韵律不足的人声和合成声音都没有被判断为更自然。相反,当有虚拟代理陪伴时,人声甚至被认为不那么“活跃”。总而言之,我们的研究结果一方面强调了充分的韵律对感知自然性的重要性,特别是在短语中重要单词的重音方面,另一方面表明虚拟代理的体现在语音的自然性评级中起着次要作用。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号