首页> 外文学位 >Eye gaze for reference resolution in multimodal conversational interfaces .
【24h】

Eye gaze for reference resolution in multimodal conversational interfaces .

机译:多模式对话界面中参考分辨率的眼睛注视。

获取原文
获取原文并翻译 | 示例

摘要

Multimodal conversational interfaces allow users to carry a spoken dialogue with an artificial conversational agent while looking at a graphical display. The dialogue is used to accomplish purposeful tasks. Motivated by previous psycholinguistic findings, this dissertation investigates how eye gaze contributes to automated spoken language understanding in such a setting, specifically focusing on robust reference resolution---a process that identifies the referring expressions in an utterance and determines which entities these expressions refer to. As a part of this investigation we attempt to model user focus of attention during human-machine conversation by utilizing the users' naturally occurring eye gaze. We study which eye gaze and auxiliary visual factors contribute to this model's accuracy. Among the various features extracted from eye gaze, fixation intensity has shown to be the most indicative in reflecting attention. We combine user speech along with this gaze-based attentional model into an integrated reference resolution framework. This framework fuses linguistic, dialogue, domain, and eye gaze information to robustly resolve various kinds of referring expressions that occur during human-machine conversation. Our studies have shown that based on this framework, eye gaze can compensate for limited domain models and dialogue processing capability. We further extend this framework to handle recognized speech input acquired situated dialogue within an immersive virtual environment. We utilize word confusion networks to model the set of alternative speech recognition hypotheses and incorporate confusion networks into the reference resolution framework. The empirical results indicate that incorporating eye gaze significantly improves reference resolution performance, especially when limited domain model information is available to the reference resolution framework. The empirical results also indicate that modeling recognized speech via confusion networks rather than the single best recognition hypothesis leads to better reference resolution performance.
机译:多模式对话界面允许用户在查看图形显示时与人工对话代理进行口头对话。对话用于完成有目的的任务。基于先前的语言学研究发现,本文研究了在这种情况下视线如何有助于自动口语理解,特别是着眼于可靠的参考分辨力-这一过程可以识别发声中的参考表达并确定这些表达所指的实体。作为此调查的一部分,我们尝试通过利用用户自然发生的视线来模拟人机对话期间用户的关注焦点。我们研究哪些眼睛凝视和辅助视觉因素有助于该模型的准确性。在从视线中提取的各种特征中,注视强度显示出最能反映注意力。我们将用户语音与基于注视的注意力模型结合到一个集成的参考解析框架中。该框架融合了语言,对话,领域和视线信息,以可靠地解决人机对话期间出现的各种引用表达。我们的研究表明,基于此框架,视线可以补偿有限的领域模型和对话处理能力。我们进一步扩展了该框架,以处理身临其境的虚拟环境中位于对话中的已识别语音输入。我们利用单词混淆网络对备选语音识别假设进行建模,并将混淆网络纳入参考解析框架。实验结果表明,合并视线会显着改善参考分辨率性能,尤其是当有限域模型信息可用于参考分辨率框架时。实验结果还表明,通过混淆网络而不是单个最佳识别假设对识别的语音进行建模会导致更好的参考分辨率性能。

著录项

  • 作者

    Prasov, Zahar.;

  • 作者单位

    Michigan State University.;

  • 授予单位 Michigan State University.;
  • 学科 Psychology Cognitive.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号