首页> 外文期刊>Journal of VLSI signal processing >Speech and Language Processing for Multimodal Human-Computer Interaction
【24h】

Speech and Language Processing for Multimodal Human-Computer Interaction

机译:多模式人机交互的语音和语言处理

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing; Aurora2 evaluation results for this front-end processing; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad; HMM-based acoustic modeling for continuous speech recognition decoding; a unified language model integrating context-free grammar and N-gram model for the speech decoding; schema-based knowledge representation for the MiPad's personal information management task; a unified statistical framework that integrates speech recognition, spoken language understanding and dialogue management; the robust natural language parser used in MiPad to process the speech recognizer's output; a machine-aided grammar learning and development used for spoken language understanding for the MiPad task; Tap & Talk multimodal interaction and user interface design; back channel communication and MiPad's error repair strategy; and finally, user study results that demonstrate the superior throughput achieved by the Tap & Talk multimodal interaction over the existing pen-only PDA interface. These user study results highlight the crucial role played by speech in enhancing the overall user experience in MiPad-like human-computer interaction devices.
机译:在本文中,我们以代号为Who的项目描述了Microsoft Research在Microsoft Research上的最新工作,该项目旨在开发以语音为中心的多模式人机交互的支持技术。特别是,我们详细介绍了MiPad,这是第一个专门针对移动用户交互场景的Who。Dr. Who应用程序。 MiPad是无线移动PDA原型,使用户能够使用多模式口语界面和无线数据技术来完成许多常见任务。它完全集成了连续的语音识别和口语理解能力,并为解决当前普遍存在的用小笔尖啄食或在当今PDA或智能手机中使用小键盘打字的问题提供了新颖的解决方案。尽管目前尚不完全实现,但在本文所报告的用户研究中,我们已经观察到语音和笔有可能显着改善用户体验。我们在这篇面向系统的论文中描述了MiPad的主要组件,重点是健壮的语音处理和口语理解方面。讨论的详细MiPad组件包括:语音处理算法设计中的分布式语音识别注意事项;一种基于立体声的语音特征增强算法,用于抗噪鲁棒的前端语音处理;此前端处理的Aurora2评估结果; MiPad中用于分布式语音识别的语音特征压缩(源编码)和错误保护(通道编码);基于HMM的声学建模,用于连续语音识别解码;整合了无上下文语法和N-gram模型的语音解码统一语言模型;基于模式的知识表示,用于MiPad的个人信息管理任务;整合语音识别,口语理解和对话管理的统一统计框架; MiPad中用于处理语音识别器输出的强大自然语言解析器;用于MiPad任务的口语理解的机器辅助语法学习和开发; Tap&Talk多模式交互和用户界面设计;反向通道通信和MiPad的错误修复策略;最后,用户研究结果表明,通过Tap&Talk多模式交互在现有的仅笔式PDA接口上实现了卓越的吞吐量。这些用户研究结果突显了语音在增强类似MiPad的人机交互设备中的整体用户体验方面所起的关键作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号