Speech and Language Processing for Multimodal Human-Computer Interaction

L. DENG; Y. WANG; K. WANG; A. ACERO; H. HON; J. DROPPO; C. BOULIS; M. MAHAJAN; X.D. HUANG

首页> 外文期刊>Journal of VLSI signal processing >Speech and Language Processing for Multimodal Human-Computer Interaction

【24h】

Speech and Language Processing for Multimodal Human-Computer Interaction

机译：多模式人机交互的语音和语言处理

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing; Aurora2 evaluation results for this front-end processing; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad; HMM-based acoustic modeling for continuous speech recognition decoding; a unified language model integrating context-free grammar and N-gram model for the speech decoding; schema-based knowledge representation for the MiPad's personal information management task; a unified statistical framework that integrates speech recognition, spoken language understanding and dialogue management; the robust natural language parser used in MiPad to process the speech recognizer's output; a machine-aided grammar learning and development used for spoken language understanding for the MiPad task; Tap & Talk multimodal interaction and user interface design; back channel communication and MiPad's error repair strategy; and finally, user study results that demonstrate the superior throughput achieved by the Tap & Talk multimodal interaction over the existing pen-only PDA interface. These user study results highlight the crucial role played by speech in enhancing the overall user experience in MiPad-like human-computer interaction devices.

机译：在本文中，我们以代号为Who的项目描述了Microsoft Research在Microsoft Research上的最新工作，该项目旨在开发以语音为中心的多模式人机交互的支持技术。特别是，我们详细介绍了MiPad，这是第一个专门针对移动用户交互场景的Who。Dr. Who应用程序。 MiPad是无线移动PDA原型，使用户能够使用多模式口语界面和无线数据技术来完成许多常见任务。它完全集成了连续的语音识别和口语理解能力，并为解决当前普遍存在的用小笔尖啄食或在当今PDA或智能手机中使用小键盘打字的问题提供了新颖的解决方案。尽管目前尚不完全实现，但在本文所报告的用户研究中，我们已经观察到语音和笔有可能显着改善用户体验。我们在这篇面向系统的论文中描述了MiPad的主要组件，重点是健壮的语音处理和口语理解方面。讨论的详细MiPad组件包括：语音处理算法设计中的分布式语音识别注意事项；一种基于立体声的语音特征增强算法，用于抗噪鲁棒的前端语音处理；此前端处理的Aurora2评估结果； MiPad中用于分布式语音识别的语音特征压缩（源编码）和错误保护（通道编码）；基于HMM的声学建模，用于连续语音识别解码；整合了无上下文语法和N-gram模型的语音解码统一语言模型；基于模式的知识表示，用于MiPad的个人信息管理任务；整合语音识别，口语理解和对话管理的统一统计框架； MiPad中用于处理语音识别器输出的强大自然语言解析器；用于MiPad任务的口语理解的机器辅助语法学习和开发； Tap＆Talk多模式交互和用户界面设计；反向通道通信和MiPad的错误修复策略；最后，用户研究结果表明，通过Tap＆Talk多模式交互在现有的仅笔式PDA接口上实现了卓越的吞吐量。这些用户研究结果突显了语音在增强类似MiPad的人机交互设备中的整体用户体验方面所起的关键作用。

著录项

来源
《Journal of VLSI signal processing》 |2004年第3期|p.161-187|共27页
作者
L. DENG; Y. WANG; K. WANG; A. ACERO; H. HON; J. DROPPO; C. BOULIS; M. MAHAJAN; X.D. HUANG;
展开▼
作者单位

Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类大规模集成电路、超大规模集成电路;
关键词
speech-centric multimodal interface; human-computer interaction; robust speech recognition; SPLICE algorithm; denoising; online noise estimation; distributed speech processing; speech feature encoding; error protection; spoken language understanding; aut;

机译：以语音为中心的多模态接口;人机交互;健壮的语音识别;SPLICE算法;去噪;在线噪声估计;分布式语音处理;语音特征编码;错误保护;口语理解;aut;

相似文献

外文文献
中文文献
专利

1. Emotional information processing based on feature vector enhancement and selection for human-computer interaction via speech [J] . Park Jeong-sik, Kim Ji-hwan Telecommunication systems: Modeling, Analysis, Design and Management . 2015,第2期

机译：基于特征向量增强和情感选择的人机交互情感信息处理
2. Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction [J] . Takao Kobayashi IEICE Transactions on Information and Systems . 2010,第9期

机译：处理自然语音变异性以改善言语人机交互的特殊部分
3. NATURAL LANGUAGE PROCESSING and Human-Computer Interaction [J] . Rafael Valencia-Garcia, Francisco Garcia-Sanchez Computer standards & interfaces . 2013,第5期

机译：自然语言处理与人机交互
4. NATURAL LANGUAGE PROCESSOR FOR USER DRIVEN FREE SPEECH VOICE INTERACTION IN MULTIMODAL SMART ENVIRONMENTS [C] . Stefan Meissner, Bernd Mrohs, Ralf Kernchen, Institution of Engineering and Technology International Conference on Intelligent Environments . 2006

机译：用于用户驱动的自然语言处理器在多模式智能环境中驱动免费语音语音交互
5. A prototype system for human-computer interaction logging, post-processing, and data visualization for the Project54 system. [D] . Bourbeau, Edward. 2007

机译：一个用于Project54系统的人机交互日志记录，后处理和数据可视化的原型系统。
6. Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and Alzheimer’s dementia [O] . Anthony Yeung, Andrea Iaboni, Elizabeth Rochon, 2021

机译：与临床医生评估相关的自然语言处理和自动化语言分析以量化轻度认知障碍和阿尔茨海默痴呆症的语言变化
7. Speech and Language Processing for Multimodal Human-Computer Interaction 1 [O] . L. Deng, Y. Wang, K. Wang, 2009

机译：多模式人机交互的语音和语言处理1

Speech and Language Processing for Multimodal Human-Computer Interaction

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅